{"id":18281,"date":"2020-04-09T18:23:30","date_gmt":"2020-04-09T09:23:30","guid":{"rendered":"https:\/\/www.skyarch.net\/blog\/?p=18281"},"modified":"2025-12-09T14:14:00","modified_gmt":"2025-12-09T05:14:00","slug":"aws-ecs-cluster-auto-scaling","status":"publish","type":"post","link":"https:\/\/www.skyarch.net\/blog\/en\/aws-ecs-cluster-auto-scaling\/","title":{"rendered":"[AWS ECS] Cluster Auto Scaling"},"content":{"rendered":"<p>This new capability offered by AWS improves the cluster scaling experience by increasing the speed and reliability of <em>cluster scale-out<\/em>, giving the user control over the amount of spare capacity maintained in the cluster, and automatically managing instance termination on <em>cluster scale-in<\/em>.<\/p>\n<div>Previously, the workaround to have ECS Cluster Auto Scaling enabled is to create <em>CloudWatch Alarms<\/em>\u00a0connected to the <em>ECS Cluster CloudWatch Metrics<\/em>\u00a0(e.g., <em>MemoryUtilization<\/em>, <em>CPUUtilization<\/em>) and then integrate those alarms in the <em>Auto Scaling Group<\/em> of the ECS Cluster. With this, you will have to manage the number of EC2 instances you add or remove every time the alarms you set are triggered. You can read more on that <a href=\"https:\/\/www.unicon.net\/insights\/blogs\/aws-ecs-auto-scaling-part2\">here<\/a>.<\/div>\n<h2>Capacity Provider<\/h2>\n<div>Going back to the new ECS Cluster Auto Scaling, to enable it, you will need to create the new ECS resource type called <strong>Capacity Provider<\/strong>. A <em>Capacity Provider<\/em> can be associated with an <em>EC2 Auto Scaling Group (ASG)<\/em>. Through an <em>ASG<\/em> will an ECS Cluster have the optimum number of container instances, without creating additional <em>CloudWatch Metric Alarms<\/em>\u00a0nor specifying the number of EC2 instances in case of scale out or in.<\/div>\n<div><\/div>\n<div>In particular, <em>Capacity Provider<\/em> uses <strong>Managed Scaling<\/strong> that utilizes an automatically-created scaling policy on the <em>ASG<\/em>, and a new scaling metric (<em>Capacity Provider Reservation<\/em>) to manage the optimum number of instances in the cluster; and in addition, <strong>Managed instance termination protection<\/strong> which enables container-aware termination of instances in the <em>ASG<\/em>\u00a0when scale-in happens.<\/div>\n<h3>Creating Capacity Provider and ECS Cluster<\/h3>\n<div>To create a <em>Capacity Provider<\/em>, you will need an <em>ASG<\/em> first. An <em>ASG<\/em> can be created through the <em>AWS console<\/em> or <em>AWS CLI<\/em>. You may also opt to using the <em>AWS SDK<\/em>. An <em>ASG<\/em>, in turn, requires <em>Launch Configuration <\/em>so this needs to be created first.<\/div>\n<div><\/div>\n<div>In the AWS console, <em>ASG<\/em> and <em>Launch Configuration<\/em> sections can be found in the <em>EC2<\/em> console. Using <em>CLI<\/em>:<\/div>\n<h4>Launch Configuration:<\/h4>\n<pre>aws autoscaling create-launch-configuration --cli-input-json &lt;launch-config-json&gt; --user-data &lt;user-data-txt-file&gt;<\/pre>\n<h4>Auto Scaling Group:<\/h4>\n<pre>aws autoscaling create-auto-scaling-group --auto-scaling-group-name &lt;asg-name&gt; --cli-input-json &lt;asg-config-json&gt;<\/pre>\n<h4>Capacity Provider:<\/h4>\n<pre>aws ecs create-capacity-provider --cli-input-json &lt;capacity-provider-config-json&gt;<\/pre>\n<div>Now you can finally create the <em>ECS Cluster<\/em>\u00a0equipped with <em>Capacity Provider<\/em>:<\/div>\n<pre>aws ecs create-cluster --cluster-name &lt;cluster-name&gt; --capacity-providers &lt;capacity-provider-name&gt; --default-capacity-provider-strategy capacityProvider=&lt;capacity-provider-name&gt;,weight=1<\/pre>\n<div><strong>Note: You need not to create a <em>Launch Configuration<\/em> and an <em>ASG<\/em> if you create the <em>ECS Cluster<\/em> using the <em>EC2 Linux + Networking<\/em> template. They are handled by AWS in the background. All you have to do is create a <em>Capacity Provider<\/em> and connect the pre-configured ASG to it. This is NOT possible if you create an EMPTY cluster.<\/strong><\/div>\n<div><\/div>\n<h2>Fargate and EC2 ECS Launch Types Comparison<\/h2>\n<div>In this section, <strong><em>Fargate<\/em> <span style=\"color: #ff0000\">(A)<\/span><\/strong> and <strong><em>EC2<\/em> ECS launch types with <span style=\"color: #ff0000\">(B)<\/span><\/strong> and <strong>without <span style=\"color: #ff0000\">(C)<\/span> <em>Capacity Provider<\/em><\/strong>\u00a0are compared.<\/div>\n<h3>Setup<\/h3>\n<div>Three clusters are prepared, one each for the three launch types. All of them use the same task definition and container definition for the service.<\/div>\n<div><\/div>\n<div><\/div>\n<div><span style=\"color: #ff0000\">(A)<\/span> in the first place does not need any EC2 and container instance configuration as it is managed by AWS.<\/div>\n<div><\/div>\n<div><\/div>\n<div>For <span style=\"color: #ff0000\">(B)<\/span>, <em>launch configuration<\/em> and <em>ASG<\/em>\u00a0created when the ECS Cluster was created are utilized. <em>Capacity Provider<\/em>\u00a0was added with <em>Managed Scaling<\/em>\u00a0option enabled.<\/div>\n<div><\/div>\n<div><\/div>\n<div>Additional configurations are needed for <span style=\"color: #ff0000\">(C)<\/span> such as <em>CloudWatch Metric Alarms<\/em>\u00a0mentioned earlier. One (1) EC2 instance is added every time the alarm is triggered.<\/div>\n<div><\/div>\n<div><\/div>\n<div>Twenty (20) tasks are fired through updating the service definition (i.e., <em>Number of tasks = 20<\/em>).<\/div>\n<h3>Speed<\/h3>\n<div><span style=\"color: #ff0000\">(A)<\/span> is the fastest in launching the 20 tasks as most of its resources are handled and optimized by AWS.<\/div>\n<div><\/div>\n<div class=\"mceTemp\"><\/div>\n<div><a href=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/a.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-18277\" src=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/a-300x69.png\" alt=\"Fargate Launch Type\" width=\"1005\" height=\"231\" srcset=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/a-300x69.png 300w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/a-1024x234.png 1024w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/a-768x176.png 768w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/a-1536x351.png 1536w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/a-728x166.png 728w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/a.png 1649w\" sizes=\"auto, (max-width: 1005px) 100vw, 1005px\" \/><\/a><\/div>\n<div><\/div>\n<div>As shown in the image, it only took ~2 minutes for the 20 tasks to be launched and the service to reach a steady state.<\/div>\n<div><\/div>\n<div><\/div>\n<div><span style=\"color: #ff0000\">(B)<\/span> comes in second which took ~11 minutes to do the same job.<\/div>\n<div><\/div>\n<div><a href=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/b.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-18278\" src=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/b-300x85.png\" alt=\"EC2 Launch Type with Capacity Provider\" width=\"1002\" height=\"284\" srcset=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/b-300x85.png 300w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/b-1024x291.png 1024w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/b-768x218.png 768w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/b-1536x436.png 1536w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/b-728x207.png 728w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/b.png 1669w\" sizes=\"auto, (max-width: 1002px) 100vw, 1002px\" \/><\/a><\/div>\n<div>As seen, the service repeatedly failed to place the remaining tasks due to insufficiency in memory of the available container instances (EC2 instances). <em>ASG<\/em> through the <em>Capacity Provider<\/em>\u00a0fired the necessary number of additional EC2 instances to accommodate the remaining tasks.<\/div>\n<div><\/div>\n<div><\/div>\n<div><span style=\"color: #ff0000\">(C)<\/span> is the slowest of the three mainly because it depends on the cluster's <em>CPUUtilization<\/em> and <em>MemoryUtilization<\/em>\u00a0metric alarms. Also, since only 1 instance is added every alarm trigger, it took time for the optimum number of EC2 instances to be fired.<\/div>\n<div><\/div>\n<div><a href=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/c.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-18276\" src=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/c-300x159.png\" alt=\"EC2 Launch Type without Capacity Provider\" width=\"996\" height=\"528\" srcset=\"https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/c-300x159.png 300w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/c-1024x542.png 1024w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/c-768x407.png 768w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/c-1536x814.png 1536w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/c-728x386.png 728w, https:\/\/www.skyarch.net\/blog\/wp-content\/uploads\/2020\/04\/c.png 1635w\" sizes=\"auto, (max-width: 996px) 100vw, 996px\" \/><\/a><\/div>\n<div>It took ~28 minutes for <span style=\"color: #ff0000\">(C)<\/span> to reach a steady state. This can be further improved by tweaking the <em>CloudWatch Alarms<\/em>\u00a0thresholds and\/or the number of instances fired each alarm trigger.<\/div>\n<h3>Zero Scale<\/h3>\n<div><span style=\"color: #ff0000\">(A)<\/span> and <span style=\"color: #ff0000\">(B)<\/span> can both scale out from zero. For <span style=\"color: #ff0000\">(C)<\/span>, however, it needs at least one container instance for the <em>CloudWatch Metrics<\/em>\u00a0to record data; only then can it decide whether to add more instances or not.<\/div>\n<h2>References<\/h2>\n<ul>\n<li>https:\/\/aws.amazon.com\/blogs\/compute\/building-blocks-of-amazon-ecs\/<\/li>\n<li>https:\/\/aws.amazon.com\/about-aws\/whats-new\/2019\/12\/amazon-ecs-capacity-providers-now-available\/<\/li>\n<li>https:\/\/aws.amazon.com\/blogs\/aws\/aws-ecs-cluster-auto-scaling-is-now-generally-available\/<\/li>\n<li>https:\/\/aws.amazon.com\/blogs\/containers\/deep-dive-on-amazon-ecs-cluster-auto-scaling\/<\/li>\n<li>https:\/\/github.com\/aws\/containers-roadmap\/issues\/76<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This new capability offered by AWS improves the cluster scaling experience by increasing the speed and reliabi&#8230;<\/p>\n","protected":false},"author":128,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_locale":"en_US","_original_post":"https:\/\/www.skyarch.net\/blog\/?p=18281","footnotes":""},"categories":[20],"tags":[1019,102,303,1018],"class_list":{"0":"post-18281","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-aws","7":"tag-autoscaling","8":"tag-aws","9":"tag-ecs","10":"tag-fargate","11":"en-US"},"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/posts\/18281","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/users\/128"}],"replies":[{"embeddable":true,"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/comments?post=18281"}],"version-history":[{"count":5,"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/posts\/18281\/revisions"}],"predecessor-version":[{"id":30073,"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/posts\/18281\/revisions\/30073"}],"wp:attachment":[{"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/media?parent=18281"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/categories?post=18281"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.skyarch.net\/blog\/wp-json\/wp\/v2\/tags?post=18281"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}