Auto Scaling



CHAPTER 6


Auto Scaling


In this chapter, you will


•   Learn what Auto Scaling is


•   Understand the benefits of Auto Scaling


•   Understand various Auto Scaling policies


•   See how to set up Auto Scaling


•   Learn what Elastic Load Balancing is


•   Understand how Elastic Load Balancing works


•   See the various types of load balancing


Auto Scaling is the technology that allows you to scale your workload up and down automatically based on the rules you define. It is one of the innovations that makes the cloud elastic and helps you customize per your own requirements. Using Auto Scaling, you don’t have to over-provision the resources to meet the peak demand. Auto Scaling will spin off and configure new resources automatically and then take the resources down when the demand goes down. In this chapter, you’ll learn all about the advantages of Auto Scaling.


On-premise deployments require customers to go through an extensive sizing exercise, essentially guessing at the resources required to meet peak workloads. Experience shows that it’s almost impossible to get the sizing estimates right. Most often customers end up with underutilized resources while underestimating resources for peak workloads. Other times customers plan for the peak by over-provisioning the resources. For example, you might provision all the hardware for Black Friday at the beginning of the year since you get your capital budget during the start of the year. So, for the whole year those servers run only with, say, 15 to 20 percent CPU and achieve the peak during the Black Friday sale. In this case, you have wasted a lot of compute capacity throughout the year that could have been used for some other purpose.


With Amazon Web Services (AWS), you have the ability to spin servers up when your workloads require additional resources and spin them back down when demand drops. You can set up rules with parameters to ensure your workloads have the right resources.


You can integrate Auto Scaling with Elastic Load Balancing; by doing so you can distribute the workload across multiple EC2 servers.



Images


CAUTION    People often think about Auto Scaling as Auto Scaling for EC2 servers, but now Auto Scaling is available for many AWS products, so you should not restrict your thoughts to EC2 only.


Benefits of Auto Scaling


These are the main benefits of Auto Scaling:


•   Dynamic scaling The biggest advantage of Auto Scaling is the dynamic scaling of resources based on the demand. There is no limit to the number of servers you can scale up to. You can scale up from two servers to hundreds or thousands or tens of thousands of servers almost instantly. Using Auto Scaling you can make sure that your application always performs optimally and gets additional horsepower in terms of CPU and other resources whenever needed. You are able to provision them in real time.


•   Best user experience and performance Auto Scaling helps to provide the best possible experience for your users because you never run out of resources and your application always performs optimally. You can create various rules within Auto Scaling to provide the best user experience. For example, you can specify that if the CPU utilization increases to more than 70 percent, a new instance is started.


•   Health check and fleet management You can monitor the health checks of your Elastic Compute Cloud (EC2) instances using Auto Scaling. If you are hosting your application on a bunch of EC2 servers, the collection of those EC2 servers is called a fleet. You can configure health checks with Auto Scaling, and if a health check detects there is a failure on an instance, it automatically replaces the instance. It reduces a lot of burden from you because now you don’t have to manually replace the failed instance. It also helps to maintain the desired fleet capacity. For example, if your application is running on six EC2 servers, you will be able to maintain the fleet of six EC2 servers no matter how many times there is an issue with an EC2 server. Alternatively, if one or more servers go down, Auto Scaling will start additional servers to make sure you always have six instances running. When you configure Auto Scaling with Elastic Load Balancing (ELB), it is capable of doing ELB health checks as well. There are various kinds of health checks ELB can do, such as for hardware failure, system performance degradation, and so on. Detecting these failures on the fly while always managing a constant fleet of resources is really painful in the on-premise world. With AWS, everything is taken care of for you automatically.


•   Load balancing Since Auto Scaling is used to dynamically scale up and down the resources, it can take care of balancing the workload across multiple EC2 instances when you use Auto Scaling along with ELB. Auto Scaling also automatically balances the EC2 instances across multiple AZs when multiple AZs are configured. Auto Scaling makes sure that there is a uniform balance of EC2 instances across multiple AZs that you define.


•   Target tracking You can use Auto Scaling to run on a particular target, and Auto Scaling adjusts the number of EC2 instances for you in order to meet that target. The target can be a scaling metric that Auto Scaling supports. For example, if you always want the CPU’s utilization of your application server to remain at 65 percent, Auto Scaling will increase and decrease the number of EC2 instances automatically to meet the 65 percent CPU utilization metric.


•   Cost control Using Auto Scaling, you can also automatically remove the resources you don’t need in order to avoid overspending. For example, in the evening when the users leave, Auto Scaling will remove the excess resources automatically. This helps in keeping the budget under control.


•   Predictive scaling Auto Scaling is now integrated with machine learning (ML), and by using ML Auto Scaling, you can automatically scale your compute capacity in advance based on predicted increase in demand. The way it works is Auto Scaling collects the data from your actual usage of EC2 and then uses the machine learning models to predict your daily and weekly expected traffic. The data is evaluated every 24 hours to create a forecast for the next 48 hours.


Auto Scaling is most popular for EC2, but in addition to EC2, Auto Scaling can be used to scale up some other services. You can use application Auto Scaling to define scaling policies to scale up and down these resources. Here are the other services where Auto Scaling can be used:


•   EC2 spot instances


•   EC2 Container Service (ECS)


•   Elastic Map Reducer (EMR) clusters


•   AppStream 2.0 instances


•   Amazon Aurora Replicas


•   DynamoDB


Let’s see how Auto Scaling works in real life. Say you have an application that consists of two web servers that are hosted in two separate EC2 instances. To maintain the high availability, you have placed the web servers in different availability zones. You have integrated both the web servers with ELB, and the users connect to the ELB. The architecture will look something like Figure 6-1.


Image



Figure 6-1 Application with two web servers in two different AZs


Everything is going well when all of a sudden you notice that there is an increase in the web traffic. To meet the additional traffic, you provision additional two web servers and integrate them with ELB, as shown in Figure 6-2. Up to this point you are doing everything manually, which includes adding web servers and integrating them with ELB. Also, if your traffic goes down, you need to bring down the instances manually since keeping them is going to cost you more. This is fine and manageable when you have a smaller number of servers to manage and you can predict the traffic. What if you have hundreds or thousands of servers hosting the application? What if the traffic is totally unpredictable? Can you still add hundreds and thousands of servers almost instantly and then integrate each one of them with ELB? What about taking those servers down? Can you do it quickly? Not really. Auto Scaling solves this problem for you.


Image



Figure 6-2 Adding two additional web servers to the application


When you use Auto Scaling, you simply add the EC2 instances to an Auto Scaling group, define the minimum and maximum number of servers, and then define the scaling policy. Auto Scaling takes care of adding and deleting the servers and integrating them with ELB based on the usage. When you integrate Auto Scaling, the architecture looks something like Figure 6-3.


Image



Figure 6-3 Adding all four web servers as part of Auto Scaling


Scaling Plan


The first step in using Auto Scaling is to create a scaling plan. By using a scaling plan, you can configure and manage the scaling for the AWS resources you are going to use along with Auto Scaling. The scaling plan can be applied to all the supported Auto Scaling resources. The following sections outline the steps to create a scaling plan.


Identify Scalable Resources


You can automatically discover or manually choose the resources you want to use with your Auto Scaling plan. This can be done in three different ways:


•   Search via CloudFormation stack You can select an existing AWS CloudFormation stack to have AWS Auto Scaling scan it for resources that can be configured for automatic scaling. AWS Auto Scaling only finds resources that are defined in the selected stack. It does not traverse through nested stacks. The stack must be successfully created and cannot have an operation in progress.


•   Search by tag You can also use tags to find the following resources:


•   Aurora DB clusters


•   Auto Scaling groups


•   DynamoDB tables and global secondary indexes


When you search by more than one tag, each resource must have all of the listed tags to be discovered.


•   EC2 Auto Scaling groups You can choose one or more Auto Scaling groups to add to your scaling plan. The EC2 Auto Scaling is covered in detail in the next section. Figure 6-4 shows all the options for finding a scalable resource from the console.


Image



Figure 6-4 Finding a scalable resource


Specify Scaling Strategy


Once you identify the resource you are going to use with Auto Scaling, the next step is to specify a scaling strategy by which the resource will scale up and down. There are four different ways by which you can create a scaling strategy:


•   Optimize for Availability When you choose this option, Auto Scaling automatically scales the resources in and out to make sure they are always available. When you choose this option, the CPU/resource utilization is kept at 40 percent.


•   Balance Availability and Cost This option keeps a uniform balance between the availability and the cost. Here, the CPU/resource utilization is kept at 50 percent in order to maintain the perfect balance between the availability and cost.


•   Optimize for Cost As the name suggests, the goal of this option is to lower the cost; hence, the CPU/resource utilization is kept at 70 percent. This feature is very useful for low-level environments where performance is not critical.


•   Custom Scaling Strategy Using this option, you can choose your own scaling metric if the off-the-shelf strategy doesn’t meet your requirements. Here, you can decide your own CPU/resource utilization value. The various options related to scaling strategy from the AWS console are shown in Figure 6-5.


Image



Figure 6-5 Choosing a scaling strategy



Images


TIP    As a solutions architect, you will be dealing with various kinds of workloads. You will notice that scaling strategies for one particular workload will be different from another workload. Therefore, you should experiment with scaling strategies for each workload and then come up with the correct one.


While choosing a scaling strategy, you can also enable predictive scaling and dynamic scaling. If you enable predictive scaling, machine learning is used to analyze the historical workload and then forecast the future workload. Predictive scaling makes sure you have the resource capacity provisioned before your application demands it. If you enable dynamic policy, target tracking scaling policies are created for the resources in your scaling plan. For example, via dynamic policy, you can define that the EC2 servers run at 60 percent of CPU. Then, whenever the CPU utilization rises above 60 percent, your scaling policy will be triggered. Thus, this scaling policy adjusts resource capacity in response to live changes in resource utilization.


Using EC2 Auto Scaling


Auto Scaling is most popular with EC2 instances. In this section, we are going to cover in detail how to set up Auto Scaling with an EC2 instance. The concepts for Auto Scaling we have discussed previously apply here as well. In the case of EC2 Auto Scaling, the resource is an EC2 server only. Let’s look in detail at all the steps required to use EC2 Auto Scaling. The first step in this process is to create a launch configuration.


Launch Configuration


When you use Auto Scaling to scale up your instances, it needs to know what kind of server to use. You can define this by creating a launch configuration. A launch configuration is a template that stores all the information about the instance, such as the AMI (Amazon machine image) details, instance type, key pair, security group, IAM (Identity and Access Management) instance profile, user data, storage attached, and so on.


Once you create a launch configuration, you can link it with an Auto Scaling group. You can use the same launch configuration with multiple Auto Scaling groups as well, but an Auto Scaling group always has only one launch configuration attached to it. You will learn about Auto Scaling groups in the next section. Once you create an Auto Scaling group, you can’t edit the launch configuration tied up with it; the only way to do this is to create a new launch configuration and associate the Auto Scaling group with the new launch configuration. The subsequent instances will be launched as per the new Auto Scaling group settings. For example, in your Auto Scaling group, say you have created a launch configuration with C4 large instances. You launch four C4 large instances as part of the initial launch. Then you remove the old configuration, create a new configuration, and add the new configuration as part of your Auto Scaling group. In your new configuration, you specify C4 extra-large instances. Now when the new instances are going to spin off, they will be C4 extra-large. Say the new Auto Scaling rule kicks in and the Auto Scaling group starts two more instances; the additional two new instances will be C4 extra-large. Now you will be running the fleet with four C4 large and two C4 extra-large instances. If one of the C4 large instances goes down because of a hardware fault, the replacement instance that the Auto Scaling group will launch will be a C4 extra-large and not C4 large since there is no entry of C4 large instances in the launch configuration anymore.



Images


TIP    You can save and reuse the launch configuration. (For example, you can use the launch configuration of the production environment for building a test environment.)


Auto Scaling Groups


An Auto Scaling group is the place where you define the logic for scaling up and scaling down. It has all the rules and policies that govern how the EC2 instances will be terminated or started. Auto Scaling groups are the collection of all the EC2 servers running together as a group and dynamically going up or down as per your definitions. When you create an Auto Scaling group, first you need to provide the launch configuration that has the details of the instance type, and then you need to choose the scaling plan or scaling policy. You can scale in the following ways:


•   Maintaining the instance level This is also known as the default scaling plan. In this scaling policy, you define the number of instances you will always operate with. You define the minimum or the specified number of servers that will be running all the time. Auto Scaling groups make sure you are always running with that many instances. For example, if you define that you are always going to run six instances, whenever the instance goes down because of hardware failure or any issues, the Auto Scaling group is going to spin off new servers, making sure you are always operating with a fleet of six servers.


•   Manual scaling You can also scale up or down manually either via the console or the API or CLI. When you do the manual scaling, you manually add or terminate the instances. Manually scaling should be the last thing you would be doing since Auto Scaling provides so many ways of automating your scaling. If you still scale it manually, you are defying the Auto Scaling setup.


•   Scaling as per the demand Another usage of Auto Scaling is to scale to meet the demand. You can scale according to various CloudWatch metrics such as an increase in CPU, disk reads, disk writes, network in, network out, and so on. For example, you can have a rule that says if there is a spike of 80 percent and it lasts for more than five minutes, then Auto Scaling will spin off a new server for you. When you are defining the scaling policies, you must define two policies, one for scaling up and the other for scaling down.


•   Scaling as per schedule If your traffic is predictable and you know that you are going to have an increase in traffic during certain hours, you can have a scaling policy as per the schedule. For example, your application may have heaviest usage during the day and hardly any activity at night. You can scale the application to have more web servers during the day and scale down during the night. To create an Auto Scaling policy for scheduled scaling, you need to create a scheduled action that tells the Auto Scaling group to perform the scaling action at the specified time.


To create an Auto Scaling group, you need to provide the minimum number of instances running at any time. You also need to set the maximum number of servers to which the instances can scale. In some cases, you can set a desired number of instances that is the optimal number of instances the system should be. Therefore, you tell Auto Scaling the following:


•   If the desired capacity is greater than the current capacity, then launch instances.


•   If the desired capacity is less than the current capacity, then terminate instances.


It is important that you know when the Auto Scaling group is increasing or decreasing the number of servers for your application. To do so, you can configure Amazon Simple Notification Service (SNS) to send an SNS notification whenever your Auto Scaling group scales up or down. Amazon SNS can deliver notifications as HTTP or HTTPS POST, as an e-mail, or as a message posted to an Amazon SQS queue.


There are some limits to how many Auto Scaling groups you can have. Since the number keeps on changing, it is recommended that you check the AWS web site for the latest numbers. All these numbers are soft limits, which can be increased with a support ticket.


Please note that an Auto Scaling group cannot span regions; it can be part of only one region. However, it can span multiple AZs within a region. By doing so, you can achieve a high-availability architecture.


It is recommended that you use the same instance type in an Auto Scaling group since you are going to have effective load distribution when the instances are of the same type. However, if you change the launch configuration with different instance types, all the new instances that will be started will be of different types.


Let’s talk about the scaling policy types in more detail. You can have three types of scaling policies.


Simple Scaling


Using simple scaling, you can scale up or down on the basis of only one scaling adjustment. In this mechanism, you select an alarm, which can be CPU utilization, disk read, disk write, network in or network out, and so on, and then scale up or down the instances on the occurrence of that particular alarm. For example, if the CPU utilization is 80 percent, you can add one more instance, or if the CPU utilization is less than 40 percent, you can take one instance down. You can also define how long to wait before starting or stopping a new instance. This waiting period is also called the cooldown period. When you create a simple scaling policy, you need to create two policies, one for scaling up or increasing the group size and another for scaling down or decreasing the group size. Figure 6-6 shows what a simple scaling policy looks like.


Image



Figure 6-6 Simple scaling policy


If you look at Figure 6-6, you will notice the policy is executed when the alarm occurs, so the first step is to create an alarm. By clicking Add New Alarm, you can create a new alarm from where you can specify whom to notify and the scaling conditions. Figure 6-7 shows an alarm created that sends a notification to the admin when the CPU goes up by 50 percent after one occurrence of five minutes.


Image



Figure 6-7 Creating an alarm


Once you create the alarm, you need to define the action that adds an EC2 instance for scaling up and decreases an EC2 instance for scaling down; then you input the time before the next scale-up or scale-down activity happens, as shown in Figure 6-8. If you look at the top of Figure 6-8, you will see that I have chosen from one to six instances; therefore, the maximum instances I can scale up to is six.


Image



Figure 6-8 Simple scaling policy with all the parameters


Simple Scaling with Steps


With simple scaling, as we have discussed, you can scale up or down based on the occurrence of an event, and every time Auto Scaling does the same action. Sometimes you might need to have even finer-grained control. For example, let’s say you have defined a policy that says when the CPU utilization is more than 50 percent, add another instance. However, you can have even more control. Specifically, you can specify that when the CPU utilization is between 50 percent and 60 percent, add two more instances, and when the CPU utilization is 60 percent or more, add four more instances. If you want to do this kind of advanced configuration, simple scaling with steps is the solution. With simple scaling with steps, you do everything just like simple scaling, but in the end you add a few more steps. Figure 6-8 showed the option Creating A Scaling Policy With Steps. Once you click this, the Add Step button is enabled, and from there you can define the additional steps, as shown in Figure 6-9.


Image



Figure 6-9 Simple scaling with steps


When you are doing the scaling up or down using simple scaling or simple scaling with steps, you can change the capacity in the following ways:


•   Exact capacity You can provide the exact capacity to increase or decrease. For example, if the current capacity of the group is two instances and the adjustment is four, Auto Scaling changes the capacity to four instances when the policy is executed.


•   Change in capacity You can increase or decrease the current capacity by providing a specific number. For example, if the current capacity of the group is two instances and the adjustment is four, Auto Scaling changes the capacity to six instances when the policy is executed.


•   Percentage change in capacity You can also increase or decrease the current capacity by providing a certain percentage of capacity. For example, if your current capacity is 10 instances and the adjustment is 20 percent when the policy runs, Auto Scaling adds two more instances, making it a total of 12. Please note that since in this case it is a percentage, the resulting number will not always be an integer, and Auto Scaling will round off the number to the nearest digit. Values greater than 1 are rounded down. For example, 13.5 is rounded to 13. Values between 0 and 1 are rounded to 1. For example, .77 is rounded to 1. Values between 0 and –1 are rounded to –1. For, example, –.72 is rounded to –1. Values less than –1 are rounded up. For example, –8.87 is rounded to –8.


Target-Tracking Scaling Policies


You can configure dynamic scaling using target-tracking scaling policies. In this policy, either you can select a predetermined metric or you choose your own metric and then set it to a target value. For example, you can choose a metric of CPU utilization and set the target value to 50 percent. When you create a policy like this, Auto Scaling will automatically scale up or scale down the EC2 instances to maintain a 40 percent CPU utilization. Internally, Auto Scaling creates and monitors the CloudWatch alarm that triggers the Auto Scaling policy. Once the alarm is triggered, Auto Scaling calculates the number of instances it needs to increase or decrease to meet the desired metric, and it automatically does what you need.


Termination Policy


Auto Scaling allows you to scale up as well as scale down. When you scale down, your EC2 instances are terminated; therefore, it is important to shut down in a graceful manner so that you have better control. You can decide how exactly you are going to terminate the EC2 servers when you have to scale down. Say, for example, that you are running a total of six EC2 instances across two AZs. In other words, there are three instances in each AZ. Now you want to terminate one AZ. Since in this case the instances are pretty much balanced across these two AZs, terminating any one of them from any one of the AZs should be fine. If you have to terminate two instances, it is important to shut down instances from each AZ so that you can have a balanced configuration. It should not happen that you are going to terminate two servers from a single AZ; then you would have three servers running from one AZ and one server running from a second AZ, making it an unbalanced configuration.


You can configure termination policies to terminate an instance. The termination policy determines which EC2 instance you are going to shut down first. When you terminate a machine, it deregisters itself from the load balancer, if any, and then it waits for the grace period, if any, so that any connections opened to that instance will be drained. Then the policy terminates the instance.


There could be multiple ways you can write down termination policies. One way would be to determine what is the longest-running server you have in your fleet and then terminate it. The advantage of this is that since you’re running the server for the longest time, it may be possible the server might not have been patched, or there might be some memory leaks happening on the server and so on.


You can also terminate the servers that are close to billing an hour. By terminating these servers, you are going to extract the maximum benefit from the Auto Scaling feature. For example, if you have two servers and one of them has been running for just 5 minutes and another one has been running for around 54 minutes, terminating the one that has been running for 54 minutes gives you more value for the money.



Images


NOTE    AWS has now moved to a new billing model that is based on paying per second for certain instance types along with paying per hour. You should be aware of both concepts.

Only gold members can continue reading. Log In or Register to continue

Aug 1, 2021 | Posted by in Building and Construction | Comments Off on Auto Scaling
Premium Wordpress Themes by UFO Themes