▶ Auto-scaling in Practice

Özgür Bal Autoscaling 1 Comment


How do we do auto-scaling in practice? Where do we begin and what changes, if any, do we need to make to our application? Are there any other things we need to consider? This is the third post in the series on auto-scaling. Auto-scaling, a primer started off with the basics, described the problem auto-scaling solves, introduced some terms, and motivated why it matters. Advanced predictive and proactive auto scaling described how we can use predictions about future capacity demand to get even better auto-scaling behavior, and presented how the Elastisys cloud platform’s predictive and pro-active auto-scaler works. In this post, we present a check list of general questions that need answers before we can auto-scale an application. With that information at hand, we then describe how to use the Elastisys cloud platform’s auto-scaler in City Cloud.

Auto-scaling check list

There is a check list we need to go through first, before we can configure auto-scaling for any application. We need to answer the following questions:

  1. What component(s) of the application do we want to auto-scale?
  2. What capacity/size should each server instance have?
  3. How do we automate making a new server instance operational?
  4. How do server instances find each other (service discovery) and synchronize/collaborate?
  5. What metric do we use for determining when to scale?
  6. What is a single server instance’s capacity (in terms of our monitored metric(s))?
  7. What limits on deployment size do we have?

Read on for additional insight into each of these questions.

1. What component(s) of the application do we want to auto-scale?

Applications are typically divided in well-defined components. If you install the WordPress blogging engine on a server, the WordPress software itself does not store the contents of your blog posts, it asks a database server (such as MySQL or MariaDB) to do that. Similarly, it does not concern itself with how to actually communicate its data over HTTP to the web browsers of your visitors, it asks a web server (such as Apache or NGINX) to do that.

Two aspects matter a lot to auto-scaling: how well these components scale horizontally (by adding and removing instances) and how they are deployed. Deploying all components on a single server is nice from a performance point of view (no network delays between components!), but really bad from a horizontal scaling perspective. The reasons are many, but the two main ones are that:

  • if the application is suffering because there are not enough instances of one component, scaling up the others as well does not make sense — neither performance- or cost-efficiency-wise; and that
  • if we simply clone the server instance, we will also need to perform considerable configuration updates to make sure that the service as a whole is kept in sync.

Our deployment strategy should only put components on a single server instance if it makes sense from a horizontal scaling point of view. In our WordPress example, this means that we place WordPress bundled with a web server on a server instance, but keep the database on a separate set of servers. If we are using memcached to speed up the blog and offload the database, we keep that on a separate set as well.

Wordpress scaling

Figure 1. Single all-in-one Linux-Apache-MariaDB-PHP (LAMP) server versus a fully scalable WordPress install, where each component class can be horizontally scaled independently.

2. What capacity/size should each server instance have?

Now that we know what components to run on each server, we need to figure out how such servers are started. We need to define the capacity requirements so we know which size of VM or container we need to provision from the cloud provider. Which VM or container size you need relates to what software you need to run, and how much additional work synchronizing instances requires. If adding a new instance requires very little in terms of synchronization, you can typically add many smaller instances. If each new instance needs to synchronize a lot with its peers, as in a distributed database, you will likely want to keep the number of instances lower, and instead opt for giving each plenty of capacity.

3. How do we automate making a new server instance operational?

Next, we need to figure out how each new server should configure itself once it boots up. With auto-scaling, server instances will come and go at all times during the day. We need to automate that process, so it does not require a human system administrator logging in and taking care of each server instance. We can either do this via a simple shell script, or use a more complex but flexible solution via a configuration management tool such as Chef, Puppet, Salt, or Ansible.

Regardless of which configuration approach we choose, we typically need to install software, and configure the software to either register with some registry (e.g. a load balancer) or synchronize with its peers.

4. How do server instances find each other and synchronize/collaborate?

When a new server instance starts, the components deployed in it must make their existence known to the rest of the components in some way. In the case of a web server, that typically means registering with a load balancer. In the case of a database server, that typically means contacting a master node and becoming a member with a suitable role. The appropriate steps must be added to the boot script or the configuration management software, so a new instance becomes a fully operational member in whatever group it belongs to.

Some less than obvious pitfalls exist here, too. Again, let’s use WordPress as an example (it is, after all, rather popular — powering some 60 million blogs out there!). Textual content (pages, blog posts, comments, etc.) is stored in the database, but when you upload a media file (images) to go with your blog entry, it is stored locally on disk. So if we just add a bunch of web server with WordPress-instances via auto-scaling, they will all correctly serve the textual content of your site, since they query the same database, but the media can only be served by the instance that happened to accept your upload. The resulting mess will lead to a blog that is half-broken, all of the time. To avoid that issue, you need to either use a networked file system (such as GlusterFS) to synchronize content across your instances, or use a content distribution network.

5. What metric do we use for determining when to scale?

As we described in the first entry in this series, some metrics are better suited for auto-scaling than others. What we are ideally looking for is a set of metrics that:

  • is related to only the pertinent set of components, and
  • is not capped by the performance of the current deployment size (as it is impossible to know by how much capacity demand exceeds availability if our reporting is capped by current availability).

This means that measuring e.g. CPU usage across all servers we have right now is bad, because high values only tells us that we need more servers, but not how many. It is therefore better to measure, e.g., how many of certain typical or time-critical queries against a database take to finish, or what the current request rate is, coming in through a load balancer. If we have insufficient capacity available, at least we know by how much we need to scale up.

6. What is a single server instance’s capacity?

If we monitor good metrics, we then need to know how many servers are needed. If we know that a web server with our particular application can deal with X number of requests per second, that is great. Then we can adjust the size of our deployment according to that number — just dividing the number of requests we need to handle by X, and there is the number of server instances we need. As we explained in the second entry in this series, the auto-scaler will have to be clever about these values to avoid scaling up and down rapidly, to avoid stressing the system further.

7. What limits on deployment size do we have?

We need to set reasonable lower and upper limits for our deployment size. The lower limit ensures that we always have a reasonable baseline: there should never be fewer than a certain amount of server instances. At the very least, we need one server available, all the time. The upper limit is set by our budget. If you recall our second entry in this series, the capacity limits are enforced as the last step in the auto-scaler’s pipeline. This acts as a dependable safe-guard against exceeding the limits.

Auto-scaling example

To keep us focused on auto-scaling, we will just auto-scale a simple web application. We will deploy it on City Cloud, and use the load balancing service City Cloud provides. It is not WordPress, only a simple Apache web server serving static files. We do that to avoid the nitty-gritty details of how a scalable WordPress install is made (if that interests you, subscribe to our blog’s feed, an article about that is in the works). Our answers to the checklist are as follows:

Table I. Auto-scaling checklist for our simple web application.
Question Answer
What component(s) of the application do we want to auto-scale? Apache web servers, running in Ubuntu 14.04 LTS
What capacity/size should each server instance have? 1 CPU core, 1 GB RAM
How do we automate making a new server instance operational? Update package lists, install Apache via repositories.
How do server instances find each other and synchronize/collaborate? No synchronization, but servers will be registered with a City Cloud load balancer via an agent.
What metric do we use for determining when to scale? Requests per second, as reported by the load balancer.
What is a single server instance’s capacity? 100 requests per second*
What limits on deployment size do we have? Limit deployment size to between 1 to 20 instances.

* It is good to pick a conservative number for a server instance’s capacity first and then test rigorously to find a value closer to the actual limit. The capacity is of course dependent on a number of factors, including how many CPU cores and how much RAM we assign to each server instance.

Configuring auto-scaling in City Cloud

Here is a screencast where we deploy our auto-scaler in City Cloud, and use it to configure our simple web application according to the check list:

In the video, we show how we:

  • configure a load balancer for our application,
  • obtain our OpenStack API credentials,
  • register a key pair for use in instances created via the OpenStack API,
  • deploy the auto-scaler, and
  • configure the auto-scaler using our simple web configuration UI, built specifically for City Cloud.

The result is an application that will auto-scale nicely according to the parameters we set. It is admittedly a bit simple to serve a static page, but additional complexity can always be added later. Complexity likes that — it adds up over time, whether you want to or not.

If you would like to see the auto-scaler in action, we recently made a screencast of that available, too. In it, we auto-scale an application across multiple clouds: City Cloud and Amazon EC2. It is available for your viewing pleasure here:

Join the Elastisys auto-scaling beta

We are working on making the auto-scaler available to all customers and streamlining the process of getting up and running. If this series of blog posts has caught your interest in auto-scaling, you are welcome to join our beta. Contact Elastisys, and we will be happy to set you up!

Upcoming events

We will attend #CloudBeerStockholm on November 5, 2015. We are presenting there with a talk entitled Autoscaling in a multi-cloud environment and give a live demo using the Elastisys configuration UI built for City Cloud. Don’t miss it, it will be fun!

Auto-scaling an application can be rather complex, depending the application. In a joint effort with City Cloud, we will host a webinar some time in the future. At the webinar we discuss the steps required to scale up a WordPress blog from a single server to a fully auto-scaled install, ready to take on huge amounts of visitors. Stay tuned on this blog and the City Cloud newsletter for updates!


About the authors of this article


This has been a guest post by elastisys, a Swedish startup company that has spun off from the distributed systems research group at Umeå university. Elastisys turns academic research into products and services with the mission of helping companies improve the robustness, performance, and cost-efficiency of their cloud applications. You can find elastisys on various social media and its home on the web, elastisys.com.


Lars is a software architect at elastisys who specializes in research and development of scalable systems. When not directly involved in software development, he gives lectures on advanced topics in distributed systems and cloud application analysis and design. Contacthim via email or visit his LinkedIn page.