Setting Google AppEngine (GAE) Performance Sliders For Instance and Latency

If you are trying to figure out how to not get eaten alive by the new billing for Google AppEngine, but also don’t want your application to be doggedly slow, then you have to get the mix right.  Tuning GAE is more finicky than tuning the carb on my classic mustang, and often the results can be far more mysterious.

The first step in tuning your app is looking at your average number of active instances.

This is a poorly optimized example:

Setting  Google AppEngine (GAE) Performance Sliders For Instance and Latency

In the above the number of active instances is pretty nearly 0 and the billed instances is always 5. Until the end when some tuning was done (sorry I decided to write this after starting to play with the instance settings).

Since at no time was the above billed for less than 5 instances, setting the Min Idle instances to 5 would instantly result in about 40% savings. That’s a big deal by itself, but likely this app could benefit from some Max Idle instance tweaking as well. So let’s go over how to calculate both the Min Idle, Max Idle, and Pending Latency settings.

First the easy one.  I have found no instance where setting Min Pending Latency to 20ms was not the right answer. So Min pending latency should always be 20 ms.

Max Pending latency.  If this is set to a number lower than your average request time which can be found by looking at the Milliseconds/Request in the dash board.  Then any time you receive a set of requests from an HTTP 1.1 browser that makes 8 requests to a server at once you will spawn more instances, rather than serving some of the responses in series.  That is your call, but generally I think it is not the best idea, unless you have lots (24+ instances) at all times.

For example.  Let’s say you have a site that the page makes 6 calls to AppEngine.  If you have 4 instances and they are set to a max pending latency of 500ms and the Average Request takes 1500 ms then 2 new instances will be spun up. If your warm up time is more than 1500ms those instance won’t serve anything after they warmed up because the other instances will be free by then.

If you have a page that makes 8 requests, and you always have 12 instances set to idle, then as long as the max latency is set to 75% of the average (1,125 ms) then unless 2 people make requests in the same 2 second window you will not need any additional instances to serve the traffic.

So what is the formula for success?

Depends what you are optimizing for.  If you want balls to the walls speed.  Set your min idle instances to 24 or 3x the number of requests a visitor makes per page load, which ever is greater, your max latency to 100ms and leave max idle instances to Automatic.  You will always have enough instances to handle 3 http 1.1 users arriving at the exact same time, and new instances will spin up when ever you get more than that.

If you want settings that are conservative but won’t break your budget or "time out", set the minimum idle instances to equal the number of requests a visitor makes per page load, the max idle to twice the average request time, and the max idle to 50% more than the min idle.  If you find in a day or two that the billed instances is always set to your max idle, raise the minimum idle instances by 25% and leave max idle alone.

If you want absolute economy. Look at your maximum request times.  Subtract that from 45 seconds.  Set that to the Maximum pending latency, set min idle to 1 (unless you don’t want always on) and max idle to 1.  With these settings you should never time out, but your requests might take a very long time to serve.

Other hints:

What ever the lowest Billed instances falls to should be your min Idle instances.  This will save you money.

If your start up time is more than 2x the Max Pending Latency you need to have Max Idle be at least twice min Idle. If it is 3x you need the max idle to be 3x as well.  This doesn’t hold up when you get to 20 and 30 instances, but until you get large enough that adding another visitor doesn’t add 1 over a single digit as much traffic you need to follow that rule. As you approach the next order of magnitude the numbers change a bit but your incremental cost for another instance is less of the budget so it matters less.

The scheduler is much better with large apps than it is with small apps.  if you are small, consider locking the instances to a min that is 25% higher than your average, and like 5 when you only need 4 and setting max idle to 1 more than the min, then leaving latency at 20ms and 15ms.  This will effectively turn off scaling excepting absolute emergencies.