Lambda Warmer: Optimize AWS Lambda Function Cold Starts

At a recent AWS Startup Day event in Boston, MA, Chris Munns, the Senior Developer Advocate for Serverless at AWS, discussed Lambda cold starts and how to mitigate them. According to Chris (although he acknowledge that it is a “hack”) using the CloudWatch Events “ping” method is really the only way to do it right now. He gave a number of really good tips to pre-warm your functions “correctly”:

  • Don’t ping more often than every 5 minutes
  • Invoke the function directly (i.e. don’t use API Gateway to invoke it)
  • Pass in a test payload that can be identified as such
  • Create handler logic that replies accordingly without running the whole function

He also addressed how to keep several concurrent functions warm. You need to invoke the same function multiple times, each with a delayed execution. This prevents the system from reusing the same container.

💡 There were a lot of great insights in his presentation. I’ve summarized them for you in my post: 15 Key Takeaways from the Serverless Talk at AWS Startup Day. I highly suggest you read this to learn some serverless best practices directly from the AWS team.

I deal with cold starts quite a bit, especially when using API Gateway. So following these “best practices”, I created Lambda Warmer. It’s a lightweight Node.js module that can be added to your AWS Lambda functions to manage “warming” ping events. It also handles automatic fan-out for warming concurrent functions. Just instrument your code and schedule a “ping”.

Here is an example usage. You simply require the lambda-warmer package and then use it to inspect the event. And that’s it! Lambda Warmer will either recognize that this is a “warming” event, scale out if need be, and then short-circuit your function – OR – it will detect that this is an actual request and pass execution off to your main logic.

💡 Lambda Warmer returns a resolved promise, so you can use it with promise chains too. 😉

Why should I use Lambda Warmer? 🤔

It’s important to note that you DO NOT HAVE TO WARM all your Lambda functions. First of all, according to Chris, cold starts account for less than 0.2% of all function invocations. That is an incredibly small percentage that will only affect a tiny number of invocations. Secondly, unless cold starts are causing noticeable latency issues with synchronous invocations, there probably isn’t a need to warm them.

For example, if you have functions responding to asynchronous or streaming events, no one is going to notice if there is a few hundred millisecond start up delay 0.2% of the time. There probably isn’t any reason to warm these functions. If, however, your functions are responding to synchronous API Gateway requests and users periodically experience 5 to 10 seconds of latency, then Lambda Warmer might makes sense.

So, if you have to pre-warm your functions, why would you choose Lambda Warmer? First of all, it’s open source. I built Lambda Warmer to solve my own problem and shared it with the community so that others can benefit from it. Pull Requests, suggestions and feed back are always welcome. Secondly, it is super lightweight with only one dependency (Bluebird for managing delays). Dependency injection scares me, and given the string of recent NPM hacks, minimizing dependencies is becoming a best practice.

And then there’s this:

As the old saying goes, “If it’s good enough for Chris Munns, it’s good enough for me!” 😂

But seriously, my goal was to follow best practices and make it easily shareable with others. If this helps a few more people adopt serverless or makes their apps a bit more responsive, awesome. 🙌

How does Lambda Warmer work? 👷‍♂️

The code for Lambda Warmer is available on Github and is MIT licensed. Feel free to contribute or fork it and create your own implementation.

There are a number of configuration options, all explained in the documentation, so I won’t repeat them here. I do want to outline the basics since the default implementation will probably be sufficient for most use cases.

❗️IMPORTANT NOTE: Lambda Warmer facilitates the warming of your functions by analyzing invocation events and appropriately managing handler processing. It DOES NOT create or manage CloudWatch Rules or other services that periodically invoke your functions.

You can create a rule manually, using a SAM template, or using the Serverless framework. You could also write another Lambda function that periodically invoked functions (maybe even with some smarter rules). Regardless of how you automate the “ping” of your functions, the two most important things are:

  1. Non-VPC functions are kept warm for approximately 5 minutes whereas VPC-based functions are kept warm for 15 minutes. Set your schedule for invocations accordingly. There is no need to ping your functions more often than the minimum warm time.
  2. Functions must be invoked with a JSON object like this: { "warmer":true,"concurrency":3 }. The warmer and concurrency field names can be changed using the configuration options.

When your function is pinged with a concurrency of 1, the process is quite simple:

Lambda Warmer inspects the event, determines it’s a “warming” event, and then short-circuits your code without executing the rest of your handler logic. This saves you money by minimizing the execution time. If the function isn’t warm, it may take a few hundred milliseconds to start up. But if it is warm, it should be less than 100ms.

If you are warming your function with a concurrency great than 1, then Lambda Warmer handles that for you as well:

As with single concurrency warming events, Lambda Warmer inspects the event to determine if it is a “warming” invocation. If it is, Lambda Warmer attempts to invoke copies of the function to simulate concurrent load. It does this by using the AWS-SDK, this means that your function needs the lambda:InvokeFunction permission. The invocations are asynchronous and use the Event type to avoid blocking. Each invoked function executes with a Delay that keeps that function busy while others are starting. The last invocation uses the RequestResponse type so that the initial function doesn’t end before the final invocation is made. This prevents the system from reusing any of the running containers.

Lambda Warmer is invoking copies of the same function by sending it a similar “warming” event, which is handled appropriately for you.

What’s the purpose of logging “warming” events?  📊

Logs are automatically generated unless the log configuration option is set to false. I personally like the logs because they contain useful information beyond just invocation data. The warm field indicates whether or not the Lambda function was already warm when invoked. The lastAccessed field is the timestamp (in milliseconds) of the last time the function was accessed by a non-warming event. Similarly, the lastAccessedSeconds gives you a counter (in seconds) of how long it’s been since it has been accessed.

You can use these to create metric filters in CloudWatch, which could give you some really interesting stats. Armed with this information, you can determine things like whether your concurrency needs to be lowered (or raised).

Here is a sample log:

Warm and fuzzy Lambda functions

I hope you find this module useful and that it helps you overcome (or at least lessen) this little annoyance as much as possible. I said in a previous post about cold starts:

I’m excited to find new and creative ways that serverless infrastructures can solve real-world problems. As soon as the cold start issue is fully addressed, the possibilities become endless.

I still believe this is true and that serverless is the future. It was also nice to finally meet Chris Munns in person and hear his thoughts on the issue. He relayed that the AWS Lambda team is well aware of the cold start issue and is working diligently to address it. This certainly makes me feel better to know that it is a priority for them.

But even with this outstanding issue, we can still feel warm and fuzzy about the future of serverless and do what we can today by following best practices to keep our Lambdas warm and fuzzy too. ⚡️❤️🙌

Tags: , , , , ,

Did you like this post? 👍  Do you want more? 🙌  Follow me on Twitter or check out some of the projects I’m working on.

35 thoughts on “Lambda Warmer: Optimize AWS Lambda Function Cold Starts”

  1. Thanks for your article, but you are wrong. Invoke lambda by another service or another lambda is useless for API Gateway API invoke. API Gateway execute-api always case lambda create a new container in order to run the invoke, even though there already have some idle containers warmed by other services.

    1. I’m not sure where you heard that from, but that is incorrect. API Gateway is just an event trigger, it pulls from the pool of warm Lambdas, or creates new containers if none exist. Using CloudWatch to invoke the function or API Gateway doesn’t matter, then both access the same underlying pool.

    2. hmm, I think Cow might be right, I’ve been seeing some similar behaviour. A cloudwatch event pings a lambda function every 5 mins, but api gateway invokes a completely new one (even though the pinged one is not in use by any other service or busy handling a different api call). This can be seen by a) two different logs for each lambda function in cloudwatch (not more than 2 functions were invoked, one by the cloudwatch event and one by api gateway) and the fact that the second api request through api gateway was significantly faster than the initial one even though that function was already supposed to be warmed up by the cloudwatch event. This is reproducible behaviour, I’ve seen this many times and I wonder why that might be.

    3. Hmm, from talking both to the Lambda team at AWS and Chris Munns, who is the Principal Developer Advocate for Serverless at AWS, I can tell you that this is not expected behavior. There is never a guarantee that a container will stay warm, so it is possible that an API Gateway invocation could cold start even if another instance is warm, but that should happen very infrequently.

      If you have examples of this, please send them to me and I’d be happy to forward it along to the team at AWS to investigate.


    4. Hi Jeremy,

      thanks for responding so quickly. Yeah, so we’ve been fighting with this quite a bit and I think we found the issue. This seems to happen when x-ray is enabled. When that’s the case api gw always starts a new lambda in a new container rather than using an existing one. After, however, we disabled x-ray for that specific lambda and any lambda calling it (via api gw) it worked fine again.

    5. edit to my previous post:

      what I meant to say is: when x-ray is enabled, api gw will only reuse those containers it has started itself and not any container started by other services.

    6. Interested to hear any further results. I also have experienced this behavior when X-Ray is involved. In short, after many different setups and iterations, I haven’t been able to get “affinity” between a container being warmed by lambda-warmer (i.e., on the lambda side), and one being invoked via API Gateway, if the lambda is built with X-Ray. Tried things like limiting subnets, creating the warmed container first, creating the container via API Gateway first, etc. No luck. When using plain old aws-sdk without X-Ray, it is possible (my lambdas are nodejs fwiw), though it does seem that API Gateway is too quick to create a new container, which it often does even when there are quiescent, warmed lambdas that are not apparently busy. This involves Lambdas in a VPC, so maybe I’m just getting unlucky with Subnets, but anyway X-Ray-wrapped lambdas called from API Gateway never pick a warmed lambda in my experience.

      The obvious workaround is to pre-warm a container via an actual invocation that comes thru API Gateway (and not use X-Ray) but of course this is less elegant and in that case there’s no need for lambda-warmer anyway.

      The X-Ray problem is most disturbing because really it has nothing to do with lambda-warmer, rather it implies that API Gateway and Lambdas must be tightly coupled from a design/test standpoint in order to achieve pre-warming, i.e., when using X-Ray you cannot do anything exclusively from the Lambda side (at least not if the Lambda is in a VPC), to pre-warm. You have to pre-warm by simulating the entire event chain starting from API Gateway, which is much more involved than a simple rule timer.

      I’ll check out Was G’s reddit too.


  2. Thanks for sharing this excellent tool. I realize we want to short circuit the Lambdas we are keeping warm, but what should that look like in practice? In my case, the log from the warmer works as expected, but that is followed by an error, such as “Unexpected error { i: Required value missing: ___”…

    Is that the desired outcome?

  3. Hi Jeremy, great article and we are actually testing out your warmer right now. Question: what does a concurrency more than 1 accomplish? Isn’t your code warm or cold? If it’s warm, won’t either 1 invocations or 100,000 invocations all be executed from the warm code? Why keep 3 of them running?

    1. Hi Jason,

      Lambda invokes a separate container for each “concurrent” connection. So if 10 clients access your function at the same time, 10 separate copies of your Lambda function will be created and executed. This is how they are able to achieve near infinite scaling. Once a container is created, Lambda will attempt to reuse those “warm” containers (but there is no guarantee). So if at one point you had 10 concurrent users, then subsequent requests (up to 10 concurrent connections), should all get warm containers. Containers that are unused only stay warm for about 15 minutes (if they are in a VPC, 5 mins if not), so if you spike to 10 concurrent users, but then it drops to 2 concurrent users for the next 15 minutes, another spike will require 8 cold starts to handle the additional concurrency.

      The idea of Lambda Warmer’s concurrency setting is to make sure that X number of Lambda containers are warm at any given time. It essentially makes several concurrent calls to the same function to be sure that you always have enough warm functions.

      Hope that helps,

  4. Maybe I should read through the source code to find answer for this question, but since I am lazy (and not 100% sure I would be able to understand what I read), I will just ask here. 🙂

    So, my guess is that this lambda-warmer (implemented within the actual lambda function we need to run) calls the same lambda function, and the newly invoked lambda function does the same, and on and on, is that right? You know, just like recursive function calls.

    If so, that must mean:
    1. There is some counter in the event the function receives and passes down to the next function, which gets decremented by one each time an invocation is made, so that it knows when to stop the recursion
    2. The lambda-warmer is not designed to go after some specific instances of the lambda function (i.e. the ones it initiated earlier). Instead it just makes N number of “concurrent”* calls, which is fine since we are guaranteed to have N instances of the lambda refreshed after that.
    (* I figured that invocations are actually sequential, not exactly concurrent, but they work out as if they were concurrent as lambda-warmer forces the function to run for 75 ms, during which and a chain of warming up calls will complete, which are concurrent in practical sense).
    This also means that if there was an invocation from an actual customer, not by the lambda-warmer during this time, we may end up with N+1 loaded instances (which should be fine).

    Are these all correct understanding?

    It’s a bummer that internally initialized lambda instances won’t be used by API Gateway with X-Ray on (we do use API Gateway with VPC, and we want to use X-Ray as well). Maybe we need to make N concurrent calls from outside (through API Gateway). How to ensure they do get translated to actually concurrent invocation of our lambda internally? I have no idea 🙁

    This still is a great article, however. Thank you for sharing it!

  5. Hi,
    In cloudwatch rule, sets the constant input as { “warmer”:true,”concurrency”:3}.
    nodejs code snippet:
    var http = require(‘http’);
    var ResponseJSON = “”;
    const warmer = require(‘lambda-warmer’);
    exports.handler = function (event, context, callback)
    console.log(‘event ‘ + JSON.stringify(event, null, 2));
    if(event.warmer) {
    console.log(‘Lambda Warmer event-to keep lambda warmup’);
    return context.logStreamName;
    // next set of code as part of real event.
    I would expect two lambdas get executed but in cloud watch logs could see only one set of logs.
    concurrency code is not executing. Am I missing something? Kindly share your thought.
    Configured the Lambda to run myVPC and not in default VPC or No VPC.
    Configured private subnets in myVPC.
    Do i need to configure public subnet to which does have NAT configured?
    At present the lambda warmer is running but concurrency is not happening.
    Thanks in advance;

    1. Sorry for the late reply. Yes, in order for your Lambda functions (within a VPC) to invoke other Lambda functions, you need to have a NAT gateway enabled.

  6. Hi Jeremy,
    Thank you for the lambda-warm module. I have been researching ways to warm up lambdas. We have a few lambdas within VPC that required to be warm up. In your option, is it a good practice to create a lambda-warmer for a lambda that require to be warm up. It seems like the lambda-warmer should be able to warm up multiple lambdas. Any thought on that?

    1. You can certainly warm multiple Lambdas, you’d just need to have multiple CloudWatch Rules to do that. Just make sure you actually have to warm you Lambdas before you do it. In many cases, apps with fairly steady traffic do not need the warmer as Cold Starts aren’t a huge issue. In low traffic situations, or if you’re preparing for some sort of burst, in sometimes makes sense.

  7. Couldnt recognize for where you are breaking the infinite loop, where each function call doesnt end up calling other concurrent functions ?

    1. The calls to other Lambdas do not pass the concurrency configuration, so they default to 1.

      let concurrency = event[config.concurrency]
      && !isNaN(event[config.concurrency])
      && event[config.concurrency] > 1
      ? event[config.concurrency] : 1

  8. Great design, thank you. I ported to C# and I’m using it. I have the same issue regarding API Gateway as others – although I can see multiple concurrent ‘warming’ invocations, a request via API Gateway still starts slowly (~14 sec) and then subsequent requests execute quickly (~200ms).

    1. Yes, we’re trying out provisioned concurrency next. As an update… the Lambda warming is working fine, as I can see cloudwatch logs that start with a warming event and then show subsequent web-driven events. The extra delay we are still seeing appears to be due to connection establishment delays with DynamoDB. Provisioned concurrency won’t fix that, so we’re adjusting the warming event to see if it can establish a DynamoDB connection that will be reused by functional requests.

  9. If my requirement is for 100 concurrent lambdas, should I warm 100 every time or how should I determine this number?

    1. Hi Rahul. Sorry for the late reply. If you need to warm that many Lambda functions, you should look at provisioned concurrency instead. You should also evaluate as to whether or not you really need to warm your functions. Cold starts are generally pretty low now, and you can easily spin up 500 functions without any throttling.

    1. Hi John. Provisioned concurrency can get expensive, but for the right use case, it can make a lot of sense. I should update this post with a note that this is NOT the preferred way to do it any more.

  10. Is there a risk of the warmer taking all of the Lambda’s concurrency, and causing real calls to fail?
    In most cases, the Lambdas should be already warm when called by the warmer, and since they basically do a no-op for the warmer, the period of time when the entire concurrency is taken by warmers should be very short.
    However, the first time the warmers are run, after a change to the function or some other event that resets the containers, I would assume all instances would be busy initializing for the warmers, and any real call would be throttled.
    Would you agree?

    1. Hi Shahar. Yes, this was never an optimal solution, but it was the official AWS recommendation for some time. The problem you discuss is of course possible, though given the fact that the Lambda Warmer short-circuits your function, the response time should be very quick (and against a warm function at that). It’s unlikely that you’ll run into this issue, but it is definitely possible.

      AWS now recommends using Provisioned Concurrency to keep your Lambda Functions warm. I recommend not using anything (including this library) to keep functions warm. If you build small, single purpose Lambda functions, you can usually achieve cold starts in the 100-300ms range. If you need to scale quickly, provisioned concurrency can help with that.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.