Serverless Tip: Don’t overpay when waiting on remote API calls

Our serverless applications become a lot more interesting when they interact with third-party APIs like Twilio, SendGrid, Twitter, MailChimp, Stripe, IBM Watson and others. Most of these APIs respond relatively quickly (within a few hundred milliseconds or so), allowing us to include them in the execution of synchronous workflows (like our own API calls).  Sometimes we run these calls asynchronously as background tasks completely disconnected from any type of front end user experience.

Regardless how they’re executed, the Lambda functions calling them need to stay running while they wait for a response. Unfortunately, Step Functions don’t have a way to create HTTP requests and wait for a response. And even if they did, you’d at least have to pay for the cost of the transition, which can get a bit expensive at scale. This may not seem like a big deal on the surface, but depending on your memory configuration, the cost can really start to add up.

In this post we’ll look at the impact of memory configuration on the performance of remote API calls, run a cost analysis, and explore ways to optimize our Lambda functions to minimize cost and execution time when dealing with third-party APIs.

TLDR;

Setting your memory configuration to 128 MB has a negligible (if any) effect on the execution time of a Lambda function making remote HTTP calls. By specializing functions to call remote API services, you can safely lower the memory configuration and save a significant amount of money.

Lambda is already really inexpensive

Yes, I agree, which is one of the reason I build serverless applications in the first place. Pay only for what you use, no servers to manage, etc. However, when you start to see significant scale, or you have some longer running processes, those fractions of a cent start to add up. This is particularly true when your Lambda functions aren’t even doing any processing, but simply waiting for a response from some third-party API.

NOTE: Lambda has a per 100 ms billing model that changes based on the amount of memory you use. See the Lambda pricing details page for more information.

As responsible developers, we should always be looking for ways to optimize our applications. This should be true for both performance and cost. At AlertMe, we are processing thousands of articles per day, and this requires us to call a number of APIs, including some that don’t respond in a few hundred milliseconds. Finding ways to be more efficient should be at the heart of every startup, and we’re no different. So when I saw our Lambda bill starting to look like real money, I decided to dig a little deeper and figure out why.

Testing a hypothesis

We have a number of Lambda functions at AlertMe that work together to execute our article intake system. There is quite of bit of choreography that is required including de-duping, text analysis/comparison, NLP processing, etc. We’ve experimented in the past with different memory configurations and have found that for most of our workloads, 1024MB works just fine. However, there were a few functions that made remote API calls, did some post processing, and then saved the information to a datastore. Not overly complex, but some of these Lambdas take 30 seconds or more to complete because of the complexity of the remote APIs. This seemed like a lot of wasted execution time, so I set up a few experiments.

My hypothesis was that by lowering the memory configuration, that the execution of the Lambda function would be slower and perhaps not as cost effective. In my 15 Key Takeaways from the Serverless Talk at AWS Startup Day post, I highlight the fact that sometimes more memory will result in faster execution times, therefore saving you money and time. I made the same assumption here. To test this, I set up a series of Lambda functions, each with different memory configurations, and had them call the same API endpoint that introduced an artificial delay. I ran that experiment using a number of different delays and the results were very interesting.

Here is a sample function that called the API endpoint.

Time is money

Let’s say our external process only takes 100 ms (maybe we’re looking up the weather or something really simple). In the chart below, I outline remote API calls with a 100 ms processing delay and record the total execution time including the added HTTP latency. The Total column is the total Lambda execution time (in ms). The next three columns calculate the cost of making 100,000, 1,000,000 and 10,000,000 API calls using the memory configuration in the Memory column. Note that the total execution times are rounded to the nearest 100 ms for the cost calculations.

Memory Time Total 100,000 1,000,000 10,000,000
128 100 ms 142 ms $0.04 $0.42 $4.16
512 100 ms 165 ms $0.17 $1.67 $16.68
1024 100 ms 135 ms $0.33 $3.33 $33.34
2048 100 ms 129 ms $0.67 $6.67 $66.68
3008 100 ms 142 ms $0.98 $9.79 $97.94

You can see that the overall performance of all the memory configurations are quite similar. I ran these tests several times and the numbers were fairly consistent, with no added benefit to increasing the memory. These are relatively small numbers. The difference between $0.42 and $9.79 for 1M API calls seems like a rounding error in the grand scheme of things. If you get to some significant volume however, you start to leave some money on the table.

But seriously, how many third-party API calls only take 100ms? Let’s be a bit more realistic and figure at least 1 second. Same chart, with updated numbers and cost calculations.

Memory Time Total 100,000 1,000,000 10,000,000
128 1000 ms 1067 ms $0.23 $2.29 $22.88
512 1000 ms 1125 ms $1.00 $10.01 $100.08
1024 1000 ms 1044 ms $1.83 $18.34 $183.37
2048 1000 ms 1026 ms $3.67 $36.67 $366.74
3008 1000 ms 1074 ms $5.39 $53.87 $538.67

That added 900 ms really starts to make the numbers look a bit more real. Still relatively low, of course, but at 1M or 10M API calls, spending an extra $50 to $500 to wait for another process seems a bit crazy.

What if you’re doing something a bit more complex? Maybe calling a natural language processing API? These types of calls might take 10 seconds or more.

Memory Time Total 100,000 1,000,000 10,000,000
128 10000 ms 10085 ms $2.10 $21.01 $210.08
512 10000 ms 10045 ms $8.42 $84.23 $842.34
1024 10000 ms 10038 ms $16.84 $168.37 $1,683.67
2048 10000 ms 10064 ms $33.67 $336.73 $3,367.34
3008 10000 ms 10062 ms $49.46 $494.60 $4,945.97

Now we’re talking hundreds to thousands of dollars of added cost just to wait for another system to respond. Even at only 100,000 calls, we’re starting to see quite a difference.

Let’s go a bit further and address the problem we were having at AlertMe. We need to call the Diffbot API to power our article parsing component. The quality of the results are really good, and we have no interest in building our own web scraper, so overall, we are happy with the service. However, most of the time, a call to their API can take 30 seconds or more to complete.

Think about it. We call their API with a URL, they then download that page (sometimes several pages), run it through their parsing system, enrich it with some basic NLP, and then return the data to us. There are a lot of factors involved here, like the speed of the site we’re downloading the page from, so there’s not much we can do, other than wait for the response. ☹️

So here is the cost breakdown when you need to download hundreds of thousands of articles from that API:

Memory Time Total 100,000 1,000,000 10,000,000
128 25s 25091 ms $5.22 $52.21 $522.08
512 25 s 25081 ms $20.93 $209.33 $2,093.34
1024 25 s 25183 ms $42.01 $420.08 $4,200.84
2048 25 s 25091 ms $83.68 $836.83 $8,368.34
3008 25 s 25064 ms $122.91 $1,229.15 $12,291.47

Suddenly activities like this start to become cost prohibitive at the higher memory configurations.

What about Cold Starts?

I’m glad you asked. I ran a number of experiments that compared the cold start time at each memory configuration. There was little to no difference since these Lambdas were not in a VPC. The higher memory functions did shave a few hundred milliseconds off the startup time, but cold starts should be a tiny fraction of your invocations. It isn’t enough to move the needle.

Save even more time and money with concurrent requests

Another great feature of Lambda is that you can use different programming languages for each of your functions. So regardless of whether you use Python, Ruby, or Go as your primary language, you can always take advantage of some of the features of other languages if there is a task better suited for it. While it’s certainly possible in other languages, NodeJS is really good at handling concurrent outgoing requests. In fact, I ran the script below and it was able to process 100 parallel requests (with a simulated 1,000 ms delay) in just over 1,200 ms.

So whether you use NodeJS, or another language to make remote API calls, bundling multiple calls together and running them concurrently can save a significant amount of Lambda execution time. In an asynchronous process, this could be passed off to another Lambda function, sent to an SNS topic or SQS queue, or passed into Kinesis for further processing.

Wrapping up

Making remote API calls in our Lambda functions is an unavoidable reality if we want our applications to be fully serverless. It’s possible to set up an EC2 instance or a container that does this work for us, but we begin to add more complexity for minimal benefit. If we use lower memory configuration settings, my experiments show there is little to no effect on total Lambda execution time since it is simply waiting for a response and not doing any processing of its own.

Bottomline: Functions that make remote API calls can be broken down into small, asynchronous components with low memory settings. We get the same performance and significantly reduce our costs, especially at scale.

Do you know of a better way to do this? I’m always interested in finding new ways to optimize serverless applications, so please contact me or leave a comment.

Tags: , , , ,


Did you like this post? 👍  Do you want more? 🙌  Follow me on Twitter or check out some of the projects I’m working on.

7 thoughts on “Serverless Tip: Don’t overpay when waiting on remote API calls”

  1. Great post. I also built a Lambda function that calls a slow API and runs for longer periods of time. I use Python’s multiprocessing module for concurrent API calls in a function with 128Mb memory. They return results to a multiprocessing Manager. For my application, I start the function every 5 minutes with a CloudWatch rule, it checks the queue for tasks and starts processing them. If there are no tasks or if it’s done with tasks, it will quit. If there are more tasks than can be processed in 5 minutes, it calculates the average API response time so it knows when to gracefully quit before the 5 minute timeout. CloudWatch will start it again at the next scheduled time. I designed it this way because the target API has a maximum of 10 concurrent calls, so I only want 1 Lambda function running at any time. The 5-minute wait between invocations is fine, because it’s more of a batch process.

    1. Hi Jep,

      Sounds like a good approach for your use case. Lambda functions can run up to 15 minutes now, so it’s possible to extend your execution even longer. Also, you could change the CloudWatch rule to invoke your function more frequently (like every minute) and then use a concurrency setting of 1 so that if your function is running, the CloudWatch trigger will just get throttled. Finally, you can use the get_remaining_time_in_millis method from the context object to determine how much time your function has left.

      – Jeremy

  2. Hey great article!
    I’m curious about the methodology used here to test the impact of increasing the memory of the lambda to reduce the run time due to IO latency. What was the size of the data being transferred in the requests you tested? If the delays set were all artificial, and the actual network bandwidth used was next to nothing, why would increasing the network bandwidth (via increasing the lambda memory) have any affect on the duration? Where I’ve seen huge costs cut in by increasing memory is when the network request actually transfers a meaningful amount of data, like a querying for a hash key in DynamoDB that could return 10-100s of records, or publishing a batch of events to a Kinesis stream. Have you done any tests around how memory size can reduce run times when the network bandwidth could be a bottleneck?

    1. Hey Alex,
      For all the experiments, the delays were artificial so that I could guarantee execution time from the calling Lambda. I was returning a response that was approximately 150 KB, which was the size of the response we get from one of the third-party APIs we use. If you were to increase the size of the response, I agree that increased memory would likely add some benefits because of the added network throughput. However, if you are using this for consistently-sized responses, I’ve found these lower memory settings to be quite efficient.
      – Jeremy

  3. The ‘breaking down into small async components’ to reduce the cost is in fact very good. i.e., it waits for all APIs to respond.
    I landed in this interesting blog post because I am looking for “call & forget type” of invocations from lambda – especially I would like to do this because the external API does some long processing and saves the result in a database and hence lambda is not required to wait until processing is finished.
    I think of step-functions as one option — do you suggest AWS step-functions (https://aws.amazon.com/blogs/aws/new-compute-database-messaging-analytics-and-machine-learning-integration-for-aws-step-functions/) or something different for this?
    Thanks.

  4. Hi! Very insteresting. We have to use AWS to expose some orchestration logic as an API. The orchestration should contain some request validation, remote HTTPs API call, and then execute some Java code and return the output synchronously as a response. One approach could be to expose a Java Lambda via API Gateway, that does everything. Another approach could be to expose a Step Function Express (sync) via API Gateway, that orchestrates the API invoke, validations and Java Lambda execution. Which way is best regarding cold start, performance and money? Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.