Serverless Tip: Don't overpay when waiting on remote API calls

Making remote API calls from your serverless apps? You're probably paying to wait for a response. Learn how to optimize your functions and save money.

Our serverless applications become a lot more interesting when they interact with third-party APIs like Twilio, SendGrid, Twitter, MailChimp, Stripe, IBM Watson and others. Most of these APIs respond relatively quickly (within a few hundred milliseconds or so), allowing us to include them in the execution of synchronous workflows (like our own API calls).  Sometimes we run these calls asynchronously as background tasks completely disconnected from any type of front end user experience.

Regardless how they're executed, the Lambda functions calling them need to stay running while they wait for a response. Unfortunately, Step Functions don't have a way to create HTTP requests and wait for a response. And even if they did, you'd at least have to pay for the cost of the transition, which can get a bit expensive at scale. This may not seem like a big deal on the surface, but depending on your memory configuration, the cost can really start to add up.

In this post we'll look at the impact of memory configuration on the performance of remote API calls, run a cost analysis, and explore ways to optimize our Lambda functions to minimize cost and execution time when dealing with third-party APIs.

TLDR;

Setting your memory configuration to 128 MB has a negligible (if any) effect on the execution time of a Lambda function making remote HTTP calls. By specializing functions to call remote API services, you can safely lower the memory configuration and save a significant amount of money.

Lambda is already really inexpensive

Yes, I agree, which is one of the reason I build serverless applications in the first place. Pay only for what you use, no servers to manage, etc. However, when you start to see significant scale, or you have some longer running processes, those fractions of a cent start to add up. This is particularly true when your Lambda functions aren't even doing any processing, but simply waiting for a response from some third-party API.

NOTE: Lambda has a per 100 ms billing model that changes based on the amount of memory you use. See the Lambda pricing details page for more information.

As responsible developers, we should always be looking for ways to optimize our applications. This should be true for both performance and cost. At AlertMe, we are processing thousands of articles per day, and this requires us to call a number of APIs, including some that don't respond in a few hundred milliseconds. Finding ways to be more efficient should be at the heart of every startup, and we're no different. So when I saw our Lambda bill starting to look like real money, I decided to dig a little deeper and figure out why.

Testing a hypothesis

We have a number of Lambda functions at AlertMe that work together to execute our article intake system. There is quite of bit of choreography that is required including de-duping, text analysis/comparison, NLP processing, etc. We've experimented in the past with different memory configurations and have found that for most of our workloads, 1024MB works just fine. However, there were a few functions that made remote API calls, did some post processing, and then saved the information to a datastore. Not overly complex, but some of these Lambdas take 30 seconds or more to complete because of the complexity of the remote APIs. This seemed like a lot of wasted execution time, so I set up a few experiments.

My hypothesis was that by lowering the memory configuration, that the execution of the Lambda function would be slower and perhaps not as cost effective. In my 15 Key Takeaways from the Serverless Talk at AWS Startup Day post, I highlight the fact that sometimes more memory will result in faster execution times, therefore saving you money and time. I made the same assumption here. To test this, I set up a series of Lambda functions, each with different memory configurations, and had them call the same API endpoint that introduced an artificial delay. I ran that experiment using a number of different delays and the results were very interesting.

Here is a sample function that called the API endpoint.

javascript
exports.t128 = async (event) => { let timer = Date.now() let result = await callAPI(128,event.delay) return Object.assign(result,{ total: (Date.now() - timer) + 'ms'}) } // end t128 const callAPI = async (memory,delay) => { let options = { method: 'POST', uri: 'https://XXXXXXX.execute-api.us-east-1.amazonaws.com/dev/test', body: { memory, delay }, headers: { 'Content-Type': 'application/json' }, json: true } return await REQUEST(options) }

Time is money

Let's say our external process only takes 100 ms (maybe we're looking up the weather or something really simple). In the chart below, I outline remote API calls with a 100 ms processing delay and record the total execution time including the added HTTP latency. The Total column is the total Lambda execution time (in ms). The next three columns calculate the cost of making 100,000, 1,000,000 and 10,000,000 API calls using the memory configuration in the Memory column. Note that the total execution times are rounded to the nearest 100 ms for the cost calculations.

Memory Time Total 100,000 1,000,000 10,000,000
128 100 ms 142 ms $0.04 $0.42 $4.16
512 100 ms 165 ms $0.17 $1.67 $16.68
1024 100 ms 135 ms $0.33 $3.33 $33.34
2048 100 ms 129 ms $0.67 $6.67 $66.68
3008 100 ms 142 ms $0.98 $9.79 $97.94

You can see that the overall performance of all the memory configurations are quite similar. I ran these tests several times and the numbers were fairly consistent, with no added benefit to increasing the memory. These are relatively small numbers. The difference between $0.42 and $9.79 for 1M API calls seems like a rounding error in the grand scheme of things. If you get to some significant volume however, you start to leave some money on the table.

But seriously, how many third-party API calls only take 100ms? Let's be a bit more realistic and figure at least 1 second. Same chart, with updated numbers and cost calculations.

Memory Time Total 100,000 1,000,000 10,000,000
128 1000 ms 1067 ms $0.23 $2.29 $22.88
512 1000 ms 1125 ms $1.00 $10.01 $100.08
1024 1000 ms 1044 ms $1.83 $18.34 $183.37
2048 1000 ms 1026 ms $3.67 $36.67 $366.74
3008 1000 ms 1074 ms $5.39 $53.87 $538.67

That added 900 ms really starts to make the numbers look a bit more real. Still relatively low, of course, but at 1M or 10M API calls, spending an extra $50 to $500 to wait for another process seems a bit crazy.

What if you're doing something a bit more complex? Maybe calling a natural language processing API? These types of calls might take 10 seconds or more.

Memory Time Total 100,000 1,000,000 10,000,000
128 10000 ms 10085 ms $2.10 $21.01 $210.08
512 10000 ms 10045 ms $8.42 $84.23 $842.34
1024 10000 ms 10038 ms $16.84 $168.37 $1,683.67
2048 10000 ms 10064 ms $33.67 $336.73 $3,367.34
3008 10000 ms 10062 ms $49.46 $494.60 $4,945.97

Now we're talking hundreds to thousands of dollars of added cost just to wait for another system to respond. Even at only 100,000 calls, we're starting to see quite a difference.

Let's go a bit further and address the problem we were having at AlertMe. We need to call the Diffbot API to power our article parsing component. The quality of the results are really good, and we have no interest in building our own web scraper, so overall, we are happy with the service. However, most of the time, a call to their API can take 30 seconds or more to complete.

Think about it. We call their API with a URL, they then download that page (sometimes several pages), run it through their parsing system, enrich it with some basic NLP, and then return the data to us. There are a lot of factors involved here, like the speed of the site we're downloading the page from, so there's not much we can do, other than wait for the response. ☹️

So here is the cost breakdown when you need to download hundreds of thousands of articles from that API:

Memory Time Total 100,000 1,000,000 10,000,000
128 25s 25091 ms $5.22 $52.21 $522.08
512 25 s 25081 ms $20.93 $209.33 $2,093.34
1024 25 s 25183 ms $42.01 $420.08 $4,200.84
2048 25 s 25091 ms $83.68 $836.83 $8,368.34
3008 25 s 25064 ms $122.91 $1,229.15 $12,291.47

Suddenly activities like this start to become cost prohibitive at the higher memory configurations.

What about Cold Starts?

I'm glad you asked. I ran a number of experiments that compared the cold start time at each memory configuration. There was little to no difference since these Lambdas were not in a VPC. The higher memory functions did shave a few hundred milliseconds off the startup time, but cold starts should be a tiny fraction of your invocations. It isn't enough to move the needle.

Save even more time and money with concurrent requests

Another great feature of Lambda is that you can use different programming languages for each of your functions. So regardless of whether you use Python, Ruby, or Go as your primary language, you can always take advantage of some of the features of other languages if there is a task better suited for it. While it's certainly possible in other languages, NodeJS is really good at handling concurrent outgoing requests. In fact, I ran the script below and it was able to process 100 parallel requests (with a simulated 1,000 ms delay) in just over 1,200 ms.

javascript
exports.multicall = async (event) => { let timer = Date.now() let calls = event.calls ? event.calls : 5 let events = [] for(let i=0;i<calls;i++) { events.push(callAPI(1024,event.delay)) } let result = await Promise.all(events) return Object.assign({ result },{ total: (Date.now() - timer) + 'ms'}) } // end multicall

So whether you use NodeJS, or another language to make remote API calls, bundling multiple calls together and running them concurrently can save a significant amount of Lambda execution time. In an asynchronous process, this could be passed off to another Lambda function, sent to an SNS topic or SQS queue, or passed into Kinesis for further processing.

Wrapping up

Making remote API calls in our Lambda functions is an unavoidable reality if we want our applications to be fully serverless. It's possible to set up an EC2 instance or a container that does this work for us, but we begin to add more complexity for minimal benefit. If we use lower memory configuration settings, my experiments show there is little to no effect on total Lambda execution time since it is simply waiting for a response and not doing any processing of its own.

Bottomline: Functions that make remote API calls can be broken down into small, asynchronous components with low memory settings. We get the same performance and significantly reduce our costs, especially at scale.

Do you know of a better way to do this? I'm always interested in finding new ways to optimize serverless applications, so please contact me or leave a comment.

Comments are currently disabled, but they'll be back soon.