Update June 5, 2019: The Data API team has released another update that adds improvements to the JSON serialization of the responses. Any unused type fields will be removed, which makes the response size 80+% smaller.
Update June 4, 2019: After playing around with the updated Data API, I found myself writing a few wrappers to handle parameter formation, transaction management, and response formatting. I ended up writing a full-blown client library for it. I call it the “Data API Client“, and it’s available now on GitHub and NPM.
Update May 31, 2019: AWS has released an updated version of the Data API (see here). There have been a number of improvements (especially to the speed, security, and transaction handling). I’ve updated this post to reflect the new changes/improvements.
On Tuesday, November 20, 2018, AWS announced the release of the new Aurora Serverless Data API. This has been a long awaited feature and has been at the top of many a person’s #awswishlist. As you can imagine, there was quite a bit of fanfare over this on Twitter.
Obviously, I too was excited. The prospect of not needing to use VPCs with Lambda functions to access an RDS database is pretty compelling. Think about all those cold start savings. Plus, connection management with serverless and RDBMS has been quite tricky. I even wrote an NPM package to help deal with the
max_connections issue and the inevitable zombies 🧟♂️ roaming around your RDS cluster. So AWS’s RDS via HTTP seems like the perfect solution, right?
Well, not so fast. 😞 (Update May 31, 2019: There have been a ton of improvements, so read the full post.)
Amazon announced the General Availability of Aurora Serverless on August 9, 2018. I have been playing around with the preview of Aurora Serverless for a few months, and I must say that overall, I’m very impressed. There are A LOT of limitations with this first release, but I believe that Amazon will do what Amazon does best, and keep iterating until this thing is rock solid.
The announcement gives a great overview and the official User Guide is chock full of interesting and useful information, so I definitely suggest giving those a read. In this post, I want to dive a little bit deeper and discuss the pros and cons of Aurora Serverless. I also want to dig into some of the technical details, pricing comparisons, and look more closely at the limitations.
Someone asked a great question on my How To: Reuse Database Connections in AWS Lambda post about how to end the unused connections left over by expired Lambda functions:
I’m playing around with AWS lambda and connections to an RDS database and am finding that for the containers that are not reused the connection remains. I found before that sometimes the connections would just die eventually. I was wondering, is there some way to manage and/or end the connections without needing to wait for them to end on their own? The main issue I’m worried about is that these unused connections would remain for an excessive amount of time and prevent new connections that will actually be used from being made due to the limit on the number of connections.
🧟♂️ Zombie RDS connections leftover on container expiration can become a problem when you start to reach a high number of concurrent Lambda executions. My guess is that this is why AWS is launching Aurora Serverless, to deal with relational databases at scale.
At the time of this writing it is still in preview mode.
Update September 2, 2018: I wrote an NPM module that manages MySQL connections for you in serverless environments. Check it out here.
Update August 9, 2018: Aurora Serverless is now Generally Available!
Overall, I’ve found that Lambda is pretty good about closing database connections when the container expires, but even if it does it reliably, it still doesn’t solve the MAX CONNECTIONS problem. Here are several strategies that I’ve used to deal with this issue.
I came across an interesting problem the other day. As part of our URL normalization strategy at AlertMe, we have been adding a trailing slash to URLs without file extensions. We did a lot of research when deciding on this tactic and the general consensus around the web was to use trailing slashes for directories and (obviously) no slashes on filenames. See this article from the official Google Webmasters blog: https://webmasters.googleblog.com/2010/04/to-slash-or-not-to-slash.html (I know it’s old, but the concept is still relevant).
We even tested a number of publisher URLs to see what their redirection strategies were. Every one we tested responded correctly to both the slash and no-slash versions of the URL. Some redirected to a trailing slash, some redirected to no trailing slash, but they all returned (or redirected to) the intended page.
Update 9/2/2018: I wrote an NPM module that manages MySQL connections for you in serverless environments. Check it out here.
I work with AWS Lambda quite a bit. The ability to use this Functions-as-a-Service (FaaS) has dramatically reduced the complexity and hardware needs of the apps I work on. This is what’s known as a “Serverless” architecture since we do not need to provision any servers in order to run these functions. FaaS is great for a number of use cases (like processing images) because it will scale immediately and near infinitely when there are spikes in traffic. There’s no longer the need to run several underutilized processing servers just waiting for someone to request a large job.
AWS Lambda is event-driven, so it’s also possible to have it respond to API requests through AWS’s API Gateway. However, since Lambda is stateless, you’ll most likely need to query a persistent datastore in order for it to do anything exciting. Setting up a new database connection is relatively expensive. In my experience it typically takes more than 200ms. If we have to reconnect to the database every time we run our Lambda functions (especially if we’re responding to an API request) then we are already adding over 200ms to the total response time. Add that to your queries and whatever additional processing you need to perform and it becomes unusable under normal circumstance. Luckily, Lambda lets us “freeze” and then “thaw” these types of connections.
Update 4/5/2018: After running some new tests, it appears that “warm” functions now average anywhere between 4 and 20ms to connect to RDS instances in the same VPC. Cold starts still average greater than 100ms. Lambda does handle setting up DB connections really well under heavy load, but I still favor connection reuse as it cuts several milliseconds off your execution time.
I’ve been managing a few small MySQL databases lately that often need record updates but certainly don’t warrant building a separate management interface. The easiest way to accomplish this (assuming you don’t have complicated joins and relationships) is to install phpMyAdmin, a robust, web-based admin utility for MySQL that is built in php. In my case, I’m running these mostly on Amazon Linux instances, so after a little poking around, it turns out the installation is just 3 simple steps.