Three years ago at re:Invent 2017, AWS announced the original Amazon Aurora Serverless preview. I spent quite a bit of time with it, and when it went GA 9 months later, I published my thoughts in a post titled Aurora Serverless: The Good, the Bad and the Scalable.
If you read the post, you’ll see that I was excited and optimistic, even though there were a lot of missing features. And after several months of more experiments, I finally moved some production workloads onto it, and had quite a bit of success. Over the last 18 months, we’ve seen some improvements to the product (including support for PostgreSQL and the Data API), but there were still loads of problems with the scale up/down speeds, failover time, and lack of Aurora provisioned cluster features.
That all changed with the introduction of Amazon Aurora Serverless v2. I finally got access to the preview and spent a few hours trying to break it. My first impression? This thing might just be a silver bullet!
I know that’s a bold statement. 😉 But even though I’ve only been using it for a few hours, I’ve also read through the (minimal) docs, reviewed the pricing, and talked to one of the PMs to understand it the best I could. There clearly must be some caveats, but from what I’ve seen, Aurora Serverless v2 is very, very promising. Let’s take a closer look!
Update December 9, 2020: I’ve updated the post with some more information after having watched the “Amazon Aurora Serverless v2: Instant scaling for demanding workloads” presentation by Murali Brahmadesam (Director of Engineering, Aurora Databases and Storage) and Chayan Biswas (Principle Product Manager, Amazon Aurora). The new images are courtesy of their presentation.
Update June 5, 2019: The Data API team has released another update that adds improvements to the JSON serialization of the responses. Any unused type fields will be removed, which makes the response size 80+% smaller.
Update June 4, 2019: After playing around with the updated Data API, I found myself writing a few wrappers to handle parameter formation, transaction management, and response formatting. I ended up writing a full-blown client library for it. I call it the “Data API Client“, and it’s available now on GitHub and NPM.
Update May 31, 2019: AWS has released an updated version of the Data API (see here). There have been a number of improvements (especially to the speed, security, and transaction handling). I’ve updated this post to reflect the new changes/improvements.
On Tuesday, November 20, 2018, AWS announced the release of the new Aurora Serverless Data API. This has been a long awaited feature and has been at the top of many a person’s #awswishlist. As you can imagine, there was quite a bit of fanfare over this on Twitter.
Obviously, I too was excited. The prospect of not needing to use VPCs with Lambda functions to access an RDS database is pretty compelling. Think about all those cold start savings. Plus, connection management with serverless and RDBMS has been quite tricky. I even wrote an NPM package to help deal with the
max_connections issue and the inevitable zombies 🧟♂️ roaming around your RDS cluster. So AWS’s RDS via HTTP seems like the perfect solution, right?
Well, not so fast. 😞 (Update May 31, 2019: There have been a ton of improvements, so read the full post.)
“What? You can’t use MySQL with serverless functions, you’ll just exhaust all the connections as soon as it starts to scale! And what about zombie connections? Lambda doesn’t clean those up for you, meaning you’ll potentially have hundreds of sleeping threads blocking new connections and throwing errors. It can’t be done!” ~ Naysayer
I really like DynamoDB and BigTable (even Cosmos DB is pretty cool), and for most of my serverless applications, they would be my first choice as a datastore. But I still have a love for relational databases, especially MySQL. It had always been my goto choice, perfect for building normalized data structures, enforcing declarative constants, providing referential integrity, and enabling ACID-compliant transactions. Plus the elegance of SQL (structured query language) makes organizing, retrieving and updating your data drop dead simple.
But now we have SERVERLESS. And Serverless functions (like AWS Lambda, Google Cloud Functions, and Azure Functions) scale almost infinitely by creating separate instances for each concurrent user. This is a MAJOR PROBLEM for RDBS solutions like MySQL, because available connections can be quickly maxed out by concurrent functions competing for access. Reusing database connections doesn’t help, and even the release of Aurora Serverless doesn’t solve the
max_connections problem. Sure there are some tricks we can use to mitigate the problem, but ultimately, using MySQL with serverless is a massive headache.
Well, maybe not anymore. 😀 I’ve been dealing with MySQL scaling issues and serverless functions for years now, and I’ve finally incorporated all of my learning into a simple, easy to use NPM module that (I hope) will solve your Serverless MySQL problems.