“What? You can’t use MySQL with serverless functions, you’ll just exhaust all the connections as soon as it starts to scale! And what about zombie connections? Lambda doesn’t clean those up for you, meaning you’ll potentially have hundreds of sleeping threads blocking new connections and throwing errors. It can’t be done!” ~ Naysayer
I really like DynamoDB and BigTable (even Cosmos DB is pretty cool), and for most of my serverless applications, they would be my first choice as a datastore. But I still have a love for relational databases, especially MySQL. It had always been my goto choice, perfect for building normalized data structures, enforcing declarative constants, providing referential integrity, and enabling ACID-compliant transactions. Plus the elegance of SQL (structured query language) makes organizing, retrieving and updating your data drop dead simple.
But now we have SERVERLESS. And Serverless functions (like AWS Lambda, Google Cloud Functions, and Azure Functions) scale almost infinitely by creating separate instances for each concurrent user. This is a MAJOR PROBLEM for RDBS solutions like MySQL, because available connections can be quickly maxed out by concurrent functions competing for access. Reusing database connections doesn’t help, and even the release of Aurora Serverless doesn’t solve the max_connections
problem. Sure there are some tricks we can use to mitigate the problem, but ultimately, using MySQL with serverless is a massive headache.
Well, maybe not anymore. 😀 I’ve been dealing with MySQL scaling issues and serverless functions for years now, and I’ve finally incorporated all of my learning into a simple, easy to use NPM module that (I hope) will solve your Serverless MySQL problems.
Introducing Serverless MySQL
Serverless MySQL is a wrapper for Doug Wilson’s amazing mysql Node.js module. It adds a connection management component to the mysql
module that is designed specifically for use with serverless applications. The module can monitor the number of connections being utilized, and then based on your settings, manage those connections to allow thousands of concurrent executions to share them. It will reuse connections when possible, clean up zombie threads, enforce connection limits per user, and retry connections using trusted backoff algorithms.
In addition, Serverless MySQL also adds modern async/await
support to the mysql
module, eliminating callback hell or the need to wrap calls in promises. It also dramatically simplifies transactions, giving you a simple and consistent pattern to handle common workflows.
NOTE: This module has been tested with AWS’s RDS MySQL, Aurora MySQL, and Aurora Serverless, but it should work with any standards-based MySQL server.
A Simple Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
// Require and initialize outside of your main handler const mysql = require('serverless-mysql')({ config: { host : process.env.ENDPOINT, database : process.env.DATABASE, user : process.env.USERNAME, password : process.env.PASSWORD } }) // Main handler function exports.handler = async (event, context) => { // Run your query let results = await mysql.query('SELECT * FROM table') // Run clean up function await mysql.end() // Return the results return results } |
Installation and Requirements
Serverless MySQL is an NPM module that can be installed and included in your Node.js functions.
1 |
> npm i serverless-mysql |
This module uses modern JavaScript methods and therefore requires Node 8.10+. It also requires the use of a MySQL-compatable server/cluster.
There really isn’t anything special that needs to be done in order for your MySQL server (including RDS, Aurora, and Aurora Serverless) to use serverless-mysql
. You should just be aware of the following two scenarios.
If you set max user_connections
, the module will only manage connections for that user. This is useful if you have multiple clients connecting to the same MySQL server (or cluster) and you want to make sure your serverless app doesn’t use all of the available connections.
If you’re not setting max user_connections
, the user MUST BE granted the PROCESS
privilege in order to count other connections. Otherwise it will assume that its connections are the only ones being used. Granting PROCESS
is fairly safe as it is a read only permission and doesn’t expose any sensitive data.
How to use Serverless MySQL 🚀
Serverless MySQL wraps the mysql module, so it supports pretty much everything that the mysql
module does. It uses all the same connection options, provides a query()
method that accepts the same arguments when performing queries (except the callback of course), and passes back the query results exactly as the mysql
module returns them. There are a few things that don’t make sense in serverless environments, like streaming rows, so there is no support for that yet.
To use Serverless MySQL, require
it OUTSIDE your main function handler. This will allow for connection reuse between executions. The module must be initialized before its methods are available.
There are a number of supported configuration options (including event callbacks) detailed in the documentation. These must be passed in during initialization.
1 2 3 4 5 6 7 8 9 |
// Require and initialize with default options const mysql = require('serverless-mysql')() // <-- initialize with function call // OR include configuration options const mysql = require('serverless-mysql')({ backoff: 'decorrelated', base: 5, cap: 200 }) |
MySQL connection options can be passed in at initialization (using the config
property) or later using the config()
method.
1 2 3 4 5 6 |
mysql.config({ host : process.env.ENDPOINT, database : process.env.DATABASE, user : process.env.USERNAME, password : process.env.PASSWORD }) |
You can explicitly establish a connection using the connect()
method if you want to, though it isn’t necessary. This method returns a promise, so you’ll need to await
the response or wrap it in a promise chain.
1 |
await mysql.connect() |
Running queries is super simple using the query()
method. It supports all query options supported by the mysql
module, but returns a promise instead of using the standard callbacks. You either need to await
them or wrap them in a promise chain.
1 2 3 4 5 6 7 8 9 10 11 12 |
// Simple query let results = await query('SELECT * FROM table') // Query with placeholder values let results = await query('SELECT * FROM table WHERE name = ?', ['serverless']) // Query with advanced options let results = await query({ sql: 'SELECT * FROM table WHERE name = ?', timeout: 10000, values: ['serverless']) }) |
Once you’ve run all your queries and your serverless function is ready to return data, call the end()
method to perform connection management tasks. This will do things like check the current number of connections, clean up zombies, or even disconnect if there are too many connections being used. Be sure to await
its result before continuing.
1 2 |
// Perform connection management tasks await mysql.end() |
Note that end()
will NOT necessarily terminate the connection. Only if it has to in order to free up connections. If you’d like to explicitly terminate connections, use the quit()
method instead.
1 2 |
// Gracefully terminate the connection mysql.quit() |
If you need access to the connection object, you can use the getClient()
method. This will allow you to use any supported feature of the mysql
module directly.
1 2 3 4 5 |
// Get the connection object let connection = mysql.getClient() // Use it to escape a value let value = connection.escape('Some value to be escaped') |
Transaction Support 🙌
Transaction support in Serverless MySQL has been dramatically simplified. Start a new transaction using the transaction()
method, and then chain queries using the query()
method. The query()
method supports all standard query options.
You can specify an optional rollback()
method in the chain. This will receive the error
object if a transaction fails, allowing you to add additional logging or perform some other action. Call the commit()
method when you are ready to execute the queries.
1 2 3 4 5 6 |
let results = await mysql.transaction() .query('INSERT INTO table (x) VALUES(?)', [1]) .query('UPDATE table SET x = 1 ') .query('SELECT * FROM table') .rollback(e => { /* do something with the error */ }) // optional .commit() // execute the queries |
Automatic Connection Retries and Backoff 🤓
Serverless MySQL will automatically kill idle connections or disconnect the current connection if the connUtilization
limit is reached. Even with this aggressive strategy, it is possible that multiple functions will be competing for the same available connections. The backoff
setting uses the strategy outlined here to use Jitter instead of Exponential Backoff when attempting connection retries.
The two supported methods are full
and decorrelated
Jitter. Both are effective in reducing server strain and minimize retries. The module defaults to full
.
Full Jitter: LESS work, MORE time
1 |
sleep = random_between(0, min(cap, base * 2 ** attempts)) |
Decorrelated Jitter: MORE work, LESS time
1 |
sleep = min(cap, random_between(base, sleep * 3)) |
You can also provide your own backoff algorithm.
Does this thing actually work? 🤔
Fair question. I’ve run a lot of tests using a number of different configurations. Ramp ups appear to work best, but once there are several warm containers, the response times are much better. Below is an example test I ran using AWS Lambda and Aurora Serverless. Aurora Serverless was configured with 2 ACUs (and it did not autoscale), so there were only 90 connections available to the MySQL cluster. The Lambda function was configured with 1,024 MB of memory. This test simulated 500 users per second for one minute. Each user ran a sample query retrieving a few rows from a table.
From the graph below, you can see that the average response time was 41 ms (min 20 ms, max 3743 ms) with ZERO errors.
Other tests that use larger configurations were extremely successful too, but I’d appreciate other independent tests to verify my assumptions. 📊
Where do we go from here?
There is plenty more to do with this module. I’ve done a number of simulated tests, but there are several others that need to be run. The module also needs some automated tests written along with code coverage reporting. I’ll be adding that soon so that we don’t introduce regressions as new features are added.
I’ve actually been using this module in production for quite some time and it has worked perfectly for several of my applications. Even if you are not experiencing massive scale, this module is a handy tool for simply managing your connections and providing a more intuitive interface into the mysql
module.
I’d love to know your thoughts and see your test results. There is always room for improvement, so your feedback, bug reports, suggestions, and contributions are greatly appreciated.
Good luck! 🤘🏻
Tags: aurora, aurora serverless, aws, aws lambda, google cloud, microsoft azure functions, nodejs, rds, serverless
Did you like this post? 👍 Do you want more? 🙌 Follow me on Twitter or check out some of the projects I’m working on.
Could you write similar wrapper for MongoDB and provide node module. It will be really great to many users. thanks.
Hi Chandru,
I wish I had the time. I don’t use MongoDB in serverless production environments, so I’m not as familar with the ins and outs of the technology. I do think most of the same techniques I use would work for MongoDB, so perhaps someone wants to fork the repo and add that functionality?
Thanks,
Jeremy
Thanks for sharing! Do you plan to integrate this with existing ORMs? Like sequelize? Or would you prefer to just keep using raw SQL queries?
Hi Tom,
I personally hate ORMs. I think they mostly provide a layer of abstraction that has little to no benefit. They often require several queries to be run when a simple join could have solved the problem. Plus, the likelihood of me changing my underlying database technology is slim to none, and even if I did, most ORMs would require code changes anyway. I do have a separate NPM module I’ve been working on that integrates a caching layer, but it’s not quite ready to share.
Thanks for feedback,
Jeremy
Thanks for the article and the lib, we have the same Lambda/Aurora SLS combo so we’ll definitely give it a try.
Regarding your comment on ORMs and the likelihood to change DB tech… I don’t believe that “convenient switching of underlying DB tech” is something that is highlighted as an ORM benefit. At the beginning of a project, you can pick whatever you like / need, but at later stages, it immediately disappears because switch is not “just plug-in a different DB driver”, there so much more to that.
Hi Adrian,
I did point out that changing your DB tech often requires a number of query rewrites. You’ve obviously had experience with ORMs, so you are well aware of this fact. However, Doctrine ORM (as an example) says, “This provides developers with a powerful alternative to SQL that maintains flexibility without requiring unnecessary code duplication.” IMO, someone new to ORMs might interpret “maintains flexibility” as a reference to the underlying DB tech. In some cases this is true, until you add complexity.
My disdain for most ORMs is that they provide too much abstraction, and in doing so, require unnecessary steps (read multiple queries) to accomplish simple things. If you are just using it to list, sort and paginate data, they may work well. I am a fan of query builders, since they can shortcut quite a bit of code, but I don’t like handing query planning over to an ORM.
Thanks for the comment and I hope the lib works for you.
– Jeremy
Good stuff. As I work through this I so far noticed in your first example you say “Man handler” when I think you meant “main handler” but that was funny so might be worth keeping lol. Also in that example, it’s missing the arrow operator to make the handler a function. Other than that it works for me. Thanks for the cool library!
🤦🏻♂️ Thanks for noticing. I just updated the post.
Hi , i read your article and its very interesting , i also have implemented it in lambdas with MySQL database , however now I am in a project where use Aurora MySQL its mandatory , i have treid to connected with mysql-serverless but it doesnt work , do you know a way to connect it ?
Hi Carlos,
The
serverless-mysql
package is Amazon Aurora compatible. Make sure that your Lambdas are in the same VPC and that you have granted the correct security permissions.– Jeremy
Hi Jeremy! This is very cool. Do you happen to know of anything similar for Python?
Hi Jani,
Unfortunately I don’t, but I’m sure it wouldn’t be super difficult to reimplement in Python.
– Jeremy
Hi Jeremy, I just installed this plugin, but when I try to run a query I get the following error:
(node:7292) UnhandledPromiseRejectionWarning: Error: Error: ER_NOT_SUPPORTED_AUTH_MODE: Client does not support authentication protocol requested by server; consider upgrading MySQL client
at connect (C:\Users\Jackie\Amazli.com\website\App\node_modules\serverless-mysql\index.js:79:15)
However I have no problems at all connecting using regular mysql.js.
Any ideas?
Um… you can disregard that last comment, asking about auth problems. I was forgetting to call my config function. Herp-a-derp!
Glad you figured it out!
This is great. What are your thoughts of implementing the mysql initialization and common method calls in a common Layer instead of each individual lambda handler file? Is it going to be faster or slower? How about efficiency?
You could certainly include some of the boilerplate in a Layer, but I would still have each function initialize connections when they were needed.
Hi Jeremy,
I’m upgrading my lambda NodeJS8 to 10 and when I run the lambda I get an error about serverless-mysql
saying Error: Error: Cannot find module ‘isarray’ at connect (/var/task/node_modules/serverless-mysql/index.js:78:13)It used to work perfectly with Node JS 8 and now all my apis using mysql are failing.
Can you please help?
Please post this in the GitHub issues and I’ll take a look.
Thanks,
Jeremy
Hey Jeremy, Great Stuff. I’m regular listener of your podcast and a big fan your tweets. I’m aware of this module for long but looking at it closely because my client may need something like this for Postgres. Couple of questions.
1. Quoting from the post, “The module can monitor the number of connections being utilized, and then based on your settings, manage those connections to allow thousands of concurrent executions to share them” I’m kind of lost when you say “thousands of concurrent executions sharing” the connection. There is no sharing happening between concurrent executions right? because concurrent executions happen in different lambda containers, each per request – a lambda container instance executes one request at a time. Understand that the same connection is reused for subsequent requests in the same warm lambda container. Could you please clarify? Maybe I am not reading it right.
2. Has anyone attempted to write a similar module for Postgres? Probably based on the same design as mysql version.
Sorry for the late reply. You are not sharing connections, but rather using Lambda to automatically disconnect and then retry failed connections to utilize the total connections available. Even though it can often handle concurrent connections that far exceed the number of available database connections, you should still size the database accordingly to handle normal load. I saw someone working on a Postgres version, but last time I checked it wasn’t very far along.
Thanks for the article Jeremy! One question here, let’s say we reach the maximum of opened connection and we have one new lambda execution that wants to open a new connection to the database. what is going to happen with the new connection? is it going to be created once a new connection is available in the pool?
is there a timeout for establishing connection? As far as I can see in the library, there are timeouts for zombie connections and queries.
Thanks for the article, It’s great! couple of questions here:
1) Let’s say we reach the maximum number of connections opened, and we have a new lambda executions that wants to open a new connection, what is going to happen when instantiating the new connection? is the lambda execution going to wait until finding an available connection? do we have a timeout for creating connections?
2) In case we use RDS Aurora serverless and the Data API, do we need the library for handling max_connections? or that is managed by Data API?
Thanks
Marcos
Hi Jeremy
I want to ask you if the queries get sanitized to prevent SQL injection? And when do the queries get sanitized? If you use the ? with values: or do we have to do something else too that the queries get sanitized? I tried it myself with postman and i recognized that the ? paremeters are wrapped as strings so where I tried to do a test injection it got prevented because it got an error. The query was “‘DELETE FROM task WHERE id=30′” so a String in a String. Is that the sanitation?
Thanks
Andre