Event Injection: Protecting your Serverless Applications
Serverless applications, like all event-driven architectures, must account for user-supplied data from multiple sources. Make sure you protect your applications.
Updated January 25, 2019: This post was updated based on feedback from the community.
The shared security model of cloud providers extends much further with serverless offerings, but application security is still the developer's responsibility. Many traditional web applications are front-ended with WAFs (web application firewalls), RASPs (runtime application self-protection), EPPs (endpoint protection platforms) and WSGs (web security gateways) that inspect incoming and outgoing traffic. These extra layers of protection can save developers from themselves when making common programming mistakes that would otherwise leave their applications vulnerable. If you're invoking serverless functions from sources other than API Gateway, you no longer have the ability to use the protection of a WAF.
Serverless makes it easy to deploy a function to the cloud and not think about the infrastructure it's running on. While certainly convenient, this can leave many developers with a false sense of security. By relying too heavily on the cloud provider, and not coding defensively, developers can significantly reduce their overall security posture. As with any type of software, there are a myriad of attacks possible against serverless infrastructures. However, unlike many traditional web applications, serverless architectures are "event-driven". This means they can be triggered by a number of different sources with multiple formats and encodings, bypassing protection provided by WAFs and opening them up to different types of attacks. 🤯
This certainly isn't unique to serverless, as all event-driven architectures are prone to this. If fact, most event sources existed prior to the introduction of Lambda. But, it is important to understand that developers are still responsible for their application code.
Where does event data come from? 🤔
Common web application exploits include SQL injection, code injection and cross-site scripting (XSS) attacks. WAFs are usually pretty good at detecting and defending against these types of attacks, and you should certainly enable it on your API Gateways. However, WAFs only go so far, so I would hope that most of us apply some simple application layer input sanitization as well. But even the best programmers are prone to overlook less common data patterns like local or remote file inclusion (LFI/RFI) attacks.
Above we're assuming the input is coming from a web application. In this case, we (or a WAF) would be inspecting just the request body
and the URL parameters. Still a lot to think about, but what about data from other triggers? At the time of this writing, there are 47 supported event sources that can trigger an AWS Lambda function, including:
- Amazon S3
- Amazon DynamoDB
- Amazon Kinesis Data Streams
- Amazon Simple Notification Service
- Amazon Simple Email Service
- Amazon CloudWatch Logs
- Amazon CloudWatch Events (as a proxy to 25+ other services)
- Scheduled Events
- AWS Config
- Amazon Alexa
- Amazon Lex
- Amazon API Gateway
- AWS IoT Button
- Amazon CloudFront
- Amazon Kinesis Data Firehose
- Amazon Simple Queue Service
You can also invoke functions on demand from your own code or by accessing data from SQS and other message brokers. Take a look at the sample event data from these sources and you'll see that they all vary in their formats and complexity. That's a lot of places where malicious, user-supplied data could sneak into our apps. AWS isn't alone when it comes to supporting events either. Google Cloud Functions supports four types of triggers and Microsoft Azure Functions supports at least nine.
If we look closer at the data
attribute for CloudWatch Logs and Kinesis Data Streams, we'll see that the records are Base64 encoded and compressed with the gzip format:
javascript{ "awslogs": { "data": "H4sIAAAAAAAAAHWPwQqCQBCGX0Xm7EFtK+smZBEUgXoLCdMhFtKV3akI8d0bLYmibvPPN3wz00CJxmQnTO41whwWQRIctmEcB6sQbFC3CjW3XW8kxpOpP+OC22d1Wml1qZkQGtoMsScxaczKN3plG8zlaHIta5KqWsozoTYw3/djzwhpLwivWFGHGpAFe7DL68JlBUk+l7KSN7tCOEJ4M3/qOI49vMHj+zCKdlFqLaU2ZHV2a4Ct/an0/ivdX8oYc1UVX860fQDQiMdxRQEAAA==" } }
This data has to be decoded, unzipped, and then inspected to make sure it's safe to use. Even if we had the luxury of a WAF, it would have no idea how to deal with this type of input.
How are these attacks possible? 🤷♂️
I too was a bit skeptical when I first heard of this type of attack. How would user-supplied data even make its way into these other types of events? The answer is frighteningly simple. Ory Segal, CTO & Co-Founder of PureSec, gave a talk at Serverless Days TLV and presented a few examples. I highly suggest you watch this entire 22 minute talk as it gives a great overview (examples here). There is an obvious case of XSS with an API Gateway request, but also less obvious cases where he demonstrates how trusting seemingly harmless input can get executed as shell commands to devastating effect.
I thought these examples were interesting, but I was curious to dig a bit deeper and see if I could exploit something even more obscure. I thought about a case where we would be tempted to implicitly trust the input. This led me to S3 files names. I created a sample bucket and attached a trigger that fires off a Lambda function. Then I uploaded a text file and the event data looked like this (edited for brevity):
javascript{ "Records": [ { "eventSource": "aws:s3", "eventName": "ObjectCreated:Put", "s3": { "bucket": { ... }, "object": { "key": "Title%3B+with+a+semicolon", "size": 4 } } } ] }
The event
conveniently gives me a key
with the name of the file I uploaded. This is URL encoded, so I'll decode it like this:
javascriptlet filename = decodeURIComponent(s3.object.key.replace(/\+/g,'%20')) // => Title; with a semicolon
And then record that to my database:
javascriptconnection.query( 'INSERT INTO uploads (`file`) VALUES ("' + filename + '")', (error, results) => {} )
The SQL query above is prone to SQL injection and is obviously bad practice. But let's be honest, we've all done this. Maybe it was before we knew what SQL injection was or because we were just quickly coding a prototype. Either way, this kind of mistake creeps into code ALL THE TIME. We might even think, "well, this is just a file name from an S3 event, it should be a trusted value." But that's not true.
I created a new file named "1");(delete * from uploads
" and uploaded it to my S3 bucket. The event came through like this:
javascript{ "Records": [ { "eventSource": "aws:s3", "eventName": "ObjectCreated:Put", "s3": { "bucket": { ... }, "object": { "key": "1%22%29%3B%28delete+*+from+uploads", "size": 4 } } } ] }
And as soon as I decode it, my SQL query becomes:
mysqlINSERT INTO uploads (`file`) VALUES ("1");(delete * from uploads)
Uh oh! 🤦🏻♂️ Hope you have a recent database backup. While this example might be a bit contrived, you can see that this type of exploit is entirely plausible, especially for the developer who isn't overly security conscious (read: most developers). This only scratches the surface of creative ways that attackers could inject data into SNS topics, CloudWatch logs, Amazon Alexa commands and more.
It's also important to note that this isn't an issue or security hole in AWS or the other cloud providers. As Chris Munns said, "I can throw acid or water down a hose to your lawn, none of that is the hose's fault because it doesn't do liquid verification." This can be entirely avoided by implementing proper coding techniques.
Trust no one, including yourself 👽
Application Security 101 tells us we should ALWAYS sanitize user input. With serverless applications (as with all event-driven applications), we have to think about even more sources where unfiltered user input can come from. This means we need to be hypervigilant about sanitizing every piece of data that comes into our functions, even if it we think we can trust the source.
If you'd like to learn more about serverless security, read my post Securing Serverless: A Newbie's Guide. Also be sure to check out 10 Things You Need To Know When Building Serverless Applications to jumpstart your serverless knowledge.