Use AWS Lambda to build a Serverless REST API, storing data in S3 and querying it with Athena.
We’re going to build a Serverless REST API and deploy it on AWS without setting up any server!
Why use serverless?
The great thing about going serverless is to be able to deploy your code instantly without bothering to setup a server.
But after a while, you quickly realize that having stateless bits of codes in the cloud has its limits. For example, how do you persist your data?
In this article, We’ll build a REST API using AWS Lambda (python 3.6), that stores data on an S3 Bucket and then queries it using AWS Athena.
We’ll create the following API:
- POST /user: Create a user
- GET /user/{user_id}: Fetch the data matching the user_id
- PUT /user/{user_id}: Update the data matching the user_id
- DELETE /user/{user_id}: delete the data matching the user_id
- GET /user/list: returns a list of all the users
TL;DR: To jump to the full working example, you can go there.
We will take the following steps:
- Install the necessary toolkits and creating a serverless project
- Write the serverless deployment configuration
- Define a helper class S3Model to write and read data from the S3 Bucket created in the config file
- Define a helper class S3ApiRaw to generate the API handlers
- Create the user.py module with the User schema and generate the API handlers
- Configure AWS Athena to be able to query our data using SQL statements
Let’s get started!
Step 0: The requirements
- Install the Serverless toolkit
- Configure your AWS credentials, you can find a tutorial here
- Make sure you have python 3.6 installed (installation instructions can be found here)
- You’re now ready to create your project:
The serverless-python-requirements plugin is used to ship the python dependency with the lambda, to do this all you need is to create a requirements.txt file
Step 1: Writing the deployment config
By creating a serverless project, a serverless.yml file was created. This config file will contain multiple key aspects of your project:
- What Lambda function execute what piece of code (A function called a handler)
- What route / parameters / HTTP method will trigger which Lambda function
- What other AWS resources you want to create (one S3 Buckets in our case)
- The permissions in order for the Lambda functions to interact with other AWS resources
In our case we will need to specify the following:
- One S3 Bucket to store our data
- Five Lambda functions: user_get, user_post, user_put, user_delete, user_list
- The Lambda needs to be able to read and write on the S3 Bucket
Let’s take a look at what our serverless.yml will look like:
Given this configuration file, we now have to provide a python module user.py with 5 functions: get, post, put, delete and list.
Step 2: The S3Model class
An S3Model class is characterized by two class attributes:
- a SCHEMA that will define the fields and field types of our object
- a name that will specify an S3 folder in which the data will be stored
The class will have 5 methods:
- load(id): to get data associated with an id from the bucket
- save(object): to save data on the bucket
- delete(id): to delete data associated with an id from the bucket
- list_ids(): list all the ids on the bucket
- validate(obj_data): This ensures that the data we save to the bucket is compliant with the JSON schema
Now that we can easily interact with some storage, the API handlers are pretty straightforward for a given an S3Model.
Step 3: The S3ApiRaw class
An S3ApiRaw class has one class attribute:
- An S3Model class, that will manage the interaction with the bucket
This class will have 6 methods:
- get, put, post, delete, all: 1 for each route
- get_api_methods: that returns the 5 methods above
We also define one decorator handle_api_error that will format the return values for the API Gateway
Step 4: Define what a user looks like and get the API handlers
We define two classes:
- User that inherits S3Model. By setting a name: user and a schema, we ensure that all the files in the folder “user” on the Bucket will be formatted as specified by the schema
- UserResource that inherits S3ApiRaw. By setting the s3_model_class attribute to User, all the handlers defined above will be specific to the User model
Final Step: Deployment
One final thing we need before we deploy is to specify the requirement (boto3 is natively in the Lambda python runtime environment):
We are all set, let’s deploy:
And that’s it you now have a fully functional rest API!
Here are some command lines to play with your new API:
(I am using the httpie package, here is the github repository)
If you want to test some intensive requests, here is a snippet of code that creates 1000 users (using python asyncio module):
As you can see our API already works pretty nicely :)
Bonus Step: Improving perfs using AWS Athena to query S3 Data
As soon as you have a few hundreds objects, the /user/list route will timeout. This is because it tries to download iteratively all the files on the bucket.
This is where AWS Athena comes into play: AWS Athena’s documentation
This allows to query file data stored on S3 with common SQL SELECT statements.
In order to use Athena, we need to run queries. This is how it is done:
- Launch a query execution
- Wait for the execution to finish
- Once the execution is done, fetch the results
Let’s write some helpers in manage Athena queries:
Now that we can execute SQL queries, we can create our Athena table. We’ll write a Lambda function that will execute CREATE statements.
For this, we need to make a few additions to the serverless.yml file:
We now need to write the init_schema schema function that:
- Create a database: execute a CREATE DATABASE statement
- Create a table: execute a CREATE DATABASE statement
We can now initialize the Athena database with the command:
serverless invoke local -f init_athena_schema
Now that the database is set up, we can modify our user.list function to make a query to the database:
And that’s it you can now query your resources as if they were in a database. You can find the full code here.
If you are looking for Data Engineering expert's, don't hesitate to contact us !
Thanks to Antoine Toubhans and Alexandre Chaintreuil.