All this tutorial will assume you have a working installation of Terraform, Docker and Python. You can find installation tutorials for Terraform here and for Docker here.
Did you know that only 1 out of 10 data science projects make it to production? According to VentureBeat, only 13% of machine learning projects are used in a production setting. Many factors contribute to this, but the main one is the difficulty to manage deployment of models and infrastructure. That’s why the use of serverless and the need for MLOps, a discipline aimed at managing models’ deployment and production use has been rising.
When you look at the steps needed to back the deployment of a ML project, it really seems like it’s an impossible task.
Let’s imagine you have a brand new model trained on your customer data that is really good at evaluating if a customer will buy a new toy.
The problem is that everyone will want to use this model around Christmas, and you don’t really have the skills internally to handle scaling, monitoring, server management.... and you don’t want your deployment to be expensive.
That’s where serverless comes in to save the day! It’s a paradigm where you only provide your code, and let your cloud provider handle where and how it’s run. Let’s try with AWS and Lambda (but you can do the same with Google Cloud Platform or Azure).
Target serverless architecture
- The loading of our model and the computation of the prediction will be done on AWS Lambda.
- Our users will be able to access the lambda through an API gateway.
- The lambda will use an image stored on an ECR repository to bypass the limit on python package size in lambda.
Infrastructure As Code with Terraform for serverless
First, you will want to initialize your Terraform remote state with AWS. You will need to have a S3 bucket already available in your account. It will hold the remote state safe and sound!
Terraform is a declarative language. You will want to specify every ressource you will use in the project. It will the be synced with the state in the bucket you specify in the Terraform backend.
The first thing is to create a repository for the Docker image that will be our lambda’s context. An ECR repository is storage for Docker images. They can then be used through your AWS services.
Then let’s declare our lambda function !
We define the function name and the role it will assume. Additionally, since we want to use a docker image as the context for our lambda, we pass the package type : “Image” and the URI to the ECR repository (we will cover it later).
Then let’s create the API Gateway that will trigger the lambda.
We now have a working API Gateway that triggers the lambda when we call the right address with a POST request. The payload of the request is then transferred to the lambda in its input event.
Simply execute
terraform init && terraform apply
in the folder where your Terraform code is stored and you should be good to go!
The command should return the “api_gateway_url” of your code deployment, keep it somewhere as you will need it to access the model.
Serving the model
I’m using joblib to dump and save our model. Our lambda will then load the model at runtime and execute the prediction. Since we are using a serverless paradigm, we may experience cold start for the first predictions, but the next ones will be way faster! The body of our request will be available in the event of the lambda, we expect to receive a json as an input
Docker in serverless lambda
One thing that may be weird is the use of a Docker image as our lambda context. We plan on using XGBoost as our model, and the package itself is pretty heavy. It may not fit in the limitations of the classic lambda requirements, but it will fit in a docker image! That’s one of the great thing about AWS serverless paradigm. On one of my project, we needed to create a PDF report from a python notebook using LaTeX and it was convenient to do so using Docker in a serverless environment as well.
Let’s define a Dockerfile!
We use the base image from AWS, and copy our requirements and source code in. We would also imagine installing more dependencies using apt for instance.
The model file is then copied inside the Docker image so our lambda will be able to access it.
The command to build the Docker image is then:
docker build -t model-serving .
Then we will tag the image and push it to the ECR repository:
docker tag model-serving:latest $AWS_ACCOUNT_ID.dkr.ecr.eu-west-3.amazonaws.com/lambda-deployment-ecr-repository:latest
Using a Docker tag could allow us to handle many version of our lambda API.
docker push $AWS_ACCOUNT_ID.dkr.ecr.eu-west-3.amazonaws.com/lambda-deployment-ecr-repository:latest
Finally, we need to update our lambda function so that it uses the new image we uploaded:
aws lambda update-function-code --function-name sicara-model-serving-lambda --image-uri $AWS_ACCOUNT_ID.dkr.ecr.eu-west-3.amazonaws.com/lambda-deployment-ecr-repository:latest --region eu-west-3
Accessing our model
Remember the URL you got from executing the terraform apply command? You can now request this endpoint (through POST) and you should get a response from your model!
The next step in your MLOps journey once you get a model served can be to think about retraining, monitoring performances in production and managing experiments. If you have a bit of time, read this amazing article by Google on the subject. For managing experiments, you can check out how to manage computer vision experiments analysis.
Are you looking for Image Recognition Experts? Don't hesitate to contact us!