June 10, 2024 • 4 min read

How Data & ML Engineers Can Uncover Memory Leaks Before They Hit Production

Rédigé par Arsène Tripard

Arsène Tripard

In this article, I want to shed light on the recurrent issue of memory stability for ML and data engineers shipping applications to production. And show how to uncover and locate memory leaks in Python data applications.

Understanding memory leaks in Python

99% of the time when writing Python code, there is no need to know how the Python interpreter manages memory for you. However, in data and ML engineering, the stability of the application is always a crucial factor. And memory management, especially in a demanding production environment, can induce a lot of instability if not managed to perfection. That’s why these couple principles about Python memory management are good to know.

Within Python built-in memory manager is an essential concept: the garbage collector. Essentially, what it does is it keeps count of the references to each object in memory. And reclaim the memory when an object's reference count drops to zero.

At this point, the most attentive readers might have spotted a loophole. What happens if two or more objects reference each other in a cycle, therefore preserving positive reference counts? Fortunately, garbage collection has a dedicated algorithm to handle such cyclic references.

Python variable reference counts

These two principles guarantee that every block of memory can be deallocated when obsolete. However, keep in mind that Python extensions can sometimes have their own memory management systems, such as NumPy or Tensorflow. Even if Python's garbage collector is working correctly, the memory used by these extensions might not be released as expected. This can potentially lead to higher memory usage.

Load testing with Postman to replicate memory leaks

Before deploying a data or ML application to production, how can you be confident about its stability under a high volume of requests?

The answer is load testing or stress testing, and it should be an unavoidable step in your deployment pipelines. Load testing is about evaluating a system's performance under high load conditions to ensure its reliability. For example, in the case of a machine learning API, you want to send hundreds, even thousands of HTTP requests, within a few minutes. That way you can simulate real-world usage while monitoring the API’s behavior: average response time, memory usage, etc…

In the case of memory leaks on a machine learning API, load testing is particularly crucial. Indeed, chances are you won’t notice anything if you just send the usual 3-4 validation requests before deployment.

There are many software tools made for developers to facilitate API interactions, and load testing is only one aspect of it. There are special tools like Apache JMeter or Gatling which can provide more flexibility and customisation in load testing. But still, I recommend Postman overall for its comprehensive set of features and its easy user interface. In my experience I have found Postman more than sufficient for all my use cases, load testing included.

In particular, Postman is super smooth when it comes to juggling between your environments (I have written an article explaining how here). Which means that you can always launch your app locally and load test it with Postman. That way you won’t affect your company’s cloud resources for your tests.

Note that Postman offers many settings to customize load testing. You can find more information here, and below a basic configuration for 20 virtual users during 10 minutes.

postman load testing
Basic load testing configuration in Postmane

Another great feature available with Postman is the comparison between different load testing runs. Super handy when you are trying to fix the issue and you want to get a clear picture of before and after.

compare runs of load tests
Compare load test runs in Postman

Configure tracemalloc to record memory allocations

I read in Fugue’s technical blog (now part of the cybersecurity company Snyk) that “a language is only as good as its debugging and profiling tools”. And as far as memory profiling is concerned, Python is very well equipped with tracemalloc, short for Trace Memory Allocation.

Essentially, what tracemalloc does is record every block of memory allocated by Python during execution. And from that, it can identify the portions of code responsible for the biggest memory footprint. In other words, it can uncover the most significant memory leaks.

What’s the point you say? Well, we often see that the memory usage of an application in production is not perfectly steady over time. Therefore, by comparing the memory snapshots taken periodically by tracemalloc, you can locate the portions of code responsible for the overall increase in memory usage. It’s the starting point before thinking about code improvements in order to contain memory usage.

What does tracemalloc look like in the code? You can take a peak at the official documentation where the basic usage is illustrated. To make tracemalloc easier to use and interpret, I had to write a couple of helper functions. I’m happy to share these with you below:

tracemalloc helper functions

Analyze tracemalloc logs to locate memory leaks

Now that you have Postman to send API requests, and tracemalloc set up to record memory movements, you’re ready to go.

Coming back to the example of a machine learning API, you can start by taking a snapshot every 10 requests sent. And use the function compare_successive_snapshots to print out the 10 lines of code responsible for the biggest memory allocation since the last snapshot was taken. Instead of printing out the top 10, you could also define a threshold and print out modules which are above.

Below is what the output should look like, which I annotated so that you can make sense of tracemalloc’s output.

analyse memory leaks in tracemalloc output
tracemalloc output annotated

From the screenshot above, you can draw the following conclusions:

  • Between every snapshot taken (in other words: every 10 https requests), the module responsible for the largest increase in memory usage is constant_op from tensorflow
  • On every 10 requests sent, constant_op consumes another 72KB in memory (+73.7 KB, +72.9KB and +71.1KB to be exact)
  • The rest of the code has a relatively insignificant impact on memory usage. For example, in second iteration, the impact of constant_op is 72.9KB and below 1300B for the others (utils.py, libevreactor.py, etc...)

With that in mind, you get a clearer sense of what your next steps should be: investigate your use of tensorflow, and in particular something to do with tensorflow constants it seems!

To take things further, you can take a look at this article to learn more about micro analysis of Python code. And this article to dig into deep learning memory usage in Pytorch framework.

Are you looking for help building your data and ML applications? Our team of data and ML engineers is available to assist and guide you along your data science projects.

Cet article a été écrit par

Arsène Tripard

Arsène Tripard