There are tons of Machine Learning tools available on the market and sometimes it is difficult for data scientists to navigate. Undoubtedly, Streamlit (15K ⭐ on GitHub) and DVC (8.1k ⭐sur GitHub) are among them.
This article shows how Streamlit and DVC can help data scientists to quickly develop a web UI to analyze their experiments on their ML projects.
In Machine Learning projects, you need scripts to build your dataset, train and evaluate your models. DVC (Data Version Control) is a very popular ML tool that allows the orchestration of these scripts and the tracking of the input/output data: datasets, models, and metrics.
Yet, DVC has some limitations:
-
Metrics tracking is limited to scalar values. Even though you can track any large file, it is not easy to compare different versions of the tracked data;
-
Files are just files: DVC provides limited tools (e.g. plots) to dynamically explore the data e.g., dig into large CSV files, play with the trained models by running predictions, drill down nice data visualization.
Another popular ML tool is Streamlit that “turns data scripts into shareable web apps in minutes”. In this article, I’ll show how Streamlit lets you harness the potential of the tracked data. Combined with DVC, it allows you to compare different versions (meaning data tracked at different commits) in a very nice and customizable way.
If you are familiar with DVC, you may skip parts 1 and 2 that present a very simple pipeline to train a cat vs dog classifier.
In this article, I provide code snippet examples. If you are interested in the code, you can clone the companion repository of this article here.
1. A Cat versus Dog Classifier DVC Pipeline
Let’s say I want to train a model to classify cats and dogs, simple right?
To do so, there are 4 steps:
-
Download the cat_vs_dogs dataset from Tensorflow Datasets;
-
Split the dataset into
train
/val
/test
subsets; -
Train a neural network classifier using
train
/val
subsets; -
Evaluate the trained model on the
test
subset.
In order to orchestrate the 4 steps and track the data, I created a 4 stages DVC pipeline running python scripts. Concretely, the DVC pipeline is a dvc.yaml
file describing the command to run, the input and output files to be tracked for each stage.
In the end, it defines a dag that looks like this:
Note: arrows are reversed as they represent stage dependencies. You can generate this graph automatically using DVC command line by running :
dvc dag --full --dot | dot -Tpng -o docs/images/dvc-pipeline.png
1.1 Download the data
First and foremost, we need to download the dataset. The stage download_dataset
uses wget
to download the dataset archive from Tensorflow and unzip it in the data/raw
folder. Let's write the first stage in dvc.yaml
:
There are no dependencies (inputs) for this stage, and it produces (output) data in data/raw
that is tracked by DVC (see the outs
key in the YAML).
Note the flexibility of DVC that lets you run any shell command, not only a python script.
1.2 Split the dataset
Then, after you executed the first stage (run dvc repro dvc.yaml:download_dataset
), you’ll see in the data/raw
folder train
and validation
splits but no test
subset:
That is perfectly normal: no test
split is provided by Tensorflow for the cat vs dog dataset. To alleviate that, I (arbitrarily) chose to split the validation
into val
(70%) and test
(30%) subsets. I do the split using pandas (see the split_dataset.py
script) and add the new stage to the pipeline:
Now, if you execute the pipeline (run dvc repro dvc.yaml:split_dataset
), you see the train
/val
/test
splits:
1.3 Train the model
Now, I’ll train a Tensorflow binary classifier model. To do so, I simply adapted the Tensorflow transfer learning tutorial to write the train.py
script.
Same as before, let's add the train
stage to dvc.yaml
:
And let's run dvc repro dvc.yaml:train
, wait a few minutes and your model is trained!
1.4 Evaluate the model
Finally, we want to evaluate the trained model on the test
subset. First, we run the trained model on the test
set so as to obtain probabilities for each image to be a cat or a dog, producing a CSV file that looks like this:
Then, we compute the accuracy of the model, i.e., the ratio of images correctly classified, and write the result in a summary JSON file :
You can find the code in the evaluate.py
script. Let's add the final stage to our pipeline:
Note: metrics
are special kind of outs
, more details in the next section.
And that is it! We now have a fully functional DVC pipeline that downloads, prepares the data, trains and evaluates the model. You can run the whole pipeline by running dvc repro
.
If you are interested in more advanced DVC features, you may look at the full version of the DVC pipeline here, which adds parameters, metrics, plots, dvclive for training summaries... If you want to go further, please read the awesome DVC blog.
2. What I can Do with DVC?
First, let's clone the repo and install requirements:
git clone git@github.com:sicara/dvc-streamlit-example.gitpip install -r requirements.txt
2.1 Track the Data
The core feature of DVC is to track the data produced when executing the DVC pipeline presented in part I at any commit.
Let's take an example: first, have a look at git commits :
git log --stat --color # or glg with oh-my-zsh installed
You can see that commit f242e6ebdb1ddd1fbef8d6f1ed1b7e6f1345348a
modified the dvc.lock
file meaning the pipeline was executed to produce that commit:
With DVC, the data produced by the pipeline at any commit can be easily retrieved. First checkout the commit:
git checkout f242e6ebdb1ddd1fbef8d6f1ed1b7e6f1345348a
Then pull the data:
And that it! Now the pipeline outputs are restored in your local file system as they were when the pipeline was executed at commit f242e6e
.
2.2 Track more Data
Let's say you want to retrain your model, you do some changes in the code or in the training parameters and then you commit your changes:
# Do some modifications in the model, parameters, ...
git add YOUR_MODIFIED_FILES
git commit -m "My changes on the model, params, ..."
To execute the DVC pipeline again, you simply need to run dvc repro
:
Then, commit the changes:
git add dvc.lock data
git commit -m "DVC repro"
Finally, push changes to the git and DVC remotes:
# Save changes
git push
dvc push
And that’s it: pipeline inputs and outputs are versioned by git and DVC so that they could be retrieved later on.
Note: the remote storage of the repository is the Sicara's public s3 bucket (see dvc config file). By default, you have permission to read (dvc pull
) but you cannot write (dvc push
). If you want to run experiments and save your result withdvc push
, consider adding your own dvc remote.
2.3 Metrics and Plots
If you look closer at the dvc.yaml
file from the Github repository, you’ll see metrics
and plots
in the evaluate
stage:
These are special outputs that enable additional DVC features. Here is what DVC says in its documentation:
DVC has two concepts for metrics, that represent different results of machine learning training or data processing:
1.dvc metrics
represent scalar numbers such as AUC, true positive rate, etc.
2.dvc plots
can be used to visualize data series such as AUC curves, loss functions, confusion matrices, etc.
Let's try it: to see current metrics values, run dvc metrics show
:
Additionally, dvc metrics diff
let you compare metrics values between different commits:
Pretty nice :) Lets now try to plot some data:
It outputs a plots.html file, let's open it in the browser:
Very nice! When running dvc plots show data/evaluation/predictions.csv
, DVC does the following:
-
the
predictions.csv
file is parsed; -
the predefined confusion matrix template is used to process/transform the data so as to produce the confusion matrix;
-
VEGA renders the confusion matrix (embedded in an html page).
If you go further, it is even possible to add your own templates to DVC plots (see the documentation).
3. What I Cannot (Easily) Do with DVC?
ML projects often consist of exploratory research: data scientists run experiments, retrain models. To know where they’re going, they need to track the model performance and they need to visualize the model and the data to understand what is going on.
Let's step back a little bit and look closer at DVC capabilities:
- show scalar values (e.g., model accuracy) at one commit with
dvc metrics show
; - plots data series (e.g., training loss function) at one commit with
dvc plots show
; - compare different commits with
dvc metrics diff
ordvc plots diff
; - define more complex data visualization by extending DVC plots with custom templates, it allows data transformations and interactive data visualization;
- track a scalar value (e.g., the model accuracy) through the project history. For instance,
dvc metrics show -A
shows you metrics values for all commit in the command line.
I sum up DVC data abilities on two axes:
- visualization expressiveness: how easy it is to build more complex visualization and more complex input data, e.g., tabular data, images, videos, ...
- version aggregation abilities: the capabilities to collect data from different commits and aggregate it in different ways, e.g, diffing the model accuracy between two commits.
DVC Limits
Even though DVC is an amazing tool, it has some limitations:
- data visualizations are limited: input data formats are limited to tabular file formats (JSON, CSV, and YAML files), which excludes other data such as images, videos;
- diffing is limited to scalar values. It is possible to show the evolution of scalar values through all commits (
dvc metrics show -A
) but it is restricted to the command-line interface - it provides no real UI:
dvc plots
relies on VEGA, a declarative language for creating, saving, and sharing interactive visualization. Yet, it does not provide a standalone UI: it needs to be rendered somewhere (e.g., embedded into an HTML page).
DVC needs help to bridge the gap between data tracking in the command line interface and real UI that allows showing interactively any comparison between any kind of data.
4. Streamlit Bridges the Gap
Streamlit is an open-source python library very popular among data scientists that let you build interactive UI for manipulating data very quickly:
Streamlit turns data scripts into shareable web apps in minutes. All in Python. All for free. No front‑end experience required.
- quote from streamlit.io webpage
If I must place it on my two axes graph above, it would be on the top (no data version comparison) right (very expressive data visualization):
So, regarding DVC limitations I described in the previous section, Streamlit appears to be a good candidate to bridge the gap:
- it provides an interactive Web UI;
- it allows to represent almost any kind of data, not only scalar or data series;
- together with git and DVC python API, it also allows comparing any versions of any kind of data in a very flexible way.
In the following, I’ll go into detail of the third point by describing several concrete use cases from the Cat Vs Dogs classifier example.
4.1 Build a Commit Selector with Git Python API
The first thing to do is to be able to select the commit you want to see the data from. First, we retrieve the list of commits using the git python API:
Note the paths
argument: I am interested in commits that correspond to a DVC pipeline execution so I only need to keep commits that modified the dvc.lock
file.
Now, I simply use a Streamlit selectbox to let the user choose the commit:
To start the streamlit app, simply run:
streamlit run {PATH_TO_YOUR_SCRIPT}.py
Go to your browser, you should see:
4.2 Explore the Performance of any Model on the Test Set Image
Now that the user can select a commit, I’ll show how to load any data file tracked by DVC and show it in the Streamlit app.
If you remember the Cat vs Dog classifier pipeline I introduced in part I, the evaluation
stage outputs a data/evaluation/predictions.csv
file containing model predictions on the test set.
The DVC python API is simple yet very powerful: it provides a dvc.api.open()
function that behaves like the core python open()
function but for files tracked by DVC:
Provided a commit hash, the load_predictions()
reads the corresponding prediction CSV file with pandas. Then, we use the commit selector to load the selected predictions and show them with Streamlit.
And that’s it: when you select a commit, you’ll see the dataframe change dynamically:
4.3 Compare Predictions of Two Different Models
Let’s say we want to see where two different versions of the trained model disagree on the test set. First, we put two git commit selectors:
Then, read both prediction files with the load_predictions()
function, merge them with pandas, and select test set images where the two models disagree:
Finally, show the final dataframe and the corresponding images with st.image()
:
And here it goes, with a few lines of python, we built a simple web page for comparing two models:
4.4 Build an Experiments Tracking UI à la MLFlow
Many ML frameworks propose a UI that displays the list of experiments (i.e., training) with model parameters, training statistics, and model performance. For instance for MLFlow:
With DVC and Streamlit, it is quite easy to build the same. First, let’s collect the list of commits that modified the dvc.lock
file:
Then, let’s collect model parameters that are written in the dvc.lock
file. The dvc.lock
file is tracked with git, so we need a utility function to read it from any commit:
It is a trick: I recover the file with git python api by computing the diff between current revision rev
and the first commit (FIRST_COMMIT
).
Then, I can collect model parameters from dvc.lock
files:
Now, I’ll collect model performance from metrics.json
file:
Finally, let’s assemble the collected information into a single dataframe and show it in the Streamlit app:
Of course, it is not as pretty as the MLFlow tracking experiment interface, but it does the job and it is very flexible: I can easily choose what to show in the Streamlit app, even afterward the experiments were run as long as the data was tracked by git or DVC.
4.5 Run Inferences with Models from Differents Commits
Now, I would like to have direct interactions with trained models: an interface where I can upload any image and run the model of my choice on it. A Streamlit page that looks like this:
I can reuse the model selector as before, but regarding loading the model, it’s a bit more technically challenging: a model is not a single file, it is a folder:
It is a problem because the dvc.api.open()
function only work for single files, whereas the Tensorflow tf.keras.models.load_model()
requires a folder as input.
We need something more. We need the dvc get
CLI command:
Provides an easy way to download files or directories tracked in any DVC repository (e.g. datasets, intermediate results, ML models)
Unfortunately, there is no python API for this command. No worries, DVC is written in python, so I can use the internal DVC API to retrieve the model folder. A bit of caution here: the internal DVC API is subject to changes in future versions.
To load the model, I built a function load_model(rev)
that downloads the desired model to a model cache directory, and then load it with TensorFlow:
Voila: selected models are downloaded to .model_cache
directory:
Now, I have all I need to do the Streamlit page:
Conclusion: DVC+Streamlit = ❤️
I hope I convinced you that Streamlit allows you to build custom web UI very quickly on top of DVC. At Sicara, I use it in my computer vision project, it is very convenient to share results with the team and our client. If you enjoyed the article, please leave me a comment, and star the repo or contact-us !
If you want more inspiration on Streamlit, look at their blog and gallery.
Last-minute note: DVC just released a few days ago DVC Studio which is a web UI for tracking experiments. I have not tested it yet and I don’t know if the UI is as flexible as Streamlit dashboard but it looks awesome and I’m looking forward to trying it.