Until the mid-2010s, Machine Learning projects involved a series of manual operations. The MLOps tools that have gradually emerged since then have allowed for the structuring and streamlining of an increasing portion of these tasks, leading to complete platforms that address the entire lifecycle of models.
Databricks, available since 2015, can be cited as one of the precursors to these platforms, as well as the ML services of the three main cloud providers: Google VertexAI, Amazon Sagemaker, and Azure ML.
These platforms enable rapid implementation of the main components of a project: infrastructure for training and predictions, a model registry, pipeline orchestration, experiment tracking, etc.
Without going into details about each platform, several common problem types can be identified:
- Resource costs: They are always higher than with less managed solutions. For example, training on Vertex AI is about ~15% more expensive than through Compute Engine.
- Rigidity: All-in-one ML services are limited in terms of customization and integration with tools outside their ecosystem. For example, it may be difficult to use DVC to version data or launch low-carbon training on Qarnot.
- Vendor lock-in: Dependence on a specific provider can be even more frustrating in the field of AI, where technologies evolve very rapidly.
The main alternative is to combine specific and less managed technologies, such as DVC, Streamlit, Airflow, which represents a significant investment in setup costs.
OUR PERSPECTIVE
At Sicara, our default stack does not use an end-to-end ML platform. We prefer a combination of open-source tools tailored to our customers’ needs, minimizing costs and avoiding vendor lock-in. Sicarator, our open-source ML project generator, accelerates their implementation.
An end-to-end solution remains preferable if one does not have the time or necessary skills for a custom stack, or for companies planning to undertake only a limited number of ML projects in the near future.