A Custom-Built, Scalable, Event-Driven ETL on GCP: Engineered to Perfection in Record Time

format-site-sicara-_Bannière-moyenne-_US_-_Paysage__-_14_

Context

Descartes & Mauss is a company that combines artificial intelligence and industrial expertise to help its clients make more informed decisions and create organizational convergence around a strategy that will bring their ambition to life.

They offer the transformation of large amounts of raw data into intelligent data to improve the impact and relevance of their clients' decisions.

 

Challenge

The company has cutting-edge and exclusive natural language processing (NLP) algorithms and wanted to develop its data platform to ingest more information while increasing the flow of automated processing and providing reproducible data quality.

It was necessary to provide them with an ETL text preprocessing pipeline that supports distributed Spark calculations to handle the large amount of data to be processed. The solution had to be scalable and cost-effective.

Descartes & Mauss' Data Engineering and infrastructure teams needed to quickly integrate this solution into their overall platform, with the ability to adapt it to new text processing tasks.

 

Solution

We implemented a secure and scalable data platform on GCP with Dataproc Workflow pipelines. These pipelines are specific to each client and perform NLP text preprocessing.

Tech Stack

Results

In just 5 weeks, we were able to deliver a modular data pipeline to their internal technical team, allowing them to perform a range of information type transformations and securely access the resulting data.

Our approach included clear documentation and training to ensure a smooth handover, so they could adopt the solution at the speed they need.

Descartes & Mauss now has a data platform and expertise that allows it to adapt existing pipelines for all its future clients and the desired text processing.

Need expert adice? 

Contact-us