In this article, I’ll explain how I used BigQuery and Gemini to improve my eating habits by creating a more accurate food database. I initially relied on conventional calorie-tracking apps, but their databases, filled with user-submitted data, can be unreliable.
Frustrated by this, I decided to build my own database using Gemini and BigQuery.
I started by using traditional OCR methods (like in this article) to extract nutritional information and store it in BigQuery. However, I faced challenges due to inconsistent formats and varying portion sizes on labels, making it hard to build a dataset.
That’s when I turned to generative AI. By using Gemini, I was able to automatically process the nutritional data without manual intervention. In fact, Gemini seamlessly handled the variability in data formats, allowing me to focus on building a more accurate and comprehensive database with minimal technical effort.
Gemini 1.5-Flash is one of GCP’s multimodal LLM (Google Cloud Platform Large Language Model). It can process text, images, video, and audio. It allows you to input a video and a related question, and it will generate an answer.
BigQuery is GCP’s data warehousing product. Data analysts and data engineers mainly use it to perform analytics at scale.
By the end of this article, you will be able to reproduce it on the use case of your choice.
Step-by-step guide: Generative AI in action with BigQuery and Gemini
Let’s take a look at the target architecture:
Step 0: Create your GCP account
Create your GCP account here and receive free credits to follow along with this article.
Step 1: Load your files into BigQuery
First, you need to load your files in a Cloud Storage bucket. You can use drag and drop to do it.
Then, create an external table referencing the videos from Cloud Storage into BigQuery:
To do so, create a service account (SA) and give it access to the bucket you just created.
Open a new notebook in BigQuery and execute the following block :
Once its done, create the external table using the service account you just created
Step 2: Create a BigQuery model using the Vertex AI Gemini-1.5-flash endpoint
First, you need to activate GCP’s AI platform API and allow the previous service account to use it:
Then, create the model in BigQuery:
Step 3: Using a prompt, create your database
I would like to parse the generated answer, therefore I will write a prompt to do so:
You are provided with a photo of a product label.
Extract and process the nutritional information for 100 grams of the product.
If the nutritional values are not directly provided for 100 grams, perform the necessary conversions.
Specifically, focus on extracting the kcal value for calories and not the kJ value (performing a conversion if only kJ is provided).
Present the processed data in a valid JSON format, ensuring that all keys (calories, protein, carbs, fat) are enclosed in double quotes and that no additional formatting (such as backticks or markdown) is used.
The output should be in the following format: {"calories": x, "protein": x, "carbs": x, "fat": x}, where x represents the appropriate values for each nutrient.
Then, ask Gemini to generate an answer and iterate by modifying your prompt and the parameters :
Wait for the cell to execute and enjoy the database that you just created 😎
Once you are happy with the results, you can then create a BigQuery table out of it thanks to the “create table as” function.
What we learned
We discovered a technique to create databases from data extracted from images, videos, and PDF files using Gemini and BigQuery.
While this approach offers significant potential, there are several limitations to be aware of:
- Lack of Accuracy Metrics: One of the key drawbacks is the absence of built-in metrics to evaluate the accuracy of the model. Without these, you have to trust the model's output or resort to LLM as a judge as a form of validation.
- Generalized Model Performance: The model is designed to handle general, "common" queries effectively. However, when it comes to niche use cases, such as recognizing the specific make and model of a car, it fails to provide good results.
In such cases, you would need to fine-tune the model or opt for a traditional computer vision model, which can be more complex and resource-intensive to implement. - Video Length Limitation: Gemini 1.5-flash is currently limited to processing the first 2 minutes of video files when using the ML.GENERATE_TEXT function. For a full list of current limitations, refer to the documentation here.
- Compliance and Privacy Risks: Extracting data from sensitive documents can introduce compliance and privacy risks. This article illustrates how developers can make LLMs more secure.
- BigQuery is a analytics tool. It is not optimized for near-real time tasks, so expect some delay (around 2 minutes) when generating the answer.
If this article inspired you in your own business, do not hesitate to contact us!