Effective decision-making is at the heart of successful data science projects. Every decision, whether selecting an optimal machine learning model or picking the right database technology, can shape the project's future. What if there was a way to enhance this process, ensuring more informed and efficient decisions? In this article, we'll explore how GPT-4 can elevate your decision-making in data science, offering insights, clarity, and confidence in each choice you make. Let's illustrate how GPT-4 will help some particular processes; we will use an Architecture Decision Record for a data science project using vector databases.
How can GPT-4 integrate Architecture Decision Records?
First, an Architecture Decision Record (ADR) is a document that records critical decisions made regarding the architecture of a software project. These decisions, including the reasonings and consequences, are documented for future reference, aiding in transparency, traceability, and knowledge sharing.
At its core, the key takeaway in constructing an ADR is to meticulously record the decision, allowing for enhanced understanding, communication, and review of the architectural choices made in the project's development journey.
In software development and data science, preserving such records is part of our culture. It ensures that decisions are well thought-out and helps future team members understand the context and reasons behind particular design choices.
Using vector databases as an ADR example for GPT-4
Next, to illustrate how GPT-4 will assist your decision-making, let's make an ADR to find the best vector database for data stored in Google's Big Query. A vector database manages high-dimensional vectors, optimizing similarity searches within large vector spaces. Notably, it's essential for handling machine learning embeddings; Here, we will use the vector database with Google's Big Query data, streamlining operations. So it helps creating a more efficient workflow.
Please consider this excellent blog post on Vector database, by Noé Achache, which is strongly more relevant if you need a curation of the best vector databases out there!
GPT-4 as a catalyst for enhanced decision-making
Now, navigating the process of creating a comprehensive ADR can be time-consuming and requires substantial expertise. That's where GPT-4 comes into play. Let's break down the ADR creation process to understand how GPT-4 can help at each step:
Formulate the 'Why'
Initially, the first part of any ADR defines the problem or the 'why.' GPT-4, with its advanced language processing and comprehension, can help in articulating the problem statement. It can distill complex ideas into simple, clear sentences, enhancing the clarity of your problem statement.
Here is the prompt step-by-step:
- Introduce the context: "I am making an Architecture Decision Record."
- Summon the expertise needed: "I want you to act like the best data Product Manager."
- Give the object of your prompt (more context!): "Given the need for our chatbot to use GCP's Big Query, data which vector database should we consider to maximize machine learning-driven insights?"
- Given the context, explicitly ask what you need: "Reformulate this sentence to explain the Why!"
Capitalizing the "Why" distinguishes the word "why" from Simon Sinek's concept, where he glorifies it. So you should reproduce this tip using a specific idea when you need an answer!
Here's what stands out about the worth of GPT-4's answer :
- Clarity: It ties the technical choice of a vector database to the need to maximize ML insights in this example.
- Rationale and holistic approach: It explains that a vector database efficiently manages high-dimensional data, justifying the choice from a tech perspective and integrating broader business goals.
- Business impact: By highlighting the improvements, it ties the decision drivers for the next parts of the ADR – a tangible benefit!
- User-focused: The emphasis on "rapid, high-quality insights" suggests a user-focused approach.
Make GPT-4 list the Decision Drivers.
Then, decision drivers are critical factors or criteria that should drive decisions in data science. For instance, when creating custom datasets for image detection using state-of-the-art generative AI with Stable Diffusion. Identifying the correct decision drivers requires a fair amount of expertise. GPT-4 can assist in this aspect by helping you brainstorm possible decision drivers based on the problem context. GPT-4 has been trained on diverse texts, enabling it to generate insights from multiple perspectives and uncover decision drivers you might overlook.
Here is the prompt step-by-step:
- Continue chatting from your last prompt: More context!
- Give the object of your next question: "We are choosing a new vector database for Big Query data. What factors should we consider?"
- Format GPT-4 answer: "Don't explain much; just list the factors."
Give attention to the format: It can dramatically change the quality of the output. You can ask for a paragraph or a detailed enumeration. My trick to refine the GPT-4 answer was to make a notation of each factor: "List then the top 10 factors and their associated scores."
The following aspects compel me:
- Relevance to Objective: The factors and their scores directly align with our goal! It completes well with the hand-crafted ADR template from years of Sicara experience.
- Quantifiable Metrics: By associating scores with each factor, we can easily prioritize their efforts and resources. This quantification aids in making informed decisions and trade-offs.
- Clarity & Detail: Each element briefly explains to ensure that even non-technical stakeholders understand the reasoning behind each score. Moreover, we address all critical areas.
Let's list your different solutions using GPT-4!
Following this, one of the challenges in decision-making is considering all viable solutions. It's easy to gravitate towards familiar options while overlooking others. GPT-4 can assist in generating a comprehensive list of potential solutions based on its extensive knowledge base.
If you can access the Web Browser plugin in ChatGPT, please give your prompt a link containing a list of your solutions. I recommend taking a look at GitHub's awesome repositories!
Here is the prompt step-by-step:
- Continue chatting from your last prompt: More context!
- Give the object of your question: "What are some vector databases we can use?"
- Make the chatbot focus on the context: "Based on the factors you listed?"
- Summon expertise: "Explain as a Data expert."
- Format GPT-4 answer: "using one sentence, and without explaining each decision driver."
Here, we want to ensure the chatbot can answer the available solutions. Be bold and experiment with as many lists of products as you can!
In this example, we are trying to discover some vector databases. As of mid-2023, this industry is evolving rapidly, with new products and actors each week! So I would take the ChatGPT answer with some cautions, as it will maximize my satisfaction and offer me to try the most known products and make me miss some discoveries. And remember to also capitalize on your team and network's expertise!
Let's unpack the noteworthy bits of GPT-4's answer:
- Contextual Alignment: The suggestions can be based on the detailed factors we provided, ensuring the solutions align well with our requirements.
- Narrowed Choices: Instead of overwhelming the vast number of options available, the response distilled the list to a few select choices, simplifying our decision-making process.
- Brevity and Clarity: The answer was concise, providing a quick overview without delving too deeply into specifics, making it easier for stakeholders to comprehend.
Attribute a score to each solution.
After listing possible solutions, the next step is to evaluate them. GPT-4 provides objective overviews and comparisons of different solutions based on available data, which can reduce the bias often seen when relying solely on manufacturers' or developers' claims.
You can also directly ask for a table that will summarize the results more comprehensively and efficiently!
Here is the prompt step-by-step:
- Continue chatting from your last prompt: More context!
- Give the object of your question: "Give me an objective comparison."
- Focus GPT-4 on your few solutions: "Of Milvus and Pinecone for this task."
- Ask for the score attribution: "Assigning a score for each decision driver out of 5 stars."
- Ask how GPT-4 scores while making it reflect on its answer: "And explain the score differences for each decision driver to understand the scoring better."
- Format GPT-4 answer: "Wrap your explanation in a table."
In this score attribution prompt, the key is the formatting! We expect a lot of relevant information on how it factors each decision driver, which can be overwhelming for a reader. So a table has two benefits - making the answer pretty and helping GPT-4 to be concise! The 5-star rating is the most useful in this case.
Use the provided analysis to score each decision driver of your solutions. The GPT-4 answer offers a balanced, detailed breakdown tailored to specific criteria, which promotes informed decision-making. This approach gives a comprehensive yet concise overview, making choices straightforward, especially when expertise on a particular architecture is lacking.
Rule to get the best solution
Finally, making the actual decision can be daunting. Here, GPT-4 can assist by summarizing the pros and cons of each option, helping you make a well-informed decision.
The use of GPT-4 continues beyond decision-making. It will also help in documenting the ADR effectively.
Here is the prompt step-by-step:
- Summon a stakeholder! "As a Lead Data Scientist"
- Focus on why you will be satisfied with GPT-4 answer: "What would be the positive consequences of using Milvus in this case."
- Add more context on what GPT-4 should rule for: "To make the well-informed decision to select it instead of Pinecone."
- Format the GPT-4 answer: "I only need the positive consequences that you will curate to keep at most the five best ones."
Here, the focus is on the ruling. Feel free to experiment with different prompts using "As a CTO" or "as a senior data scientist" and make GPT-4 reflect the potential changes of the ruling using various stakeholders! Moreover, feel free to force the format by clarifying your needs, such as "One paragraph only."
However, it's imperative to note that while GPT-4 is immensely capable, it shouldn't be the only entity making decisions. It serves best as an advisor, offering perspectives and recommendations to steer decisions. As the human overseeing operations, you:
- Have the final say in all decisions.
- Bring a vital knowledge of the unique context.
- Understand the specific requirements of the project.
- Factor in nuances and elements that a machine might not fully comprehend.
Incorporating AI tools like ChatGPT into decision-making in data science offers a fresh perspective, and its value isn't limited to producing ADRs. Leveraging AI resources like ChatGPT help you make well-informed decisions, conserves time, and boosts efficiency, positioning you at an advantage in data science.
Wrapping up
You can extract many of the benefits of GPT-4 from your prompt engineering. Using ChatGPT for nine months now, what I've learned is the result of the diverse experimentations I have done and curated examples. My favorite way to discover new tips and tricks is by reading the Prompt Engineering Daily newsletter from Aadit Sheth. Consider subscribing to his newsletter!
GPT-4 is revolutionizing our approach to decision-making in data science. It lends its capabilities to formulating clear problem statements, constructing comprehensive lists of decision drivers, and generating potential solutions while assisting their evaluation, culminating in a streamlined decision-making process. By coupling human expertise with AI capabilities, we accelerate our decision-making process, enhance productivity, and ultimately steer our projects toward success.
This advancement marks the beginning of a period where AI-human collaboration is at the forefront, enabling us to surpass human limitations in decision-making and utilize AI capabilities for amplification.
However, as we progress, it provokes a crucial question - How can we employ the strength of AI to persist in expanding the limits of what's achievable in data science? The prospects are captivating!
But wait, there's more! If you have a GenerativeAI project to kickstart, we're here to help! Follow our GenerativeAI offers here! Our team of LLM experts is available to assist and guide you through your data journey.
Don't hesitate to contact us and let us be part of your success story! 🚀