Why Your AI Is Only as Good as Your Instructions

By Sadiq Balogun, Data Scientist at LIDA
Large language models (LLMs), or AI as most people call them, are becoming an essential part of our daily lives and jobs. They're no longer just chatbots for quick questions and answers; they're working behind the scenes in tools and technologies we rely on every day. For instance, when you record a meeting, and the app automatically generates a transcript with key points highlighted. This feature is powered by LLMs analysing patterns in data to make smart predictions.
As a data scientist, I recently discovered just how much these models depend on the quality of instructions you give them while working on a project to integrate a LLM into my app's database. But before I go into that journey, let me quickly explain what LLMs are and how they work.
LLMs are sophisticated systems trained on massive amounts of data, including text, images, and videos scraped from the internet, like social media posts, articles, blogs, research papers, and books. This training gives them a broad "knowledge" base, allowing them to generate human-like responses.
However, despite this training, there are still many aspects they are unfamiliar with. For example, ChatGPT wouldn't have a clue about the custom inventory tracking systems at Sainsbury’s or ASDA. These are unique, private details not publicly available online. See figure 1 to see ChatGPT’s response about ASDA’s inventory tracking system.

Figure 1. Screenshot of chat with ChatGPT
This is where prompting comes in. Prompting is essentially about crafting clear, detailed instructions that guide the AI on what to do and provide it with the context it needs. Think of it like hiring someone to help with your garden, asking them to 'tidy up' could mean anything, but specifying 'prune the roses, weed the flower beds, and cut the grass to 3cm' ensures you get exactly what you envisioned. Through my recent work, I've learned that poor prompting leads to mediocre results, while thoughtful prompting can unlock impressive capabilities. Let me show you exactly how this played out in my project.
My project involved making database querying more accessible. Traditionally, querying a database requires you to write SQL code, which can be intimidating if you are not a programmer. In most organisations, this creates a bottleneck where business users constantly need to ask developers or data analysts for simple data requests. These back-and-forth wastes time for everyone involved and slows down decision-making. Non-technical team members often have valuable questions about the data but lack the SQL skills to answer them independently. I wanted to create a solution that allows users to query the database using plain English instead of SQL code.
To implement this, I chose Meta’s LLama 3 8B-Instruct model. The 8B refers to 8 billion parameters, which is the model size and complexity. For context, the free tier ChatGPT, ChatGPT-4o is estimated to have over 200 billion parameters. Yes, today you learned each AI has different sizes 😊
Here's how the process works: I use a system prompt to give the model the database schema. When a user asks a question in natural language, the model interprets that request based on the schema I've provided, converts it into the corresponding SQL code, and returns that code to the user. The model doesn't run the query - it just generates the SQL code for the user to execute themselves. In other words, it doesn't have access to the data in the database itself. Figure 2 below shows this process visually.

Figure 2. Text-to-SQL process workflow
The most important part of this process is the system prompt. Figure 3 below shows the initial system prompt used for this app. It was quite basic, containing just the database schema without additional context. This allowed the model to generate simple queries, but it struggled with intermediate queries that involved joining tables and performing aggregations as seen in Figure 4.

Figure 3. Original basic system prompt

Figure 4. Query result of basic prompting
This poor performance prompted my research to understand why the model isn’t performing as expected. I realised that the model needed to understand the specific domain it was working with, and the way to achieve this was by including domain knowledge directly in the system prompt. For example, figure 5 show how the model responded to the same question in figure 4, using different prompts - the difference in results is quite telling. When the model becomes aware of the database context, it produces far more accurate results as seen in figure 5 below.

Figure 5. Query result after improving system prompt
Improving the system prompt became an iterative process - I tested multiple scenarios and used the results to refine the prompt further. This experience taught me that effective AI engineering isn't just about the technical implementation. You need a deep understanding of the specific domain you're working in, what your users want to achieve, and the patience to keep refining your approach until you bridge the gap between what the model can theoretically do and what it actually delivers in practice.
The figures 6-8 below show some of the domain-aware system prompt that enabled the model to generate much more sophisticated SQL code, as demonstrated in the figures 9 and 10 below.

Figure 6. Part 1 of improved system prompt with type of database and database schema context

Figure 7. Part 2 of improved system prompt with research context

Figure 8. Part 3 of improved system prompt with naming convention context

Figure 9. More accurate result after prompt improvement

Figure 10. Advance SQL generated with string manipulation
One particularly striking example of this improvement can be seen in Figure 10, where the model chose to use the “FROM” keyword, a PostgreSQL-specific way of converting text to numbers. This wasn't something we explicitly taught it to do, rather, the model understood from our context that it was working with a PostgreSQL database and automatically selected the most appropriate syntax for that environment. This small but significant detail demonstrates how providing proper context enables the AI to make intelligent technical decisions that align with the specific tools and requirements of your domain.
It's worth noting that other parameters also play an important role in this process. For instance, temperature is a setting that controls randomness. There's also maximum tokens and several other settings, but since this article focuses specifically on prompting, I won't dive into those technical details here.
In conclusion, this project showed me how crucial proper prompting is when working with LLMs. The difference between a basic prompt and a well-crafted, domain-aware one was significant. It turned a model that struggled with intermediate queries into one that could handle complex SQL generation effectively. The impact of implementing these models in databases is substantial. Non-technical users can get answers as quick as possible like experts, and SQL-savvy users can quickly generate complex queries as starting points and then refine them as needed, cutting their query-writing time significantly.
For anyone looking to integrate LLMs into their applications, the takeaway is simple: invest time in understanding how to communicate effectively with the model. The technology is powerful, but it requires clear, detailed instructions to perform well. As I discovered, a model is only as good as the prompts you give it.
By Sadiq Balogun,
LIDA Data Scientist,
Data Scientist Development Programme 2024-25