How AI is tuned for real-world workloads
Inference is when you call a model to get a prediction or an answer based on given conditions.
1. Prompt engineering
The most common approach. You pass instructions and context directly to the model.
2. Dynamic context
Models are trained on general data, but business problems require your up-to-date, internal data.
Before inference, you prepare context, for example by querying your database.
Then you run standard prompt engineering using that prepared context.
3. RAG (Retrieval-Augmented Generation)
Sometimes it’s hard to prepare context without involving AI. For example, when you get a free-form user query and need to find relevant data.
Full-text search tools like Elasticsearch can help, but the quality is often not great.
This is where AI helps:
Convert text into numerical vectors (embeddings) using AI
Store both the data and vectors in a vector database
Perform vector similarity search without AI involvement
Inject the retrieved context into the prompt for inference
4. Function calling
Prompt size is limited. When context becomes too large, previous approaches stop working.
Instead of passing all the data, you provide the model with a list of available functions.
The model decides which function to call to get the data it needs.
You effectively act as a router for the model.
5. MCP (Model Context Protocol)
Being a router becomes expensive as inference logic gets more complex.
External system APIs can be registered as MCP servers.
During inference, the model can directly query them through its internal tools to get required data.
Example: Gmail MCP server. You ask the model to find emails from former colleagues. The model queries Gmail and filter required emails.
6. AI agents
Complex business processes are typically handled by BPM engines like Camunda.
But when transitions between states aren’t strictly defined, BPM falls short.
That’s where AI comes in.
AI manages the execution of the process, and each step can be handled by different models. This is what we call an AI agent.
This isn’t a full reference, just a simplified guide.


