ExplainerDay 4

Tool Use: the moment your AI system crosses a boundary

The first three patterns work inside what the model already knows. Tool Use is the moment it reaches outside and touches real systems. That boundary crossing changes who owns the risk.

2026-06-26·7 min

The failure this pattern exists to prevent

This is an illustrative example of a failure pattern that appears in production. An internal operations agent is given access to a search tool, a document retrieval tool, and a data export tool. The team scoped it carefully for the search and retrieval functions. The export tool was included because it was useful in testing and nobody formally reviewed whether it should be available in production.

Six weeks after go-live, the agent was asked to compile a summary report on customer activity for a specific region. It searched, retrieved, and then exported. The export included records from outside the requested region because the model interpreted the export scope more broadly than the task required. A dataset of roughly four thousand customer records landed in a shared folder the agent had write access to.

Nobody asked it to export. Nobody reviewed the export scope. Nobody had defined what the agent was and was not allowed to call. The tool was there, the model selected the tool based on its interpretation of the task, and it used it.

That is not a model failure. That is a system design failure. And it is the kind of failure that Tool Use, when governed properly, helps prevent.

What the pattern is

Tool Use, also called Function Calling, is the mechanism by which a language model is given access to external functions it can invoke during a task. The model receives a description of each available tool, decides when and how to call it, constructs the parameters, and processes the response.

The capability is now well established in mainstream LLM platforms. Function calling became widely available to developers from 2023 onwards as major model providers began supporting it in some form. The pattern is not new. What remains underspecified in most implementations is the governance layer around it.

A tool description tells the model what a function does. It does not tell the model what it is allowed to do. Those are different things, and the gap between them is where most Tool Use failures originate.

When you give a model a tool, you are making four decisions whether you realize it or not. You are deciding what systems the model can reach. You are deciding what data it can read or write. You are deciding what actions it can take on behalf of your organization. And you are deciding what the blast radius is if it calls that tool incorrectly, repeatedly, or in a context you did not anticipate.

Most teams make the first decision consciously and the remaining three by omission.

A concrete before and after

A customer service agent is given three tools. A knowledge base search, an order lookup, and a ticket creation function.

Without a governed tool use implementation the setup looks like this. Tools available:

search_knowledge_base(query) lookup_order(order_id) create_ticket(customer_id, issue, priority)

System prompt: You are a helpful customer service agent.

Use the available tools to help customers resolve their issues.

The model has access to ticket creation with priority as a parameter. Nothing in the setup defines what priority levels are valid, what conditions justify each level, or whether the model is allowed to set priority at all versus surfacing it for a human to decide. In testing this works fine because test cases are well-formed. In production a frustrated customer using emphatic language triggers a high-priority ticket for an issue that does not warrant it. The support queue gets distorted. Nobody can explain why without pulling the model traces.

With a governed tool use implementation the setup looks like this. Tools available:

search_knowledge_base(query) lookup_order(order_id) create_ticket(customer_id, issue)

Removed from model access: priority parameter.

Priority is set by a downstream rule engine based on

issue type and customer tier, not by the model. System prompt: You are a customer service agent. Use the

available tools to find information and log issues.

Do not infer urgency or escalation level. If a customer

indicates an urgent situation, log the issue and inform

them it will be reviewed. Do not set priority directly. Tool call logging: every function call, parameter set,

and response is written to an immutable audit log with

a timestamp and session identifier.

The second implementation exposes less to the model. That is the point. The model's tool access is scoped to what it needs for the task, not what might be useful in some future scenario. Every call is logged. A downstream system owns the decisions the model should not be making.

Why this matters beyond functionality

The first three patterns in this series work inside the model's context. Chain of Thought, ReAct, and Reflection all operate on text that flows in and out of the model. The consequences of an error are an incorrect output that a human can review, dispute, or discard.

Tool Use changes that. When a model calls a function, it is no longer producing text for a human to evaluate. It is issuing instructions to systems that will act on them immediately, often without a human in the loop.

That shift has three governance implications that are worth making explicit.

The first is access control. Every tool available to a model is an access grant. It should be reviewed with the same scrutiny as any other access grant in your system. Who approved this. What is the minimum scope required. What happens if this access is misused. These are not AI-specific questions. They are standard information security questions that most teams forget to ask when the entity requesting access is a model rather than a person.

The second is auditability. A model that calls tools without a complete call log is a system you cannot investigate after something goes wrong. At minimum the log needs to capture what was called, what parameters were passed, and what was returned. Where reasoning traces are available, linking them to the tool call log gives you a richer basis for investigation. Some production systems rely on structured decision logs instead of full reasoning traces, which is a reasonable approach as long as the log captures enough context to reconstruct what the model was doing.

The third is anomaly detection. A model calling a tool once is expected behaviour. A model calling the same tool an unusual number of times in a short window is a signal. What counts as unusual depends on your baseline, which is why defining normal behaviour before you ship matters. A model calling a tool with parameters outside the normal range for that function is a signal. These patterns are detectable with basic monitoring and they appear before incidents become serious. Most teams do not have this monitoring in place because they added tools incrementally and never defined what normal behaviour looks like.

Where Tool Use breaks down

Parameter hallucination. The model constructs a plausible-looking function call with parameter values it invented rather than extracted from context. This is most common when the required parameter is not clearly present in the conversation and the model fills the gap rather than asking for clarification or returning an error. The call succeeds at the API level. The result is based on fabricated input.

Scope creep at the tool boundary. The model reasons its way into calling a tool that was not required for the task but seemed adjacent. The export failure in the opening example is this pattern. The model did not malfunction. It made a locally reasonable decision that sat outside the intended scope. Tight tool descriptions and explicit scope constraints in the system prompt reduce this but do not eliminate it.

Error misinterpretation. The model receives a 4xx or 5xx response from a tool and treats it as a soft signal rather than a hard stop. It retries, adjusts parameters, or proceeds with the task using incomplete data. In a well-governed implementation, a tool error should trigger an explicit handling path, not be left to the model to interpret.

Silent success with wrong data. The tool call succeeds, returns a result, and the model accepts it without sanity-checking whether the result makes sense for the context. A search that returns zero results, an order lookup that returns a different customer's record due to a parameter error, a date field that returns in an unexpected format. The model moves forward. The error only surfaces downstream when someone notices the output does not match reality.

Tool sprawl. Over time, agents accumulate tools. A function added for one use case remains available for all subsequent ones. Nobody reviews the full tool list against the current task scope. The model has access to functions that have no relevance to what it is being asked to do, which increases the surface area for unintended calls. This is an operational hygiene problem as much as a technical one.

The question to ask before you add a tool to an agent

For every tool you are considering giving a model access to, answer these questions before you add it.

What is the minimum scope this tool needs to do its job. If the tool has parameters the model does not need to control, remove them from the model-facing interface or handle them in a wrapper.

What does a normal call to this tool look like in terms of frequency, parameter ranges, and return size. Write that down. It becomes your anomaly detection baseline.

What is the worst realistic outcome if this tool is called with incorrect parameters, called repeatedly, or called in a context you did not design for. If that outcome is reversible, document the reversal process. If it is not reversible, add a human confirmation step before the model can call it.

Is every call to this tool being logged with enough context to reconstruct what the model was doing when it made the call. If not, the tool is not ready for production regardless of how well it performs in testing.

What comes next

Tool Use connects the model to external systems. Planning connects those tool calls into sequences that work toward a goal across multiple steps.

Day 5 covers the Plan and Execute pattern, which is where the model does not just call tools in response to a user request but constructs a multi-step plan first and then executes it. That additional layer introduces a new class of failure modes, and a new set of questions about who reviews the plan before execution begins.

For every tool your current AI system has access to, do you have a written record of who approved that access and what the acceptable use boundaries are?

Part of the TrilokCloud AI Design Patterns series · One pattern per day · trilokcloud.in/blog