AI Agents and Tool Use (Sample)
How AI models go beyond chat by executing actions in the real world
An AI agent is a language model that can take actions — not just generate text. It can search the web, run code, call APIs, read files, and make decisions about what to do next. This shift from passive text generation to active problem-solving represents one of the most significant developments in applied AI.
From Chat to Action
A chatbot answers questions. An agent solves problems. The difference is autonomy: agents decide which tools to use, in what order, and how to handle errors.
Consider the difference in practice. You ask a chatbot: “What’s the weather in Tokyo?” It might tell you based on its training data — which is months or years old and almost certainly wrong. You ask an agent the same question, and it calls a weather API, retrieves the current data, and returns an accurate, up-to-date answer.
The chatbot generates plausible text. The agent interacts with the world.
The Spectrum of Autonomy
Not all agents are equally autonomous. There is a spectrum:
- Tool-assisted chat — the model can call tools, but only in direct response to user requests. One tool call per turn.
- Multi-step agents — the model can chain multiple tool calls together to accomplish a task, deciding the sequence on its own.
- Fully autonomous agents — the model operates independently over extended periods, making decisions, handling errors, and pursuing goals with minimal human oversight.
Most production systems today sit at levels 1-2. Fully autonomous agents are an active area of research with significant safety challenges still to solve.
Tool Use
Tool use lets an AI model call external functions. The model decides when a tool is needed, generates the right parameters, and incorporates the result into its response. This turns a text generator into a capable assistant.
How Tool Use Works
The mechanics are straightforward:
- Tool definition — you describe the available tools to the model, including their names, parameters, and what they do. This is typically provided as structured JSON in the system prompt or via a dedicated API field.
- Decision — when processing a user request, the model decides whether a tool would be helpful. If so, it generates a tool call with the appropriate parameters.
- Execution — your application executes the tool call (the model does not execute it directly) and returns the result.
- Integration — the model incorporates the tool result into its response to the user.
Example Tool Definition
{
"name": "search_documentation",
"description": "Search the product documentation for relevant articles",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return",
"default": 5
}
},
"required": ["query"]
}
}
The model sees this definition and knows it can search documentation. When a user asks a product question, the model generates a call like search_documentation(query="how to reset password"), your system executes the search, and the model uses the results to compose an accurate answer.
Common Tool Categories
Production agent systems typically offer tools in several categories:
- Information retrieval — web search, database queries, file reading, API calls
- Code execution — running Python, JavaScript, or shell commands in a sandboxed environment
- Communication — sending emails, posting messages, creating tickets
- File manipulation — creating, editing, and organizing files
- System operations — deploying code, managing infrastructure, running CI pipelines
The tools you provide define the boundaries of what the agent can do. A well-designed tool set gives the agent enough capability to be useful without enough power to be dangerous.
Agentic Loops
The most powerful pattern is the agentic loop: the model plans a step, executes it, observes the result, and decides the next step. This loop continues until the task is complete or the model determines it cannot proceed.
The Loop in Practice
Consider an agent tasked with debugging a failing test:
- Plan — “I should first read the test file to understand what it’s testing”
- Execute — calls
read_file("tests/auth.test.ts") - Observe — sees the test expects a 200 status but gets a 401
- Plan — “I should check the auth middleware to see what’s returning 401”
- Execute — calls
read_file("src/middleware/auth.ts") - Observe — finds that the token validation logic has a bug in the expiration check
- Plan — “I should fix the expiration comparison”
- Execute — calls
edit_file(...)with the fix - Observe — confirms the edit was made
- Plan — “I should run the test to verify the fix”
- Execute — calls
run_command("pnpm test tests/auth.test.ts") - Observe — test passes
- Complete — reports the fix to the user
Each step involves the model reasoning about the current state, deciding what to do next, and adapting based on what it discovers. This is fundamentally different from a linear script — the agent handles unexpected findings and changes course when needed.
Handling Errors in the Loop
Robust agents must handle failures gracefully. A tool might return an error, a file might not exist, or an API might be rate-limited. Good agent design includes:
- Retry logic — retry transient failures with backoff
- Alternative strategies — if one approach fails, try another
- Graceful degradation — if the task cannot be completed fully, complete as much as possible and explain what remains
- Loop limits — set a maximum number of iterations to prevent infinite loops when the agent gets stuck
Designing Effective Tools
The quality of an agent system depends heavily on the quality of its tools. Poorly designed tools lead to confused agents and incorrect results.
Tool Design Principles
- Clear names —
search_usersis better thanquery_db_1. The model uses the name to decide when to call the tool. - Descriptive parameters — include descriptions for every parameter. The model reads these descriptions to determine what values to pass.
- Focused scope — each tool should do one thing well. A
read_filetool and awrite_filetool are better than afile_operationstool with a mode parameter. - Useful errors — return clear error messages that help the model understand what went wrong and what to try instead.
- Idempotent when possible — tools that can be safely retried simplify error handling.
Risks
Agents that can take actions can take wrong actions. Sandboxing, confirmation steps, and human-in-the-loop reviews are essential safety measures for any production agent system.
Categories of Risk
- Destructive actions — an agent with file system access could delete important files. An agent with database access could drop tables. Sandbox environments and permission boundaries are essential.
- Data exfiltration — an agent that can both read sensitive data and make network requests could inadvertently (or through prompt injection) leak information.
- Runaway costs — an agent in a loop calling expensive APIs can rack up significant costs quickly. Budget limits and rate limiting are practical necessities.
- Incorrect actions taken confidently — the agent might misunderstand a request and take an irreversible action. For high-stakes operations, always require human confirmation.
Safety Patterns
Production agent systems should implement several safety patterns:
- Least privilege — give the agent only the tools it needs for its specific task, nothing more
- Sandboxing — execute code and file operations in isolated environments
- Confirmation gates — require human approval for destructive or irreversible actions
- Audit logging — record every tool call and its result for review
- Kill switches — provide mechanisms to immediately halt a running agent
- Budget limits — set hard caps on API calls, token usage, and compute time
The goal is not to prevent agents from being useful — it is to ensure they are useful within well-defined boundaries.
Comments