Tool Use Patterns for LLMs in 2026: Parallel Calls, Error Handling, Schema Design
Tool use (sometimes called function calling) is the mechanism that turns an LLM from a text-in/text-out system into something that can actually do things: search a database, call an API, run code, modify files, or take any action in the world. Getting tool use right is the difference between an agent that works reliably and one that constantly gets stuck, calls the wrong tool, or produces errors it can't recover from.
This post is about the design decisions that matter in practice: how to structure tool schemas so models call them correctly, how to handle parallel tool execution, and how to build error recovery that keeps agents from getting stuck.
The basic pattern
Tool use follows a consistent pattern across OpenAI, Anthropic, and most other providers:
- You define tools in your API request as a list of JSON schema objects describing the tool's name, description, and parameters
- The model decides whether to call a tool, and if so, returns a tool call with filled-in arguments
- You execute the tool call on your side and send the result back to the model
- The model continues, potentially calling more tools or producing a final response
The model doesn't execute tools itself. It returns a structured request for you to execute a tool, and you pass the result back. This is important for security and reliability: the model's outputs are always text, and you control what actually gets executed.
A minimal example with the Anthropic API:
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
]
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Paris?"}]
)
The model returns a tool_use content block with the tool name and arguments. You extract those, call your actual weather API, and send the result back as a tool_result message.
Schema design: what actually matters
The quality of your tool schemas directly determines how reliably the model calls tools correctly. Ambiguous or poorly documented schemas produce wrong calls; clear, specific schemas produce correct calls.
Descriptions are load-bearing. The description field on both the tool and each parameter is not optional decoration. The model uses these descriptions to decide when to call the tool and how to fill in parameters. A tool description that says "Get data" will produce inconsistent behavior. A description that says "Retrieve the current account balance for a specific user ID. Returns a float representing the balance in USD." will be called correctly.
Use enums to constrain discrete choices. If a parameter can only take specific values, list them as an enum. Don't leave it as a free-form string and hope the model guesses right.
# Bad: model might write "USD", "usd", "dollars", "US dollars"
"currency": {"type": "string", "description": "Currency code"}
# Good: model picks from the list
"currency": {
"type": "string",
"enum": ["USD", "EUR", "GBP", "JPY"],
"description": "ISO 4217 currency code"
}
Mark required vs. optional explicitly. The required array in JSON schema tells the model which parameters it must provide. Parameters not in required are optional. Use this intentionally: if the model should almost always provide a parameter, make it required even if your code has a default. Let the model explicitly decide on optional parameters rather than having them silently default.
Name tools unambiguously. Tool names should be self-describing even without the description. search_documents is better than search. get_user_by_email is better than get_user. When the model has 20 tools available, names that communicate intent at a glance produce fewer wrong-tool selections.
Parallel tool calls
Models from OpenAI (GPT-4o and above), Anthropic (Claude 3+ family), and Google (Gemini 2.x) all support returning multiple tool calls in a single response. The model can decide that two independent operations should happen simultaneously rather than sequentially.
This matters for performance. Consider a task like "summarize the weather and stock price for New York." Without parallel tool calls, the model would call get_weather, wait for the result, then call get_stock_price, then write the summary. Three round trips. With parallel tool calls, both tools fire in the first response, you execute them simultaneously, and the model writes the summary in the next call. Two round trips, roughly half the elapsed time.
The model decides whether to use parallel calls based on whether the tasks seem independent. Your job is to structure your tool definitions so independent operations look independent to the model. Don't over-merge tools: a single get_weather_and_stocks tool prevents the model from deciding to run those operations in parallel when appropriate.
Executing parallel tool calls:
import asyncio
async def execute_tool_calls(tool_calls):
tasks = []
for tool_call in tool_calls:
if tool_call.name == "get_weather":
tasks.append(get_weather(**tool_call.input))
elif tool_call.name == "get_stock_price":
tasks.append(get_stock_price(**tool_call.input))
results = await asyncio.gather(*tasks)
return results
Send all results back to the model in a single response with multiple tool_result content blocks matching each tool call by ID. The model expects to see results for all tool calls it made before continuing.
Error handling: don't just crash
Tool calls fail. External APIs return errors, databases time out, parameters are invalid. How you handle these failures determines whether your agent recovers gracefully or gets stuck in a broken state.
The key principle: always send a result back to the model for every tool call, even if the tool failed. If you silently drop failed tool calls, the model is waiting for results that never come.
async def safe_tool_call(tool_name, tool_args):
try:
result = await execute_tool(tool_name, tool_args)
return {
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": json.dumps(result)
}
except Exception as e:
return {
"type": "tool_result",
"tool_use_id": tool_call.id,
"is_error": True,
"content": f"Error: {str(e)}"
}
The is_error: true flag tells the model the tool failed. A well-prompted model will then decide what to do: retry with different parameters, try an alternative approach, or tell the user there's a problem.
Give models actionable error messages. "Error: timeout" is unhelpful. "Error: database query timed out after 5 seconds. Try again with a narrower date range." gives the model information it can use to recover.
Limit retry depth. If an agent retries the same failed tool call in a loop, you need a circuit breaker. Track the number of times a specific tool has been called in a session and stop the loop after 3-5 attempts. Log the stuck state for debugging.
Forced tool use vs. auto mode
Both OpenAI and Anthropic let you control whether the model is allowed to respond without calling a tool, or whether it must call a tool.
In Anthropic's API, tool_choice accepts:
auto(default): model decides whether to call a toolany: model must call at least one tooltool: model must call a specific named tool
In OpenAI's API, tool_choice similarly accepts auto, required, or a specific tool reference.
Use forced tool use when you need guaranteed structured output. If you define a single tool with the schema of the output you want, set tool_choice to that tool, and the model must return a valid call matching that schema. This is a reliable way to get structured output from any model that supports tool calling, even if it doesn't have native JSON output mode.
# Force the model to return user information in a specific schema
extract_user_tool = {
"name": "extract_user_info",
"description": "Extract user information from text",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"company": {"type": "string"}
},
"required": ["name", "email"]
}
}
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
tools=[extract_user_tool],
tool_choice={"type": "tool", "name": "extract_user_info"},
messages=[{"role": "user", "content": user_bio_text}]
)
Tool granularity: how many tools is too many?
There's a real performance degradation when you give a model too many tools at once. With 5-10 tools, selection accuracy is high. With 30-40 tools, models start calling the wrong tool more often, especially tools with similar descriptions or overlapping scope.
Some approaches to managing large tool sets:
Dynamic tool selection: Rather than giving the model all tools at every step, determine which tools are relevant to the current task and include only those. A retrieval step over tool descriptions can find the most relevant 5-10 tools for a given user request.
Tool namespacing: Group tools by domain and use multi-step agents where the first agent selects a domain (and therefore a subset of tools) before the task-specific agent executes with the right tool set.
Tool documentation in the system prompt: For complex tools, supplement the schema description with examples of when to use the tool and when not to use it. The schema description field has limited space; move detailed usage guidance to the system prompt.
The pattern that actually works in production
After experimenting with many different agentic patterns, the one that holds up best in production:
- Tight schemas with explicit descriptions on every field
- Parallel calls enabled for independent operations
- Error results always returned with actionable messages
- Max iteration limit as a hard circuit breaker (5-10 steps for most agents)
- Logging of every tool call and result for debugging
The complexity of agentic systems grows fast. Keeping the tool interface simple and explicit is what keeps agents debuggable when things go wrong, which they will.
Tool use has matured a lot in 2026. The primitives are stable, the models are reliable on well-defined schemas, and the main remaining challenge is designing the right tool set for your specific domain. That design problem, unlike the API mechanics, doesn't have a general solution.