C1. Built-in Tools: Code Execution, Web Search, and the Tool Use Loop | Claude API Practitioner's Guide | DataMy

About this article This article is part of Building with Claude — A Practitioner's Guide to the Anthropic API, a study-notes-plus-commentary series based on Anthropic's official "Building with the Claude API" course (hosted on Coursera) and the public Anthropic API documentation at docs.anthropic.com.

Original course and documentation material is © Anthropic. Direct quotes are cited inline. Commentary, code adaptations, and examples are © DataMy. This series is independent and not affiliated with or endorsed by Anthropic.

Companion notebook: C1_builtin_tools.ipynb Setup: see README.md · Dataset: data/warehouse_usage.csv

Moving from generation to action

Every article in this series up to now has treated Claude as a text transformer: something goes in, text comes out. Tool use changes the model. Claude becomes an agent that can decide to take actions — call a function, run code, search the web — and incorporate the results into its response.

The Anthropic documentation frames the shift precisely:

"Tools allow Claude to interact with external services and perform actions beyond just generating text responses. By defining tools that Claude can call, you enable it to look up information, perform calculations, interact with APIs, and more." — Anthropic API Docs, "Tool use (function calling)", accessed 2026-06-10.

This article covers the mechanics of tool use from the ground up: how the API works, how to implement the execution loop, and two practical tools that cover the cases you encounter most often in data work — code execution and web search. C2 extends this to multi-tool agent loops with custom function calling.

1. The tool use execution model

Before the API details, a mental model worth holding.

Tool use is a conversation turn structure, not a Claude capability. Claude does not execute code. Claude does not search the web. What Claude does is decide to request a tool call, including the exact inputs it wants to pass. Your code executes the tool. The result comes back to Claude as the next turn in the conversation. Claude incorporates the result and either produces a final answer or requests another tool call.

The loop:

You:        messages=[{"role": "user", "content": question}]
Claude:     stop_reason="tool_use", content=[..., ToolUseBlock(name, input)]
Your code:  result = execute(name, input)
You:        messages.append(assistant turn)
            messages.append({"role": "user", "content": [ToolResultBlock(result)]})
Claude:     stop_reason="end_turn", content=[TextBlock(final_answer)]

Two things follow from this structure:

Claude is not doing the execution. It is generating a description of what it wants done, in a structured format you can parse. The actual execution happens in your environment, under your control.
The loop can repeat. If Claude needs multiple tool calls to answer a question, it will cycle through the loop — tool call, result, tool call, result — until it has what it needs. Your code drives the iteration.

This is what Anthropic calls the "agentic" pattern:

"In agentic contexts, Claude will sometimes act as an orchestrator of multi-agent pipelines and sometimes as a subagent within those pipelines, and sometimes as both. Orchestrators direct agents to use tools or undertake tasks. Subagents implement those instructions, taking actions with real-world consequences." — Anthropic API Docs, "Building with Claude — Agentic and multi-agent frameworks".

2. The API mechanics

Defining a tool

A tool definition has three parts: a name, a description, and a JSON Schema for its inputs. The description is the most important part — it is what Claude reads when deciding whether to call this tool.

tools = [
    {
        "name": "run_python",
        "description": (
            "Execute Python code and return the printed output. "
            "Use this to analyse data, compute statistics, or generate insights. "
            "A pandas DataFrame `df` containing Snowflake warehouse usage data is pre-loaded."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "Valid Python code. Use print() to produce output.",
                }
            },
            "required": ["code"],
        },
    }
]

Good descriptions answer three questions: when should Claude use this tool (not every tool is always appropriate), what the tool does, and how to construct valid inputs.

A tool use response

When Claude decides to call a tool, the response has stop_reason="tool_use" and a content list that contains one or more ToolUseBlock objects alongside any text Claude generated before deciding to call the tool:

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=4096,
    tools=tools,
    messages=[{"role": "user", "content": "Which warehouse had the highest credit spend last week?"}],
)

print(response.stop_reason)   # "tool_use"
for block in response.content:
    print(block.type)         # "text" and/or "tool_use"
    if block.type == "tool_use":
        print(block.name)     # "run_python"
        print(block.input)    # {"code": "print(df[df.date >= ...].groupby(...)...)"}
        print(block.id)       # "toolu_01..." -- needed for the tool_result

The block.id is critical. When you return the tool result, you must include the matching ID so Claude knows which tool call produced which result.

Returning a tool result

The follow-up user message carries the execution result inside a tool_result content block:

messages.append({"role": "assistant", "content": response.content})
messages.append({
    "role": "user",
    "content": [
        {
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": execution_output,   # the result as a string
        }
    ],
})

Then you call the API again. Claude reads the tool result and produces its final answer — or, if it needs more information, makes another tool call.

3. The execution loop

The pattern above, wrapped into a reusable function:

def run_tool_loop(
    messages: list[dict],
    tools: list[dict],
    execute_tool_fn,
    *,
    model: str,
    max_tokens: int = 4096,
    max_turns: int = 10,
) -> object:
    """Run Claude with tools until stop_reason='end_turn' or max_turns is reached.

    execute_tool_fn(name, input_dict) -> str
        Called by the loop whenever Claude requests a tool. Must return a string.
    """
    client = anthropic.Anthropic()
    for turn in range(max_turns):
        response = client.messages.create(
            model=model,
            max_tokens=max_tokens,
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            return response

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool_fn(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result),
                    })
            messages.append({"role": "user", "content": tool_results})

    raise RuntimeError(f"Tool loop reached max_turns={max_turns} without end_turn.")

The loop is simple because the pattern is simple: keep cycling until end_turn, routing each tool_use block to your dispatcher.

A note on max_turns: always set one. An agent loop that calls tools indefinitely is both expensive and hard to debug. Ten turns is a generous upper bound for most data analysis tasks.

4. Code execution as a tool

The most practical built-in pattern for data work: give Claude access to a Python interpreter and let it write and run analysis code against your data.

import io, sys, traceback
import pandas as pd

df = pd.read_csv("data/warehouse_usage.csv", parse_dates=["date"])

def execute_python(code: str) -> str:
    """Execute Python code in a namespace that includes df. Capture stdout."""
    namespace = {"pd": pd, "df": df}
    buf = io.StringIO()
    old_stdout, sys.stdout = sys.stdout, buf
    try:
        exec(compile(code, "<tool>", "exec"), namespace)
        return buf.getvalue().strip() or "(no output -- did you use print()?)"
    except Exception:
        return f"EXECUTION ERROR:\n{traceback.format_exc()}"
    finally:
        sys.stdout = old_stdout

When Claude receives df in its tool description, it can write arbitrary pandas code to answer the user's question. If the code has a bug, the error is returned as the tool result and Claude corrects itself on the next turn — a natural self-correction loop without any explicit retry logic.

What Claude does well with code execution

Exploratory analysis: "find the 90th-percentile daily credit spend per warehouse" — Claude generates the correct pandas aggregation on the first try in most cases.
Anomaly detection: "flag any date where credits exceeded 3× the warehouse's median" — Claude writes the z-score or IQR logic correctly.
Formatting results: "produce a summary table of the top three cost drivers" — Claude uses df.to_string() or manual formatting to return human-readable output.

Safety considerations

The exec() approach in a notebook or trusted internal tool is fine. For production systems handling arbitrary user input, sandbox properly: run code in a subprocess with resource limits, use a container with no network access, or delegate to a dedicated code execution service. Never exec() user-supplied code in an unrestricted production process.

5. Built-in server-side tools: web search

Some tools are executed by Anthropic's infrastructure rather than your code. The web search tool is the primary example: you declare it in the tools list with a special type identifier, and when Claude calls it, Anthropic performs the search and injects the results into the conversation automatically.

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=2048,
    tools=[
        {
            "type": "web_search_20250305",
            "name": "web_search",
        }
    ],
    messages=[{
        "role": "user",
        "content": "What is the current Snowflake enterprise credit pricing for US East?",
    }],
)

The key difference from client-executed tools: you do not handle the tool result. Anthropic executes the search, the results appear in Claude's context, and the response arrives with the answer already grounded in the search results. The stop_reason may still show tool_use blocks in the response content (the search queries Claude used), but you do not need to implement an execution loop — the conversation resolves in one API call.

Server-side vs client-side: the architectural distinction

Aspect	Server-side (web search)	Client-side (run_python)
Who executes	Anthropic's infrastructure	Your code
Tool result handling	Automatic	Your execution loop
Data access	Public internet only	Your environment, your data
Customisation	Limited to what the API exposes	Unlimited
Latency	One API call	One API call per tool turn

Use server-side tools when Anthropic has already built the right connector. Use client-side tools whenever the execution needs access to your own data, APIs, or environment — which is most of the time in enterprise data work.

6. `tool_choice`: controlling when Claude reaches for tools

By default, Claude decides autonomously whether to use a tool (tool_choice={"type": "auto"}). Two other modes are available:

# Force Claude to use at least one tool
tool_choice = {"type": "any"}

# Force Claude to use a specific tool
tool_choice = {"type": "tool", "name": "run_python"}

# Prevent tool use entirely (despite tools being defined)
tool_choice = {"type": "none"}

{"type": "tool", "name": ...} was introduced in B1 as a structured-output enforcer. The same mechanism appears here in its natural habitat: forcing a code execution call when you want to guarantee Claude analyses the data rather than answering from its training knowledge.

For most data analysis use cases, {"type": "auto"} is correct — let Claude decide when it needs to run code and when it can answer directly. Reserve {"type": "tool"} for cases where you specifically need a guaranteed execution trace.

7. Handling tool errors gracefully

Two error patterns worth designing for explicitly.

Execution errors: The tool ran but raised an exception. Return the traceback as the content of the tool_result. Claude reads the error, understands what went wrong, and typically corrects its code on the next turn. This self-correction behaviour is reliable for syntax errors, missing column names, and type mismatches.

Unavailable results: The tool ran but produced no useful output (e.g., a query returned zero rows). Return a clear string ("No results for this filter") rather than an empty string. Empty tool results confuse Claude's next-turn reasoning; an explicit "no results" message is unambiguous.

def safe_execute(name: str, tool_input: dict) -> str:
    try:
        raw = TOOL_DISPATCH[name](tool_input)
        return raw if raw.strip() else "(tool returned no output)"
    except KeyError:
        return f"ERROR: unknown tool '{name}'"
    except Exception as e:
        return f"EXECUTION ERROR: {type(e).__name__}: {e}"

Practitioner Notes

Write tool descriptions for Claude, not for humans. The description is the primary signal Claude uses to decide whether to call a tool. "Execute Python code" is worse than "Execute Python code against the warehouse_usage DataFrame to answer quantitative questions about Snowflake credit spend." The more specific the description, the more reliable the tool selection.
Return structured text from tools. Claude integrates tool output into its reasoning. A plain number ("14.7") gives Claude less to work with than "Total credits for WH_BI_M on 2025-07-22: 14.7 (3-sigma above the 90-day median of 4.8)." The extra context helps Claude draw the right conclusions.
Log every tool call in production. The tool call log is your audit trail: which tool Claude chose, what input it generated, what result it got, and how it used the result. Without it, debugging unexpected agent behaviour is nearly impossible.
Add a max_turns ceiling before you add a retry. The most common cause of infinite tool loops is not a logic error but a tool that always returns an error — Claude keeps retrying, paying output tokens each turn. A hard ceiling is cheaper and safer than elaborate retry logic.
Test each tool independently before testing the agent. The tool execution function is just a Python function. Test it with a fixed input/output pair before attaching it to the Claude loop. This isolates bugs in the tool from bugs in the agent's reasoning.

Beyond the Docs

The course covers tool use as a feature for "extending Claude's capabilities." Two architectural points the course documentation leaves implicit:

Tool use is how Claude interacts with your environment, not just with APIs. The execution loop described here is the same loop that drives database queries, file writes, Slack notifications, and Snowflake warehouse operations. Every agentic action in C2, D1, and D2 runs through this same structure. Understanding the loop at the API level — the turn structure, the tool_use block, the tool_result message — is the prerequisite for everything that comes after.
The quality of tool output determines answer quality more than prompt quality. A well-written system prompt cannot compensate for a tool that returns ambiguous results or empty strings on edge cases. In production agentic systems, time spent improving tool reliability and output clarity returns more than the same time spent refining the system prompt.

Previous: B5 — RAG Advanced: Reranking & Contextual Retrieval Next: C2 — Custom Tools & Function Calling Series index: Building with the Claude API — A Practitioner's Guide