C2. Custom Tools: Building a Multi-Tool Data Platform Agent | Claude API Practitioner's Guide | DataMy

About this article This article is part of Building with Claude — A Practitioner's Guide to the Anthropic API, a study-notes-plus-commentary series based on Anthropic's official "Building with the Claude API" course (hosted on Coursera) and the public Anthropic API documentation at docs.anthropic.com.

Original course and documentation material is © Anthropic. Direct quotes are cited inline. Commentary, code adaptations, and examples are © DataMy. This series is independent and not affiliated with or endorsed by Anthropic.

Companion notebook: C2_custom_tools.ipynb Setup: see README.md · Datasets: data/warehouse_usage.csv, data/pipeline_jobs.csv

From one tool to a tool suite

C1 established the fundamental tool use loop with a single run_python tool. Real agentic applications almost never stop at one. A data platform monitoring agent needs to check pipeline status, query cost data, verify data freshness, and fire alerts — each a distinct operation, each best expressed as its own tool.

This article builds that agent. The mechanics from C1 apply unchanged: define tools, run the loop, execute tool calls, return results. The new skill in C2 is tool design — deciding what tools to create, how to write their descriptions so Claude selects them correctly, how parallel tool calls work, and how to handle tools that produce side effects.

1. Designing tool schemas: the description is the interface

When Claude decides which tool to call, it reads your tool definitions the way a developer reads an API reference. The description is the primary signal — it determines both when Claude reaches for a tool and how it constructs the inputs.

"Provide extremely detailed descriptions. This is by far the most important thing you can do to improve tool performance. Describe what the tool does, when it should be used (and when it shouldn't), what each parameter means, and any important limitations or edge cases." — Anthropic API Docs, "Tool use — best practices for tool definitions", accessed 2026-06-10.

The difference between a tool Claude uses correctly and one it misuses is almost always in the description. Compare two versions of the same tool:

# Underspecified -- Claude may call this when it should call run_python instead
{
    "name": "get_job_status",
    "description": "Returns job status data from the pipeline jobs table.",
    ...
}

# Precise -- tells Claude exactly when to use this tool vs. alternatives
{
    "name": "get_job_status",
    "description": (
        "Look up the recent run history for a named pipeline job. "
        "Call this when the user asks about pipeline health, recent failures, "
        "job duration trends, or whether a specific job ran recently. "
        "Returns the last N runs with status, duration, credits, and error messages. "
        "For cross-dataset analysis (e.g. correlating job failures with warehouse cost), "
        "call this alongside get_warehouse_spend in the same turn."
    ),
    ...
}

The second version does three things the first does not: it names the triggering intent ("pipeline health, recent failures, job duration trends"), it distinguishes this tool from related tools, and it explicitly hints at when to call multiple tools together.

Schema design principles

A few rules that hold across tool suites of any size:

Single responsibility. Each tool should do one thing. A get_pipeline_and_warehouse_data tool that returns both job history and spend figures looks convenient but forces Claude to call a heavy tool even when it only needs one of the two datasets. Two separate tools that each do one thing give Claude finer-grained control.

Granular parameters over flags. get_job_status(job_name, days=7) is better than get_job_status(job_name, include_failures=True, include_recent=True, days=7). Flags that are almost always True are noise; omit them or bake them into the tool's default behaviour.

Explicit enum values in the description. If a parameter only accepts specific values ("status must be one of: success, failed, running, skipped"), put that in the description, not just in the JSON Schema enum field. Claude reads the description; the schema is for validation.

2. The four-tool monitoring suite

The companion notebook builds a monitoring agent for Acme's data platform using two datasets (warehouse_usage.csv and pipeline_jobs.csv) and four custom tools:

get_job_status(job_name, days=7, status_filter=None) Returns recent run history for a named pipeline job: started_at, duration, status, credits, error_message. Use to answer questions about pipeline health, recent failures, or job duration trends.

get_warehouse_spend(warehouse_name, start_date, end_date) Returns daily credit consumption for a named warehouse over a date range. Use for cost trend analysis and anomaly detection.

check_data_freshness(job_name) Returns the time elapsed since the last successful run of a job, and whether it is within its SLA window (defined as 26 hours for daily jobs). Use when the user asks "is this table up to date?" or "when did X last run successfully?"

send_alert(severity, title, message) Records an alert to the session log (simulating a real alerting system). Severity is one of: info, warning, critical. Call this when you identify a condition requiring human attention. This action cannot be undone — use critical severity only for genuine incidents.

The four tools together cover the monitoring cycle: detect a problem (get_job_status, get_warehouse_spend), assess severity (check_data_freshness), and act (send_alert).

3. Parallel tool calls

When Claude determines that two pieces of information are independent, it may call both tools in a single response turn rather than sequentially. This is one of the more powerful behaviours in the agentic pattern:

"Claude may call multiple tools in parallel when processing a request. This is especially useful when multiple tool calls are independent of each other, allowing Claude to gather information faster." — Anthropic API Docs, "Tool use — parallel tool use".

The API signal: multiple tool_use blocks appear in response.content in the same response. The execution loop handles this naturally — all tool_use blocks are executed and their results returned in one tool_result user message:

tool_results = []
for block in response.content:
    if block.type == "tool_use":
        result = dispatch(block.name, block.input)
        tool_results.append({
            "type": "tool_result",
            "tool_use_id": block.id,   # must match the block that requested it
            "content": str(result),
        })
# ALL results for this turn go in one user message
messages.append({"role": "user", "content": tool_results})

A critical constraint: all tool_result blocks for a given assistant turn must be returned in a single user message. Sending them as separate messages is an API error.

The question that reliably triggers parallel calls: "Is there a correlation between the pipeline failures on 2025-07-15 and the warehouse cost data for that week?" Claude recognises that both get_job_status and get_warehouse_spend are needed, and that neither depends on the other — so it calls both simultaneously.

4. Tool chaining: output informs input

Parallel calls work when tools are independent. A different pattern — tool chaining — applies when the output of one tool determines the input to the next.

Example: "Which job has the worst SLA compliance rate this month, and when did it last fail?"

Claude calls get_job_status with a broad date range to identify all jobs with failures.
From the result, it identifies dbt_fct_subscriptions as the worst offender.
Claude calls check_data_freshness("dbt_fct_subscriptions") to get the last-success timestamp.
Claude produces its final answer.

Steps 1 and 3 are sequential — step 3 requires the output of step 1. The loop handles this automatically: each tool call returns a result that appears in Claude's context for the next turn, and Claude decides what to do next.

No special orchestration code is needed. The loop from C1 (cycle until end_turn or max_turns) handles both parallel and chained patterns correctly.

5. Side-effect tools: tools that write state

send_alert is different from the first three tools in one critical way: calling it has consequences outside the current conversation. In a real deployment, this might page an on-call engineer, create a ticket, or trigger an automated remediation workflow.

Design rules for side-effect tools:

Describe the consequence, not just the action. "Records an alert to the incident log and notifies the on-call team via PagerDuty" is more accurate than "sends an alert." Claude uses the description to decide whether the stakes justify the call.

Use irreversibility as a selector. A tool that writes to a staging log is different from one that pages an engineer. If your system has both, expose them as separate tools with explicit severity levels. Reserve the disruptive action for conditions that justify it.

Log every call. Unlike read-only tools, side-effect tool calls leave traces in the world. Your execution loop should log the full (tool_name, input, result, turn, timestamp) tuple for every side-effect tool call. This is your audit trail.

The Anthropic documentation on agentic footprint applies directly:

"Prefer reversible over irreversible actions, and err on the side of doing less and confirming with users when uncertain about intended scope in order to preserve human oversight." — Anthropic API Docs, "Build with Claude — Agentic and multi-agent frameworks", accessed 2026-06-10.

6. When tools beat prompting

A tool is the right choice when the required information is:

Dynamic — it changes faster than a system prompt update cycle (daily pipeline run history, hourly warehouse credits)
Precise and computed — it requires arithmetic over a dataset rather than text retrieval (90th-percentile credit spend, z-score anomaly detection)
Scoped to the query — only a subset of the data is relevant to each question; you cannot send all 630 rows of warehouse_usage.csv in every prompt

A tool is the wrong choice when:

The information fits comfortably in the system prompt and changes infrequently (e.g., the list of warehouse names and their owners)
The question is structural ("what does this runbook say about auto-suspend?") — RAG (B4/B5) handles this better
The tool just wraps a static lookup that produces the same answer every time

The clearest test: if the answer would be the same regardless of the current date, it probably belongs in the system prompt or a RAG index. If the answer requires the model to see today's data, it belongs in a tool.

7. Structuring the tool output for Claude

How a tool formats its output has a larger effect on answer quality than most practitioners expect. Claude integrates tool output verbatim into its reasoning; ambiguous output produces ambiguous conclusions.

Compare two versions of the same data:

# Sparse output -- Claude has to infer the context
WH_BI_M,2025-07-22,8.37
WH_BI_M,2025-07-23,8.91
WH_BI_M,2025-07-24,9.12

# Contextualised output -- Claude knows what it is looking at
WH_BI_M daily credits (2025-07-20 to 2025-07-25):
  date        credits  vs_baseline  note
  2025-07-20  5.92     +0%
  2025-07-21  5.88     -1%
  2025-07-22  8.37     +42%         ANOMALY (>1.25x baseline)
  2025-07-23  8.91     +51%         ANOMALY
  2025-07-24  9.12     +54%         ANOMALY
  2025-07-25  6.01     +2%
  Baseline (30-day median): 5.91 credits/day

The second version is three lines of formatting work in the tool function. It is the difference between Claude saying "there appear to be elevated credits in late July" and "WH_BI_M experienced a 42–54% cost elevation from 2025-07-22 through 2025-07-24, consistent with the Tableau extract proliferation incident documented in the runbook."

Practitioner Notes

Write the description before writing the implementation. If you cannot describe when a tool should be called in two sentences, the tool's scope is unclear. Clarify the scope, then implement.
Return strings, not dicts or objects. Tool results are content: str. Convert everything to a formatted string before returning — a pandas DataFrame printed with to_string() is cleaner than a JSON dict.
Test each tool with a fixed input/output before attaching it to the loop. A tool that returns the wrong thing for a known input will mislead Claude silently. Unit test the tool functions independently before agent integration.
Use tool_choice={"type": "any"} for monitoring agents. When you want the agent to always ground its answer in current data rather than inference, force at least one tool call. "Any" ensures Claude uses a tool without forcing a specific one.
Audit side-effect tools separately from read-only tools. Keep two logs in your execution loop: a read log (get_job_status, get_warehouse_spend) and an action log (send_alert). Reviewing the action log tells you what the agent did; reviewing the read log tells you what it knew.

Beyond the Docs

The course introduces custom tool use as a way to "give Claude access to external data and APIs." Two consequences of a well-designed tool suite that the course doesn't make explicit:

The tool suite is where domain knowledge lives. A system prompt describes how the agent should reason; the tools describe what the agent can see. The contextualised, baseline-annotated output format from §7 is a form of domain encoding: you are teaching the agent what "anomalous" means by expressing it in the tool output rather than in a prompt. This is often more reliable than prompt engineering because it is applied at the point of data access rather than at the point of reasoning.
Multi-tool agents expose the quality of your data. A single-tool agent that runs code can hide data quality problems by producing plausible-looking answers from messy data. A multi-tool agent that cross-references job failures against warehouse spend against data freshness will surface inconsistencies that a single-tool agent misses. The monitoring agent in this article is simultaneously a Claude use case and a data quality check on the underlying datasets.

Previous: C1 — Built-in Tools: Code Execution, Web Search & Text Editor Next: D1 — MCP Concepts & Using an MCP Server as a Client Series index: Building with the Claude API — A Practitioner's Guide