Agents2025-12-12

AI Agent Tool Integration & Function Calling: Design, Contracts, and Safety

How to wire tools into agents safely and reliably: function schemas, argument validation, tool routing, retries, observability, and evaluation—plus clear examples.

Function calling turns prompts into safe, structured actions. Tools are not “magic”—they are contracts: name, description, input schema, auth, and side-effects. Good agents compose small tools, validate inputs, and log every call.

Quick answer

  • Design tools as small, single-responsibility functions with explicit JSON schemas.
  • Validate arguments server-side; never trust model-generated inputs.
  • Route intents to tools; add allowlists and permissions.
  • Log and evaluate calls, failures, latency, and user outcomes.

1) Tool schema (contract)

{
  "name": "search_docs",
  "description": "Search enterprise docs by keyword and tag",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": { "type": "string", "minLength": 2 },
      "tags": { "type": "array", "items": { "type": "string" }, "default": [] },
      "limit": { "type": "integer", "minimum": 1, "maximum": 50, "default": 10 }
    },
    "required": ["query"]
  },
  "auth": "user",
  "side_effects": false
}
      

2) Good vs Bad tool integration

Bad Issue Good Why
Free-form strings to DB tool No schema; injection risk; brittle JSON schema + validation + parameterized queries Safe and maintainable
One giant tool that “does everything” Hard to test; poor routing; hidden effects Small tools (search, fetch, update) composed by agent Modular and debuggable
No logs for tool calls No visibility; hard to improve Structured logs: tool, args, latency, outcome Observability for evals and fixes

3) Routing and permissions

  • Router: map intents to tools (keywords, embeddings, simple rules).
  • Allowlist: per-user/session tool availability with scope.
  • Auth: user vs service tools; require tokens for side-effects.
  • Rate limits: protect expensive tools; backoff and queue.

4) Safety and validation

  • Schema validation: reject invalid types and missing fields.
  • Sanitization: escape strings; blockpaths; redact secrets.
  • Policy checks: ensure tenant/ACL constraints on every call.
  • Dry-run mode: for dangerous tools, require explicit confirm.

5) Orchestration patterns

  • Plan → act → observe: agent drafts plan, calls tools, reflects, and answers.
  • Decompose: break tasks into small steps and checkpoint.
  • Retry: exponential backoff; circuit-breakers on repeated failures.
  • Caching: cache tool results keyed by args for speed.

6) Example: search → fetch → summarize

// Tool call 1
{
  "tool": "search_docs",
  "args": { "query": "rotate API key", "tags": ["security"], "limit": 5 }
}
// Tool result
{
  "results": [ {"id": "DOC-12", "title": "API key rotation"}, {"id": "DOC-33", "title": "SSO security"} ]
}
// Tool call 2
{
  "tool": "fetch_doc",
  "args": { "id": "DOC-12" }
}
// Final answer: summarize with citations
{"answer": "Go to Admin → Auth → Keys...", "citations": ["DOC-12"], "confidence": 0.86}
      

7) Observability and evaluation

  • Logs: tool name, args hash, latency, status, error.
  • Metrics: success rate, retries, p95 latency, user outcomes.
  • Evals: task-specific success, preference wins, regression tests.

8) Try it: minimal validator + router

// Minimal JSON schema validator (runtime)
type JSONSchema = { type: string; properties?: Record; required?: string[] };

function validate(schema: JSONSchema, data: any) {
  if (schema.type !== typeof data && !(schema.type === 'object' && typeof data === 'object')) {
    return { ok: false, error: 'Expected ' + String(schema.type) };
  }
  const req = schema.required || [];
  for (const key of req) {
    if (!(key in data)) return { ok: false, error: 'Missing field: ' + String(key) };
  }
  return { ok: true };
}

// Tool registry
const tools = {
  search_docs: {
    schema: {
      type: 'object',
      required: ['query'],
      properties: {
        query: { type: 'string' },
        tags: { type: 'array' },
        limit: { type: 'number' }
      }
    },
    run: async ({ query, tags = [], limit = 5 }: { query: string; tags?: string[]; limit?: number }) => {
      // Replace with your search impl
      return { results: [{ id: 'DOC-12', title: 'API key rotation' }] };
    }
  },
  fetch_doc: {
    schema: { type: 'object', required: ['id'], properties: { id: { type: 'string' } } },
    run: async ({ id }: { id: string }) => {
      // Replace with your fetch impl
      return { id, content: 'To reset SSO, go to Admin → Auth → SSO...' };
    }
  }
};

// Simple router stub
function routeIntent(userQuery: string) {
  if (/doc|fetch/i.test(userQuery)) return 'fetch_doc';
  return 'search_docs';
}

// Execute tool with validation
async function callTool(name: keyof typeof tools, args: any) {
  const tool = tools[name];
  const v = validate(tool.schema as any, args);
  if (!v.ok) throw new Error('Invalid args for ' + String(name) + ': ' + String(v.error));
  return await tool.run(args);
}

// Example
async function example() {
  const toolName = routeIntent('rotate API key');
  const res1 = await callTool(toolName as any, { query: 'rotate API key', tags: ['security'], limit: 5 });
  const doc = await callTool('fetch_doc', { id: res1.results[0].id });
  return { answer: 'Go to Admin → Auth → Keys...', citations: [doc.id], confidence: 0.86 };
}
      

9) Try it: side-effect tool with allowlist + dry-run

// Allowlist (per user/session)
const allowedTools: Record = {
  user_123: ['search_docs', 'fetch_doc', 'update_user_email']
};

function isAllowed(userId: string, tool: string) {
  return (allowedTools[userId] || []).includes(tool);
}

// Simple sanitization helper
function sanitizeEmail(input: string) {
  const s = input.trim().toLowerCase();
  if (!/^[^s@]+@[^s@]+.[^s@]+$/.test(s)) throw new Error('Invalid email');
  return s;
}

// Side-effect tool (requires confirm)
const update_user_email = {
  schema: { type: 'object', required: ['userId', 'newEmail', 'confirm'], properties: {
    userId: { type: 'string' },
    newEmail: { type: 'string' },
    confirm: { type: 'boolean' }
  }},
  run: async ({ userId, newEmail, confirm }: { userId: string; newEmail: string; confirm: boolean }) => {
    if (!confirm) return { dryRun: true, message: 'Set confirm=true to apply change.' };
    const email = sanitizeEmail(newEmail);
    // Perform the update (replace with real DB/API)
    return { ok: true, userId, email };
  }
};

async function callSideEffectTool(userId: string, name: string, args: any) {
  if (!isAllowed(userId, name)) throw new Error('Tool not allowed for this user');
  const tool = name === 'update_user_email' ? update_user_email : null;
  if (!tool) throw new Error('Unknown tool');
  const v = validate(tool.schema as any, args);
  if (!v.ok) throw new Error('Invalid args: ' + String(v.error));
  return await tool.run(args);
}

// Example
async function exampleSideEffect() {
  const resDry = await callSideEffectTool('user_123', 'update_user_email', { userId: 'user_123', newEmail: 'Admin@Example.com ', confirm: false });
  // => { dryRun: true, message: 'Set confirm=true to apply change.' }
  const resLive = await callSideEffectTool('user_123', 'update_user_email', { userId: 'user_123', newEmail: 'admin@example.com', confirm: true });
  // => { ok: true, userId: 'user_123', email: 'admin@example.com' }
  return resLive;
}
      

Side-effect tools: checklist

  • Auth scope: verify user/session permissions and required tokens per call.
  • Audit logging: log who, what, when, args hash, and outcome; retain for reviews.
  • Idempotency: use idempotency keys on writes to avoid duplicates.
  • Dry-run + confirm: require explicit confirmation for risky actions.
  • Rollback plan: record previous state; provide reversible operations where possible.
  • Rate limits: protect expensive or sensitive tools with quotas and backoff.
  • Validation + sanitization: strict schema checks and string sanitization.
  • Observability: metrics for success rate, error classes, p95 latency.

FAQ (direct answers)

Can tools call other tools?

Prefer agent-level orchestration. If a tool composes others, keep boundaries clear and log subcalls for debuggability.

How do I handle long results?

Return compact structured data with IDs and pagination; the agent decides what to display and what to retrieve next.

Further reading

Related Topics

AgentsFunction CallingTool UseSchemasValidationRoutingObservabilityEvaluation

Ready to put this into practice?

Start building your AI pipeline with our visual DAG builder today.