Observability (Loki & Tempo)

Coro agents treat production telemetry as primary evidence. Base runtime guidance explicitly prefers live LogQL / TraceQL when code is ambiguous versus guessing from static reads (/concepts/architecture/, agent runtime .claude/CLAUDE.md).

Three MCP tools power this story:

Tool	Purpose
`loki_query`	Execute LogQL (range queries) via the configured Loki base URL
`tempo_get_trace`	Fetch trace JSON by hex trace id
`tempo_search`	Run TraceQL search templates exposed by Grafana Tempo

They share the tool context clients instantiated at runner bootstrap (createLokiClient, createTempoClient) using in-memory Settings fields.

Configuring backends

There is no dashboard Settings field for Loki or Tempo today. Configure backends on the runner host via environment variables read at bootstrap (packages/runner/src/runner/build-settings.ts):

Variable	Meaning
`LOKI_BASE_URL`	Grafana Loki querier reachable from runner
`LOKI_API_KEY` / `LOKI_USERNAME`	Optional tenancy / auth pairing
`TEMPO_BASE_URL`	Tempo HTTP API gateway
`TEMPO_API_KEY`	Optional bearer/API token

Kubernetes / systemd units should inject these centrally; developers can export locally for debugging.

Codifying conventions: URLs are supplied via env vars on the runner host, but tenant memory (and snippets under memory/snippets/) is the right place for recommended LogQL dashboards, label cardinality tips, and escalation contacts everyone should reuse in queries.

When evaluations call tools

Evaluator and QA-phase agents emphasise verifying acceptance criteria against real traffic whenever possible — e.g., confirm canary rollout before closing the job loop. Teach teams to cite dashboard deep links plus query text inside evaluation artefacts (post_artifact report-md).

Operational hygiene

Rate limits: heavy queries inflate wall-clock phases—wrap explorations behind narrow time windows (5m) first.
Secrets: disallow returning raw PII stacks; summarise counts / exemplar trace ids referencing Tempo indirectly.
Offline fallback: absent env vars, clients advertise available: false responses—agents escalate instead of hallucinating infra.

Skill bundles like observability-additions (layered .claude/skills/) — see /guides/add-skill/.
/guides/byo-mcp/ — if you need Grafana or Datadog-specific MCP adjuncts besides built-ins.

Observability (Loki & Tempo)

Configuring backends

When evaluations call tools

Operational hygiene

Related reading