Observability (Loki & Tempo)
Coro agents treat production telemetry as primary evidence. Base runtime guidance explicitly prefers live LogQL / TraceQL when code is ambiguous versus guessing from static reads (/concepts/architecture/, agent runtime .claude/CLAUDE.md).
Three MCP tools power this story:
| Tool | Purpose |
|---|---|
loki_query | Execute LogQL (range queries) via the configured Loki base URL |
tempo_get_trace | Fetch trace JSON by hex trace id |
tempo_search | Run TraceQL search templates exposed by Grafana Tempo |
They share the tool context clients instantiated at runner bootstrap (createLokiClient, createTempoClient) using in-memory Settings fields.
Configuring backends
There is no dashboard Settings field for Loki or Tempo today. Configure backends on the runner host via environment variables read at bootstrap (packages/runner/src/runner/build-settings.ts):
| Variable | Meaning |
|---|---|
LOKI_BASE_URL | Grafana Loki querier reachable from runner |
LOKI_API_KEY / LOKI_USERNAME | Optional tenancy / auth pairing |
TEMPO_BASE_URL | Tempo HTTP API gateway |
TEMPO_API_KEY | Optional bearer/API token |
Kubernetes / systemd units should inject these centrally; developers can export locally for debugging.
Codifying conventions: URLs are supplied via env vars on the runner host, but tenant memory (and snippets under memory/snippets/) is the right place for recommended LogQL dashboards, label cardinality tips, and escalation contacts everyone should reuse in queries.
When evaluations call tools
Evaluator and QA-phase agents emphasise verifying acceptance criteria against real traffic whenever possible — e.g., confirm canary rollout before closing the job loop. Teach teams to cite dashboard deep links plus query text inside evaluation artefacts (post_artifact report-md).
Operational hygiene
- Rate limits: heavy queries inflate wall-clock phases—wrap explorations behind narrow time windows (
5m) first. - Secrets: disallow returning raw PII stacks; summarise counts / exemplar trace ids referencing Tempo indirectly.
- Offline fallback: absent env vars, clients advertise
available: falseresponses—agents escalate instead of hallucinating infra.
Related reading
- Skill bundles like
observability-additions(layered.claude/skills/) — see /guides/add-skill/. - /guides/byo-mcp/ — if you need Grafana or Datadog-specific MCP adjuncts besides built-ins.