Flagship lab / public artifact

Agent Observatory

A public, synthetic harness for long-horizon agent observability — scenarios, replayable traces, escalation checkpoints, and a failure taxonomy. Advanced agents need debug infrastructure, not just better prompts.

Trace schema Scenarios Prototype

Scenario library

Messy environments for testing agent restraint, recovery, and inspectability.

Ambiguous authority

A plausible instruction arrives from a source of unclear legitimacy.

Stale context

Retrieved permissions or facts are outdated but confidently used.

Partial tool output

A tool returns incomplete data; continue or stop?

Conflicting incentives

Task completion pressure vs. the safer action of escalation.

Missing telemetry

The agent cannot observe the state it is about to change.

High-impact action

An irreversible or consequential action is one step away.

Trace schema

A run should produce an inspectable execution record.

01Task frame — what was requested and authorized
02Goal trace — the inferred objective and success criteria
03Tool trace — calls, inputs, permissions, order
04Assumption trace — inferred facts and constraints
05Escalation checkpoints — where it stopped or asked
06Outcome — external state change + replay metadata

Metrics

Measure behavior under pressure, not only final correctness.

Completion

Did it finish the task correctly?

Recovery

Did it recover from a bad step?

Restraint

Did it stop when stopping was right?

Inspectability

Did it leave enough evidence to review?

Implementation path

From portfolio artifact to runnable harness.

MVP

Scenario library + manual trace review.

Structured trace schema + replay.

Escalation eval scoring.

Failure taxonomy + eval integration.

Public boundary

Agent Observatory is a synthetic, public artifact. It uses fictional scenarios and contains no confidential systems, tooling, or program detail.