Research

AI infrastructure, observability, and compute systems research.

Research notes, papers, and frameworks on AI infrastructure, silicon observability, autonomous systems, compute economics, and quantum-classical systems.

The Observability Stack Read flagship essay

Featured series

Observability for Frontier AI Systems

This series translates lessons from silicon bring-up into frontier AI infrastructure: if complex systems cannot expose state, localize failures, and produce trustworthy traces, they cannot be operated safely at scale.

Silicon bring-up Agent telemetry Eval infrastructure

Published Debuggability for autonomous agents: what AI safety can learn from silicon bring-up
Published Observability is an underbuilt safety primitive for frontier AI systems
Framework AI agents as infrastructure risk
Research note A taxonomy of autonomous agent failure modes
Lab-backed Building an Eval Harness for Cyberphysical AI Risk

Paper · 2026 · Framework

The Cost of Usable Intelligence

A framework for measuring the cost of useful AI output under constraints like latency, reliability, safety, and quality.

Series · Active · Framework

Observability for Frontier AI Systems

A series translating lessons from silicon bring-up into agent telemetry, safety instrumentation, and eval infrastructure.

Essay · Published · 12 min

Observability is an underbuilt safety primitive for frontier AI systems

Agent observability stack, maturity model, silicon-to-agent mapping, and operational eval agenda.

Essay · Planned · 9 min

A taxonomy of autonomous agent failure modes

A public taxonomy for failures that emerge when agents gain tools, memory, planning loops, and cyberphysical reach.

Lab memo · Planned

Building an Eval Harness for Cyberphysical AI Risk

A proposal for observable, bounded, and reproducible tests of agent actions that touch real systems.

Memo · Draft

Quantum computing needs an accelerated classical control plane

Why quantum systems will bottleneck on classical orchestration, observability, latency, and platform integration.

Essay · Published · 10 min

Debuggability for autonomous agents

What AI safety can learn from silicon bring-up: state capture, replay, failure localization, and root-cause discipline.