AI Infrastructure

The Observability Stack

The Observability Stack

A curated technical publication surface for essays, frameworks, and labs on observability for complex compute systems — from silicon to autonomous agents.

A structured body of work, not a newsletter feed.

Each artifact is meant to be read, cited, or inspected. Published work comes first; research arcs are labeled honestly by maturity.

Core pillars

Four pillars, separated by maturity.

Core practice

Silicon observability

State capture, scan access, and debug platforms for AI/HPC silicon.

Published framework

Compute economics

The Cost of Usable Intelligence — useful output under real constraints.

Emerging arc

Agent observability

Tracing, replay, escalation, and failure taxonomy for autonomous systems.

Emerging arc

Quantum-classical

Accelerated classical control planes for fragile quantum systems.

Featured artifacts

Public work you can read, cite, or inspect.

Frameworks

Reusable models for complex-system observability.

Publication roadmap

Published artifacts first. Future directions second.

Published

Shipped

  • Observability safety primitive (essay)
  • Debuggability for autonomous agents (essay)
  • The Cost of Usable Intelligence (paper)
  • Scan / fault / LCI labs
Planned

Next

  • Agent trace schema
  • Escalation eval suite
  • Agents as infrastructure risk
  • Failure-mode taxonomy