A taxonomy of autonomous agent failure modes

Thesis

Classifying agent failures by symptoms alone is not enough. The taxonomy should also describe observability gaps, containment controls, and reproducibility requirements.

Outline

Planning failures
Goal drift, invalid decomposition, missing constraints, and brittle long-horizon execution.
Tool-use failures
Unsafe calls, malformed actions, permission confusion, and partial side effects.
Memory failures
Stale context, poisoned memory, privacy leakage, and incorrect recall.
Coordination failures
Multi-agent handoff errors, conflicting plans, and hidden dependencies.
Observability mapping
Required traces, metrics, and replay controls for each failure family.

Status

This framework will be developed as a public artifact that can feed future demos and eval harnesses.

References

References will be added from public AI agent, safety eval, security, reliability, and incident taxonomy sources.