NVIDIA // AI Hardware DFX & Silicon Observability

Silicon observability for the AI compute stack.

I build and study debug, verification, and observability systems for complex compute, from AI/HPC silicon state capture to frontier AI infrastructure and autonomous systems risk.

Portrait of Aditya Morey
Aditya Morey AI hardware systems engineer building silicon observability and debug platforms.
Cross-team debug tooling AI/HPC silicon systems Scan/JTAG state capture Pre-silicon verification Platform observability

Start here

A focused hub for AI hardware infrastructure.

Publication hub

The Observability Stack

A curated technical publication surface for essays, frameworks, and labs on observability for complex compute systems.

Open Stack

Public artifacts

Working demos, papers, and essays that make the technical thesis inspectable.

Lab

Scan Chain / TAP Visualizer

Interactive visualizer for scan-chain state movement, TAP-style control flow, and state-capture intuition.

View lab
Lab

Fault Injection / Observability Demo

Public demo showing how observability changes fault isolation and root-cause workflow.

View lab
Paper / framework

The Cost of Usable Intelligence

Published framework for reasoning about useful AI output under latency, reliability, safety, and quality constraints.

View on SSRN
Technical essay

Observability is an underbuilt safety primitive

Flagship essay on tracing, replaying, and debugging agent behavior before failures scale.

Read essay

Featured work

Making complex hardware observable.

Selected professional themes, described without employer-confidential details. The strongest professional thread is platform leverage: turning low-level silicon behavior into repeatable workflows engineers can use.

Debug platforms for deterministic state capture

Built and maintain internal scan-debug tooling used by multiple engineering teams to support deterministic state capture and root-cause workflows.

Cross-functional silicon readiness

Work across DFX, design, verification, bring-up, and debug stakeholders to improve observability, readiness, and platform reliability for next-generation AI/HPC silicon programs.

Hardware/software debug boundary

Focus on making complex hardware systems observable, debuggable, and reliable through state capture, test access, and engineering productivity tooling.

Featured research & essays

Three public artifacts anchoring the site thesis.

Paper / framework

The Cost of Usable Intelligence

A framework for measuring the cost of useful AI output under real constraints like latency, reliability, safety, and quality.

Compute economics AI infrastructure Framework
Technical essay

Debuggability for autonomous agents

A public thesis connecting silicon bring-up observability to agent safety, eval infrastructure, replay, escalation checkpoints, and root-cause workflows.

Agent observability AI safety Debug infrastructure
Flagship essay

Observability is an underbuilt safety primitive

A framework for agent debuggability, maturity levels, silicon-debug lessons, escalation evals, and deployment readiness.

Frontier AI systems Agent evals Safety infrastructure

Featured labs

Selected technical demos in silicon observability, failure isolation, and AI infrastructure economics.

Shifting bits through an observable scan path
Live

Scan Chain / TAP Visualizer

Shows how serial debug paths expose internal state for controlled observability.

Launch demo
Pipeline stage lighting up under fault injection
Live

Fault Injection / Observability

Injects controlled failures and shows how instrumentation shortens the path to root cause.

Launch demo
Useful-output curve moving with reliability and latency
Live

LCI Compute Economics Simulator

Explores useful AI cost under latency, reliability, quality, and safety constraints.

Launch demo

Current focus

Core practice and emerging research arcs.

Core practice means direct professional or published work. Emerging research arcs means active study, writing, and future work.

Core practice

Silicon observability and debug platforms

State capture, scan access, JTAG/TDI-based workflows, MBIST context, and platform readiness.

AI infrastructure economics

Useful output cost, latency, reliability, safety constraints, and infrastructure productivity.

Emerging research arcs

Autonomous agent observability and red-team evals

Failure modes, telemetry, containment, auditability, and cyberphysical risk evaluation.

Quantum-classical infrastructure

Control planes, accelerated classical co-processing, observability, and system integration bottlenecks.

Selected writing

Published work only.

Frontier AI Systems Observability is an underbuilt safety primitive for frontier AI systems Frontier AI Systems Debuggability for autonomous agents AI Infrastructure Economics The Cost of Usable Intelligence

Contact

For conversations around AI hardware observability, frontier infrastructure, research evals, or high-leverage technical systems.

Email: adityamorey1723@gmail.com