The Production AI Reading List

The library

The Production AI Reading List

A living, curated reading list for engineers shipping AI to production. Organized by pillar. New entries added as we read them.

8 entries · curated by The AI Runtime · last updated May 10, 2026

From the publication

Latest from The AI Runtime

Fresh issues from our own Substack — auto-pulled from the feed. See all issues →

Issue The AI Runtime · May 10, 2026

The Agent Runtime, Unbundled: A Reference Architecture Built on OpenClaw

OpenClaw isn’t a product to adopt. It’s a reference architecture to decompose. Five primitives, three production-grade use cases that earn real revenue, and a harness audit checklist for anyone build

Issue The AI Runtime · May 9, 2026

Auctor’s Bet: Traceability Is the Architecture, Not a Feature

Enterprise software only creates value when it’s actually deployed, and deployment is overwhelmingly a labor problem, not a software problem.

Issue The AI Runtime · May 8, 2026

A Portfolio That Practices MRE

Vishnu Purohitham’s four shipped projects are a worked example of Model Reliability Engineering — and a soft hit on most of the AIfolio.

Issue The AI Runtime · May 7, 2026

Three Weeks of Opus 4.7 in Production: What Teams Are Actually Reporting

The launch numbers were one story. The production patterns are a different one.

Issue The AI Runtime · May 6, 2026

Inside Mintlify’s Agent Stack

A teardown of the two-harness architecture — async sandboxes for writes, virtual filesystems for reads — and what it teaches about wrapping a model in production.

Issue The AI Runtime · May 2, 2026

How Vertical Agents Self-Improve in Production

Field notes on the harness loop at Harvey, Hippocratic, Anterior, and Azure SRE — where production failures compound into skill without retraining the model.

Start here

Highlights

agents ★ Anthropic · Erik Schluntz, Barry Zhang · Dec 2024

Anthropic's reference taxonomy of agent patterns — when to build a workflow vs. an agent, what each pattern is good for, and the failure modes of premature autonomy. The starting point for any production agent conversation.

anthropic.com ↗

agents ★ Eugene Yan · Jul 2023

Patterns for Building LLM-based Systems & Products

Seven production patterns — evals, RAG, fine-tuning, caching, guardrails, defensive UX, collect feedback — with cites and rationale. The reference Eugene's been updating since 2023; the canonical "what does ship-to-prod look like" map.

eugeneyan.com ↗

evals ★ Hamel Husain · May 2024

A Field Guide to Rapidly Improving AI Products

How to actually improve an AI product after the demo: build evals from real failures, cluster errors, iterate the prompts and the data. The closest thing the field has to a playbook for "we shipped, it kind of works, now what."

hamel.dev ↗

Evals & Observability

evals ★ Hamel Husain · May 2024

A Field Guide to Rapidly Improving AI Products

How to actually improve an AI product after the demo: build evals from real failures, cluster errors, iterate the prompts and the data. The closest thing the field has to a playbook for "we shipped, it kind of works, now what."

hamel.dev ↗

Agents in Production

agents ★ Anthropic · Erik Schluntz, Barry Zhang · Dec 2024

Building Effective Agents

Anthropic's reference taxonomy of agent patterns — when to build a workflow vs. an agent, what each pattern is good for, and the failure modes of premature autonomy. The starting point for any production agent conversation.

anthropic.com ↗

agents ★ Eugene Yan · Jul 2023

Patterns for Building LLM-based Systems & Products

Seven production patterns — evals, RAG, fine-tuning, caching, guardrails, defensive UX, collect feedback — with cites and rationale. The reference Eugene's been updating since 2023; the canonical "what does ship-to-prod look like" map.

eugeneyan.com ↗

agents Anthropic · Sep 2025

Effective Context Engineering for AI Agents

Practical guide to designing the context that agents see — how to build inputs that survive long-horizon tool use without devolving into prompt-stuffing. Closes the gap between "this works in the playground" and "this works in production with real users."

anthropic.com ↗

agents Lil'Log · Lilian Weng · Jun 2023

LLM-Powered Autonomous Agents

The canonical survey of agent design — planning, memory, tool use, reflection. Older than most "agent" posts you'll see but still the reference frame everyone draws from. Read this if you've ever wondered why your agent loops forever.

lilianweng.github.io ↗

agents Anthropic · Nov 2024

Introducing the Model Context Protocol

MCP — the open protocol Anthropic shipped for connecting agents to tools and data sources. Read this before designing any agent that needs to act on systems it doesn't own. Relevant tonight: how agents authenticate and enumerate capabilities at endpoints they've never seen.

anthropic.com ↗

Inference & Serving

inference Chip Huyen · Apr 2023

Building LLM Applications for Production

Chip Huyen's early but durable map of the production LLM stack — what changes when you move from a notebook to real users. Cost, latency, hallucination, evals, drift. The starting reading for anyone moving from research to product.

huyenchip.com ↗

inference Simon Willison · Dec 2024

Things we learned about LLMs in 2024

Simon's yearly LLM survey — the most read 30 minutes you can spend before designing anything new. Tools matured. Agents didn't (yet). Per-token costs cratered. Required reading for grounding intuition.

simonwillison.net ↗

Subscribe

Get the Reading Roundup

Every few weeks, we ship a Reading Roundup issue — the best new items added to the library, with editorial notes. No vendor pitches, no idea-stage tourists.