The library

The Production AI Reading List

A living, curated reading list for engineers shipping AI to production. Organized by pillar. New entries added as we read them.

8 entries · curated by The AI Runtime · last updated May 10, 2026

From the publication

Latest from The AI Runtime

Fresh issues from our own Substack — auto-pulled from the feed. See all issues →

Issue The AI Runtime · May 8, 2026

A Portfolio That Practices MRE

Vishnu Purohitham’s four shipped projects are a worked example of Model Reliability Engineering — and a soft hit on most of the AIfolio.

Issue The AI Runtime · May 6, 2026

Inside Mintlify’s Agent Stack

A teardown of the two-harness architecture — async sandboxes for writes, virtual filesystems for reads — and what it teaches about wrapping a model in production.

Issue The AI Runtime · May 2, 2026

How Vertical Agents Self-Improve in Production

Field notes on the harness loop at Harvey, Hippocratic, Anterior, and Azure SRE — where production failures compound into skill without retraining the model.

Start here

Highlights

agents Anthropic · Erik Schluntz, Barry Zhang · Dec 2024

Building Effective Agents

Anthropic's reference taxonomy of agent patterns — when to build a workflow vs. an agent, what each pattern is good for, and the failure modes of premature autonomy. The starting point for any production agent conversation.

agents Eugene Yan · Jul 2023

Patterns for Building LLM-based Systems & Products

Seven production patterns — evals, RAG, fine-tuning, caching, guardrails, defensive UX, collect feedback — with cites and rationale. The reference Eugene's been updating since 2023; the canonical "what does ship-to-prod look like" map.

evals Hamel Husain · May 2024

A Field Guide to Rapidly Improving AI Products

How to actually improve an AI product after the demo: build evals from real failures, cluster errors, iterate the prompts and the data. The closest thing the field has to a playbook for "we shipped, it kind of works, now what."

Evals & Observability
evals Hamel Husain · May 2024

A Field Guide to Rapidly Improving AI Products

How to actually improve an AI product after the demo: build evals from real failures, cluster errors, iterate the prompts and the data. The closest thing the field has to a playbook for "we shipped, it kind of works, now what."

Agents in Production
agents Anthropic · Erik Schluntz, Barry Zhang · Dec 2024

Building Effective Agents

Anthropic's reference taxonomy of agent patterns — when to build a workflow vs. an agent, what each pattern is good for, and the failure modes of premature autonomy. The starting point for any production agent conversation.

agents Eugene Yan · Jul 2023

Patterns for Building LLM-based Systems & Products

Seven production patterns — evals, RAG, fine-tuning, caching, guardrails, defensive UX, collect feedback — with cites and rationale. The reference Eugene's been updating since 2023; the canonical "what does ship-to-prod look like" map.

agents Anthropic · Sep 2025

Effective Context Engineering for AI Agents

Practical guide to designing the context that agents see — how to build inputs that survive long-horizon tool use without devolving into prompt-stuffing. Closes the gap between "this works in the playground" and "this works in production with real users."

agents Lil'Log · Lilian Weng · Jun 2023

LLM-Powered Autonomous Agents

The canonical survey of agent design — planning, memory, tool use, reflection. Older than most "agent" posts you'll see but still the reference frame everyone draws from. Read this if you've ever wondered why your agent loops forever.

agents Anthropic · Nov 2024

Introducing the Model Context Protocol

MCP — the open protocol Anthropic shipped for connecting agents to tools and data sources. Read this before designing any agent that needs to act on systems it doesn't own. Relevant tonight: how agents authenticate and enumerate capabilities at endpoints they've never seen.

Inference & Serving
inference Chip Huyen · Apr 2023

Building LLM Applications for Production

Chip Huyen's early but durable map of the production LLM stack — what changes when you move from a notebook to real users. Cost, latency, hallucination, evals, drift. The starting reading for anyone moving from research to product.

inference Simon Willison · Dec 2024

Things we learned about LLMs in 2024

Simon's yearly LLM survey — the most read 30 minutes you can spend before designing anything new. Tools matured. Agents didn't (yet). Per-token costs cratered. Required reading for grounding intuition.

Subscribe

Get the Reading Roundup

Every few weeks, we ship a Reading Roundup issue — the best new items added to the library, with editorial notes. No vendor pitches, no idea-stage tourists.