The Big Idea
In June 2025, Shopify CEO Tobi Lutke tweeted a definition that instantly went viral: context engineering is "the art of providing all the context for the task to be plausibly solvable by the LLM." Two days later, Andrej Karpathy amplified it - calling context engineering "the delicate art and science of filling the context window with just the right information for the next step." Within weeks, the term replaced prompt engineering in practitioner discourse. The shift is not cosmetic. Prompt engineering optimized a single input string. Context engineering designs entire dynamic systems that assemble instructions, retrieved knowledge, tool outputs, conversation history, and persistent memory into the context window at runtime. As AI moves from chatbots to autonomous agents, the bottleneck is no longer what you ask - it is what information is available when you ask it.
Before vs After
Prompt engineering treated the LLM interaction as a writing exercise - craft the perfect sentence, add the right examples, iterate on phrasing. Context engineering treats it as systems engineering - design the infrastructure that dynamically populates the context window with everything the model needs to succeed.
Prompt Engineering (2022-2024)
- Craft a clever single query or instruction
- Static text optimization - same prompt every time
- Focus on phrasing, word choice, few-shot examples
- One interaction at a time, no state
- "Find the magic sentence that makes GPT do the thing"
- Skills needed: writing, trial-and-error
Context Engineering (2025+)
- Design dynamic systems that populate the context window
- Runtime assembly - different context per task, user, state
- Focus on RAG, tools, memory, state management, compression
- Manage full agent lifecycle across multiple turns
- "Build the operating system that feeds the LLM the right data"
- Skills needed: systems design, retrieval, infrastructure
How It Works
Simon Willison framed the naming problem clearly: "prompt engineering" acquired an unfortunate inferred definition - "typing into a chatbot." The real work was always more complex, but the name undersold it. "Context engineering" sticks because its inferred meaning matches the actual complexity of what practitioners do. Philipp Schmid from Hugging Face formalized the definition: context engineering is "the discipline of designing and building dynamic systems that provide the right information and tools, in the right format, at the right time."
LangChain formalized four core strategies in their July 2025 framework. Write - persist information outside the active context (scratchpads, long-term memory stores). Select - strategically retrieve only what is relevant (semantic search over tools improved selection accuracy 3x). Compress - retain only essential tokens (Claude Code auto-compacts after 95% window usage). Isolate - split work across separate context windows via multi-agent architectures, though Anthropic found this can consume up to 15x more tokens than single-agent approaches.
The O'Reilly analogy from Addy Osmani captures it precisely: treat the LLM as a CPU and its context window as RAM. The context engineer functions as an operating system - loading the right programs, managing memory, scheduling I/O. A CPU with the wrong data in RAM produces garbage regardless of its processing power. Same with LLMs.
Key Findings
- Semantic search over tool descriptions improved selection accuracy 3x. When agents have access to dozens of tools, simply listing them all in the prompt fails. Context engineering selects and surfaces only relevant tool definitions per task - a retrieval problem, not a prompting problem.
- Multi-agent architectures consume up to 15x more tokens. Anthropic found that isolating context across multiple agents dramatically increases cost. The tradeoff: better focus per agent vs. massive token overhead. Context engineers must decide when isolation is worth the cost.
- Claude Code auto-compacts at 95% context utilization. When the context window fills, the system triggers recursive summarization of older messages. This is context engineering in production - managing finite memory under real constraints, not crafting a better opening line.
- Anthropic's multi-agent researcher persists plans to memory at 200K+ tokens. Before the context window truncates, the system explicitly writes its working plan to external storage. This is the Write strategy - engineering around context window limitations rather than hoping the prompt is good enough.
- Four failure modes identified by Drew Breunig: Context Poisoning (hallucinations entering the window), Context Distraction (too much context overwhelming the training signal), Context Confusion (superfluous content influencing outputs), and Context Clash (conflicting information within the same window).
Why This Matters for AI and Automation Practitioners
If you are building AI agents, RAG pipelines, or any system where an LLM does real work - this shift redefines your job description. You are no longer a prompt writer. You are a context architect. The difference between a demo that impresses and a system that works in production is almost never the prompt. It is whether the model had access to the right customer data, the right tool definitions, the right conversation history, and the right constraints when it generated its response.
For automation practitioners specifically, context engineering is the bridge between "AI chatbot" and "AI that actually does things." Every n8n workflow that feeds data into an LLM node, every RAG pipeline that retrieves documents before generation, every agent that calls tools - these are all context engineering. The discipline gives a name and framework to what practitioners were already doing, and provides systematic patterns (Write, Select, Compress, Isolate) for doing it better.
My Take
The naming shift matters more than it looks. "Prompt engineering" attracted writers and marketers. "Context engineering" attracts systems engineers and infrastructure builders. The second group is who actually ships production AI systems. The term change filters the talent pipeline toward the right skill set - people who think in pipelines, retrieval strategies, and memory management rather than linguistic tricks.
That said, prompting skill does not disappear inside context engineering - it becomes one component among seven. The system prompt still matters. Few-shot examples still help. But they are 15% of the problem now, not 90%. The other 85% is infrastructure: what gets retrieved, when it gets retrieved, how it gets compressed, and whether the model has the right tools available at the moment of generation. If you are still spending most of your time iterating on prompt wording rather than building retrieval pipelines and memory systems, you are optimizing the wrong layer.
The practitioners who will excel in this paradigm are those who already think in systems - backend engineers, data engineers, platform builders. They have been building the plumbing that context engineering requires for decades. The domain knowledge transfers directly. What is new is that the "application" running on top of that plumbing is now an LLM rather than a deterministic program.
Discussion question: Context engineering requires infrastructure (vector stores, memory systems, tool registries, orchestration layers) that prompt engineering never needed. At what point does the infrastructure overhead of proper context engineering exceed the value it delivers - and how do you decide when a simple, well-crafted prompt is still the right answer?