When the creator of one of the world's most sophisticated coding agents reveals his secrets, the entire tech world stops to take notes. Boris Cherny, the head of Claude Code at Anthropic, recently shared his workflow, and the reaction was seismic. It wasn't just a collection of tips; it was a manifesto signaling the end of coding as we know it and the beginning of software engineering managed by an AI "fleet commander."
For the engineering community, this revelation validated a powerful, emerging paradigm: AI is no longer an incremental speed boost for typing. It is an entirely new operating system for labor itself. This workflow, surprisingly simple in setup yet revolutionary in output, suggests that a single, skilled engineer can now operate with the capacity of a mid-sized team. This development forces us to re-evaluate compute strategy, team structures, and the very definition of developer skill.
Traditional software development follows a linear "inner loop": write a bit of code, compile, test, debug, repeat. Cherny completely shatters this model. His approach feels less like programming and more like playing a complex real-time strategy video game, like Starcraft, where the programmer manages autonomous units.
Cherny revealed he runs **five Claude agents simultaneously** in his terminal. This is the heart of the multi-agent revolution. While one AI agent is busy running complex integration tests, another is refactoring old, messy code (a task humans dread), and a third might be drafting detailed technical documentation. By using system notifications (via tools like iTerm2), he ensures he only intervenes when an agent specifically requests input.
This mirrors the emerging concept of **Multi-Agent Systems (MAS)** in AI research. Instead of relying on one generalist model to handle every step, modern systems delegate tasks to specialized 'sub-agents.' This is validated by broader industry interest in frameworks designed for MAS orchestration. For enterprise technology leaders, this proves that the next major productivity leap isn't about building a single, vastly more powerful monolithic AI, but about mastering the art of orchestration—directing an army of competent agents effectively.
Furthermore, Cherny mixes terminal agents with web-based sessions, using a "teleport" command to seamlessly move tasks between local control and the browser interface. This hybrid approach maximizes flexibility, proving that the best workflow utilizes every available interface strategically.
In a world obsessed with low latency—getting code completions back in milliseconds—Cherny makes a counterintuitive choice: he exclusively uses Anthropic’s largest, most thoughtful model, Opus 4.5, even though it is slower than smaller versions like Sonnet.
His logic is profound and carries massive economic implications: The true bottleneck in AI development is not token generation speed; it is human time spent correcting the AI’s errors.
By choosing the "smarter" model, Cherny willingly pays a higher upfront "compute tax" per token generated. However, this investment yields massive returns because the smarter model requires significantly less steering and makes fewer foundational mistakes. The time saved debugging or rewriting subpar code—the dreaded "correction tax"—far outweighs the fractional difference in raw generation speed. For CTOs, this suggests a clear pivot point: stop prioritizing inference speed for complex tasks and start prioritizing reasoning quality. A slightly slower output that is 95% correct is infinitely faster than a super-fast output that is only 70% correct and requires constant human oversight.
This finding aligns with broader discussions in AI circles concerning the effectiveness of large models in complex reasoning. Studies often show that while smaller models offer speed for simple classification, tasks requiring multi-step logic, planning, and deep contextual understanding—like substantial software refactoring—only truly unlock efficiency at the highest tiers of model capability. The engineer becomes an auditor, not a perpetual proofreader.
One persistent frustration with LLMs is their short-term memory. Every new session, even with the same prompt, starts mostly from scratch regarding your company’s unique coding styles, design patterns, and past mistakes. Cherny’s team solved this with radical simplicity: a single, shared file named `CLAUDE.md` committed directly into their version control system (Git).
This file serves as the AI's evolving constitution. Anytime a human spots an error made by Claude, they fix the code and add an explicit instruction or correction rule to `CLAUDE.md`. This transforms the codebase into a self-correcting organism. The longer the team uses this system, the smarter the AI becomes at adhering to specific, proprietary standards.
This pattern strongly suggests the future of enterprise AI integration will heavily rely on specialized RAG (Retrieval-Augmented Generation) techniques that connect the LLM directly to a verified, evolving repository of organizational truth. It moves the AI from being a general-purpose tool to becoming a specialized, domain-aware team member. As one observer noted, "Every mistake becomes a rule."
The final pillar of Cherny's hyper-productivity is the automation of all bureaucratic and repetitive tasks. He doesn't just use the AI to write logic; he uses it to manage the development process itself.
This level of automation shows that the value isn't just in generating novel code but in eliminating the "glue work" that consumes developer bandwidth. This shift towards specialized agents echoes research into building robust AI frameworks where different modules handle planning, execution, and verification independently.
Perhaps the most crucial unlock, and likely the source of Claude Code’s rapid reported revenue growth, is the **verification loop**. An AI that can write code is valuable; an AI that can test its own code and confirm the user experience is excellent is transformative.
Cherny confirmed that Claude tests every change it lands, often using the Claude Chrome extension to automate browser actions, run UI tests, and iterate until the result meets both functional and aesthetic standards. This creates a closed-loop quality assurance process handled entirely by the AI.
This move from *code generation* to code validation is the real game-changer. It dramatically reduces the typical friction point between AI output and production readiness. When the AI proves its own work, the human engineer is elevated to a system designer and validator, drastically increasing throughput by a factor of two or three, as Cherny suggests.
The collective astonishment from Silicon Valley is not just about a clever hack; it's about recognizing a fundamental reorganization of work. For years, AI coding tools offered better autocomplete—faster typing. Cherny’s workflow repositions the technology as a genuine management layer, an Operating System for Labor.
The implication for the future of AI is clear: the battleground is shifting from model raw intelligence benchmarks to orchestration frameworks. Who can build the most efficient system for coordinating multiple LLM instances, feeding them institutional knowledge, and giving them the tools (like web automation or bash access) to verify their output?
The programmers who embrace this "fleet commander" mindset—stopping the linear typing and starting the complex command structure—won't just be slightly faster; they will fundamentally be playing a different game. They will be leveraging tools that multiply their output by five, leaving behind those still treating AI as merely a slightly better autocomplete assistant.
This is not the future arriving slowly; it is a live demonstration of an exponential productivity curve already underway, driven by superior workflow architecture rather than just raw model size.
Source context derived from analysis of the workflow shared by Boris Cherny regarding Claude Code productivity.