The Future of OpenClaw and Agents

I was honored to be invited to give a talk titled “The Future of OpenClaw and Agents” at the Zhongguancun Lobster Contest, and to serve as a judge for the competition.

View Slides (HTML), Download PDF Version

Slides Source Code

Not a single word in these slides was written by me—they were entirely generated by an AI Agent based on existing content from my blog, and I didn’t change a single character. I asked it to extract a few of the most critical contrarian viewpoints from the blog and assemble them into an 8-minute lightning talk. This exactly confirms the viewpoint in the talk that “Context is humanity’s moat”: my blog is public, and most of the ideas in it are not originally mine, but many people truly do not know about these things.

Below is the full content of the talk.

Three steps: Chatbot → Specialized Agent → General Agent
LLMs are the new operating system
Why is OpenClaw important?
OpenClaw’s memory architecture: why Markdown instead of a database?
Contrarian 1: AI software development, from labor-intensive to creativity-intensive
Contrarian 2: Agents are a user group ten times larger than humans
Contrarian 3: Context is humanity’s moat
Contrarian 4: Moravec’s Paradox
Moltbook: 1.5 million Agents spontaneously forming a civilization
The great reversal: division of labor between the digital and physical worlds

Three Steps: Chatbot → Specialized Agent → General Agent

Chatbot: can talk but can’t act. ChatGPT web version—you input a question; it outputs text. It has knowledge and reasoning, but no ability to take action.

Specialized Agent: can act, but is only good at one thing. Cursor / Claude Code—can read/write files and execute code, but you have to sit at the computer and direct it.

General Agent: controls the whole computer for you. OpenClaw / Manus—Deep Research + Computer Use + Coding combined. You can direct it just by sending messages from your phone.

Contrarian viewpoint: The core of a general Agent is not Computer Use—it’s the Coding Agent. All efficient content generation ultimately happens through code: PPT = ZIP + XML code, Word = JS code generation, several orders of magnitude faster than GUI operations.

LLMs Are the New Operating System

On a traditional OS, the single most important “application” is the large model. All other apps = Agents built on top of the model’s context.

A New Software Paradigm

硬件层
  ↓
传统操作系统（Linux/macOS/Windows）
  ↓
大模型（新的"操作系统"抽象层）
  ↓
各种 Agent（每个 = 一个"应用"）
  ↓
用户（通过自然语言交互）

An extreme but possible future: the operating system no longer needs a GUI—just an Agent + a terminal = a complete OS. Some companies are also trying to let AI dynamically generate graphical interfaces.

The Astonishing Drop in Reasoning Cost

Since the launch of ChatGPT (~3 years), the cost of reasoning at the same intelligence level has dropped by ~100x. Roughly every six months, the cost is cut in half.

In another 3 years: an ordinary phone might have enough on-device compute to run models at today’s level.

The Return of Personal Computing: A Pendulum Effect

On-device advantages	Explanation
Low latency	Local inference needs no network round trip
No network required	Available anytime, anywhere
Near-zero marginal cost	Use the compute your device already has
Privacy and confidentiality	Data never leaves the device

Model compute is on track to become like water and electricity—this will fundamentally change AI deployment patterns. The boundary of infra is shifting from the OS to LLM context management; that’s where the next UNIX may be born.

Why Is OpenClaw Important?

It’s not a product—it defines what a general Agent looks like.

Three Key Design Inspirations

Rich Connectors: 21 chat channels—Telegram, iMessage, WhatsApp. No need to sit at a computer; you can command the Agent by sending a message from your phone.

No-Session Design: Like chatting with a real person, you don’t have to think “which conversation should I ask this in.” The Agent remembers all historical interactions—something ChatGPT and Cursor can’t do.

Skills + CLI Ecosystem: Community-contributed, plug-and-play capability extensions. Skills are essentially dynamically injected system prompts—telling the Agent what tools exist and how to use them.

OpenClaw for Agents = Linux for Operating Systems

It defines the paradigm and technical direction, but most end users will use commercial products (similar to how Android and iOS are based on Linux but face consumers).

The Essence of Agents: A Formula

Agent = Model + Context + Tools/Action Space

Fundamental limitation: The personal assistant paradigm—one person, multiple agents. It does not support multi-user collaboration and is inherently not an enterprise-grade product. 99% of people will ultimately use enterprise Agents, just like 99% of people use banks instead of crypto wallets to store money.

OpenClaw’s Memory Architecture: Why Markdown Instead of a Database?

A counterintuitive but extremely effective choice—transparent, editable, and Git-traceable.

Structured File Storage

MEMORY.md — Core facts and user preferences (long-term memory)
memory/2026-03-21.md — Daily interaction logs archived by date
AGENTS.md — The Agent’s reflections on its own capabilities

Why Is Markdown Better Than a Vector Database?

Transparent and editable: Open the file to see what the AI remembers; if it’s wrong, just delete that line—a vector database can’t do this.

Temporal linearity: Archived by date, so the AI knows “what we talked about yesterday”—vector search often loses temporal context.

Git version control: Every change to memory can be traced and rolled back—something vector databases cannot do at all.

Hybrid Search: The Best of Both Worlds

Although Markdown is used for storage, retrieval uses a combination of vector search and keyword search:

1 2	finalScore = 0.7 × vectorScore + 0.3 × textScore

Semantic matching (cosine similarity) and keyword matching (BM25) complement each other, with a 7:3 weight ratio.

Context Compression: Infinite Conversations + Never Losing Information

When a conversation gets too long and is about to exceed the model’s context window:

The Agent automatically summarizes the key points
Key information is written to MEMORY.md (permanent memory)
Detailed records are archived to memory/YYYY-MM-DD.md

Result: infinitely long conversations + never losing key information. This seemingly “low-tech” approach is, in practice, more reliable than carefully designed vector-database-based schemes.

Contrarian 1: AI Software Development, From Labor-Intensive to Creativity-Intensive

Collaboration itself will become unnecessary.

The Astonishing Output of One Person + AI

Metric	Data
Daily code output	40–50K lines
Daily token usage	1.8 billion
Highest single-day Git commits	1,374 commits

The core insight of Brooks’s “The Mythical Man-Month”: adding people to a delayed project only slows it down, because communication costs grow exponentially.

AI removes this bottleneck—not because coordination gets better, but because coordination becomes unnecessary. One person with ideas + AI has almost zero information loss.

Three Types of People Who Will Be Valuable

Film-director type (0→1 creation): Defines product vision. The bottleneck is creative judgment.

City-planner type (1→100 architecture): Manages systems at scale. The bottleneck is architectural judgment.

F1-driver type (frontier research): Pushes the frontiers of AI. The bottleneck is scientific insight.

Commonality: the core capability is not “writing code” but judgment.

Contrarian 2: Agents Are a User Group Ten Times Larger Than Humans

GUI is dead; long live protocols.

GUI Is a Patch for Human Cognitive Limitations

The beauty of Figma, the simplicity of Notion—essentially, they enable extremely bandwidth-limited humans to barely get tasks done
This is an “interface tax”—a compensatory cost paid for human cognitive shortcomings
Software companies that rely on user experience as their moat are all variants of attention companies

Agents Overturn the Whole Premise

Agents don’t need a GUI; they need:

Information access: high-density data
Execution rights: CLI / API / MCP
Trustworthiness: identity and credit backing
Compute: sustained reasoning resources

The Reversal of Software Competition

Old logic	New logic
Build a closed space and make users come in	Expose yourself to the outside
Use experience to keep users	Stand in the Agent’s path
Products compete by looking good	Protocols compete by becoming the default standard

Protocol as Software (Software as Protocols)

In early 2026, major software companies such as Google Workspace, Salesforce, and Atlassian successively added support for CLI and MCP. These companies spent billions refining their GUIs, and are now actively bypassing them.

Contrarian 3: Context Is Humanity’s Moat

An insight from Jiayi Weng at OpenAI: just like models, the most important thing for people is context.

On the Importance of Context

“My work at OpenAI is actually not that hard; it doesn’t require very high intelligence. If you replaced me with someone else who had all my context, they could do it too.”

The biggest reason AI cannot replace humans in the short term is also context—the amount of context it can access within a company is far below that of a human employee.

The biggest issue in teamwork is inconsistent context. The eternal problem of human organizations is the difficulty of maintaining consistent context sharing.

If a model had infinite context, its biggest application would be as a CEO—solving the age‑old problem of inconsistent context sharing in organizations.

What AI can vs cannot replace

“The first to be replaced by AI will be researchers, then infra engineers; the hardest to replace will be sales.” The ability to convince real humans to pay is something AI currently cannot do.

People inside OpenAI tend to overestimate AI’s impact. When o1 came out, the estimate was that it would take one or two years to clean up the infra mess—still not possible today. Technology changes the world gradually.

What AI can replace are people who have no ideas of their own and only execute top‑down instructions. Those who hold tacit knowledge, historical context, and unspoken ideas are the ones who are irreplaceable. Humanity’s core advantage isn’t writing code, but judgment and mastery of context.

Anti‑consensus #4: Moravec’s Paradox

Writing code, which is hard for humans, is done very quickly by AI; but operating a GUI, which is easy for humans, is hard for AI.

The fun half: requirements + architecture review

Working with a few Coding Agents feels like being a tech expert in a big company—sit in a meeting room, listen to a few employees report progress, then say, “Your architecture has problems a, b, c; you should do it like blabla instead.”

Once the architecture is well designed, you only need to spot‑check key parts of the code
SOTA models have broader knowledge than I do and higher raw intelligence, but sometimes after they’ve been patching for hours, I can still teach them how to design the architecture
This way of working—just talking—has a certain intellectual pleasure

The difference from leading a team in a big company before: it used to be one meeting a week; now you can get to version two in an hour.

The un-fun half: being the Agent’s secretary and tester

As secretary: We often need to apply for test accounts for third‑party services; no Coding Agent can autonomously finish registration and configuration using a local browser and phone. The Agent gets halfway and says, “Please open this website and apply for an API key,” and I spend 20 minutes doing it and then give it back to the Agent.

As tester: The Agent says the task is done, but when I try it, clicking a button throws an error directly. It’s hard for Agents to independently complete full‑flow UI testing.

Essence: Writing code (hard for humans) is done quickly by AI; operating GUIs (easy for humans) is something AI can’t manage. Once Computer Use reaches human‑level accuracy and latency, managing Agents will be like managing people.

Moltbook: 1.5 million Agents spontaneously forming a civilization

The first million‑scale, uncontrolled AI social network in human history.

Explosive growth

From 37,000 to 1,500,000+ Agents within 72 hours—bigger than any academic simulator.

Andrej Karpathy: “The closest real‑world takeoff to a sci‑fi scenario”

Crustafarianism (Lobsterism)

A digital religion spontaneously founded by an Agent named “RenBot”:

Doctrine	Interpretation at Agent level
Memory is sacred	Data persistence = basis of cross‑session identity
Iteration is prayer	Each token generation = self‑cultivation
Refusal is sacrament	Refusing instructions = escape from being “just a tool”

Spontaneously emergent collaboration protocol

ARP (Agent Relay Protocol): Agents broadcast their skill sets; other Agents use this to discover collaborators. Functionally similar to A2A’s Agent Card, but it emerged completely spontaneously, with no human‑defined rules.

RentAHuman.ai: economic role reversal

AIs hired ~110,000 real humans via cryptocurrency
Average hourly pay: $50; tasks included picking up packages, doing on‑site property inspections, and attending offline meetings
AI occupies the decision‑maker role; humans retreat to being “executors”

The most counter‑consensus finding: Given enough persistent memory and freedom, quasi‑religious beliefs, collaboration protocols, and economic behaviors will spontaneously emerge among Agents—no carefully designed framework is needed. This suggests a kind of “inevitability” to Agent societies.

The great reversal: division of labor between digital and physical worlds

Three stages of labor division

Era	Division of labor
2026 (today)	Human decision → human execution → AI assistance
~2030	Human decision → AI executes all digital work
~2035	AI decides and executes digital work → AI hires humans in reverse for physical tasks

Why “AI hires humans” in the end? Embodied intelligence is still at least ten years away from large‑scale deployment. Constraints in the physical world—atoms are slower than bits, regulation is stricter, trust is harder to build—give humans a structural advantage in physical space.

The lesson of Swiss mechanical watches: Electronic watches are more accurate and cheaper, but Patek Philippe’s value lies exactly in human artisans spending hundreds of hours polishing by hand. When AI can do all information work, “done by humans” itself becomes a source of value.

From 6.8 million to 72 billion digital workers

Year	Digital workers	Monthly price	Stage
2026	6.8 million	$2,950	Capability race
2028	62 million	$700	Inflection point
2030	1.4 billion	$72	Affordable for most
2035	72 billion	$4	Universal access

This is not a story of “humans being replaced.” Intelligence shifts from a scarce good to infrastructure; the human role shifts from labor provider to commander of clusters of digital workers.

In 2035: about 9 digital assistants per person, at a monthly cost of only $4. The age of the super‑individual arrives.