LLM
2025-09-08
I was honored to be invited by Prof. Jiaxing Zhang to give an academic talk titled “Two Dark Clouds Over Agents: Real-time Interaction with Environments, Learning from Experience” at the Lion Rock Artificial Intelligence Laboratory on September 4. Today I’m sharing the slides and video for your reference and discussion.
Talk Materials
- 🎬 Talk Video
- 📖 English Slides
- 📖 Chinese Slides
Talk Summary
In 1900, Lord Kelvin said in a lecture: “The beauty and clearness of our present views … two clouds …” Those two small clouds later triggered the revolutions of relativity and quantum mechanics. Today, the AI Agent field faces similar “two dark clouds.”
The First Cloud: Challenges of Real-time Interaction
Current AI agents face severe latency when interacting with environments in real time:
The predicament of voice interaction
- Serial processing vs. real-time needs: must wait for the user to finish speaking before thinking, and finish thinking before speaking
- The fast/slow thinking dilemma: deep thinking takes 10+ seconds (users lose patience), quick responses are error-prone
- Technical bottlenecks: waiting at every step (VAD detection, ASR recognition, LLM reasoning, TTS synthesis)
The “last mile” problem of GUI operations
- Agents operate computers 3–5× slower than humans
- Each click requires re-screenshotting and thinking (3–4 seconds of latency)
- Moravec’s paradox: the model “knows” what to do but “can’t do it”
2025-07-30
[This article is based on a presentation at the Turing Community’s Large Model Technology Learning Camp, Slides Link]
Explore the design philosophy and practical strategies of AI Agents in depth. From the conversational mode of Chatbots to the action mode of Agents, systematically design and manage the information environment of Agents to build efficient and reliable AI Agent systems.
Table of Contents
- Part 1: Paradigm Shift - From Chatbot to Agent
- Part 2: Core Analysis of Agents
- Part 3: Context Engineering
- Part 4: Memory and Knowledge Systems
Part 1: Paradigm Shift - From Chatbot to Agent
From Chatbot to Agent: A Fundamental Paradigm Shift
We are experiencing a fundamental shift in AI interaction modes:
Chatbot Era
- 🗣️ Conversational Interaction: User asks → AI answers → Repetitive Q&A cycle
- 📚 Knowledgeable Advisor: Can only “speak” but not “act,” passively responding to user needs
- 🛠️ Typical Products: ChatGPT, Claude Chat
Agent Era
- 🎯 Autonomous Action Mode: User sets goals → Agent executes → Autonomous planning and decision-making
- 💪 Capable Assistant: Can both “think” and “act,” proactively discovering and solving problems
- 🚀 Typical Products: Claude Code, Cursor, Manus