LLM
2025-09-08
I was honored to be invited by Prof. Zhang Jiaxing to give an academic talk titled “The Two Dark Clouds over Agents: Real‑time Interaction with the Environment, Learning from Experience” at Lion Rock Artificial Intelligence Lab on September 4. Today I’m sharing the slides and video from the talk for your reference and discussion.
📰 Official coverage: 【产研对接】第 2 期 “FAIR plus × 狮子山问道” 成功举办,探索 AI 智能体与全地形具身智能的瓶颈及突破
Talk materials
- 🎬 Talk video
- 📖 Slides in English
- 📖 Slides in Chinese
Talk overview
In 1900, Lord Kelvin said in a speech: “The beauty and clearness of the dynamical theory, which asserts heat and light to be modes of motion, is at present obscured by two clouds…”. These two “small clouds” later triggered the revolutions of relativity and quantum mechanics. Today, the AI Agent field is facing a similar pair of “dark clouds”.
First dark cloud: challenges of real‑time interaction
Current AI Agents suffer from severe latency issues when interacting with the environment in real time:
The dilemma of voice interaction
- Serial processing vs real‑time needs: they must wait for the user to finish speaking before thinking, and finish thinking before speaking
- Fast vs slow thinking: deep thinking needs 10+ seconds (users lose patience), fast responses are prone to errors
- Technical bottlenecks: every step is a wait (VAD detection, ASR recognition, LLM thinking, TTS synthesis)
The “last mile” challenge of GUI operations
- Agents operate computers 3–5× slower than humans
- Every click requires a new screenshot and thinking (3–4 seconds of latency)
- “Moravec’s paradox”: the model “knows” what to do, but “can’t do it” well
2025-07-30
[This article is based on a talk given at Turing Community’s Large Model Tech Study Camp. Slides: Slides link, Download PDF version]
A deep dive into the design philosophy and practical strategies for AI Agents. From the dialogue pattern of chatbots to the action pattern of Agents, we systematically design and manage the information environment of Agents to build efficient and reliable AI Agent systems.
Table of Contents
- Part 1: Paradigm Shift - From Chatbot to Agent
- Part 2: Core Analysis of Agents
- Part 3: Context Engineering
- Part 4: Memory and Knowledge Systems
Part 1: Paradigm Shift - From Chatbot to Agent
From Chatbot to Agent: A Fundamental Paradigm Shift
We are undergoing a fundamental transformation in AI interaction patterns:
Chatbot Era
- 🗣️ Conversational interaction: user asks → AI answers → repeated Q&A loop
- 📚 Knowledgeable advisor: can “talk” but not “act,” passively responding to user needs
- 🛠️ Typical products: ChatGPT, Claude Chat
Agent Era
- 🎯 Autonomous action mode: user sets goal → Agent executes → autonomous planning and decision-making
- 💪 Capable assistant: can both “think” and “do,” actively discovering and solving problems
- 🚀 Typical products: Claude Code, Cursor, Manus