The Two Dark Clouds over Agents: Real‑time Interaction with the Environment, Learning from Experience
I was honored to be invited by Prof. Zhang Jiaxing to give an academic talk titled “The Two Dark Clouds over Agents: Real‑time Interaction with the Environment, Learning from Experience” at Lion Rock Artificial Intelligence Lab on September 4. Today I’m sharing the slides and video from the talk for your reference and discussion.
📰 Official coverage: 【产研对接】第 2 期 “FAIR plus × 狮子山问道” 成功举办,探索 AI 智能体与全地形具身智能的瓶颈及突破
Talk materials
- 🎬 Talk video
- 📖 Slides in English
- 📖 Slides in Chinese
Talk overview
In 1900, Lord Kelvin said in a speech: “The beauty and clearness of the dynamical theory, which asserts heat and light to be modes of motion, is at present obscured by two clouds…”. These two “small clouds” later triggered the revolutions of relativity and quantum mechanics. Today, the AI Agent field is facing a similar pair of “dark clouds”.
First dark cloud: challenges of real‑time interaction
Current AI Agents suffer from severe latency issues when interacting with the environment in real time:
The dilemma of voice interaction
- Serial processing vs real‑time needs: they must wait for the user to finish speaking before thinking, and finish thinking before speaking
- Fast vs slow thinking: deep thinking needs 10+ seconds (users lose patience), fast responses are prone to errors
- Technical bottlenecks: every step is a wait (VAD detection, ASR recognition, LLM thinking, TTS synthesis)
The “last mile” challenge of GUI operations
- Agents operate computers 3–5× slower than humans
- Every click requires a new screenshot and thinking (3–4 seconds of latency)
- “Moravec’s paradox”: the model “knows” what to do, but “can’t do it” well
Our solution: the SEAL architecture
SEAL (Streaming, Event-driven Agent Loop) is our proposed architecture that abstracts all interaction as asynchronous event streams:
Perception layer
- Converts continuous signals (voice, GUI) into discrete events
- Streaming speech perception models replace VAD + ASR
- Outputs rich acoustic events (interruptions, emotions, laughter, etc.)
Thinking layer
- Interactive ReAct: breaks the rigid “observe–think–act” loop
- Enables thinking while listening and speaking while thinking
- Fast thinking (0.5 s) → slow thinking (5 s) → continuous thinking
Execution layer
- Trains end‑to‑end VLA models
- Generates natural speech pauses and fillers
- Produces human‑like mouse trajectories
Second dark cloud: learning from experience
Currently, Agents start from scratch on every task; they cannot accumulate domain knowledge or improve task proficiency.
The challenge of going from “smart” to “skilled”
- SOTA models ≈ top fresh graduates (knowledgeable but inexperienced)
- Business workflows are dynamic and private
- Improving base models alone cannot solve the “experience” problem
Three learning paradigms
1. Post‑training
- Method: update parameters via RL
- Value: solidify experience into parameters
- Example: Kimi K2’s Model as Agent
2. In‑context learning
- Method: leverage the Transformer’s attention mechanism
- Breakthroughs:
- DeepSeek MLA: 16× KV cache compression
- Sparse attention: turn KV cache into a vector database
- MiniMax-01: hybrid architecture of linear attention + softmax attention
3. Externalized learning 【core innovation】
Knowledge base: persistent experience storage without retraining
- Contextual Retrieval: add context to each document chunk
- LLM‑driven summarization: turn compute into a scalable knowledge base
Tool generation: self‑evolving Agents
- Intelligent RPA: summarize repetitive operations into tools (checking weather reduced from 47 s to 10 s)
- Automatic diagnosis: automatic triage from production logs
- MCP-Zero: proactive tool discovery, 98% token saving
Extending the Scaling Law
“The two methods that seem to scale arbitrarily … are search and learning.” — Rich Sutton, The Bitter Lesson
Externalized learning breaks the limitation of model parameters:
- Search → external knowledge bases and tool libraries
- Learning → LLMs summarize experience into knowledge and code
- Extend the boundary of the Scaling Law into the external ecosystem
Key insights
- Essence of real‑time interaction: not making LLMs faster, but enabling them to “think while listening and speak while thinking” like humans
- Essence of learning: not stuffing all knowledge into parameters, but building a reliable system of external knowledge and tools
- Future of Agents: from containers of knowledge to engines of discovery
Practice at Pine AI
At Pine AI, we are putting these ideas into practice so that AI Agents can:
- Interact with the world in real time (voice calls, GUI operations)
- Learn from experience (knowledge accumulation, tool generation)
- Truly solve problems and get things done for users
If you’re interested in building SOTA autonomous AI Agents, you’re welcome to join the Pine AI team. We are looking for full‑stack engineers who enjoy collaborating with AI to code, love hands‑on problem‑solving, and have solid engineering skills. Contact: [email protected]
Official coverage
The following is reprinted from the official WeChat account of Shenzhen Robotics Association
【产研对接】第2期”FAIR plus × 狮子山问道”成功举办,探索AI智能体与全地形具身智能的瓶颈及突破
To promote technological innovation and成果转化 in the embodied intelligence industry and to facilitate coordinated development across the entire industrial chain, the Shenzhen Robotics Association and the China Merchants Lion Rock Artificial Intelligence Lab jointly host the “FAIR plus × Lion Rock WenDao” event series. Each session invites industry experts to share cutting‑edge technologies and practical experience in the embodied intelligence field.
On September 4, the 2nd session of “FAIR plus × Lion Rock WenDao” was successfully held. Pine AI co‑founder and Chief Scientist, one of the first “Huawei Genius Youth” Li Bojie, and Cyberbot Robotics co‑founder and associate professor at South China University of Technology Zhang Huaidong were invited to share on topics such as the core bottlenecks in AI Agent development and breakthroughs in applications of all‑terrain embodied intelligent robots.
At the beginning of the event, attending guests visited the China Merchants Lion Rock Artificial Intelligence Lab located in the Futian Hetao Innovation Center, where they gained in‑depth understanding of the lab’s layout in computing platforms, algorithm R&D, and scenario‑based applications, and experienced the close connection between cutting‑edge research and industrial implementation.
Chen Chao, member of the Party Committee and Deputy General Manager of China Merchants Innovation & Technology (Group) Co., Ltd., and Deputy Director of the Advanced Technology Research Institute of China Merchants Group, delivered the opening speech. He emphasized that embodied intelligence is a key link between AI and the real economy, and that its technological breakthroughs and scenario implementation depend on deep collaboration among industry, academia, research, and application. He expressed the hope that through this series of events, they can explore technical paths and expand the industrial ecosystem together with industry peers, promoting the transformation of cutting‑edge technologies into real‑world applications.
Zhang Jiaxing, Chief AI Scientist of China Merchants Group and Director of Lion Rock Artificial Intelligence Lab, hosted the event. He stated that the “FAIR plus × Lion Rock WenDao” series aims to build a communication and mutual‑learning platform covering the whole chain of embodied intelligence from industry and academia to research and application. By bringing together cutting‑edge academic research and practical experience, the series promotes cross‑domain idea collisions and resource matching, providing support for the development of the embodied intelligence industry.
In the keynote session, Pine AI co‑founder and Chief Scientist, and one of the first “Huawei Genius Youth” Li Bojie delivered a deep‑dive talk titled “The Two Dark Clouds over Agents: Real‑time Interaction with the Environment, Learning from Experience”. He pointed out that the development of Agents is facing two key challenges: first, how to achieve efficient real‑time interaction in complex and dynamic environments; second, how to conduct self‑learning and evolution from limited experience. Around these two “dark clouds”, he combined cutting‑edge research with practical cases to propose future breakthrough directions for Agents in algorithmic architecture, computing resources, and scenario deployment.
Cyberbot Robotics co‑founder and South China University of Technology associate professor Zhang Huaidong then gave a talk titled “Breakthroughs in Complex Scenario Applications of All‑terrain Embodied Intelligent Robots”. Drawing on his team’s experience in complex terrain adaptation, cross‑modal perception and control, and dynamics optimization, he systematically elaborated on the application progress and innovative成果 of all‑terrain embodied intelligent robots in typical scenarios such as disaster rescue, industrial inspection, and outdoor exploration. He pointed out that for robots to move from the lab to scalable, replicable real‑world deployment, the key lies in bridging the gap between software and hardware and the application scenarios.
As a key link deeply connecting industry, academia, research, and application in embodied intelligence, the “FAIR plus × Lion Rock Inquiry” series of events is committed to promoting the efficient integration of cross-disciplinary innovation resources and accelerating the transformation of core technologies from R&D to industrialization. In the future, the association and the laboratory will continue to advance this series of branded events, fostering a virtuous cycle between technological innovation and industrial implementation, helping embodied intelligence technologies move out of the lab, integrate into the industrial chain, and empower the development of new-quality productive forces.
About Lion Rock Artificial Intelligence Laboratory
Lion Rock Artificial Intelligence Laboratory was established by China Merchants Group and founded at Hong Kong Science Park on September 12, 2024. Led by Dr. Zhang Jiaxing, Chief Scientist of Artificial Intelligence at China Merchants Group, the lab brings together gifted young talents and senior experts across many fields, including large models, computer vision, localization and navigation, motion control, and mechanical structures, enabling interdisciplinary talent exchange and collaboration.
Upholding the mission and vision of “empowering machines with intelligence and bringing warmth to humanity,” the lab focuses on service scenarios to carry out cutting-edge research in embodied intelligence and innovative product development. The lab is committed to developing end-to-end models, believes in exploratory machine learning, and seeks to uncover the value of natural language, ultimately creating embodied intelligence technologies that can enter thousands of households.
As an important part of the China Merchants Advanced Technology Research Institute, the lab will build an industrial innovation ecosystem through Shenzhen–Hong Kong collaboration, accelerate the construction of a full-stack “Embodied-AI X Agentic-AI” R&D system, and strive to become a global leader in embodied intelligence technologies, providing technological empowerment for high-quality development across hundreds of industries and creating a new lifestyle paradigm with greater happiness for hundreds of millions of families.
About FAIR plus
FAIR plus is a platform focused on technologies and development resources across the entire robotics industry chain. Through academic conferences, technical standards, community building, supply–demand matchmaking, and other approaches, it creates opportunities for offline meetings and cooperation among technical personnel involved in development, products, engineering, and solutions at every stage of the AI + robotics industry chain, as well as process, equipment, and IT personnel from scenarios intending to adopt robots, thereby effectively promoting the development of robots toward greater intelligence and enhancing the overall capacity building and configuration of the industry.
Recently, the Robotics Full-Industry-Chain Conference FAIR plus 2026 has been officially scheduled for April 22–24, 2026. The exhibition will focus on technologies and development resources across the robotics industry chain, with specially curated zones such as a full-robotics-industry-chain exhibition area and a joint exhibition area for startups. It will cover more than 50 segments of the embodied intelligent robotics industry and feature physical exhibits from over 500 upstream and complete-machine companies in the robotics industry chain. It is expected to attract more than 50,000 professional visitors and over 100 professional overseas buyer delegations.
About Shenzhen Robotics Association
Shenzhen Robotics Association (SRA) was initiated and established by the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, in September 2009, making it the earliest robotics industry association in China. The association is a non-profit social organization voluntarily formed by enterprises, R&D institutions, and related upstream and downstream units in Shenzhen’s robotics industry. Growing alongside the development of Shenzhen’s robotics industry and the expansion of robotics enterprises, the association now has more than 800 member companies in industrial robots, service robots, medical robots, educational robots, special robots, artificial intelligence, and other fields, with a combined output value exceeding 150 billion RMB. It is the largest local association in the robotics field in terms of both membership and output value.
Relying on the scientific research resources of SIAT, the association has set up the Shenzhen Artificial Intelligence Expert Committee, Youth Expert Committee, and Medical Robotics Expert Committee, and has successively initiated the establishment of the Shenzhen Artificial Intelligence Society, the South China Machine Vision Industry Alliance, and the Shenzhen Logistics Robotics Industry Alliance. It provides long-term technical support, industry matchmaking, and other consulting services for governments, enterprises, and third-party organizations.