The Two Dark Clouds over Agents: Real-time Interaction with the Environment, Learning from Experience
I was honored to be invited by Prof. Jiaxing Zhang to give an academic talk titled “The Two Dark Clouds over Agents: Real-time Interaction with the Environment, Learning from Experience” at Lion Rock AI Lab on September 4. Today I’m sharing the slides and video of this talk for your reference and discussion.
📰 Official Coverage: 【产研对接】第 2 期 “FAIR plus × 狮子山问道” 成功举办,探索 AI 智能体与全地形具身智能的瓶颈及突破
Talk Materials
- 🎬 Talk Video
- 📖 Slides in English
- 📖 Slides in Chinese
Talk Overview
In 1900, Lord Kelvin said in a lecture: “The beauty and clearness of the dynamical theory… is at present obscured by two clouds.” These two small clouds later triggered the revolutions of relativity and quantum mechanics. Today, the field of AI Agents is also facing similar “two dark clouds”.
Dark Cloud One: Challenges of Real-time Interaction
Current AI Agents face severe latency issues when interacting with the environment in real time:
The dilemma of voice interaction
- Serial processing vs real-time needs: The system has to wait for the user to finish speaking before it can think, and wait for thinking to finish before it can speak
- The fast–slow thinking dilemma: Deep thinking takes 10+ seconds (users lose patience), fast responses are error-prone
- Technical bottlenecks: Every step is waiting (VAD detection, ASR recognition, LLM thinking, TTS synthesis)
The “last mile” problem of GUI operation
- Agents operate computers 3–5× slower than humans
- Every click requires re-screenshotting and thinking (3–4 seconds latency)
- There is a “Moravec’s paradox”: the model “knows” what to do but “can’t do it”
Our Solution: The SEAL Architecture
SEAL (Streaming, Event-driven Agent Loop) is an innovative architecture we propose, abstracting all interactions as asynchronous event streams:
Perception Layer
- Converts continuous signals (voice, GUI) into discrete events
- Streaming speech perception model replaces VAD + ASR
- Outputs rich acoustic events (interruptions, emotion, laughter, etc.)
Thinking Layer
- Interactive ReAct: Breaks the rigid “observe–think–act” loop
- Enables thinking while listening, speaking while thinking
- Fast thinking (0.5s) → slow thinking (5s) → continuous thinking
Execution Layer
- Trains end-to-end VLA models
- Generates natural speech pauses and filler words
- Achieves human-like mouse movement trajectories
Dark Cloud Two: Learning from Experience
Currently, Agents start from scratch on every task. They cannot accumulate domain knowledge or improve task proficiency.
The challenge of moving from “smart” to “skilled”
- SOTA models ≈ top fresh graduates (knowledgeable but inexperienced)
- Business processes are dynamic and non-public
- Improving only the base model cannot solve the “experience” problem
Three Learning Paradigms
1. Post-training
- Method: Update parameters via RL
- Value: Solidify experience into parameters
- Example: Kimi K2’s Model as Agent
2. In-context Learning
- Method: Use the attention mechanism of Transformers
- Breakthroughs:
- DeepSeek MLA: 16× KV cache compression
- Sparse attention: turns KV cache into a vector database
- MiniMax-01: hybrid architecture of linear attention + softmax attention
3. Externalized Learning 【Core Innovation】
Knowledge base: Persistent experience storage without retraining
- Contextual Retrieval: attach context to each document chunk
- LLM-based automated summarization: turn compute into a scalable knowledge base
Tool generation: Agents self-evolve
- Intelligent RPA: summarize repeated operations into tools (checking weather reduced from 47s to 10s)
- Automatic diagnosis: automatically triage issues from production logs
- MCP-Zero: proactive tool discovery, 98% token savings
Extending the Scaling Law
“The two methods that seem to scale arbitrarily … are search and learning.” — Rich Sutton, The Bitter Lesson
Externalized learning breaks the limitation of model parameters:
- Search → external knowledge bases and tool libraries
- Learning → LLMs summarizing experience into knowledge and code
- Extends the boundary of the Scaling Law to the external ecosystem
Key Insights
- The essence of real-time interaction: It’s not about making LLMs faster, but making them “think while listening, speak while thinking” like humans
- The essence of learning: It’s not about stuffing all knowledge into parameters, but about building reliable external knowledge and tool systems
- The future of Agents: From containers of knowledge to engines of discovery
Practice at Pine AI
At Pine AI we are putting these ideas into practice, enabling AI Agents to:
- Interact with the world in real time (voice calls, GUI operations)
- Learn from experience (knowledge accumulation, tool generation)
- Truly solve problems and get things done for users
If you’re interested in building SOTA autonomous AI Agents, you’re welcome to join our Pine AI team. We are looking for full-stack engineers who enjoy programming alongside AI, love hands-on problem solving, and have solid engineering skills. Contact: [email protected]
Official Coverage
The following content is reprinted from the official WeChat account of Shenzhen Robotics Association
【产研对接】The 2nd “FAIR plus × Lion Rock Questions” Successfully Held, Exploring Bottlenecks and Breakthroughs of AI Agents and All-terrain Embodied Intelligence
To promote technological innovation and achievement transformation in the embodied intelligence industry, and to facilitate coordinated development across the entire industry chain, Shenzhen Robotics Association and China Merchants Lion Rock AI Lab jointly host the “FAIR plus × Lion Rock Questions” series of events. Each session invites industry experts to share cutting-edge technologies and practical experience in the embodied intelligence field.
On September 4, the second session of “FAIR plus × Lion Rock Questions” was successfully held. The event invited Li Bojie, Co-founder and Chief Scientist of Pine AI and one of the first batch of “Huawei Genius Youth”, and Zhang Huaidong, Co-founder of Cyborg Robotics and Associate Professor at South China University of Technology, to share on topics such as the core bottlenecks in AI Agent development and application breakthroughs of all-terrain embodied intelligent robots.
At the beginning of the event, the guests visited the China Merchants Lion Rock AI Lab located in the Hetao Shenzhen–Hong Kong Science and Technology Innovation Cooperation Zone in Futian. On site, they gained an in-depth understanding of the lab’s layout in terms of computing platforms, algorithm R&D, and scenario-based applications, and felt the close connection between cutting-edge research and industrial deployment.
Chen Chao, Member of the Party Committee and Deputy General Manager of China Merchants Innovation & Technology (Group) Co., Ltd., and Vice President of China Merchants Advanced Technology Institute, delivered the opening speech. He emphasized that embodied intelligence is a key link connecting AI and the real economy, and its technological breakthroughs and scenario implementation rely on deep collaboration among industry, academia, and application partners. He looks forward to using this series of events to explore technical paths and expand industrial ecosystems with peers, promoting the transformation of cutting-edge technologies into practical applications.
Jiaxing Zhang, Chief AI Scientist of China Merchants Group and Director of Lion Rock AI Lab, chaired the event, stating that the “FAIR plus × Lion Rock Questions” series aims to build a communication and mutual-learning platform covering the full chain of embodied intelligence from industry to academia and applications. By gathering frontier research results and practical experience from academia, the series seeks to promote cross-domain intellectual collisions and resource matching, providing support for the development of the embodied intelligence industry.
In the keynote session, Li Bojie, Co-founder and Chief Scientist of Pine AI and one of the first batch of “Huawei Genius Youth”, gave an in-depth talk titled “The Two Dark Clouds over Agents: Real-time Interaction with the Environment, Learning from Experience”. He pointed out that the development of agents is facing two key challenges: first, how to achieve efficient real-time interaction in complex and dynamic environments; second, how to perform self-learning and evolution from limited experience. Around these two “dark clouds”, he combined cutting-edge research with practical cases and proposed future breakthroughs for agents in terms of algorithmic architecture, computing resources, and scenario implementation.
Zhang Huaidong, Co-founder of Cyborg Robotics and Associate Professor at South China University of Technology, delivered a talk titled “Application Breakthroughs of All-terrain Embodied Intelligent Robots in Complex Scenarios”. Drawing on his team’s experience in complex terrain adaptation, cross-modal perception and control, and dynamic optimization, he systematically elaborated on the application progress and innovative achievements of all-terrain embodied intelligent robots in typical scenarios such as disaster rescue, industrial inspection, and outdoor exploration. He pointed out that for robots to move from the lab to replicable and scalable deployment, the key is to bridge the link between software and hardware and real-world application scenarios.
As a key link deeply connecting industry, academia, research, and application in embodied intelligence, the “FAIR plus × Lion Rock Dialogues” series is dedicated to promoting the efficient integration of cross-disciplinary innovation resources and accelerating the transition of core technologies from R&D to industrialization. In the future, the Association and the Laboratory will continue to advance this branded event series, fostering a virtuous cycle between technological innovation and industrial implementation, helping embodied intelligence technologies move out of the lab, integrate into the industrial chain, and empower the development of new-quality productive forces.
About Lion Rock Artificial Intelligence Laboratory
Lion Rock Artificial Intelligence Laboratory was established by China Merchants Group and founded in Hong Kong Science Park on September 12, 2024. Led by Dr. Zhang Jiaxing, Chief Scientist of Artificial Intelligence at China Merchants Group, the lab brings together gifted youths and senior experts in many fields including large models, computer vision, positioning and navigation, motion control, and mechanical structures, enabling interdisciplinary talent exchange and collaboration.
Upholding the mission and vision of “empowering machines with intelligence and delivering warmth to humanity,” the laboratory focuses on service scenarios, carrying out cutting-edge research in embodied intelligence and innovative product development. The lab is committed to end-to-end model R&D, believes in exploratory machine learning, discovers the value of natural language, and ultimately aims to create embodied intelligence technologies that can enter thousands of households.
As an important part of China Merchants Advanced Technology Research Institute, the laboratory will build an industrial innovation ecosystem through Shenzhen–Hong Kong collaborative cooperation, accelerate the construction of a full-stack R&D system for “Embodied-AI × Agentic-AI,” and strive to become a global leader in embodied intelligence technologies. It aims to provide technological empowerment for high-quality development across thousands of industries and create a new lifestyle paradigm with greater happiness for hundreds of millions of families.
About FAIR plus
FAIR plus is a platform focused on technologies and development resources across the entire robotics industry chain. Through academic conferences, technical standards, community building, supply–demand matchmaking and other means, it creates offline meeting opportunities for technical professionals involved in development, products, engineering and solutions across all links of the AI + robotics industry chain, as well as process, equipment, and IT personnel from application scenarios wishing to introduce robots, enabling cooperation to effectively promote the intelligent development of robots and enhance the construction and allocation of capabilities across the industry as a whole.
Recently, the robotics full-industry-chain expo FAIR plus 2026 was officially scheduled for April 22–24, 2026. The expo focuses on technologies and development resources across the entire robotics industry chain, and will feature major themed areas such as a full robotics industry chain exhibition zone and a joint exhibition zone for startups. It will cover 50+ segments of the embodied intelligent robotics industry and showcase physical exhibits from 500+ upstream and complete-machine enterprises in the robotics industry chain. It is expected to attract more than 50,000 professional visitors and over 100 overseas professional buyer delegations.
About Shenzhen Robotics Association
Shenzhen Robotics Association (SRA) was initiated and established by the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, in September 2009, making it the earliest robotics industry association in China. The Association is a non-profit social organization voluntarily formed by enterprises, R&D institutions, and related upstream and downstream units engaged in the robotics industry in Shenzhen. Growing alongside the development of Shenzhen’s robotics industry and the expansion of robotics enterprises, the Association now has more than 800 member companies in fields such as industrial robots, service robots, medical robots, educational robots, special robots, and artificial intelligence, with a total member output value exceeding 150 billion yuan. It is the largest local association in the robotics field in terms of both number of members and scale of output value.
Relying on the scientific research resources of the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, the Association has set up the Shenzhen Artificial Intelligence Expert Committee, Young Experts Committee, and Medical Robotics Expert Committee, and has successively initiated the establishment of the Shenzhen Artificial Intelligence Society, the South China Machine Vision Industry Alliance, and the Shenzhen Logistics Robotics Industry Alliance. It provides long-term consulting services such as technical support and industrial matchmaking for governments, enterprises, and third-party organizations.