Two Dark Clouds Over Agents: Real-time Interaction with Environments, Learning from Experience

I was honored to be invited by Prof. Jiaxing Zhang to give an academic talk titled “Two Dark Clouds Over Agents: Real-time Interaction with Environments, Learning from Experience” at the Lion Rock Artificial Intelligence Laboratory on September 4. Today I’m sharing the slides and video for your reference and discussion.

📰 Official Report: 【Industry-Research Docking】Issue 2 “FAIR plus × Lion Rock Wendao” successfully held, exploring the bottlenecks and breakthroughs of AI agents and all-terrain embodied intelligence

Talk Materials

🎬 Talk Video
- Watch the talk on YouTube
- Download the talk video (474 MB, 2 hours 16 min)
📖 English Slides
- View English slides online
- English slides source code
📖 Chinese Slides
- View Chinese slides online
- Chinese slides source code

Talk Summary

In 1900, Lord Kelvin said in a lecture: “The beauty and clearness of our present views … two clouds …” Those two small clouds later triggered the revolutions of relativity and quantum mechanics. Today, the AI Agent field faces similar “two dark clouds.”

The First Cloud: Challenges of Real-time Interaction

Current AI agents face severe latency when interacting with environments in real time:

The predicament of voice interaction

Serial processing vs. real-time needs: must wait for the user to finish speaking before thinking, and finish thinking before speaking
The fast/slow thinking dilemma: deep thinking takes 10+ seconds (users lose patience), quick responses are error-prone
Technical bottlenecks: waiting at every step (VAD detection, ASR recognition, LLM reasoning, TTS synthesis)

The “last mile” problem of GUI operations

Agents operate computers 3–5× slower than humans
Each click requires re-screenshotting and thinking (3–4 seconds of latency)
Moravec’s paradox: the model “knows” what to do but “can’t do it”

Our Solution: The SEAL Architecture

SEAL (Streaming, Event-driven Agent Loop) is our innovative architecture that abstracts all interaction as asynchronous event streams:

Perception
- Convert continuous signals (voice, GUI) into discrete events
- Streaming speech perception model replaces VAD + ASR
- Outputs rich acoustic events (interruptions, emotion, laughter, etc.)
Thinking
- Interactive ReAct: break the rigid “observe–think–act” loop
- Think while listening, speak while thinking
- Fast thinking (0.5s) → slow thinking (5s) → continuous thinking
Execution
- Train end-to-end VLA models
- Generate natural speech pauses and fillers
- Human-like mouse movement trajectories

The Second Cloud: Learning from Experience

Today’s agents start from scratch each task, unable to accumulate domain knowledge or improve task proficiency.

The challenge from “smart” to “skilled”

SOTA models ≈ top graduates (knowledgeable but lacking experience)
Business processes are dynamic and non-public
Simply improving base models cannot solve the “experience” problem

Three Learning Paradigms

1. Post-training (Post-training)

Method: update parameters via RL
Value: solidify experience into parameters
Case: Kimi K2’s Model as Agent

2. In-context Learning (In-context Learning)

Method: leverage the Transformer’s attention mechanism
Breakthroughs:
- DeepSeek MLA: 16× KV cache compression
- Sparse attention: turning the KV cache into a vector database
- MiniMax-01: a hybrid of linear attention + softmax attention

3. Externalized Learning (Externalized Learning) [Core Innovation]

Knowledge base: persistent experience storage without retraining
- Contextual retrieval: add context to each document chunk
- LLM automated summarization: turn compute into a scalable knowledge base
Tool generation: agent self-evolution
- Intelligent RPA: summarize repetitive operations into tools (checking the weather reduced from 47s to 10s)
- Automatic diagnosis: automatically triage issues from production logs
- MCP-Zero: proactive tool discovery, 98% token savings

Extending the Scaling Law

“The two methods that seem to scale arbitrarily … are search and learning.” — Rich Sutton, The Bitter Lesson

Externalized learning breaks the limits of model parameters:

Search → external knowledge and tool libraries
Learning → LLM summarizes experience into knowledge and code
Extend the boundary of the Scaling Law to the external ecosystem

Key Insights

Essence of real-time interaction: not making the LLM faster, but enabling it to “think while listening, speak while thinking” like humans
Essence of learning: not stuffing all knowledge into parameters, but building reliable external knowledge and tool systems
The future of agents: from containers of knowledge to engines of discovery

Pine AI in Practice

At Pine AI, we are putting these ideas into practice so that AI agents can:

Interact with the world in real time (voice calls, GUI operations)
Learn from experience (knowledge accumulation, tool generation)
Truly solve problems and get things done for users

If you’re interested in building SOTA autonomous AI agents, join us at Pine AI. We’re looking for full-stack engineers who enjoy co-programming with AI, love hands-on problem solving, and have solid engineering skills. Contact: [email protected]

Official Report

The following content is reprinted from the official WeChat account of the Shenzhen Robotics Association

[Industry-Research Docking] The 2nd “FAIR plus × Lion Rock Wendao” successfully held, exploring bottlenecks and breakthroughs in AI agents and all-terrain embodied intelligence

To drive technological innovation and result transformation in the embodied intelligence industry and promote coordinated development across the entire industrial chain, the Shenzhen Robotics Association, together with the China Merchants Lion Rock Artificial Intelligence Laboratory, co-hosted the “FAIR plus × Lion Rock Wendao” series. Each session invites industry experts to share cutting-edge technologies and practical experience in embodied intelligence.

On September 4, the 2nd “FAIR plus × Lion Rock Wendao” event was successfully held. The event invited Bojie Li, Co-founder and Chief Scientist of Pine AI and one of the first “Huawei Genius Youth,” and Huaidong Zhang, Co-founder of Cyborg Robotics and Associate Professor at South China University of Technology, to share insights on topics such as core bottlenecks in AI agent development and application breakthroughs of all-terrain embodied intelligent robots.

At the beginning of the event, the guests visited the China Merchants Lion Rock Artificial Intelligence Laboratory located in the Futian Hetao Sci-Tech Center, gaining an in-depth understanding of the lab’s layout in compute platforms, algorithm R&D, and scenario-based applications, and feeling the close linkage between frontier research and industrial deployment.

Chen Chao, Party Committee member and Deputy General Manager of China Merchants Innovation & Technology (Group) Co., Ltd., and Deputy Director of the China Merchants Advanced Technology Research Institute, delivered the opening remarks. He emphasized that embodied intelligence is the key link connecting AI and the real economy, and that technological breakthroughs and scenario deployments rely on deep collaboration across industry, academia, research, and application. He looks forward to using this series to explore technical paths and expand the industrial ecosystem with peers, promoting the transformation of frontier technologies into practical applications.

Jiaxing Zhang, Chief AI Scientist of China Merchants Group and Director of the Lion Rock Artificial Intelligence Laboratory, hosted the event, stating that the “FAIR plus × Lion Rock Wendao” series aims to create a platform for exchange and mutual learning across industry, academia, research, and application in embodied intelligence. By gathering cutting-edge academic research and practical experience, it drives cross-disciplinary collisions of ideas and resource matching, helping advance the embodied intelligence industry.

In the keynote session, Bojie Li, Co-founder and Chief Scientist of Pine AI and one of the first “Huawei Genius Youth,” delivered a deep dive titled “Two Dark Clouds Over Agents: Real-time Interaction with Environments, Learning from Experience.” He pointed out two key challenges for agent development: how to achieve efficient real-time interaction in complex, dynamic environments, and how to learn and evolve from limited experience. Around these two “clouds,” he proposed future breakthroughs in algorithmic architecture, compute resources, and scenario deployment, based on frontier research and practical cases.

Huaidong Zhang, Co-founder of Cyborg Robotics and Associate Professor at South China University of Technology, presented “Breakthroughs in Complex-Scenario Applications of All-Terrain Embodied Intelligent Robots.” Drawing on his team’s experience in complex terrain adaptation, cross-modal perception and control, and dynamics optimization, he systematically elaborated on application progress and innovations in typical scenarios such as disaster rescue, industrial inspection, and outdoor exploration. He noted that for robots to move from the lab to replicable and scalable deployment, the key is to connect the chain between software, hardware, and application scenarios.

As a key link for deep collaboration across industry, academia, research, and application in embodied intelligence, the “FAIR plus × Lion Rock Wendao” series is committed to promoting efficient integration of cross-domain innovation resources and accelerating the path from R&D to industrialization of core technologies. In the future, the Association and the Laboratory will continue advancing this branded series, fostering a virtuous cycle between technological innovation and industrial deployment, helping embodied intelligence step out of the lab, integrate into the industrial chain, and empower the development of new productive forces.

About the Lion Rock Artificial Intelligence Laboratory

The Lion Rock Artificial Intelligence Laboratory was created by China Merchants Group and established at Hong Kong Science Park on September 12, 2024. Led by Dr. Jiaxing Zhang, Chief Scientist of Artificial Intelligence at China Merchants Group, the lab brings together brilliant young talents and seasoned experts across many fields including large models, computer vision, localization and navigation, motion control, and mechanical structures, enabling interdisciplinary exchange and collaboration.

Upholding the mission and vision of “empowering machines with intelligence and bringing warmth to humanity,” the lab conducts frontier research in embodied intelligence and develops innovative products for service scenarios. It is committed to developing end-to-end models, believes in exploratory machine learning, recognizes the value of natural language, and ultimately aims to create embodied-intelligence technologies that can enter thousands of households.

As an important component of the China Merchants Advanced Technology Research Institute, the lab will build an industrial innovation ecosystem through Shenzhen–Hong Kong collaboration, accelerate the establishment of a full-stack R&D system of “Embodied-AI X Agentic-AI,” and is determined to become a global leader in embodied-intelligence technologies, empowering high-quality development across all sectors and creating a new, happier living paradigm for hundreds of millions of families.

About FAIR plus

FAIR plus is a platform focused on technologies and development resources across the entire robotics industry chain. Through academic conferences, technical standards, community cultivation, and supply–demand matchmaking, it creates offline opportunities for developers and technical personnel in product, engineering, and solution roles across the AI+robotics industry chain, as well as process, equipment, and IT personnel from scenario owners intending to introduce robots, to meet and collaborate, effectively promoting the intelligent evolution of robotics and enhancing the building and allocation of the industry’s overall capabilities.

Recently, the full-robotics-industry-link expo FAIR plus 2026 was officially scheduled for April 22–24, 2026. The exhibition focuses on technologies and development resources across the entire robotics industry chain, with major feature zones such as a full-chain robotics exhibition area and a joint pavilion for startups, covering 50+ segments of the embodied-intelligence robotics industry and physical displays from 500+ upstream and complete-machine enterprises in the robotics supply chain, and is expected to attract 50,000+ professional visitors and 100+ overseas professional buyer delegations.

About the Shenzhen Robotics Association

The Shenzhen Robotics Association (SRA) was initiated and established by the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, in September 2009, and is the earliest robotics industry association established in China. The association is a non-profit social organization voluntarily formed by enterprises, R&D institutions, and related upstream and downstream entities engaged in the robotics industry in Shenzhen. Growing alongside the development of Shenzhen’s robotics industry and the expansion of robotics enterprises, the association now has over 800 member companies in industrial, service, medical, educational, and special-purpose robotics and artificial intelligence, with members’ output exceeding 150 billion yuan. It is the local association with the largest number of members and the largest output scale in the robotics field.

Leveraging the research resources of SIAT, the association has established the Shenzhen Artificial Intelligence Expert Committee, the Youth Expert Committee, and the Medical Robotics Expert Committee, and has successively initiated the Shenzhen Association for Artificial Intelligence, the South China Machine Vision Industry Alliance, and the Shenzhen Logistics Robotics Industry Alliance. It provides year-round consulting services such as technical support and industry matchmaking for government, enterprises, and third-party institutions.