2025-10-24
Continuous Learning for Agents: Why a Reasoner Is Not a Real Agent?

Richard Sutton, the father of reinforcement learning, says that today’s large language models are a dead end.

This sounds shocking. As the author of The Bitter Lesson and the 2024 Turing Award laureate, Sutton believes most in “more compute + general methods will win,” so in theory he should be full of praise for large models like GPT-5, Claude, and Gemini. Yet in a recent interview, Sutton bluntly pointed out: LLMs merely imitate what people would say, rather than understanding how the world works.

The interview organized by podcast host Dwarkesh Patel sparked heated debate. Andrej Karpathy then responded in writing and elaborated in another interview. Their exchange reveals three fundamental issues in current AI development that are often overlooked:

First, the myth of the small-world assumption: Do we really believe that a sufficiently large model can internalize all important knowledge and never need to learn again? Or does the real world fit a large-world assumption—no matter how big the model is, it still needs to keep learning in concrete settings?

Second, the absence of continual learning: Current model-free RL methods (PPO, GRPO, etc.) learn only from sparse rewards and cannot leverage the rich feedback provided by the environment. This makes agents extremely sample-inefficient on real-world tasks and unable to adapt quickly.

Third, the gulf between Reasoners and Agents: OpenAI divides AI capability into five levels, from Chatbot to Reasoner to Agent. But many people mistakenly think that turning a single-turn Reasoner into a multi-turn one makes it an Agent. The core difference between a true Agent and a Reasoner is continual learning capability.

This article will systematically review the core viewpoints from those two interviews and, combined with our hands-on experience building real-time agents at Pine AI, explore how to bridge this gap.

Read More

2025-09-28
The Thinking Behind Unified Bus

The protocol documentation for Unified Bus has finally been released. Most of the initial design work for the protocol was done four or five years ago, and I haven’t worked on interconnects for more than two years. Yet reading this 500+ page document today still feels very familiar.

As with most protocol documents, the UB documentation presents a wealth of details about the Unified Bus protocol, but rarely touches on the thinking behind its design. As a small foot soldier who participated in UB in its early days, I’ll share some of my personal thoughts. The productized UB today may differ in many ways from what we designed back then, so don’t take this as an authoritative guide—just read it as anecdotes.

Why UB

To understand the inevitability of Unified Bus (UB), we must return to a fundamental contradiction in computer architecture: the split between the Bus and the Network.

For a long time, the computing world has been divided into islands by these two completely different interconnect paradigms.

  • Inside an island (for example, within a single server or a chassis), we use bus technologies such as PCIe or NVLink. They are designed for tightly coupled systems; devices share a unified physical address space, communication latency can be on the order of nanoseconds, and bandwidth is extremely high. This is a performance paradise, but its territory is very limited—the physical distance and the number of devices a bus can connect are strictly constrained.
  • Between islands, we rely on network technologies such as Ethernet or InfiniBand. They are born for loosely coupled systems, excel at connecting tens of thousands of nodes, and have superb scalability. But that scalability comes at a cost: complex protocol stacks, additional forwarding overhead, and latencies in the microsecond or even millisecond range create an orders-of-magnitude gap compared with buses.

This “inside vs. outside” architecture worked well for a long time. However, a specter began to haunt the computing world—Scaling Law.

About 10 years ago, researchers in deep learning discovered a striking regularity: as long as you keep increasing model size, data, and compute, model performance predictably and steadily improves. This discovery changed the game. What used to be a “good enough” single machine with 8 GPUs suddenly became a drop in the bucket in the face of models with tens or hundreds of billions of parameters.

At that moment, a clear and urgent need presented itself to system architects everywhere: can we tear down the wall between buses and networks? Can we create a unified interconnect that offers bus-level programming simplicity and extreme performance, while also providing network-level massive scalability?

This is UB’s core mission. It’s not merely a patch or improvement on existing protocols but a thorough rethinking. UB aims to build a true “datacenter-scale computer,” seamlessly connecting heterogeneous compute, memory, and storage across the entire cluster into a unified, programmable whole. In this vision, accessing memory on a remote server should be as simple and natural as accessing local memory; tens of thousands of processors should collaborate as efficiently as if they were on a single chip.

Read More

2025-09-12
Qwen3-Next: Hybrid Attention + Ultra-Sparse MoE + MTP = SOTA Inference Speed

Recently, Alibaba’s Qwen team released the Qwen3-Next model, another major innovation after Qwen3. The model achieves multiple breakthroughs in architectural design, especially reaching industry-leading levels in the balance between inference efficiency and performance. This article briefly summarizes Qwen3-Next’s core innovations.

Three major breakthroughs of Qwen3-Next:

  1. Hybrid attention architecture: 3 layers of linear attention + 1 layer of traditional attention, incorporating DeltaNet’s delta rule idea
  2. Ultra-sparse MoE: only 11 of 512 experts activated; 80B parameters with only 3B activated
  3. 100+ tokens/s inference speed: reaches a state-of-the-art level via MTP

Core value: With 1/10 the compute cost and 10× the token processing speed, it achieves performance surpassing 32B dense models, benchmarking against Gemini 2.5 Flash.

Read More

2025-09-08
Two Dark Clouds Over Agents: Real-time Interaction with Environments, Learning from Experience

I was honored to be invited by Prof. Jiaxing Zhang to give an academic talk titled “Two Dark Clouds Over Agents: Real-time Interaction with Environments, Learning from Experience” at the Lion Rock Artificial Intelligence Laboratory on September 4. Today I’m sharing the slides and video for your reference and discussion.

📰 Official Report: 【Industry-Research Docking】Issue 2 “FAIR plus × Lion Rock Wendao” successfully held, exploring the bottlenecks and breakthroughs of AI agents and all-terrain embodied intelligence

Talk Materials

Talk Summary

In 1900, Lord Kelvin said in a lecture: “The beauty and clearness of our present views … two clouds …” Those two small clouds later triggered the revolutions of relativity and quantum mechanics. Today, the AI Agent field faces similar “two dark clouds.”

The First Cloud: Challenges of Real-time Interaction

Current AI agents face severe latency when interacting with environments in real time:

The predicament of voice interaction

  • Serial processing vs. real-time needs: must wait for the user to finish speaking before thinking, and finish thinking before speaking
  • The fast/slow thinking dilemma: deep thinking takes 10+ seconds (users lose patience), quick responses are error-prone
  • Technical bottlenecks: waiting at every step (VAD detection, ASR recognition, LLM reasoning, TTS synthesis)

The “last mile” problem of GUI operations

  • Agents operate computers 3–5× slower than humans
  • Each click requires re-screenshotting and thinking (3–4 seconds of latency)
  • Moravec’s paradox: the model “knows” what to do but “can’t do it”
Read More

2025-08-18
AI Agent Bootcamp: Build Your General-Purpose Agent in 9 Weeks

【This article is compiled from the first live session of the Turing Community AI Agent Bootcamp, Slides link

Turing Community “AI Agent Bootcamp” purchase link

Build an AI Agent of your own—start here. This article not only systematically introduces the foundational technical path to building a general-purpose AI Agent from scratch (such as context engineering, RAG systems, tool use, multimodal interaction, etc.), but also covers advanced techniques like fast/slow thinking and multi-Agent collaboration. Through 9 weeks of hands-on projects, you will progressively master the full lifecycle of Agent development and the core advanced skills.

This course had its first live preview on August 18 and will officially start on September 11. Each week includes about 2 hours of class time covering all the foundational and advanced topics below. Of course, just 2 hours of lectures per week is not enough—you’ll also need to spend time coding and practicing.

Core Goals of the Bootcamp

Build an AI Agent of your own—start here

🎯 Master core architecture and engineering capabilities

  • Deeply understand Agent architecture: Systematically grasp the core design paradigm of LLM + context + tools.
  • Master context engineering: Learn multi-layered context management from conversation history and long-term user memory to external knowledge bases (RAG) and file systems.
  • Master dynamic tool calling: Reliably integrate Agents with external APIs and MCP Server, and enable self-improvement via code generation.
  • Build advanced Agent patterns: Design and implement fast/slow thinking (Mixture-of-Thoughts), orchestration, and other complex Agent collaboration patterns.

💡 Build a systematic understanding of development and deployment

  • Understand the path of technical evolution: See the progression from basic RAG to Agents that can autonomously develop tools.
  • Master the full Agent lifecycle: Be able to independently complete the closed loop of design, development, evaluation with LLM as a Judge, and deployment.
  • Build domain knowledge: Accumulate cross-domain Agent development experience through hands-on projects in law, academia, programming, etc.
  • Consolidate your knowledge system: Co-create the book “AI Agent, Explained,” turning fragmented knowledge into a systematic output.

9-Week Hands-On Plan Overview

Week Topic Content Overview Hands-On Case
1 Agent Basics Agent structure and taxonomy, workflow-based vs autonomous Build a web-connected search Agent
2 Context Design Prompt templates, conversation history, long-term user memory Add persona and long-term memory to your Agent
3 RAG and Knowledge Base Document structuring, retrieval strategies, incremental updates Build a legal Q&A Agent
4 Tool Use and MCP Tool wrapping and MCP integration, external API calls Connect to an MCP Server to build a deep-research Agent
5 Programming and Code Execution Codebase understanding, reliable code modification, consistent execution environment Build an Agent that can develop Agents by itself
6 Model Evaluation and Selection Model capability evaluation, LLM as a Judge, safety guardrail design Build an evaluation dataset and auto-evaluate Agents with LLM as a Judge
7 Multimodal and Real-time Interaction Real-time voice Agent, operating computers and phones Implement a voice call Agent & integrate browser-use to operate a computer
8 Multi-Agent Collaboration A2A communication protocol, Agent team roles and collaboration Design a multi-Agent collaboration system to “make calls while operating the computer”
9 Project Integration and Demo Final integration and demo of the Agent project, polishing the final deliverable Showcase your unique general-purpose Agent

9-Week Advanced Topics

Week Topic Advanced Content Overview Advanced Hands-On Case
1 Agent Basics The importance of context Explore how missing context affects Agent behavior
2 Context Design Organizing user memory Build a personal knowledge management Agent to summarize long texts
3 RAG and Knowledge Base Long-context compression Build a research paper analysis Agent to summarize core contributions
4 Tool Use and MCP Learning from experience Enhance the deep-research Agent’s expert capability (sub-agents and domain experience)
5 Programming and Code Execution Agent self-evolution Build an Agent that autonomously leverages open-source software to solve unknown problems
6 Model Evaluation and Selection Parallel sampling and sequential revision Add parallelism and revision capabilities to the deep-research Agent
7 Multimodal and Real-time Interaction Combining fast and slow thinking Implement a real-time voice Agent that combines fast and slow thinking
8 Multi-Agent Collaboration Orchestration Agent Use an Orchestration Agent to dynamically coordinate phone calls and computer operations
9 Project Integration and Demo Comparing Agent learning methods Compare four ways an Agent learns from experience
Read More

2025-08-03
Another Vibe Coding Interview Question: Attention-Based LLM Hallucination Detector

Following “Solving LLM Constrained Sampling Interview Question with Vibe Coding”, I’m sharing another Vibe Coding interview question from our company (Pine AI) about the fundamental principles of LLM.

Many people misunderstand Vibe Coding, thinking it’s just about constantly asking AI, “How do you do this? How do you implement that?” This approach is doomed to fail. True Vibe Coding requires you to be the architect and product manager, guiding the AI like a teacher instructing a student, not the other way around.

This interview question assesses candidates’ understanding of the basic principles of Transformers and their engineering ability to quickly implement vibe coding. This is the kind of person we need: someone who understands models and has strong engineering skills.

The Challenge: Attention-Based LLM Hallucination Detector

1. Background & Problem Statement

In many applications, large language models (LLMs) need to answer questions or extract information based on a given context, a process often referred to as “In-Context Learning.” However, LLMs have a known, serious security flaw: when asked about information not present in the context, they may “hallucinate” a correctly formatted but factually incorrect answer instead of admitting the lack of information.

Read More

2025-07-30
From Prompt Engineering to Context Engineering: The Secret to Writing Good Agents

[This article is based on a presentation at the Turing Community’s Large Model Technology Learning Camp, Slides Link]

Explore the design philosophy and practical strategies of AI Agents in depth. From the conversational mode of Chatbots to the action mode of Agents, systematically design and manage the information environment of Agents to build efficient and reliable AI Agent systems.

Table of Contents

  1. Part 1: Paradigm Shift - From Chatbot to Agent
  2. Part 2: Core Analysis of Agents
  3. Part 3: Context Engineering
  4. Part 4: Memory and Knowledge Systems

Part 1: Paradigm Shift - From Chatbot to Agent

From Chatbot to Agent: A Fundamental Paradigm Shift

We are experiencing a fundamental shift in AI interaction modes:

Chatbot Era

  • 🗣️ Conversational Interaction: User asks → AI answers → Repetitive Q&A cycle
  • 📚 Knowledgeable Advisor: Can only “speak” but not “act,” passively responding to user needs
  • 🛠️ Typical Products: ChatGPT, Claude Chat

Agent Era

  • 🎯 Autonomous Action Mode: User sets goals → Agent executes → Autonomous planning and decision-making
  • 💪 Capable Assistant: Can both “think” and “act,” proactively discovering and solving problems
  • 🚀 Typical Products: Claude Code, Cursor, Manus
Read More

2025-07-25
OpenRouter, Anthropic, Volcano Engine, Siliconflow Usage Guide

In AI application development, choosing the right LLM API service is crucial. Whether you are building an intelligent dialogue system, developing an AI Agent, or participating in an AI Hackathon, this article will provide you with a comprehensive API usage guide, covering mainstream services such as OpenRouter, Anthropic API, Volcano Engine, and Siliconflow.

Why Do You Need Multiple API Services?

Different LLM models have their own advantages, especially when developing AI Agents, where you need to choose the right model based on specific scenarios:

  • Claude (Anthropic): Excels in complex reasoning, programming, and Agent tasks, particularly suitable for scenarios requiring deep thinking
  • Gemini (Google): Performs well in long text processing and multimodal understanding, suitable for handling multimedia content such as images and videos
  • GPT (OpenAI): Strong in image understanding and mathematical reasoning, excellent for everyday conversation experiences
  • Doubao (ByteDance): Fast access speed in China, good voice dialogue experience, especially suitable for real-time interaction scenarios
  • Open Source Models: Low cost, highly customizable, suitable for large-scale deployment
Read More

2025-07-21
AI, Our Free "Dopamine Engine": From "Freedom From" to "Freedom To"

(This article is automatically generated based on my one-hour voice chat with Gemini 2.5 Pro)

The human pursuit of freedom is a profound dialogue with the biological instincts deep within us. Before we embark on this dialogue, we must first understand the two core aspects of “freedom,” as articulated by philosopher Isaiah Berlin:

  • The first is “freedom from”, which is negative freedom. It aims to rid us of external constraints, coercion, and interference. This is about delineating a sacred, inviolable “space” in our lives, with its ultimate form being financial freedom—where you are free from the compulsion to sell your labor for a living.
  • The second is “freedom to”, which is positive freedom. It seeks to make us masters of our own will, possessing enough ability and resources to realize our self-worth. This endows us with the “power” to act, with its ultimate form being creative freedom—where you can turn imagination into reality.

Understanding this pair of concepts allows us to uncover a deeper secret, revealed by Richard Sutton, the 2025 Turing Award winner and father of reinforcement learning, in his classic textbook “Reinforcement Learning”: what drives our happiness is not the static “reward” itself, but the dynamic “reward prediction error.” What truly makes our brains secrete dopamine and feel joy is the positive gap between “actual gain” and “prior expectation.”

A completely predictable, surprise-free world, no matter how affluent, has a reward prediction error approaching zero. This biologically explains why pure “Freedom From”—a comfortable, worry-free but unchanging haven—can ultimately lead to emptiness. In contrast, “Freedom To,” filled with challenges, exploration, and creation, is a powerful engine that continuously generates positive prediction errors.

Today, the rise of AI is handing the keys to this engine to each of us in unprecedented ways.

Read More

2025-07-18
Setting Up an IKEv2 Tunnel Without Installing a Client to Bypass Cursor's Regional Restrictions

(Thanks to Koutian Wu for thoroughly debugging and deploying, and for pointing out several technical issues in the original article, which have been corrected in this version)

As access to tools like Cursor and Claude Code becomes restricted in China, traditional HTTP/SOCKS proxies can no longer meet daily needs. These tools not only impose regional restrictions on the server side but may also employ multi-layered techniques to detect the user’s true geographical location (currently only partially implemented, but may be upgraded in the future):

  1. Basic IP Database Matching: Traditional GeoIP database queries
  2. Timezone Consistency Check: Obtaining the client’s timezone via JavaScript and cross-verifying with the IP’s geographical location
  3. DNS Resolution Check: Using Geo DNS resolution results to check the real location
  4. WebRTC IP Leak Detection: Obtaining the user’s real IP address via WebRTC
  5. CloudFlare Source Address Retrieval: Obtaining the real source address through CloudFlare’s HTTP headers

Most current HTTP/SOCKS proxies can only handle basic detection methods, while more complex multi-dimensional detection often leaves them powerless. A three-layer tunnel, working at the network layer, can more thoroughly hide the user’s real network environment.

Besides bypassing geographical restrictions, a three-layer tunnel is also suitable for the following scenarios:

  1. Server Access Control: Avoid exposing the SSH access port of company servers on the public internet
  2. Development and Testing Environment: Avoid exposing the company’s test servers, internal APIs, etc., on the public internet
  3. Secure Network Environment: Ensure communication security in untrusted public WiFi environments

While solutions like WireGuard and OpenVPN are stable and efficient, they require installing dedicated clients, which can be cumbersome in multi-device usage scenarios.

IKEv2, as a modern VPN standard, not only offers excellent performance and stability but, more importantly, is natively integrated into mainstream operating systems like macOS, Windows, iOS, and Android, eliminating the need to install any third-party clients.

This article will build on the architecture idea from “Skillfully Using Hong Kong as a Relay to Build a Smooth and Stable China-US Three-Layer Tunnel“ to construct a China -> Hong Kong -> USA IKEv2 tunnel three-hop solution.

Read More
RSS