[This article is based on a presentation at the Turing Community’s Large Model Technology Learning Camp, Slides Link]

Explore the design philosophy and practical strategies of AI Agents in depth. From the conversational mode of Chatbots to the action mode of Agents, systematically design and manage the information environment of Agents to build efficient and reliable AI Agent systems.

Table of Contents

  1. Part 1: Paradigm Shift - From Chatbot to Agent
  2. Part 2: Core Analysis of Agents
  3. Part 3: Context Engineering
  4. Part 4: Memory and Knowledge Systems

Part 1: Paradigm Shift - From Chatbot to Agent

From Chatbot to Agent: A Fundamental Paradigm Shift

We are experiencing a fundamental shift in AI interaction modes:

Chatbot Era

  • 🗣️ Conversational Interaction: User asks → AI answers → Repetitive Q&A cycle
  • 📚 Knowledgeable Advisor: Can only “speak” but not “act,” passively responding to user needs
  • 🛠️ Typical Products: ChatGPT, Claude Chat

Agent Era

  • 🎯 Autonomous Action Mode: User sets goals → Agent executes → Autonomous planning and decision-making
  • 💪 Capable Assistant: Can both “think” and “act,” proactively discovering and solving problems
  • 🚀 Typical Products: Claude Code, Cursor, Manus

Core Techniques of Chatbots: Prompt Engineering

Prompt Engineering focuses on optimizing single conversations with LLMs:

  1. 🎭 Role Setting: Set a professional background for AI

    1
    You are a seasoned architect...
  2. 📝 Clear Instructions: Specify output requirements

    1
    Summarize in 200 words...
  3. 📖 Few-Shot Learning: Provide example guidance

    1
    2
    Example 1: Input → Output
    Example 2: Input → Output

💡 Essence: These techniques optimize the quality of conversation, not the effectiveness of actions

Source: OpenAI, Prompt Engineering Guide

Fundamental Limitations of Chatbots

  • ❌ Cannot Act

    • Cannot execute code
    • Cannot access web pages
    • Cannot manipulate files
  • 🔄 Stateless

    • Each conversation is relatively independent
    • Difficult to handle long-term tasks
    • Lacks contextual continuity
  • 🧠 No Memory

    • Cannot accumulate experience
    • Always “starts from scratch”
    • Cannot learn and improve
  • ⏳ Passive Waiting

    • Can only respond to user input
    • Cannot proactively discover problems
    • Lacks autonomy

Part 2: Core Analysis of Agents

What is an AI Agent?

Definition: An AI system capable of perceiving the environment, making autonomous decisions, taking actions, and accumulating experience.

Core Capabilities:

  • 🛠️ Tool Use: Interact with the external world through APIs, command lines, etc.
  • 📋 Task Planning: Break down complex goals into executable steps
  • 🔧 Error Recovery: Adjust strategies when encountering problems
  • 📚 Experience Learning: Accumulate knowledge from successes and failures

Source: Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models”

Core Loop of Agents: ReAct Framework

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
用户目标: "帮我分析这个CSV文件的销售趋势"

Agent 循环:
1. Observe (观察): 用户想要分析CSV文件
2. Think (思考): 我需要先读取文件,然后进行数据分析,最后生成可视化
3. Act (行动): 调用 read_file('sales.csv')
4. Observe: 获得CSV数据,包含月份和销售额
5. Think: 需要写个Python脚本来分析数据
6. Act: 调用 write_file('analyze.py', 'import pandas as pd...')
7. Observe: 脚本文件已创建
8. Think: 现在执行脚本来分析数据
9. Act: 调用 execute_command('python analyze.py')
10. Observe: 输出显示Q3销售额增长15%,图表已保存为trend.png
11. Think: 任务完成,需要总结发现并展示图表
12. Act: 返回分析结果:"Q3销售额增长15%,详见trend.png"

Observe → Think → Act → Observe - This loop allows Agents to continuously adapt and solve problems.

Let’s understand how an Agent invokes tools through a specific example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// 用户请求
{
"role": "user",
"content": "帮我搜索一下 OpenAI o3 模型的最新发布信息"
}

// Agent 决策:给用户说一句话,并调用工具
{
"role": "assistant",
"content": "我来帮您搜索 OpenAI o3 模型的最新信息。",
"tool_calls": [{
"name": "web_search",
"id": "toolu_01ABC123def456",
"arguments": {
"search_term": "OpenAI o3 model release announcement 2024",
"explanation": "搜索 OpenAI o3 模型的最新发布信息"
}
}]
}

// 工具响应
{
"tool_use_id": "toolu_01ABC123def456",
"content": {
"results": [
{
"title": "OpenAI Announces o3 Model",
"url": "https://openai.com/blog/o3-model",
"snippet": "OpenAI today announced o3, its latest reasoning model..."
},
{
"title": "o3 Achieves Breakthrough Performance",
"url": "https://techcrunch.com/2024/openai-o3",
"snippet": "The o3 model demonstrates significant improvements in mathematical reasoning..."
}
],
"search_query": "OpenAI o3 model release announcement 2024"
}
}

// Agent 继续处理,整合信息并向用户输出结果
{
"role": "assistant",
"content": "根据搜索结果,OpenAI o3 模型的最新信息如下:\n\n1. **发布时间**:2024年12月...\n2. **主要特性**:在数学推理方面取得突破性进展...\n3. **性能提升**:相比 o1 模型..."
}

Core Implementation of the Agent Loop

Based on an in-depth analysis of Claude Code, the main loop of a production-grade Agent includes the following key stages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
            Agent Loop 执行流程图

用户输入


消息预处理 ─────┬─ Token 使用量评估
├─ 压缩阈值检测
└─ 上下文验证


系统提示生成 ───── 动态生成适应当前任务的提示


会话流处理 ─────┬─ 模型配置与选择
├─ 流式响应管理
└─ 中断信号处理


工具调用检测 ────┬─ 无工具调用 → 返回文本响应
└─ 有工具调用 → 进入工具执行流程


工具执行引擎 ────┬─ 工具发现与验证
├─ 权限检查
├─ 并发控制
└─ 结果处理


循环判断 ────────┬─ 继续循环 → 回到消息预处理
└─ 结束循环 → 输出最终结果

Six-Stage Pipeline for Tool Execution

A production-grade Agent must have a strict tool execution process to ensure safety and reliability:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
阶段1: 工具发现与验证
├─ 工具名称解析
├─ 工具注册表查找
└─ 可用性检查

阶段2: 输入验证
├─ Schema 验证(如 Zod)
├─ 参数类型检查
└─ 必填参数验证

阶段3: 权限检查
├─ Allow:直接执行
├─ Deny:拒绝执行
└─ Ask:请求用户确认

阶段4: 取消检查
├─ AbortController 信号
├─ 用户中断处理
└─ 超时控制

阶段5: 工具执行
├─ 异步执行
├─ 流式结果输出
└─ 错误捕获

阶段6: 结果格式化
├─ 结果标准化
├─ 状态清理
└─ 事件记录

Concurrent Tool Management: The Key to Smart Scheduling

Core Issue: How to safely and efficiently execute multiple tools concurrently?

1
2
3
4
5
6
7
8
// 工具的并发安全分类
const toolConcurrencySafety = {
// 可以安全并发的工具(读操作)
concurrent: ["read_file", "list_dir", "search", "web_fetch"],

// 必须串行执行的工具(写操作)
sequential: ["write_file", "edit_file", "run_command", "delete_file"]
};

Key Design Principles:

  • ✅ Read operations can be safely concurrent
  • ⚠️ Write operations must be executed serially
  • ⚡ Dynamic scheduling to optimize performance

The Future of Tool Systems: From “Using Tools” to “Creating Tools”

Current Status: The toolset of Agents is predefined by developers and is static.

Future: Agents can autonomously evolve their toolsets.

Technical Path:

  • MCP (Model Context Protocol): A standardized context protocol to simplify tool development
  • Automatic Tool Generation: When Agents encounter new problems, they can:
    1. Search Open Source Code: Look for relevant implementations on platforms like GitHub
    2. Automatic Wrapping: Wrap the found code into MCP tools
    3. Test and Verify: Test new tools in a sandbox environment
    4. Persist and Reuse: Add effective tools to the tool library for future use

Significance: This will transform Agents from passive tool users to proactive tool creators, greatly expanding their capability boundaries.

MCP (Model Context Protocol) - Standardization of Tool Development

What is MCP?

Model Context Protocol is an open protocol launched by Anthropic to standardize the connection between LLMs and external data sources and tools.

Core Features:

  • Standardized Interface - Unified tool definition format
  • Language Agnostic - Supports Python, JS, Go, etc.
  • Plug and Play - Simple server/client model
  • Secure Isolation - Built-in permissions and sandbox mechanisms

MCP Tool Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// MCP 服务器定义
const server = new MCPServer({
name: "weather-service",
version: "1.0.0"
});

// 注册工具
server.tool({
name: "get_weather",
description: "获取指定城市的天气",
parameters: {
location: { type: "string" }
},
handler: async ({ location }) => {
// 实际的天气 API 调用
return await fetchWeather(location);
}
});

Source: Model Context Protocol Documentation

Mainstream Model Tool Invocation

Revolutionary Design of Claude Sonnet 4

Key Innovation: Special Token Instead of XML

Using special token sequences to mark tool invocation, offering advantages:

  • No Escaping Needed: Code, JSON, special characters can be directly passed
  • What Problem It Solves:
    1
    2
    3
    4
    5
    # 传统 XML/JSON 方式需要转义
    {"code": "print(\"Hello, World!\")"} # 引号需要转义

    # Claude 的 special token 方式
    <tool_use>print("Hello, World!")</tool_use> # 直接传递
  • Why It’s Important: Avoids numerous escaping errors when handling code generation and complex data structures

Source: Anthropic Docs, Tool Use

Gemini 2.5 Pro: Native Multimodal Integration

Core Advantages:

  • Native Google Search: Can use Google Search as a built-in tool directly. The model can autonomously determine when real-time information is needed and automatically perform searches, returning verifiable results with source references.
  • Multimodal Input: Can directly take images, short videos, audio, and even large PDF documents as input, with the model natively understanding these contents
  • Long Document Processing: Can directly process documents up to 2M tokens, including PDF, Word formats

Unique Capability Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from google import genai
from google.genai import types

client = genai.Client()
with open("report_2024.pdf", "rb") as f:
pdf_data = f.read()

prompt = "请分析这份年度报告的营收趋势"
response = client.models.generate_content(
model="gemini-2.5-pro",
contents=[
types.Part.from_bytes(
data=pdf_data,
mime_type='application/pdf',
),
prompt
]
)
# Gemini 原生理解 PDF 中的:
# - 文字、图表、表格
# - 布局和格式
# - 无需文本提取!

Source: Google AI Docs, Document Processing

Revolutionary Capability of Claude: Computer Use

Definition: This is not a simple tool but grants the Agent the ability to directly operate a complete, isolated Linux virtual computer environment.

Core Capabilities:

  • File Operations: read_file('data.csv'), write_file('analysis.py', 'import pandas as pd...')
  • Command Execution: run_command('python analysis.py'), run_command('ls -l | grep .csv')
  • GUI Operations: Screenshot, click, input text, operate applications
  • Complete Environment: Possesses file system, network, process management, and other complete operating system capabilities

Workflow:

  1. Provide Tools and Prompts

    • Add computer use tools to API requests
    • User prompt: “Save a picture of a cat to the desktop”
  2. Claude Decides to Use Tools

    • Evaluate the need for desktop interaction
    • Construct tool invocation request
  3. Execute and Return Results

    • Extract tool input (screenshot/click/type)
    • Perform operations in a virtual machine, return execution results
  4. Loop Until Completion

    • Claude analyzes results
    • Continue tool invocation or complete the task

Supported Operations:

  • 📸 Screenshot capture
  • 🖱️ Mouse control (click, drag)
  • ⌨️ Keyboard input
  • 🖥️ Command execution

Computing Environment Requirements:

  • Sandbox Isolation: Dedicated virtual machine or container, secure boundary protection
  • Virtual Display: Recommended resolutions 1024x768, 1280x720, or 1920x1080 (must strictly adhere, otherwise mouse click coordinates may be inaccurate)
  • Operation Processor: Screenshot capture, mouse action simulation, keyboard input processing

Agent Loop Implementation Example:

1
2
3
4
5
6
7
8
9
10
11
while True:
# Claude 请求工具使用
response = client.messages.create(...)

# 提取并执行动作
for action in response.tool_calls:
result = execute_action(action)

# 返回结果给 Claude
if task_complete:
break

Security Considerations:

  • Manual confirmation for sensitive operations
  • Operation log audit
  • Prompt injection protection

Revolutionary Significance:

  • Elevates the Agent from an “API caller” to a “programmer” and “data analyst”
  • Capable of completing end-to-end complex tasks previously unimaginable
  • Truly realizes the vision of “giving AI a computer to solve problems on its own”

Source: Anthropic Docs, Computer Use

Evolution Direction of Agent: Independent Virtual Operating System

Current Limitations: Current Agents are mostly confined to a browser tab or a temporary Shell instance, which is “single-threaded” and unable to handle complex tasks requiring persistent state and multi-application collaboration.

Evolution Direction: Each Agent should have its own dedicated, isolated, persistent virtual operating system environment.

In this environment, it can:

  • Handle Multiple Tasks in Parallel: Like humans using a computer, open and operate multiple application windows simultaneously (Terminal, Browser, IDE, File Manager)
  • Freely Switch Contexts: Seamlessly switch between different task windows, managing each task’s independent state and history
  • Have a Persistent File System: Save work results, manage project files, install new software
  • Network Communication Capability: Interact with other Agents or services

This is the inevitable direction for Agents to evolve towards more powerful capabilities.


Model as Agent: Achieving Capability Leap through RL

Based on Kimi-Researcher’s practice, the real breakthrough in Agent capabilities comes from end-to-end reinforcement learning training:

Limitations of Traditional Methods:

  • Workflow Systems: Rely on manually designed multi-agent collaboration, difficult to adapt to environmental changes
  • Imitation Learning (SFT): Limited by human demonstration data, difficult to handle long-term tasks

Advantages of End-to-End RL:

  • Overall Optimization: Planning, perception, tool usage, and other capabilities are learned together
  • Adaptability: Naturally adapts to changes in tools and environment
  • Exploratory Learning: Discover optimal strategies through extensive exploration

Practical Results: Kimi-Researcher improved accuracy from 8.6% to 26.9% through RL training, averaging 23 inference steps, exploring over 200 URLs.

Source: Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities

Core Principles of Tool Design

💡 Core Concept: Allow the model to leverage pre-trained knowledge, reducing the difficulty of learning prompts

1️⃣ Orthogonality

  • Each tool solves an independent problem, avoiding functional overlap
  • ✅ Good examples:
    • search_web search_knowledge_base - Clearly distinguish search scopes
    • text_browser visual_browser - Clearly distinguish processing methods
  • ❌ Bad examples:
    • search_google search_milvus - Expose underlying implementation
    • browse_url browse_page - Unclear functional overlap

2️⃣ Descriptive Naming

  • Names directly reflect functionality, leveraging the model’s language understanding ability
  • ✅ Good examples:
    • web_search_agent phone_call_agent - Clearly indicate functionality
    • execute_code read_file - Verb+noun structure
  • ❌ Bad examples:
    • daisy ivy - Abstract codes, unable to understand functionality
    • sw exec - Overly abbreviated
  • Tool descriptions need to be clear and unambiguous
  • Describe tool limitations (e.g., phone can only call domestic numbers)

3️⃣ Familiar Patterns

  • Parameters align with developer intuition
  • File operations resemble Unix commands
  • Search resembles Google API

4️⃣ Allow Agent to Handle Errors Autonomously

  • Return detailed error information when tool invocation fails, allowing the Agent to decide the next step
  • Utilize SOTA LLM’s programming capabilities
  • ❌ Bad example: Phone Call Failed
  • ✅ Good example: Phone Call Failed: failed to initiate call: phone number format error

These principles enable the model to leverage knowledge learned during pre-training, reducing the cost of learning new tools.

Multi-Agent Architecture: Breaking Through the Limitations of a Single Agent

Based on Anthropic Research Feature’s practice:

Why Multi-Agent?

Limitations of a Single Agent:

  • 💰 High context window cost
  • ⏱️ Inefficient sequential execution
  • 📉 Decline in long context quality (“lost in the middle” problem)
  • 🧠 Unable to explore in parallel

Advantages of Multi-Agent:

  • ✅ Parallel processing of multiple subtasks
  • ✅ Independent context windows
  • ✅ Overall performance improvement
  • ✅ Suitable for open-ended research tasks

Architecture Diagram:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
用户查询


Lead Agent(协调者)

├─→ 分析任务
├─→ 创建 SubAgents

├─→ SubAgent 1(独立搜索)
├─→ SubAgent 2(独立搜索)
└─→ SubAgent 3(独立搜索)


结果聚合 → 最终报告

Key Design:

  1. Compression is Essential

    • Each SubAgent explores in an independent context
    • Extracts the most important tokens to return
    • Lead Agent synthesizes multi-source information
  2. Mind Token Usage, Don’t Blow Up the Credit Card

    • Single chat: 1×
    • Single Agent: 4×
    • Multi-Agent: 15×
    • Value must match cost
  3. Applicable Scenarios

    • ✅ Information gathering tasks
    • ✅ Multi-angle analysis needs
    • ✅ Tasks easily parallelized

Source: Anthropic, How we built our multi-agent research system

SubAgent Architecture: Key Strategies for Controlling Context Length

As task complexity increases, the main Agent’s context rapidly expands, leading to:

  • Increased Cost: Each token incurs a fee
  • Increased Latency: Longer context means slower response
  • Quality Decline: Overly long context leads to “lost in the middle” problem

SubAgent provides an elegant solution:

1
2
3
4
5
6
7
8
9
主 Agent

│ 将复杂任务委托给 SubAgent

SubAgent(独立上下文)
├─ 只包含任务相关信息
├─ 不继承主 Agent 的历史
├─ 完成后只返回结果摘要
└─ 主 Agent 上下文保持精简

Core Value of SubAgent:

  1. Context Isolation: Each SubAgent has its own clean context
  2. Cost Control: Avoid redundant transmission of irrelevant historical information
  3. Parallel Capability: Multiple SubAgents can handle different tasks simultaneously
  4. Result Aggregation: Only return key results to the main Agent

Typical Application Scenarios:

  • Analyzing code from different modules
  • Parallel processing of multiple independent tasks
  • Performing specialized tasks requiring extensive context

In-Depth Agent Dialogue Flow: Making Thought Processes Explicit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
user: "帮我查一下今天旧金山的天气,然后根据温度推荐我穿什么。"

A:
thinking: "用户有两个请求:1. 查天气;2. 根据天气推荐穿着。
我需要先调用天气API,然后基于结果给出建议。"
text: "好的,我来帮您查询旧金山的天气情况。"
tool: get_weather(location="San Francisco")

tool:
content: {"temperature": "15°C", "condition": "Sunny", "wind": "15km/h"}

A:
thinking: "15°C加上15km/h的风,体感会更凉。需要防风保暖。"
text: "旧金山今天晴朗,15°C,有风。建议穿长袖+防风外套。"

Key Implementation Details:

  1. Explicit Thought Process: Make the Agent’s reasoning process visible and debuggable
  2. Tool Invocation Timing: Invoke only when external information is clearly needed
  3. Result Interpretation: Not just displaying data, but providing explanations and suggestions

Part 3: Context Engineering

Core Concepts of Context Engineering

“The context is the agent’s operating system.” - Manus

Definition: Systematically designing and managing the Agent’s information environment to enable efficient and reliable task completion.

Why Important:

  • Context determines what the Agent can “see” 👁️
  • Context determines what the Agent can “remember” 🧠
  • Context determines what the Agent can “do” 💪

Sources of Context Information

1. Documented Team Knowledge

  • Technical documentation, API documentation, design documents
  • Meeting notes, decision records, project plans
  • Code comments, commit messages, PR descriptions

2. Structured Project Information

  • Data from task management systems
  • Structure and history of code repositories
  • Test cases and test results

3. Dynamic Execution Environment

  • Current system status
  • Real-time logs and monitoring data
  • Instant user feedback

Strategy One: Architecture Design Around KV Cache

Core Issue: The input-output ratio of the Agent is extremely unbalanced (100:1)

Solution: Maximize KV cache hit rate

Best Practices:

  1. Stable Prompt Prefix: Keep system prompts unchanged
    1
    System: You are a helpful assistant...
  2. Append-only Context: Only append, do not modify
    1
    2
    messages.push(newMessage)
    // Do not modify existing messages
  3. Deterministic Serialization: Fixed order of JSON keys
    1
    2
    3
    4
    5
    {
    "name": "...",
    "age": "...",
    "city": "..."
    }
  4. Cache-friendly Tool Output: Uniform format, reduce changes

Strategy Two: Context Compression - A Last Resort

Core Principle: Avoid compression unless necessary due to cost and latency constraints.

Compression is a trade-off:

  • Benefits: Reduces token usage, lowers costs
  • Costs: Loss of information detail, may affect task quality

Based on Claude Code’s practice, compression is only triggered when approaching context limits:

8-Section Structured Compression Template:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
1. 背景上下文 (Background Context)
- 项目类型和技术栈
- 当前工作目录和环境
- 用户的总体目标

2. 关键决策 (Key Decisions)
- 重要的技术选择和原因
- 架构决策和设计考虑

3. 工具使用记录 (Tool Usage Log)
- 主要使用的工具类型
- 文件操作历史

4. 用户意图演进 (User Intent Evolution)
- 需求的变化过程
- 优先级调整

5. 执行结果汇总 (Execution Results)
- 成功完成的任务
- 生成的代码和文件

6. 错误与解决 (Errors and Solutions)
- 遇到的问题类型
- 解决方法

7. 未解决问题 (Open Issues)
- 当前待解决的问题
- 已知的限制

8. 后续计划 (Future Plans)
- 下一步行动计划
- 用户期望的功能

Timing for Compression:

  1. Prefer using SubAgent to control context length
  2. Next, consider cleaning irrelevant historical information
  3. Finally, use compression algorithms

Strategy Three: TODO List - Intelligent Task Management System

1
2
3
4
5
6
7
8
## 当前任务状态(实时更新)
- [x] 项目初始化
- [x] 依赖安装
- [ ] 实现用户认证 ← 当前进行中
- [ ] 设计数据库模式
- [x] 创建 API 端点
- [ ] 前端集成
- [ ] 部署到生产环境

Core Value of the TODO System:

  1. Combat Forgetfulness: Maintain clear goals in long conversations
  2. Progress Visualization: Both user and Agent can see progress
  3. Priority Management: Automatic sorting (in progress > pending > completed)
  4. Subtask Decomposition: Automatically break down complex tasks into executable steps

Intelligent Sorting Algorithm:

1
2
3
// Status priority: in_progress(0) > pending(1) > completed(2)
// Importance priority: high(0) > medium(1) > low(2)
// Sort by creation time for the same priority

Strategy Four: Errors as Learning Opportunities

Traditional Approach vs Correct Approach:

❌ Traditional Approach: Hide errors, retry

1
2
3
Action: npm install
// Error hidden
Action: npm install (retry)

✅ Correct Approach: Retain complete error context

1
2
3
4
5
6
Action: npm install
Error: npm ERR! peer dep missing: react@^18.0.0

Thinking: 需要先安装正确版本的 React
Action: npm install react@^18.0.0
Success: Dependencies installed

Principle: Error information helps the Agent update its world model, learning environmental constraints and dependencies.


Strategy Five: Six-layer Security Protection System

Enterprise-level practice based on Claude Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
第1层:输入验证
├─ Schema 验证(如 Zod)
├─ 参数类型强制检查
└─ 边界约束验证

第2层:权限控制
├─ Allow:直接执行
├─ Deny:拒绝执行
└─ Ask:用户确认

第3层:沙箱隔离
├─ 命令执行沙箱
├─ 文件系统限制
└─ 网络访问控制

第4层:执行监控
├─ AbortController 中断
├─ 超时控制
└─ 资源限制

第5层:错误恢复
├─ 异常捕获
├─ 自动重试
└─ 降级处理

第6层:审计记录
├─ 操作日志
├─ 安全事件
└─ 合规报告

Each layer provides security assurance for the Agent’s safe execution, ensuring stable operation even in complex environments.


Strategy Six: Dynamic Reminder Injection (System Reminder)

Automatically inject context reminders at critical moments:

1
2
3
4
5
6
7
<system-reminder>
检测到您正在修改认证相关代码。
请注意:
1. 密码必须加密存储
2. 使用 bcrypt 或 argon2
3. 实现速率限制防止暴力破解
</system-reminder>

Trigger Conditions:

  • Changes in TODO list status
  • Detection of specific operation patterns
  • Repeated occurrence of error patterns
  • Security-related operations

This dynamic injection mechanism provides the Agent with additional guidance at critical moments, avoiding common mistakes.


Strategy Seven: Parallel Sampling - Balancing Quality and Efficiency

Explore multiple independent inference paths in parallel, then select or synthesize the best result:

1
2
3
4
5
6
7
8
9
10
11
用户问题

├─→ Agent 实例 1:探索路径 A
├─→ Agent 实例 2:探索路径 B
└─→ Agent 实例 3:探索路径 C


结果验证与选择
├─ 一致性检查
├─ 质量评分
└─ 最佳答案选择

Application Scenarios:

  • Multi-angle analysis of complex problems
  • Critical decisions requiring high reliability
  • Parallel attempts for exploratory tasks

Implementation Points:

  • Use different initial prompts or temperature parameters
  • Set reasonable parallelism (usually 3-5)
  • Establish an effective result evaluation mechanism

Strategy Eight: Sequential Revision - Iterative Optimization

Allow the Agent to reflect and improve on its output:

1
Initial Response → Self-assessment → Identify Issues → Generate Improvements → Final Output

Revision Prompt Example:

1
2
3
4
5
6
请审视你刚才的回答:
1. 是否完全解决了用户的问题?
2. 是否有遗漏的重要信息?
3. 是否有可以优化的地方?

基于以上反思,请提供改进后的答案。

Best Practices:

  • Limit revision rounds (usually 2-3 rounds)
  • Provide specific improvement dimensions
  • Retain revision history for learning

Source: Thinking in Phases: How Inference-Time Scaling Improves LLM Performance


Design Philosophy of Context Engineering: Minimal Design, Maximum Capability (Alita)

Core Concepts:

  • Minimal Predefinition:

    • Avoid over-design and presets
    • Maintain system flexibility
    • Allow the Agent to adapt based on task needs
  • Maximal Self-Evolution:

    • Ability to learn and accumulate experience
    • Dynamically adapt to new task types
    • Continuously optimize execution strategies

Practical Insights:

  • Do not attempt to predefine all possible scenarios
  • Give the Agent enough flexibility to explore solutions
  • Simple architectures are often more effective than complex systems
  • “Simplicity is the ultimate sophistication”

Part 4: Memory and Knowledge Systems

Three-layer Memory Architecture: From Short-term to Long-term

Based on Claude Code’s practice, a mature Agent requires a three-layer memory system:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
┌─────────────────────────────────────────────────────┐
│ 短期记忆层 │
│ 当前会话上下文 (messages[]) │
│ • User Message │
│ • Assistant Message │
│ • Tool Results │
│ • System Prompts │
│ │
│ 特征:O(1)查找,实时访问,自动Token统计 │
└─────────────┬───────────────────────────────────────┘
│ 接近上下文限制时触发

┌─────────────────────────────────────────────────────┐
│ 中期记忆层 │
│ 8段式结构化压缩 │
│ • 背景上下文 • 执行结果 │
│ • 关键决策 • 错误处理 │
│ • 工具使用 • 未解决问题 │
│ • 用户意图 • 后续计划 │
│ │
│ 特征:智能压缩,上下文连续,大幅节省Token │
└─────────────┬───────────────────────────────────────┘
│ 持久化存储

┌─────────────────────────────────────────────────────┐
│ 长期记忆层 │
│ 持久化知识系统(CLAUDE.md / Knowledge Base) │
│ • 项目上下文 • 开发环境 │
│ • 用户偏好 • 安全配置 │
│ • 代码风格 • 工作流程 │
│ │
│ 特征:跨会话恢复,用户定制,项目持续记忆 │
└─────────────────────────────────────────────────────┘

User Memory - The Key to Personalization

Definition: The Agent remembers specific user preferences, history, and context

Stored Content:

  • User Preferences: “I prefer concise code style”, “Avoid using class components”
  • Project Context: “We use React 18 + TypeScript”, “API uses GraphQL”
  • Historical Decisions: “Last time we chose PostgreSQL over MySQL”, “Use pnpm for dependency management”

Implementation Mechanism:

  • Vector database storage
  • Similarity retrieval
  • Dynamic injection into system prompts

Source: Survey on Building Agentic RAG Systems


Knowledge Base - The Crystallization of Collective Wisdom

Definition: Reusable knowledge accumulated from interactions between all users and the Agent

Content Types:

  1. Solution Templates: “How to configure Nginx reverse proxy”
  2. Best Practices: “Standard structure for Python projects”
  3. Problem Patterns: “This type of error is usually because…”

Knowledge Lifecycle:

  1. Capture: Agent successfully solves a problem
  2. Refinement: Extract general patterns
  3. Storage: Structurally save
  4. Reuse: Directly apply to similar problems

Source: 01.me, How Agents Learn from Experience


Summary: Core Points of Context Engineering

“The context is the agent’s operating system.”

Core Insights:

  1. Paradigm Shift: From Conversation to Action

    • Chatbot → Agent
    • Prompt Engineering → Context Engineering
  2. Four Major Capabilities of Agents

    • 🛠️ Tool Use
    • 📋 Task Planning
    • 🔧 Error Recovery
    • 📚 Learning from Experience

Eight Practice Strategies:

  1. KV Cache Optimization - Core of architecture design
  2. Intelligent Compression - 8-section structured template
  3. TODO System - Intelligent task management
  4. Error Learning - Retain complete error context
  5. Security Protection - Six-layer protection system
  6. Dynamic Reminders - Injection at critical moments
  7. Parallel Sampling - Balancing quality and efficiency
  8. Sequential Revision - Iterative output optimization

Technical Summary:

  • Intelligent Compression Mechanism: Compress only when necessary, prioritize using SubAgent
  • Concurrent Tool Management: Concurrent read operations, serial write operations
  • Three-layer Memory Architecture: Short-term, mid-term, and long-term memory work together
  • Model as Agent: Achieve capability leap through end-to-end RL training
  • Tool Design Principles: Orthogonality, descriptiveness, close to common usage
  • Multi-Agent Architecture: Overcome the limitations of a single Agent

🚀 The future is agentic. Let’s build it with well-designed context.


Intelligent Injection Mechanism for File Content

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
用户提及文件 → 系统自动检测


路径解析与验证
├─ 安全检查
├─ 权限验证
└─ 文件存在性


智能容量控制
├─ 限制文件数量
├─ 控制单文件大小
└─ 管理总体容量


内容格式化注入
├─ 语法高亮
├─ 行号显示
└─ 相关性排序

This intelligent injection mechanism ensures that the Agent can efficiently access relevant file content while avoiding context overload.


Join Pine AI

We are looking for full-stack engineers capable of building SOTA autonomous AI Agents.

Our philosophy: Everyone’s contribution to the company’s valuation should be over ten million dollars

Requirements to Join Pine AI

🤖 1. Proficient in AI Programming

  • 80%+ of the code is completed through human-machine collaboration
  • Code interview: Complete feature development in 2 hours with AI assistance
  • All internal systems are built on AI

💻 2. Passionate about Hands-on Problem Solving

  • “Talk is cheap, show me the code”
  • Become a combination of architect and product manager
  • Directly command AI to reduce information loss

🏗️ 3. Solid Software Engineering Skills

  • Comprehensive documentation and testing
  • Enable AI to understand and maintain code
  • High-quality engineering practices

🧠 4. Understanding of LLM Principles

  • Understand basic principles and capability boundaries
  • Master the correct methods to harness LLM
  • Provide appropriate context and tools

🚀 5. Confidence in Solving World-Class Problems

  • Pursue SOTA levels
  • Grow with the startup
  • Continuously surpass existing levels

🎯 Our Mission

By building Agents that can interact with the world in real-time and learn from experience, we truly solve problems for users and get things done.

Gradually build user trust, ultimately entrusting important tasks to Pine.

Pine AI - Building Agents That Get Things Done

1
mail -s "Join Pine AI" -A /path/to/your_resume.pdf boj@19pine.ai

Meta Information: The Creation Process of This Slides

This Slides is itself a product of human-machine collaboration, embodying context engineering in practical application.

  • The draft was generated by AI based on provided reference materials
  • Humans provided direction, structure, and key insights
  • AI was responsible for expansion, organization, and refinement
  • Multiple iterations for continuous improvement

【This Slides was created using Slidev. Original Slidev Markdown

Comments

2025-07-30
  1. Table of Contents
  2. Part 1: Paradigm Shift - From Chatbot to Agent
    1. From Chatbot to Agent: A Fundamental Paradigm Shift
    2. Core Techniques of Chatbots: Prompt Engineering
    3. Fundamental Limitations of Chatbots
  3. Part 2: Core Analysis of Agents
    1. What is an AI Agent?
    2. Core Loop of Agents: ReAct Framework
    3. Tool Invocation Example: Web Search
    4. Core Implementation of the Agent Loop
    5. Six-Stage Pipeline for Tool Execution
    6. Concurrent Tool Management: The Key to Smart Scheduling
    7. The Future of Tool Systems: From “Using Tools” to “Creating Tools”
    8. MCP (Model Context Protocol) - Standardization of Tool Development
    9. Mainstream Model Tool Invocation
      1. Revolutionary Design of Claude Sonnet 4
      2. Gemini 2.5 Pro: Native Multimodal Integration
    10. Revolutionary Capability of Claude: Computer Use
    11. Evolution Direction of Agent: Independent Virtual Operating System
    12. Model as Agent: Achieving Capability Leap through RL
    13. Core Principles of Tool Design
    14. Multi-Agent Architecture: Breaking Through the Limitations of a Single Agent
    15. SubAgent Architecture: Key Strategies for Controlling Context Length
    16. In-Depth Agent Dialogue Flow: Making Thought Processes Explicit
  4. Part 3: Context Engineering
    1. Core Concepts of Context Engineering
    2. Sources of Context Information
    3. Strategy One: Architecture Design Around KV Cache
    4. Strategy Two: Context Compression - A Last Resort
    5. Strategy Three: TODO List - Intelligent Task Management System
    6. Strategy Four: Errors as Learning Opportunities
    7. Strategy Five: Six-layer Security Protection System
    8. Strategy Six: Dynamic Reminder Injection (System Reminder)
    9. Strategy Seven: Parallel Sampling - Balancing Quality and Efficiency
    10. Strategy Eight: Sequential Revision - Iterative Optimization
    11. Design Philosophy of Context Engineering: Minimal Design, Maximum Capability (Alita)
  5. Part 4: Memory and Knowledge Systems
    1. Three-layer Memory Architecture: From Short-term to Long-term
    2. User Memory - The Key to Personalization
    3. Knowledge Base - The Crystallization of Collective Wisdom
    4. Summary: Core Points of Context Engineering
    5. Intelligent Injection Mechanism for File Content
  6. Join Pine AI
    1. Requirements to Join Pine AI
    2. 🎯 Our Mission
    3. Meta Information: The Creation Process of This Slides