[This article is based on a talk given at Turing Community’s Large Model Tech Study Camp. Slides: Slides link, Download PDF version]

A deep dive into the design philosophy and practical strategies for AI Agents. From the dialogue pattern of chatbots to the action pattern of Agents, we systematically design and manage the information environment of Agents to build efficient and reliable AI Agent systems.

Table of Contents

  1. Part 1: Paradigm Shift - From Chatbot to Agent
  2. Part 2: Core Analysis of Agents
  3. Part 3: Context Engineering
  4. Part 4: Memory and Knowledge Systems

Part 1: Paradigm Shift - From Chatbot to Agent

From Chatbot to Agent: A Fundamental Paradigm Shift

We are undergoing a fundamental transformation in AI interaction patterns:

Chatbot Era

  • 🗣️ Conversational interaction: user asks → AI answers → repeated Q&A loop
  • 📚 Knowledgeable advisor: can “talk” but not “act,” passively responding to user needs
  • 🛠️ Typical products: ChatGPT, Claude Chat

Agent Era

  • 🎯 Autonomous action mode: user sets goal → Agent executes → autonomous planning and decision-making
  • 💪 Capable assistant: can both “think” and “do,” actively discovering and solving problems
  • 🚀 Typical products: Claude Code, Cursor, Manus

Core Technique for Chatbots: Prompt Engineering

Prompt Engineering focuses on optimizing single-turn conversations with LLMs:

  1. 🎭 Role setting: define a professional background for the AI

    1
    你是一位资深架构师...
  2. 📝 Clear instructions: specify output requirements

    1
    用200字总结...
  3. 📖 Few-shot learning: guide with examples

    1
    2
    例子1: 输入→输出
    例子2: 输入→输出

💡 Essence: These techniques optimize the quality of the conversation, not the effectiveness of actions.

Reference: OpenAI, Prompt Engineering Guide

Fundamental Limitations of Chatbots

  • ❌ Inability to act

    • Cannot execute code
    • Cannot access web pages
    • Cannot manipulate files
  • 🔄 Stateless

    • Each conversation is relatively independent
    • Hard to handle long-term tasks
    • Lacks contextual continuity
  • 🧠 No memory

    • Cannot accumulate experience
    • Always “starts from scratch”
    • Cannot learn and improve
  • ⏳ Passive waiting

    • Can only respond to user input
    • Cannot proactively discover problems
    • Lacks autonomy

Part 2: Core Analysis of Agents

What Is an AI Agent?

Definition: An AI system that can perceive its environment, make autonomous decisions, take actions, and accumulate experience.

Core capabilities:

  • 🛠️ Tool Use: interact with the external world via APIs, command line, etc.
  • 📋 Task Planning: decompose complex goals into executable steps
  • 🔧 Error Recovery: adjust strategy when encountering problems
  • 📚 Learning from Experience: accumulate knowledge from successes and failures

Reference: Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models”

Core Loop of Agents: ReAct Framework

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
用户目标: "帮我分析这个CSV文件的销售趋势"

Agent 循环:
1. Observe (观察): 用户想要分析CSV文件
2. Think (思考): 我需要先读取文件,然后进行数据分析,最后生成可视化
3. Act (行动): 调用 read_file('sales.csv')
4. Observe: 获得CSV数据,包含月份和销售额
5. Think: 需要写个Python脚本来分析数据
6. Act: 调用 write_file('analyze.py', 'import pandas as pd...')
7. Observe: 脚本文件已创建
8. Think: 现在执行脚本来分析数据
9. Act: 调用 execute_command('python analyze.py')
10. Observe: 输出显示Q3销售额增长15%,图表已保存为trend.png
11. Think: 任务完成,需要总结发现并展示图表
12. Act: 返回分析结果:"Q3销售额增长15%,详见trend.png"

Observe → Think → Act → Observe - this loop enables Agents to continuously adapt and solve problems.

Let’s use a concrete example to understand how an Agent calls tools:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// 用户请求
{
"role": "user",
"content": "帮我搜索一下 OpenAI o3 模型的最新发布信息"
}

// Agent 决策:给用户说一句话,并调用工具
{
"role": "assistant",
"content": "我来帮您搜索 OpenAI o3 模型的最新信息。",
"tool_calls": [{
"name": "web_search",
"id": "toolu_01ABC123def456",
"arguments": {
"search_term": "OpenAI o3 model release announcement 2024",
"explanation": "搜索 OpenAI o3 模型的最新发布信息"
}
}]
}

// 工具响应
{
"tool_use_id": "toolu_01ABC123def456",
"content": {
"results": [
{
"title": "OpenAI Announces o3 Model",
"url": "https://openai.com/blog/o3-model",
"snippet": "OpenAI today announced o3, its latest reasoning model..."
},
{
"title": "o3 Achieves Breakthrough Performance",
"url": "https://techcrunch.com/2024/openai-o3",
"snippet": "The o3 model demonstrates significant improvements in mathematical reasoning..."
}
],
"search_query": "OpenAI o3 model release announcement 2024"
}
}

// Agent 继续处理,整合信息并向用户输出结果
{
"role": "assistant",
"content": "根据搜索结果,OpenAI o3 模型的最新信息如下:\n\n1. **发布时间**:2024年12月...\n2. **主要特性**:在数学推理方面取得突破性进展...\n3. **性能提升**:相比 o1 模型..."
}

Core Implementation of the Agent Loop

Based on an in-depth analysis of Claude Code, the main loop of a production-grade Agent includes the following key stages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
            Agent Loop 执行流程图

用户输入


消息预处理 ─────┬─ Token 使用量评估
├─ 压缩阈值检测
└─ 上下文验证


系统提示生成 ───── 动态生成适应当前任务的提示


会话流处理 ─────┬─ 模型配置与选择
├─ 流式响应管理
└─ 中断信号处理


工具调用检测 ────┬─ 无工具调用 → 返回文本响应
└─ 有工具调用 → 进入工具执行流程


工具执行引擎 ────┬─ 工具发现与验证
├─ 权限检查
├─ 并发控制
└─ 结果处理


循环判断 ────────┬─ 继续循环 → 回到消息预处理
└─ 结束循环 → 输出最终结果

The 6-Stage Pipeline of Tool Execution

A production-grade Agent must have a strict tool execution process to ensure safety and reliability:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
阶段1: 工具发现与验证
├─ 工具名称解析
├─ 工具注册表查找
└─ 可用性检查

阶段2: 输入验证
├─ Schema 验证(如 Zod)
├─ 参数类型检查
└─ 必填参数验证

阶段3: 权限检查
├─ Allow:直接执行
├─ Deny:拒绝执行
└─ Ask:请求用户确认

阶段4: 取消检查
├─ AbortController 信号
├─ 用户中断处理
└─ 超时控制

阶段5: 工具执行
├─ 异步执行
├─ 流式结果输出
└─ 错误捕获

阶段6: 结果格式化
├─ 结果标准化
├─ 状态清理
└─ 事件记录

Concurrent Tool Management: The Key to Smart Scheduling

Core question: How to execute multiple tools concurrently in a safe and efficient way?

1
2
3
4
5
6
7
8
// 工具的并发安全分类
const toolConcurrencySafety = {
// 可以安全并发的工具(读操作)
concurrent: ["read_file", "list_dir", "search", "web_fetch"],

// 必须串行执行的工具(写操作)
sequential: ["write_file", "edit_file", "run_command", "delete_file"]
};

Key design principles:

  • ✅ Read operations can safely run concurrently
  • ⚠️ Write operations must be executed serially
  • ⚡ Dynamic scheduling to optimize performance

The Future of Tool Systems: From “Using Tools” to “Making Tools”

Current state: An Agent’s toolset is predefined by developers and is static.

Future: Agents can autonomously evolve their toolsets.

Technical path:

  • MCP (Model Context Protocol): a standardized context protocol that makes tool development easy
  • Automatic tool generation: when an Agent encounters a new problem, it can:
    1. Search open-source code: find relevant implementations on platforms like GitHub
    2. Auto-wrap: wrap the found code as an MCP tool
    3. Test and verify: test the new tool in a sandbox environment
    4. Persist and reuse: add effective tools to the tool library for future use

Significance: This will turn Agents from passive tool users into proactive tool creators, greatly expanding their capability boundaries.

MCP (Model Context Protocol) - Standardizing Tool Development

What is MCP?

Model Context Protocol is an open protocol introduced by Anthropic to standardize how LLMs connect to external data sources and tools.

Core features:

  • Standardized interfaces - unified tool definition format
  • Language-agnostic - supports Python, JS, Go, etc.
  • Plug-and-play - simple server/client model
  • Security isolation - built-in permissions and sandbox mechanisms

MCP tool example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// MCP 服务器定义
const server = new MCPServer({
name: "weather-service",
version: "1.0.0"
});

// 注册工具
server.tool({
name: "get_weather",
description: "获取指定城市的天气",
parameters: {
location: { type: "string" }
},
handler: async ({ location }) => {
// 实际的天气 API 调用
return await fetchWeather(location);
}
});

Reference: Model Context Protocol Documentation

Tool Use in Mainstream Models

Claude Sonnet 4’s Revolutionary Design

Key innovation: Special Tokens instead of XML

Using special token sequences to mark tool calls brings several advantages:

  • No escaping needed: code, JSON, and special characters can be passed directly
  • What problem does it solve:
    1
    2
    3
    4
    5
    # 传统 XML/JSON 方式需要转义
    {"code": "print(\"Hello, World!\")"} # 引号需要转义

    # Claude 的 special token 方式
    <tool_use>print("Hello, World!")</tool_use> # 直接传递
  • Why it matters: avoids a large number of escaping errors when dealing with code generation and complex data structures

Reference: Anthropic Docs, Tool Use

Gemini 2.5 Pro: Native Multimodal Integration

Core advantages:

  • Native Google Search: can use Google Search as a built-in tool. The model can autonomously decide when real-time information is needed, automatically execute the search, and return verifiable results with source citations.
  • Multimodal input: can directly take images, short videos, audio, and even large PDF documents as input, with native understanding of this content
  • Long-document processing: can directly handle documents up to 2M tokens, including formats like PDF and Word

Unique capability example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from google import genai
from google.genai import types

client = genai.Client()
with open("report_2024.pdf", "rb") as f:
pdf_data = f.read()

prompt = "请分析这份年度报告的营收趋势"
response = client.models.generate_content(
model="gemini-2.5-pro",
contents=[
types.Part.from_bytes(
data=pdf_data,
mime_type='application/pdf',
),
prompt
]
)
# Gemini 原生理解 PDF 中的:
# - 文字、图表、表格
# - 布局和格式
# - 无需文本提取!

Reference: Google AI Docs, Document Processing

Claude’s Revolutionary Capability: Computer Use

Definition: This is not a simple tool, but the ability that lets an Agent directly operate a full, isolated Linux virtual machine environment.

Core capabilities:

  • File operations: read_file('data.csv'), write_file('analysis.py', 'import pandas as pd...')
  • Command execution: run_command('python analysis.py'), run_command('ls -l | grep .csv')
  • GUI operations: screenshots, clicking, text input, application control
  • Full environment: has full operating system capabilities including file system, network, process management, etc.

Workflow:

  1. Provide tools and prompts

    • Add computer use tools to the API request
    • User prompt: “Save a picture of a cat to the desktop”
  2. Claude decides to use the tool

    • Evaluate whether desktop interaction is needed
    • Construct the tool call request
  3. Execute and return results

    • Extract tool inputs (screenshot/click/type)
    • Execute operations in the virtual machine and return the execution results
  4. Loop until completion

    • Claude analyzes the results
    • Continue calling tools or complete the task

Supported operations:

  • 📸 Screenshot capture
  • 🖱️ Mouse control (click, drag)
  • ⌨️ Keyboard input
  • 🖥️ Command execution

Compute environment requirements:

  • Sandbox isolation: Dedicated virtual machine or container with security boundary protection
  • Virtual display: Recommended resolutions 1024x768, 1280x720 or 1920x1080 (must be strictly followed, otherwise mouse click coordinates may be inaccurate)
  • Operation processor: Screenshot capture, mouse action simulation, keyboard input processing

Example of Agent loop implementation:

1
2
3
4
5
6
7
8
9
10
11
while True:
# Claude 请求工具使用
response = client.messages.create(...)

# 提取并执行动作
for action in response.tool_calls:
result = execute_action(action)

# 返回结果给 Claude
if task_complete:
break

Security considerations:

  • Human confirmation for sensitive operations
  • Operation log auditing
  • Prompt injection protection

Revolutionary significance:

  • Elevates the Agent from an “API caller” to a “programmer” and “data analyst”
  • Can complete end-to-end complex tasks that were previously unimaginable
  • Truly realizes the vision of “give an AI a computer and let it solve problems on its own”

Source: Anthropic Docs, Computer Use

The evolution direction of Agents: independent virtual operating systems

Current limitations: Most current Agents are constrained to a single browser tab or a temporary shell instance, which is “single-threaded” and cannot handle complex tasks requiring persistent state and multi-application collaboration.

Evolution direction: each Agent should have its own dedicated, isolated, persistent virtual operating system environment.

In this environment, it can:

  • Handle multiple tasks in parallel: like humans using computers, open and operate multiple application windows at the same time (Terminal, Browser, IDE, File Manager)
  • Freely switch context: seamlessly switch between different task windows and manage each task’s independent state and history
  • Have a persistent file system: save work results, manage project files, install new software
  • Network communication capability: interact with other Agents or services

This is the inevitable direction for Agents to evolve towards more powerful capabilities.


Model as Agent: achieving capability leaps through RL

Based on the practice of Kimi-Researcher, the real breakthrough in Agent capabilities comes from end-to-end reinforcement learning training:

Limitations of traditional methods:

  • Workflow systems: rely on manually designed multi-agent collaboration and are hard to adapt to environmental changes
  • Imitation learning (SFT): limited by human demonstration data and hard to handle long-horizon tasks

Advantages of end-to-end RL:

  • Global optimization: planning, perception, tool use and other capabilities are learned together
  • Adaptivity: naturally adapts to changes in tools and environments
  • Exploratory learning: discovers optimal strategies through massive exploration

Practical results: Through RL training, Kimi-Researcher improved accuracy from 8.6% to 26.9%, with an average of 23 reasoning steps executed and more than 200 URLs explored.

Source: Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities

Core principles of tool design

💡 Core idea: let the model leverage its pretraining knowledge and reduce the difficulty of learning prompts

1️⃣ Orthogonality

  • Each tool solves an independent problem and avoids functional overlap
  • ✅ Good examples:
    • search_web search_knowledge_base - clearly distinguishes search scopes
    • text_browser visual_browser - clearly distinguishes handling modes
  • ❌ Bad examples:
    • search_google search_milvus - exposes underlying implementation
    • browse_url browse_page - overlapping unclear functionalities

2️⃣ Descriptive naming

  • Names directly reflect functionality and leverage the model’s language understanding
  • ✅ Good examples:
    • web_search_agent phone_call_agent - clearly indicate function
    • execute_code read_file - verb+noun structure
  • ❌ Bad examples:
    • daisy ivy - abstract code names with no understandable function
    • sw exec - overly abbreviated
  • Tool descriptions need to be clear and unambiguous
  • Describe tool limitations (e.g., the phone can only call domestic numbers)

3️⃣ Familiar patterns

  • Parameters fit developer intuition
  • File operations resemble Unix commands
  • Search resembles Google APIs

4️⃣ Let the Agent handle errors autonomously

  • When tool calls fail, return detailed error information and let the Agent decide the next step
  • Leverage the programming capabilities of SOTA LLMs
  • ❌ Bad example: Phone Call Failed
  • ✅ Good example: Phone Call Failed: failed to initiate call: phone number format error

These principles allow the model to leverage knowledge learned during pretraining and reduce the cost of learning new tools.

Multi-Agent architecture: breaking through the limitations of a single Agent

Based on the practice of Anthropic Research Feature:

Why do we need Multi-Agent?

Limitations of a single Agent:

  • 💰 Context window costs are high
  • ⏱️ Sequential execution is inefficient
  • 📉 Quality degradation in long contexts (the “lost in the middle” problem)
  • 🧠 Cannot explore in parallel

Advantages of Multi-Agent:

  • ✅ Handle multiple subtasks in parallel
  • ✅ Independent context windows
  • ✅ Overall performance improvement
  • ✅ Suitable for open-ended research tasks

Architecture diagram:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
用户查询


Lead Agent(协调者)

├─→ 分析任务
├─→ 创建 SubAgents

├─→ SubAgent 1(独立搜索)
├─→ SubAgent 2(独立搜索)
└─→ SubAgent 3(独立搜索)


结果聚合 → 最终报告

Key design points:

  1. Compression is essence

    • Each SubAgent explores in an independent context
    • Extracts and returns the most important tokens
    • Lead Agent integrates multi-source information
  2. Mind your token usage, don’t max out your credit card

    • Single chat: 1×
    • Single Agent: 4×
    • Multi-Agent: 15×
    • Value must match cost
  3. Applicable scenarios

    • ✅ Information-gathering tasks
    • ✅ Needs for multi-angle analysis
    • ✅ Tasks that are easy to parallelize

Source: Anthropic, How we built our multi-agent research system

SubAgent architecture: key strategy for controlling context length

As task complexity increases, the main Agent’s context will rapidly expand, leading to:

  • Increased cost: every token has to be paid for
  • Increased latency: longer context means slower responses
  • Quality degradation: overly long contexts cause the “lost in the middle” problem

SubAgents provide an elegant solution:

1
2
3
4
5
6
7
8
9
主 Agent

│ 将复杂任务委托给 SubAgent

SubAgent(独立上下文)
├─ 只包含任务相关信息
├─ 不继承主 Agent 的历史
├─ 完成后只返回结果摘要
└─ 主 Agent 上下文保持精简

Core value of SubAgents:

  1. Context isolation: each SubAgent has its own clean context
  2. Cost control: avoids repeatedly passing irrelevant historical information
  3. Parallel capability: multiple SubAgents can process different tasks simultaneously
  4. Result aggregation: only key results are returned to the main Agent

Typical application scenarios:

  • Analyzing different modules of code
  • Parallel processing of multiple independent tasks
  • Executing specialized work that requires large amounts of context

Diving into Agent dialogue flow: making thinking explicit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
user: "帮我查一下今天旧金山的天气,然后根据温度推荐我穿什么。"

A:
thinking: "用户有两个请求:1. 查天气;2. 根据天气推荐穿着。
我需要先调用天气API,然后基于结果给出建议。"
text: "好的,我来帮您查询旧金山的天气情况。"
tool: get_weather(location="San Francisco")

tool:
content: {"temperature": "15°C", "condition": "Sunny", "wind": "15km/h"}

A:
thinking: "15°C加上15km/h的风,体感会更凉。需要防风保暖。"
text: "旧金山今天晴朗,15°C,有风。建议穿长袖+防风外套。"

Key implementation details:

  1. Explicit thinking process: make the Agent’s reasoning process visible and debuggable
  2. Tool call timing: call tools only when external information is clearly needed
  3. Result explanation: not just showing data, but providing explanations and suggestions

Part 3: Context Engineering

Core ideas of context engineering

“The context is the agent’s operating system.” - Manus

Definition: Systematically designing and managing an agent’s information environment so it can complete tasks efficiently and reliably.

Why it matters:

  • Context determines what the agent can “see” 👁️
  • Context determines what the agent can “remember” 🧠
  • Context determines what the agent can “do” 💪

Sources of contextual information

1. Documented team knowledge

  • Technical docs, API docs, design docs
  • Meeting notes, decision records, project plans
  • Code comments, commit messages, PR descriptions

2. Structured project information

  • Data from task management systems
  • Code repository structure and history
  • Test cases and test results

3. Dynamic execution environment

  • Current system state
  • Real-time logs and monitoring data
  • Immediate user feedback

Strategy 1: Architecture design around KV cache

Core issue: The agent’s input-output ratio is extremely unbalanced (100:1)

Solution: Maximize KV cache hit rate

Best practices:

  1. Stable prompt prefix: Keep the system prompt unchanged
    1
    System: You are a helpful assistant...
  2. Append-only context: Only append, never modify
    1
    2
    messages.push(newMessage)
    // Do not modify existing messages
  3. Deterministic serialization: Fixed JSON key ordering
    1
    2
    3
    4
    5
    {
    "name": "...",
    "age": "...",
    "city": "..."
    }
  4. Cache-friendly tool output: Unified formatting to reduce variation

Strategy 2: Context compression - a last resort

Core principle: If cost and latency allow, avoid compression whenever possible.

Compression is a tradeoff:

  • Benefits: Fewer tokens, lower cost
  • Costs: Loss of detail, possible impact on task quality

Based on Claude Code practice, compression is only triggered when nearing context limits:

8-part structured compression template:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
1. 背景上下文 (Background Context)
- 项目类型和技术栈
- 当前工作目录和环境
- 用户的总体目标

2. 关键决策 (Key Decisions)
- 重要的技术选择和原因
- 架构决策和设计考虑

3. 工具使用记录 (Tool Usage Log)
- 主要使用的工具类型
- 文件操作历史

4. 用户意图演进 (User Intent Evolution)
- 需求的变化过程
- 优先级调整

5. 执行结果汇总 (Execution Results)
- 成功完成的任务
- 生成的代码和文件

6. 错误与解决 (Errors and Solutions)
- 遇到的问题类型
- 解决方法

7. 未解决问题 (Open Issues)
- 当前待解决的问题
- 已知的限制

8. 后续计划 (Future Plans)
- 下一步行动计划
- 用户期望的功能

Choosing when to compress:

  1. Prefer using SubAgents to control context length
  2. Next, consider clearing irrelevant history
  3. Only then use compression algorithms

Strategy 3: TODO List - Intelligent task management system

1
2
3
4
5
6
7
8
## 当前任务状态(实时更新)
- [x] 项目初始化
- [x] 依赖安装
- [ ] 实现用户认证 ← 当前进行中
- [ ] 设计数据库模式
- [x] 创建 API 端点
- [ ] 前端集成
- [ ] 部署到生产环境

Core value of the TODO system:

  1. Counteracting forgetting: Keep goals clear in long conversations
  2. Progress visualization: Both user and agent can see progress
  3. Priority management: Auto-sorting (in progress > pending > completed)
  4. Task decomposition: Automatically break complex tasks into executable steps

Intelligent sorting algorithm:

1
2
3
// Status priority: in_progress(0) > pending(1) > completed(2)
// Importance priority: high(0) > medium(1) > low(2)
// Same priority is ordered by creation time

Strategy 4: Errors are learning opportunities

Traditional approach vs correct approach:

❌ Traditional: Hide errors and retry

1
2
3
Action: npm install
// Error hidden
Action: npm install (retry)

✅ Correct: Preserve full error context

1
2
3
4
5
6
Action: npm install
Error: npm ERR! peer dep missing: react@^18.0.0

Thinking: 需要先安装正确版本的 React
Action: npm install react@^18.0.0
Success: Dependencies installed

Principle: Error messages help the agent update its world model and learn environmental constraints and dependencies.


Strategy 5: 6-layer security protection system

Based on Claude Code’s enterprise-grade practice:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
第1层:输入验证
├─ Schema 验证(如 Zod)
├─ 参数类型强制检查
└─ 边界约束验证

第2层:权限控制
├─ Allow:直接执行
├─ Deny:拒绝执行
└─ Ask:用户确认

第3层:沙箱隔离
├─ 命令执行沙箱
├─ 文件系统限制
└─ 网络访问控制

第4层:执行监控
├─ AbortController 中断
├─ 超时控制
└─ 资源限制

第5层:错误恢复
├─ 异常捕获
├─ 自动重试
└─ 降级处理

第6层:审计记录
├─ 操作日志
├─ 安全事件
└─ 合规报告

Each layer ensures safe execution for the agent, keeping it stable even in complex environments.


Strategy 6: Dynamic reminder injection (System Reminder)

Automatically inject contextual reminders at critical moments:

1
2
3
4
5
6
7
<system-reminder>
检测到您正在修改认证相关代码。
请注意:
1. 密码必须加密存储
2. 使用 bcrypt 或 argon2
3. 实现速率限制防止暴力破解
</system-reminder>

Trigger conditions:

  • TODO list status changes
  • Detection of specific operation patterns
  • Repeated error patterns
  • Security-related operations

This dynamic injection mechanism gives the agent extra guidance at key moments and helps avoid common mistakes.


Strategy 7: Parallel Sampling - balancing quality and efficiency

Explore multiple independent reasoning paths in parallel, then select or aggregate the best result:

1
2
3
4
5
6
7
8
9
10
11
用户问题

├─→ Agent 实例 1:探索路径 A
├─→ Agent 实例 2:探索路径 B
└─→ Agent 实例 3:探索路径 C


结果验证与选择
├─ 一致性检查
├─ 质量评分
└─ 最佳答案选择

Use cases:

  • Multi-angle analysis of complex problems
  • High-reliability critical decisions
  • Exploratory tasks with parallel attempts

Implementation points:

  • Use different initial prompts or temperature settings
  • Choose a reasonable degree of parallelism (typically 3–5)
  • Establish an effective result evaluation mechanism

Strategy 8: Sequential Revision - iterative refinement

Let the agent reflect on and improve its own outputs:

1
Initial response → Self-evaluation → Identify issues → Generate improvements → Final output

Example revision prompt:

1
2
3
4
5
6
请审视你刚才的回答:
1. 是否完全解决了用户的问题?
2. 是否有遗漏的重要信息?
3. 是否有可以优化的地方?

基于以上反思,请提供改进后的答案。

Best practices:

  • Limit the number of revision rounds (typically 2–3)
  • Provide concrete dimensions for improvement
  • Keep revision history for learning purposes

Source: Thinking in Phases: How Inference-Time Scaling Improves LLM Performance


Design philosophy of context engineering: minimal design, maximal capability (Alita)

Core ideas:

  • Minimal Predefinition:

    • Avoid over-design and over-specification
    • Keep the system flexible
    • Let the agent adapt to task requirements
  • Maximal Self-Evolution:

    • Build the ability to learn from experience
    • Dynamically adapt to new task types
    • Continuously optimize execution strategies

Practical insights:

  • Don’t try to predefine every possible scenario
  • Give the agent enough flexibility to explore solutions
  • Simple architectures are often more effective than complex systems
  • “Simplicity is the ultimate sophistication”

Part 4: Memory and knowledge systems

Three-layer memory architecture: from short-term to long-term

Based on Claude Code practice, a mature agent needs a three-layer memory system:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
┌─────────────────────────────────────────────────────┐
│ 短期记忆层 │
│ 当前会话上下文 (messages[]) │
│ • User Message │
│ • Assistant Message │
│ • Tool Results │
│ • System Prompts │
│ │
│ 特征:O(1)查找,实时访问,自动Token统计 │
└─────────────┬───────────────────────────────────────┘
│ 接近上下文限制时触发

┌─────────────────────────────────────────────────────┐
│ 中期记忆层 │
│ 8段式结构化压缩 │
│ • 背景上下文 • 执行结果 │
│ • 关键决策 • 错误处理 │
│ • 工具使用 • 未解决问题 │
│ • 用户意图 • 后续计划 │
│ │
│ 特征:智能压缩,上下文连续,大幅节省Token │
└─────────────┬───────────────────────────────────────┘
│ 持久化存储

┌─────────────────────────────────────────────────────┐
│ 长期记忆层 │
│ 持久化知识系统(CLAUDE.md / Knowledge Base) │
│ • 项目上下文 • 开发环境 │
│ • 用户偏好 • 安全配置 │
│ • 代码风格 • 工作流程 │
│ │
│ 特征:跨会话恢复,用户定制,项目持续记忆 │
└─────────────────────────────────────────────────────┘

User Memory - the key to personalization

Definition: The agent remembers a specific user’s preferences, history, and context

What is stored:

  • User preferences: “I prefer concise code” , “Avoid using class components”
  • Project context: “We use React 18 + TypeScript” , “The API uses GraphQL”
  • Historical decisions: “Last time we chose PostgreSQL over MySQL” , “We use pnpm for dependency management”

Implementation mechanisms:

  • Vector database storage
  • Similarity search
  • Dynamic injection into system prompts

Source: Survey on Building Agentic RAG Systems


Knowledge Base - crystallized collective intelligence

Definition: Reusable knowledge accumulated across all user–agent interactions

Types of content:

  1. Solution templates: “How to configure Nginx as a reverse proxy”
  2. Best practices: “Standard structure for a Python project”
  3. Problem patterns: “This type of error is usually caused by…”

Knowledge lifecycle:

  1. Capture: Agent successfully solves a problem
  2. Refine: Extract generalizable patterns
  3. Store: Save in structured form
  4. Reuse: Apply directly to similar problems

Source: 01.me, How agents learn from experience


Summary: Core points of context engineering

“The context is the agent’s operating system.”

Key insights:

  1. Paradigm shift: from conversation to action

    • Chatbot → Agent
    • Prompt engineering → Context engineering
  2. Four core agent capabilities

    • 🛠️ Tool Use
    • 📋 Planning
    • 🔧 Error Recovery
    • 📚 Learning from experience

Eight practical strategies:

  1. KV cache optimization - core of architecture design
  2. Intelligent compression - 8-part structured template
  3. TODO system - intelligent task management
  4. Learning from errors - keep full error context
  5. Security protection - 6-layer protection system
  6. Dynamic reminders - inject guidance at key moments
  7. Parallel sampling - balance quality and efficiency
  8. Sequential revision - iteratively refine outputs

Key Technical Takeaways:

  • Intelligent Compression Mechanism: Compress only when necessary, prioritize using SubAgent
  • Concurrent Tool Management: Concurrent for read operations, serial for write operations
  • Three-layer Memory Architecture: Short-term, mid-term, and long-term memories work together
  • Model as Agent: Achieve capability leaps through end-to-end RL training
  • Tool Design Principles: Orthogonality, descriptiveness, close to common usage
  • Multi-Agent Architecture: Break through the limitations of a single Agent

🚀 The future is agentic. Let’s build it with well-designed context.


Intelligent Injection Mechanism for File Contents

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
用户提及文件 → 系统自动检测


路径解析与验证
├─ 安全检查
├─ 权限验证
└─ 文件存在性


智能容量控制
├─ 限制文件数量
├─ 控制单文件大小
└─ 管理总体容量


内容格式化注入
├─ 语法高亮
├─ 行号显示
└─ 相关性排序

This intelligent injection mechanism ensures that the Agent can efficiently access relevant file contents while avoiding context overload.


Join Pine AI

We are looking for full‑stack engineers who can build SOTA autonomous AI Agents.

Our belief: everyone’s contribution to the company’s valuation should exceed tens of millions of dollars

Requirements to Join Pine AI

🤖 1. Proficient in AI-assisted programming

  • 80%+ of code completed via human–AI collaboration
  • Coding interview: complete feature development in 2 hours with AI assistance
  • All internal systems are built based on AI

💻 2. Passionate about hands-on problem solving

  • “Talk is cheap, show me the code”
  • Become a combination of architect and product manager
  • Directly command AI to reduce information loss

🏗️ 3. Solid software engineering skills

  • Complete documentation and testing
  • Make code understandable and maintainable by AI
  • High-quality engineering practices

🧠 4. Understanding of LLM fundamentals

  • Know basic principles and capability boundaries
  • The right ways to harness LLMs
  • Provide appropriate context and tools

🚀 5. Confidence to tackle world-class challenges

  • Pursue SOTA levels
  • Grow together with a startup
  • Continuously surpass the current state of the art

🎯 Our Mission

By building Agents that can interact with the world in real time and learn from experience, we truly solve users’ pains and get things done.

Enable users to gradually build trust and ultimately hand over important tasks to Pine.

Pine AI - Building Agents That Get Things Done

1
mail -s "Join Pine AI" -A /path/to/your_resume.pdf [email protected]

Meta: How These Slides Were Created

These Slides themselves are a product of human–AI collaboration, precisely an embodiment of context engineering in real applications.

  • First draft generated by AI based on the provided reference materials
  • Humans provided direction, structure, and key insights
  • AI expanded, organized, and polished the content
  • Multiple rounds of iteration for continuous refinement

【These Slides were created using Slidev. Original Slidev Markdown

Comments

2025-07-30
  1. Table of Contents
  2. Part 1: Paradigm Shift - From Chatbot to Agent
    1. From Chatbot to Agent: A Fundamental Paradigm Shift
    2. Core Technique for Chatbots: Prompt Engineering
    3. Fundamental Limitations of Chatbots
  3. Part 2: Core Analysis of Agents
    1. What Is an AI Agent?
    2. Core Loop of Agents: ReAct Framework
    3. Example of Tool Use: Web Search
    4. Core Implementation of the Agent Loop
    5. The 6-Stage Pipeline of Tool Execution
    6. Concurrent Tool Management: The Key to Smart Scheduling
    7. The Future of Tool Systems: From “Using Tools” to “Making Tools”
    8. MCP (Model Context Protocol) - Standardizing Tool Development
    9. Tool Use in Mainstream Models
      1. Claude Sonnet 4’s Revolutionary Design
      2. Gemini 2.5 Pro: Native Multimodal Integration
    10. Claude’s Revolutionary Capability: Computer Use
    11. The evolution direction of Agents: independent virtual operating systems
    12. Model as Agent: achieving capability leaps through RL
    13. Core principles of tool design
    14. Multi-Agent architecture: breaking through the limitations of a single Agent
    15. SubAgent architecture: key strategy for controlling context length
    16. Diving into Agent dialogue flow: making thinking explicit
  4. Part 3: Context Engineering
    1. Core ideas of context engineering
    2. Sources of contextual information
    3. Strategy 1: Architecture design around KV cache
    4. Strategy 2: Context compression - a last resort
    5. Strategy 3: TODO List - Intelligent task management system
    6. Strategy 4: Errors are learning opportunities
    7. Strategy 5: 6-layer security protection system
    8. Strategy 6: Dynamic reminder injection (System Reminder)
    9. Strategy 7: Parallel Sampling - balancing quality and efficiency
    10. Strategy 8: Sequential Revision - iterative refinement
    11. Design philosophy of context engineering: minimal design, maximal capability (Alita)
  5. Part 4: Memory and knowledge systems
    1. Three-layer memory architecture: from short-term to long-term
    2. User Memory - the key to personalization
    3. Knowledge Base - crystallized collective intelligence
    4. Summary: Core points of context engineering
    5. Intelligent Injection Mechanism for File Contents
  6. Join Pine AI
    1. Requirements to Join Pine AI
    2. 🎯 Our Mission
    3. Meta: How These Slides Were Created