From Prompt Engineering to Context Engineering: Secrets to Building Great Agents
[This article is based on a talk given at Turing Community’s Large Model Tech Study Camp. Slides: Slides link, Download PDF version]
A deep dive into the design philosophy and practical strategies for AI Agents. From the dialogue pattern of chatbots to the action pattern of Agents, we systematically design and manage the information environment of Agents to build efficient and reliable AI Agent systems.
Table of Contents
- Part 1: Paradigm Shift - From Chatbot to Agent
- Part 2: Core Analysis of Agents
- Part 3: Context Engineering
- Part 4: Memory and Knowledge Systems
Part 1: Paradigm Shift - From Chatbot to Agent
From Chatbot to Agent: A Fundamental Paradigm Shift
We are undergoing a fundamental transformation in AI interaction patterns:
Chatbot Era
- 🗣️ Conversational interaction: user asks → AI answers → repeated Q&A loop
- 📚 Knowledgeable advisor: can “talk” but not “act,” passively responding to user needs
- 🛠️ Typical products: ChatGPT, Claude Chat
Agent Era
- 🎯 Autonomous action mode: user sets goal → Agent executes → autonomous planning and decision-making
- 💪 Capable assistant: can both “think” and “do,” actively discovering and solving problems
- 🚀 Typical products: Claude Code, Cursor, Manus
Core Technique for Chatbots: Prompt Engineering
Prompt Engineering focuses on optimizing single-turn conversations with LLMs:
🎭 Role setting: define a professional background for the AI
1
你是一位资深架构师...
📝 Clear instructions: specify output requirements
1
用200字总结...
📖 Few-shot learning: guide with examples
1
2例子1: 输入→输出
例子2: 输入→输出
💡 Essence: These techniques optimize the quality of the conversation, not the effectiveness of actions.
Reference: OpenAI, Prompt Engineering Guide
Fundamental Limitations of Chatbots
❌ Inability to act
- Cannot execute code
- Cannot access web pages
- Cannot manipulate files
🔄 Stateless
- Each conversation is relatively independent
- Hard to handle long-term tasks
- Lacks contextual continuity
🧠 No memory
- Cannot accumulate experience
- Always “starts from scratch”
- Cannot learn and improve
⏳ Passive waiting
- Can only respond to user input
- Cannot proactively discover problems
- Lacks autonomy
Part 2: Core Analysis of Agents
What Is an AI Agent?
Definition: An AI system that can perceive its environment, make autonomous decisions, take actions, and accumulate experience.
Core capabilities:
- 🛠️ Tool Use: interact with the external world via APIs, command line, etc.
- 📋 Task Planning: decompose complex goals into executable steps
- 🔧 Error Recovery: adjust strategy when encountering problems
- 📚 Learning from Experience: accumulate knowledge from successes and failures
Reference: Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models”
Core Loop of Agents: ReAct Framework
1 | 用户目标: "帮我分析这个CSV文件的销售趋势" |
Observe → Think → Act → Observe - this loop enables Agents to continuously adapt and solve problems.
Example of Tool Use: Web Search
Let’s use a concrete example to understand how an Agent calls tools:
1 | // 用户请求 |
Core Implementation of the Agent Loop
Based on an in-depth analysis of Claude Code, the main loop of a production-grade Agent includes the following key stages:
1 | Agent Loop 执行流程图 |
The 6-Stage Pipeline of Tool Execution
A production-grade Agent must have a strict tool execution process to ensure safety and reliability:
1 | 阶段1: 工具发现与验证 |
Concurrent Tool Management: The Key to Smart Scheduling
Core question: How to execute multiple tools concurrently in a safe and efficient way?
1 | // 工具的并发安全分类 |
Key design principles:
- ✅ Read operations can safely run concurrently
- ⚠️ Write operations must be executed serially
- ⚡ Dynamic scheduling to optimize performance
The Future of Tool Systems: From “Using Tools” to “Making Tools”
Current state: An Agent’s toolset is predefined by developers and is static.
Future: Agents can autonomously evolve their toolsets.
Technical path:
- MCP (Model Context Protocol): a standardized context protocol that makes tool development easy
- Automatic tool generation: when an Agent encounters a new problem, it can:
- Search open-source code: find relevant implementations on platforms like GitHub
- Auto-wrap: wrap the found code as an MCP tool
- Test and verify: test the new tool in a sandbox environment
- Persist and reuse: add effective tools to the tool library for future use
Significance: This will turn Agents from passive tool users into proactive tool creators, greatly expanding their capability boundaries.
MCP (Model Context Protocol) - Standardizing Tool Development
What is MCP?
Model Context Protocol is an open protocol introduced by Anthropic to standardize how LLMs connect to external data sources and tools.
Core features:
- Standardized interfaces - unified tool definition format
- Language-agnostic - supports Python, JS, Go, etc.
- Plug-and-play - simple server/client model
- Security isolation - built-in permissions and sandbox mechanisms
MCP tool example:
1 | // MCP 服务器定义 |
Reference: Model Context Protocol Documentation
Tool Use in Mainstream Models
Claude Sonnet 4’s Revolutionary Design
Key innovation: Special Tokens instead of XML
Using special token sequences to mark tool calls brings several advantages:
- No escaping needed: code, JSON, and special characters can be passed directly
- What problem does it solve:
1
2
3
4
5# 传统 XML/JSON 方式需要转义
{"code": "print(\"Hello, World!\")"} # 引号需要转义
# Claude 的 special token 方式
<tool_use>print("Hello, World!")</tool_use> # 直接传递 - Why it matters: avoids a large number of escaping errors when dealing with code generation and complex data structures
Reference: Anthropic Docs, Tool Use
Gemini 2.5 Pro: Native Multimodal Integration
Core advantages:
- Native Google Search: can use
Google Searchas a built-in tool. The model can autonomously decide when real-time information is needed, automatically execute the search, and return verifiable results with source citations. - Multimodal input: can directly take images, short videos, audio, and even large PDF documents as input, with native understanding of this content
- Long-document processing: can directly handle documents up to 2M tokens, including formats like PDF and Word
Unique capability example:
1 | from google import genai |
Reference: Google AI Docs, Document Processing
Claude’s Revolutionary Capability: Computer Use
Definition: This is not a simple tool, but the ability that lets an Agent directly operate a full, isolated Linux virtual machine environment.
Core capabilities:
- File operations:
read_file('data.csv'),write_file('analysis.py', 'import pandas as pd...') - Command execution:
run_command('python analysis.py'),run_command('ls -l | grep .csv') - GUI operations: screenshots, clicking, text input, application control
- Full environment: has full operating system capabilities including file system, network, process management, etc.
Workflow:
Provide tools and prompts
- Add computer use tools to the API request
- User prompt: “Save a picture of a cat to the desktop”
Claude decides to use the tool
- Evaluate whether desktop interaction is needed
- Construct the tool call request
Execute and return results
- Extract tool inputs (screenshot/click/type)
- Execute operations in the virtual machine and return the execution results
Loop until completion
- Claude analyzes the results
- Continue calling tools or complete the task
Supported operations:
- 📸 Screenshot capture
- 🖱️ Mouse control (click, drag)
- ⌨️ Keyboard input
- 🖥️ Command execution
Compute environment requirements:
- Sandbox isolation: Dedicated virtual machine or container with security boundary protection
- Virtual display: Recommended resolutions 1024x768, 1280x720 or 1920x1080 (must be strictly followed, otherwise mouse click coordinates may be inaccurate)
- Operation processor: Screenshot capture, mouse action simulation, keyboard input processing
Example of Agent loop implementation:
1 | while True: |
Security considerations:
- Human confirmation for sensitive operations
- Operation log auditing
- Prompt injection protection
Revolutionary significance:
- Elevates the Agent from an “API caller” to a “programmer” and “data analyst”
- Can complete end-to-end complex tasks that were previously unimaginable
- Truly realizes the vision of “give an AI a computer and let it solve problems on its own”
Source: Anthropic Docs, Computer Use
The evolution direction of Agents: independent virtual operating systems
Current limitations: Most current Agents are constrained to a single browser tab or a temporary shell instance, which is “single-threaded” and cannot handle complex tasks requiring persistent state and multi-application collaboration.
Evolution direction: each Agent should have its own dedicated, isolated, persistent virtual operating system environment.
In this environment, it can:
- Handle multiple tasks in parallel: like humans using computers, open and operate multiple application windows at the same time (Terminal, Browser, IDE, File Manager)
- Freely switch context: seamlessly switch between different task windows and manage each task’s independent state and history
- Have a persistent file system: save work results, manage project files, install new software
- Network communication capability: interact with other Agents or services
This is the inevitable direction for Agents to evolve towards more powerful capabilities.
Model as Agent: achieving capability leaps through RL
Based on the practice of Kimi-Researcher, the real breakthrough in Agent capabilities comes from end-to-end reinforcement learning training:
Limitations of traditional methods:
- Workflow systems: rely on manually designed multi-agent collaboration and are hard to adapt to environmental changes
- Imitation learning (SFT): limited by human demonstration data and hard to handle long-horizon tasks
Advantages of end-to-end RL:
- Global optimization: planning, perception, tool use and other capabilities are learned together
- Adaptivity: naturally adapts to changes in tools and environments
- Exploratory learning: discovers optimal strategies through massive exploration
Practical results: Through RL training, Kimi-Researcher improved accuracy from 8.6% to 26.9%, with an average of 23 reasoning steps executed and more than 200 URLs explored.
Source: Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities
Core principles of tool design
💡 Core idea: let the model leverage its pretraining knowledge and reduce the difficulty of learning prompts
1️⃣ Orthogonality
- Each tool solves an independent problem and avoids functional overlap
- ✅ Good examples:
search_websearch_knowledge_base- clearly distinguishes search scopestext_browservisual_browser- clearly distinguishes handling modes
- ❌ Bad examples:
search_googlesearch_milvus- exposes underlying implementationbrowse_urlbrowse_page- overlapping unclear functionalities
2️⃣ Descriptive naming
- Names directly reflect functionality and leverage the model’s language understanding
- ✅ Good examples:
web_search_agentphone_call_agent- clearly indicate functionexecute_coderead_file- verb+noun structure
- ❌ Bad examples:
daisyivy- abstract code names with no understandable functionswexec- overly abbreviated
- Tool descriptions need to be clear and unambiguous
- Describe tool limitations (e.g., the phone can only call domestic numbers)
3️⃣ Familiar patterns
- Parameters fit developer intuition
- File operations resemble Unix commands
- Search resembles Google APIs
4️⃣ Let the Agent handle errors autonomously
- When tool calls fail, return detailed error information and let the Agent decide the next step
- Leverage the programming capabilities of SOTA LLMs
- ❌ Bad example:
Phone Call Failed - ✅ Good example:
Phone Call Failed: failed to initiate call: phone number format error
These principles allow the model to leverage knowledge learned during pretraining and reduce the cost of learning new tools.
Multi-Agent architecture: breaking through the limitations of a single Agent
Based on the practice of Anthropic Research Feature:
Why do we need Multi-Agent?
Limitations of a single Agent:
- 💰 Context window costs are high
- ⏱️ Sequential execution is inefficient
- 📉 Quality degradation in long contexts (the “lost in the middle” problem)
- 🧠 Cannot explore in parallel
Advantages of Multi-Agent:
- ✅ Handle multiple subtasks in parallel
- ✅ Independent context windows
- ✅ Overall performance improvement
- ✅ Suitable for open-ended research tasks
Architecture diagram:
1 | 用户查询 |
Key design points:
Compression is essence
- Each SubAgent explores in an independent context
- Extracts and returns the most important tokens
- Lead Agent integrates multi-source information
Mind your token usage, don’t max out your credit card
- Single chat: 1×
- Single Agent: 4×
- Multi-Agent: 15×
- Value must match cost
Applicable scenarios
- ✅ Information-gathering tasks
- ✅ Needs for multi-angle analysis
- ✅ Tasks that are easy to parallelize
Source: Anthropic, How we built our multi-agent research system
SubAgent architecture: key strategy for controlling context length
As task complexity increases, the main Agent’s context will rapidly expand, leading to:
- Increased cost: every token has to be paid for
- Increased latency: longer context means slower responses
- Quality degradation: overly long contexts cause the “lost in the middle” problem
SubAgents provide an elegant solution:
1 | 主 Agent |
Core value of SubAgents:
- Context isolation: each SubAgent has its own clean context
- Cost control: avoids repeatedly passing irrelevant historical information
- Parallel capability: multiple SubAgents can process different tasks simultaneously
- Result aggregation: only key results are returned to the main Agent
Typical application scenarios:
- Analyzing different modules of code
- Parallel processing of multiple independent tasks
- Executing specialized work that requires large amounts of context
Diving into Agent dialogue flow: making thinking explicit
1 | user: "帮我查一下今天旧金山的天气,然后根据温度推荐我穿什么。" |
Key implementation details:
- Explicit thinking process: make the Agent’s reasoning process visible and debuggable
- Tool call timing: call tools only when external information is clearly needed
- Result explanation: not just showing data, but providing explanations and suggestions
Part 3: Context Engineering
Core ideas of context engineering
“The context is the agent’s operating system.” - Manus
Definition: Systematically designing and managing an agent’s information environment so it can complete tasks efficiently and reliably.
Why it matters:
- Context determines what the agent can “see” 👁️
- Context determines what the agent can “remember” 🧠
- Context determines what the agent can “do” 💪
Sources of contextual information
1. Documented team knowledge
- Technical docs, API docs, design docs
- Meeting notes, decision records, project plans
- Code comments, commit messages, PR descriptions
2. Structured project information
- Data from task management systems
- Code repository structure and history
- Test cases and test results
3. Dynamic execution environment
- Current system state
- Real-time logs and monitoring data
- Immediate user feedback
Strategy 1: Architecture design around KV cache
Core issue: The agent’s input-output ratio is extremely unbalanced (100:1)
Solution: Maximize KV cache hit rate
Best practices:
- Stable prompt prefix: Keep the system prompt unchanged
1
System: You are a helpful assistant...
- Append-only context: Only append, never modify
1
2messages.push(newMessage)
// Do not modify existing messages - Deterministic serialization: Fixed JSON key ordering
1
2
3
4
5{
"name": "...",
"age": "...",
"city": "..."
} - Cache-friendly tool output: Unified formatting to reduce variation
Strategy 2: Context compression - a last resort
Core principle: If cost and latency allow, avoid compression whenever possible.
Compression is a tradeoff:
- Benefits: Fewer tokens, lower cost
- Costs: Loss of detail, possible impact on task quality
Based on Claude Code practice, compression is only triggered when nearing context limits:
8-part structured compression template:
1 | 1. 背景上下文 (Background Context) |
Choosing when to compress:
- Prefer using SubAgents to control context length
- Next, consider clearing irrelevant history
- Only then use compression algorithms
Strategy 3: TODO List - Intelligent task management system
1 | ## 当前任务状态(实时更新) |
Core value of the TODO system:
- Counteracting forgetting: Keep goals clear in long conversations
- Progress visualization: Both user and agent can see progress
- Priority management: Auto-sorting (in progress > pending > completed)
- Task decomposition: Automatically break complex tasks into executable steps
Intelligent sorting algorithm:
1 | // Status priority: in_progress(0) > pending(1) > completed(2) |
Strategy 4: Errors are learning opportunities
Traditional approach vs correct approach:
❌ Traditional: Hide errors and retry
1 | Action: npm install |
✅ Correct: Preserve full error context
1 | Action: npm install |
Principle: Error messages help the agent update its world model and learn environmental constraints and dependencies.
Strategy 5: 6-layer security protection system
Based on Claude Code’s enterprise-grade practice:
1 | 第1层:输入验证 |
Each layer ensures safe execution for the agent, keeping it stable even in complex environments.
Strategy 6: Dynamic reminder injection (System Reminder)
Automatically inject contextual reminders at critical moments:
1 | <system-reminder> |
Trigger conditions:
- TODO list status changes
- Detection of specific operation patterns
- Repeated error patterns
- Security-related operations
This dynamic injection mechanism gives the agent extra guidance at key moments and helps avoid common mistakes.
Strategy 7: Parallel Sampling - balancing quality and efficiency
Explore multiple independent reasoning paths in parallel, then select or aggregate the best result:
1 | 用户问题 |
Use cases:
- Multi-angle analysis of complex problems
- High-reliability critical decisions
- Exploratory tasks with parallel attempts
Implementation points:
- Use different initial prompts or temperature settings
- Choose a reasonable degree of parallelism (typically 3–5)
- Establish an effective result evaluation mechanism
Strategy 8: Sequential Revision - iterative refinement
Let the agent reflect on and improve its own outputs:
1 | Initial response → Self-evaluation → Identify issues → Generate improvements → Final output |
Example revision prompt:
1 | 请审视你刚才的回答: |
Best practices:
- Limit the number of revision rounds (typically 2–3)
- Provide concrete dimensions for improvement
- Keep revision history for learning purposes
Source: Thinking in Phases: How Inference-Time Scaling Improves LLM Performance
Design philosophy of context engineering: minimal design, maximal capability (Alita)
Core ideas:
Minimal Predefinition:
- Avoid over-design and over-specification
- Keep the system flexible
- Let the agent adapt to task requirements
Maximal Self-Evolution:
- Build the ability to learn from experience
- Dynamically adapt to new task types
- Continuously optimize execution strategies
Practical insights:
- Don’t try to predefine every possible scenario
- Give the agent enough flexibility to explore solutions
- Simple architectures are often more effective than complex systems
- “Simplicity is the ultimate sophistication”
Part 4: Memory and knowledge systems
Three-layer memory architecture: from short-term to long-term
Based on Claude Code practice, a mature agent needs a three-layer memory system:
1 | ┌─────────────────────────────────────────────────────┐ |
User Memory - the key to personalization
Definition: The agent remembers a specific user’s preferences, history, and context
What is stored:
- User preferences: “I prefer concise code” , “Avoid using class components”
- Project context: “We use React 18 + TypeScript” , “The API uses GraphQL”
- Historical decisions: “Last time we chose PostgreSQL over MySQL” , “We use pnpm for dependency management”
Implementation mechanisms:
- Vector database storage
- Similarity search
- Dynamic injection into system prompts
Knowledge Base - crystallized collective intelligence
Definition: Reusable knowledge accumulated across all user–agent interactions
Types of content:
- Solution templates: “How to configure Nginx as a reverse proxy”
- Best practices: “Standard structure for a Python project”
- Problem patterns: “This type of error is usually caused by…”
Knowledge lifecycle:
- Capture: Agent successfully solves a problem
- Refine: Extract generalizable patterns
- Store: Save in structured form
- Reuse: Apply directly to similar problems
Summary: Core points of context engineering
“The context is the agent’s operating system.”
Key insights:
Paradigm shift: from conversation to action
- Chatbot → Agent
- Prompt engineering → Context engineering
Four core agent capabilities
- 🛠️ Tool Use
- 📋 Planning
- 🔧 Error Recovery
- 📚 Learning from experience
Eight practical strategies:
- KV cache optimization - core of architecture design
- Intelligent compression - 8-part structured template
- TODO system - intelligent task management
- Learning from errors - keep full error context
- Security protection - 6-layer protection system
- Dynamic reminders - inject guidance at key moments
- Parallel sampling - balance quality and efficiency
- Sequential revision - iteratively refine outputs
Key Technical Takeaways:
- Intelligent Compression Mechanism: Compress only when necessary, prioritize using SubAgent
- Concurrent Tool Management: Concurrent for read operations, serial for write operations
- Three-layer Memory Architecture: Short-term, mid-term, and long-term memories work together
- Model as Agent: Achieve capability leaps through end-to-end RL training
- Tool Design Principles: Orthogonality, descriptiveness, close to common usage
- Multi-Agent Architecture: Break through the limitations of a single Agent
🚀 The future is agentic. Let’s build it with well-designed context.
Intelligent Injection Mechanism for File Contents
1 | 用户提及文件 → 系统自动检测 |
This intelligent injection mechanism ensures that the Agent can efficiently access relevant file contents while avoiding context overload.
Join Pine AI
We are looking for full‑stack engineers who can build SOTA autonomous AI Agents.
Our belief: everyone’s contribution to the company’s valuation should exceed tens of millions of dollars
Requirements to Join Pine AI
🤖 1. Proficient in AI-assisted programming
- 80%+ of code completed via human–AI collaboration
- Coding interview: complete feature development in 2 hours with AI assistance
- All internal systems are built based on AI
💻 2. Passionate about hands-on problem solving
- “Talk is cheap, show me the code”
- Become a combination of architect and product manager
- Directly command AI to reduce information loss
🏗️ 3. Solid software engineering skills
- Complete documentation and testing
- Make code understandable and maintainable by AI
- High-quality engineering practices
🧠 4. Understanding of LLM fundamentals
- Know basic principles and capability boundaries
- The right ways to harness LLMs
- Provide appropriate context and tools
🚀 5. Confidence to tackle world-class challenges
- Pursue SOTA levels
- Grow together with a startup
- Continuously surpass the current state of the art
🎯 Our Mission
By building Agents that can interact with the world in real time and learn from experience, we truly solve users’ pains and get things done.
Enable users to gradually build trust and ultimately hand over important tasks to Pine.
Pine AI - Building Agents That Get Things Done
1 | mail -s "Join Pine AI" -A /path/to/your_resume.pdf [email protected] |
Meta: How These Slides Were Created
These Slides themselves are a product of human–AI collaboration, precisely an embodiment of context engineering in real applications.
- First draft generated by AI based on the provided reference materials
- Humans provided direction, structure, and key insights
- AI expanded, organized, and polished the content
- Multiple rounds of iteration for continuous refinement
【These Slides were created using Slidev. Original Slidev Markdown】