From Prompt Engineering to Context Engineering: The Secret to Writing Good Agents
[This article is based on a presentation at the Turing Community’s Large Model Technology Learning Camp, Slides Link]
Explore the design philosophy and practical strategies of AI Agents in depth. From the conversational mode of Chatbots to the action mode of Agents, systematically design and manage the information environment of Agents to build efficient and reliable AI Agent systems.
Table of Contents
- Part 1: Paradigm Shift - From Chatbot to Agent
- Part 2: Core Analysis of Agents
- Part 3: Context Engineering
- Part 4: Memory and Knowledge Systems
Part 1: Paradigm Shift - From Chatbot to Agent
From Chatbot to Agent: A Fundamental Paradigm Shift
We are experiencing a fundamental shift in AI interaction modes:
Chatbot Era
- 🗣️ Conversational Interaction: User asks → AI answers → Repetitive Q&A cycle
- 📚 Knowledgeable Advisor: Can only “speak” but not “act,” passively responding to user needs
- 🛠️ Typical Products: ChatGPT, Claude Chat
Agent Era
- 🎯 Autonomous Action Mode: User sets goals → Agent executes → Autonomous planning and decision-making
- 💪 Capable Assistant: Can both “think” and “act,” proactively discovering and solving problems
- 🚀 Typical Products: Claude Code, Cursor, Manus
Core Techniques of Chatbots: Prompt Engineering
Prompt Engineering focuses on optimizing single conversations with LLMs:
🎭 Role Setting: Set a professional background for AI
1
You are a seasoned architect...
📝 Clear Instructions: Specify output requirements
1
Summarize in 200 words...
📖 Few-Shot Learning: Provide example guidance
1
2Example 1: Input → Output
Example 2: Input → Output
💡 Essence: These techniques optimize the quality of conversation, not the effectiveness of actions
Source: OpenAI, Prompt Engineering Guide
Fundamental Limitations of Chatbots
❌ Cannot Act
- Cannot execute code
- Cannot access web pages
- Cannot manipulate files
🔄 Stateless
- Each conversation is relatively independent
- Difficult to handle long-term tasks
- Lacks contextual continuity
🧠 No Memory
- Cannot accumulate experience
- Always “starts from scratch”
- Cannot learn and improve
⏳ Passive Waiting
- Can only respond to user input
- Cannot proactively discover problems
- Lacks autonomy
Part 2: Core Analysis of Agents
What is an AI Agent?
Definition: An AI system capable of perceiving the environment, making autonomous decisions, taking actions, and accumulating experience.
Core Capabilities:
- 🛠️ Tool Use: Interact with the external world through APIs, command lines, etc.
- 📋 Task Planning: Break down complex goals into executable steps
- 🔧 Error Recovery: Adjust strategies when encountering problems
- 📚 Experience Learning: Accumulate knowledge from successes and failures
Source: Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models”
Core Loop of Agents: ReAct Framework
1 | 用户目标: "帮我分析这个CSV文件的销售趋势" |
Observe → Think → Act → Observe - This loop allows Agents to continuously adapt and solve problems.
Tool Invocation Example: Web Search
Let’s understand how an Agent invokes tools through a specific example:
1 | // 用户请求 |
Core Implementation of the Agent Loop
Based on an in-depth analysis of Claude Code, the main loop of a production-grade Agent includes the following key stages:
1 | Agent Loop 执行流程图 |
Six-Stage Pipeline for Tool Execution
A production-grade Agent must have a strict tool execution process to ensure safety and reliability:
1 | 阶段1: 工具发现与验证 |
Concurrent Tool Management: The Key to Smart Scheduling
Core Issue: How to safely and efficiently execute multiple tools concurrently?
1 | // 工具的并发安全分类 |
Key Design Principles:
- ✅ Read operations can be safely concurrent
- ⚠️ Write operations must be executed serially
- ⚡ Dynamic scheduling to optimize performance
The Future of Tool Systems: From “Using Tools” to “Creating Tools”
Current Status: The toolset of Agents is predefined by developers and is static.
Future: Agents can autonomously evolve their toolsets.
Technical Path:
- MCP (Model Context Protocol): A standardized context protocol to simplify tool development
- Automatic Tool Generation: When Agents encounter new problems, they can:
- Search Open Source Code: Look for relevant implementations on platforms like GitHub
- Automatic Wrapping: Wrap the found code into MCP tools
- Test and Verify: Test new tools in a sandbox environment
- Persist and Reuse: Add effective tools to the tool library for future use
Significance: This will transform Agents from passive tool users to proactive tool creators, greatly expanding their capability boundaries.
MCP (Model Context Protocol) - Standardization of Tool Development
What is MCP?
Model Context Protocol is an open protocol launched by Anthropic to standardize the connection between LLMs and external data sources and tools.
Core Features:
- Standardized Interface - Unified tool definition format
- Language Agnostic - Supports Python, JS, Go, etc.
- Plug and Play - Simple server/client model
- Secure Isolation - Built-in permissions and sandbox mechanisms
MCP Tool Example:
1 | // MCP 服务器定义 |
Mainstream Model Tool Invocation
Revolutionary Design of Claude Sonnet 4
Key Innovation: Special Token Instead of XML
Using special token sequences to mark tool invocation, offering advantages:
- No Escaping Needed: Code, JSON, special characters can be directly passed
- What Problem It Solves:
1
2
3
4
5# 传统 XML/JSON 方式需要转义
{"code": "print(\"Hello, World!\")"} # 引号需要转义
# Claude 的 special token 方式
<tool_use>print("Hello, World!")</tool_use> # 直接传递 - Why It’s Important: Avoids numerous escaping errors when handling code generation and complex data structures
Source: Anthropic Docs, Tool Use
Gemini 2.5 Pro: Native Multimodal Integration
Core Advantages:
- Native Google Search: Can use
Google Search
as a built-in tool directly. The model can autonomously determine when real-time information is needed and automatically perform searches, returning verifiable results with source references. - Multimodal Input: Can directly take images, short videos, audio, and even large PDF documents as input, with the model natively understanding these contents
- Long Document Processing: Can directly process documents up to 2M tokens, including PDF, Word formats
Unique Capability Example:
1 | from google import genai |
Revolutionary Capability of Claude: Computer Use
Definition: This is not a simple tool but grants the Agent the ability to directly operate a complete, isolated Linux virtual computer environment.
Core Capabilities:
- File Operations:
read_file('data.csv')
,write_file('analysis.py', 'import pandas as pd...')
- Command Execution:
run_command('python analysis.py')
,run_command('ls -l | grep .csv')
- GUI Operations: Screenshot, click, input text, operate applications
- Complete Environment: Possesses file system, network, process management, and other complete operating system capabilities
Workflow:
Provide Tools and Prompts
- Add computer use tools to API requests
- User prompt: “Save a picture of a cat to the desktop”
Claude Decides to Use Tools
- Evaluate the need for desktop interaction
- Construct tool invocation request
Execute and Return Results
- Extract tool input (screenshot/click/type)
- Perform operations in a virtual machine, return execution results
Loop Until Completion
- Claude analyzes results
- Continue tool invocation or complete the task
Supported Operations:
- 📸 Screenshot capture
- 🖱️ Mouse control (click, drag)
- ⌨️ Keyboard input
- 🖥️ Command execution
Computing Environment Requirements:
- Sandbox Isolation: Dedicated virtual machine or container, secure boundary protection
- Virtual Display: Recommended resolutions 1024x768, 1280x720, or 1920x1080 (must strictly adhere, otherwise mouse click coordinates may be inaccurate)
- Operation Processor: Screenshot capture, mouse action simulation, keyboard input processing
Agent Loop Implementation Example:
1 | while True: |
Security Considerations:
- Manual confirmation for sensitive operations
- Operation log audit
- Prompt injection protection
Revolutionary Significance:
- Elevates the Agent from an “API caller” to a “programmer” and “data analyst”
- Capable of completing end-to-end complex tasks previously unimaginable
- Truly realizes the vision of “giving AI a computer to solve problems on its own”
Source: Anthropic Docs, Computer Use
Evolution Direction of Agent: Independent Virtual Operating System
Current Limitations: Current Agents are mostly confined to a browser tab or a temporary Shell instance, which is “single-threaded” and unable to handle complex tasks requiring persistent state and multi-application collaboration.
Evolution Direction: Each Agent should have its own dedicated, isolated, persistent virtual operating system environment.
In this environment, it can:
- Handle Multiple Tasks in Parallel: Like humans using a computer, open and operate multiple application windows simultaneously (Terminal, Browser, IDE, File Manager)
- Freely Switch Contexts: Seamlessly switch between different task windows, managing each task’s independent state and history
- Have a Persistent File System: Save work results, manage project files, install new software
- Network Communication Capability: Interact with other Agents or services
This is the inevitable direction for Agents to evolve towards more powerful capabilities.
Model as Agent: Achieving Capability Leap through RL
Based on Kimi-Researcher’s practice, the real breakthrough in Agent capabilities comes from end-to-end reinforcement learning training:
Limitations of Traditional Methods:
- Workflow Systems: Rely on manually designed multi-agent collaboration, difficult to adapt to environmental changes
- Imitation Learning (SFT): Limited by human demonstration data, difficult to handle long-term tasks
Advantages of End-to-End RL:
- Overall Optimization: Planning, perception, tool usage, and other capabilities are learned together
- Adaptability: Naturally adapts to changes in tools and environment
- Exploratory Learning: Discover optimal strategies through extensive exploration
Practical Results: Kimi-Researcher improved accuracy from 8.6% to 26.9% through RL training, averaging 23 inference steps, exploring over 200 URLs.
Source: Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities
Core Principles of Tool Design
💡 Core Concept: Allow the model to leverage pre-trained knowledge, reducing the difficulty of learning prompts
1️⃣ Orthogonality
- Each tool solves an independent problem, avoiding functional overlap
- ✅ Good examples:
search_web
search_knowledge_base
- Clearly distinguish search scopestext_browser
visual_browser
- Clearly distinguish processing methods
- ❌ Bad examples:
search_google
search_milvus
- Expose underlying implementationbrowse_url
browse_page
- Unclear functional overlap
2️⃣ Descriptive Naming
- Names directly reflect functionality, leveraging the model’s language understanding ability
- ✅ Good examples:
web_search_agent
phone_call_agent
- Clearly indicate functionalityexecute_code
read_file
- Verb+noun structure
- ❌ Bad examples:
daisy
ivy
- Abstract codes, unable to understand functionalitysw
exec
- Overly abbreviated
- Tool descriptions need to be clear and unambiguous
- Describe tool limitations (e.g., phone can only call domestic numbers)
3️⃣ Familiar Patterns
- Parameters align with developer intuition
- File operations resemble Unix commands
- Search resembles Google API
4️⃣ Allow Agent to Handle Errors Autonomously
- Return detailed error information when tool invocation fails, allowing the Agent to decide the next step
- Utilize SOTA LLM’s programming capabilities
- ❌ Bad example:
Phone Call Failed
- ✅ Good example:
Phone Call Failed: failed to initiate call: phone number format error
These principles enable the model to leverage knowledge learned during pre-training, reducing the cost of learning new tools.
Multi-Agent Architecture: Breaking Through the Limitations of a Single Agent
Based on Anthropic Research Feature’s practice:
Why Multi-Agent?
Limitations of a Single Agent:
- 💰 High context window cost
- ⏱️ Inefficient sequential execution
- 📉 Decline in long context quality (“lost in the middle” problem)
- 🧠 Unable to explore in parallel
Advantages of Multi-Agent:
- ✅ Parallel processing of multiple subtasks
- ✅ Independent context windows
- ✅ Overall performance improvement
- ✅ Suitable for open-ended research tasks
Architecture Diagram:
1 | 用户查询 |
Key Design:
Compression is Essential
- Each SubAgent explores in an independent context
- Extracts the most important tokens to return
- Lead Agent synthesizes multi-source information
Mind Token Usage, Don’t Blow Up the Credit Card
- Single chat: 1×
- Single Agent: 4×
- Multi-Agent: 15×
- Value must match cost
Applicable Scenarios
- ✅ Information gathering tasks
- ✅ Multi-angle analysis needs
- ✅ Tasks easily parallelized
Source: Anthropic, How we built our multi-agent research system
SubAgent Architecture: Key Strategies for Controlling Context Length
As task complexity increases, the main Agent’s context rapidly expands, leading to:
- Increased Cost: Each token incurs a fee
- Increased Latency: Longer context means slower response
- Quality Decline: Overly long context leads to “lost in the middle” problem
SubAgent provides an elegant solution:
1 | 主 Agent |
Core Value of SubAgent:
- Context Isolation: Each SubAgent has its own clean context
- Cost Control: Avoid redundant transmission of irrelevant historical information
- Parallel Capability: Multiple SubAgents can handle different tasks simultaneously
- Result Aggregation: Only return key results to the main Agent
Typical Application Scenarios:
- Analyzing code from different modules
- Parallel processing of multiple independent tasks
- Performing specialized tasks requiring extensive context
In-Depth Agent Dialogue Flow: Making Thought Processes Explicit
1 | user: "帮我查一下今天旧金山的天气,然后根据温度推荐我穿什么。" |
Key Implementation Details:
- Explicit Thought Process: Make the Agent’s reasoning process visible and debuggable
- Tool Invocation Timing: Invoke only when external information is clearly needed
- Result Interpretation: Not just displaying data, but providing explanations and suggestions
Part 3: Context Engineering
Core Concepts of Context Engineering
“The context is the agent’s operating system.” - Manus
Definition: Systematically designing and managing the Agent’s information environment to enable efficient and reliable task completion.
Why Important:
- Context determines what the Agent can “see” 👁️
- Context determines what the Agent can “remember” 🧠
- Context determines what the Agent can “do” 💪
Sources of Context Information
1. Documented Team Knowledge
- Technical documentation, API documentation, design documents
- Meeting notes, decision records, project plans
- Code comments, commit messages, PR descriptions
2. Structured Project Information
- Data from task management systems
- Structure and history of code repositories
- Test cases and test results
3. Dynamic Execution Environment
- Current system status
- Real-time logs and monitoring data
- Instant user feedback
Strategy One: Architecture Design Around KV Cache
Core Issue: The input-output ratio of the Agent is extremely unbalanced (100:1)
Solution: Maximize KV cache hit rate
Best Practices:
- Stable Prompt Prefix: Keep system prompts unchanged
1
System: You are a helpful assistant...
- Append-only Context: Only append, do not modify
1
2messages.push(newMessage)
// Do not modify existing messages - Deterministic Serialization: Fixed order of JSON keys
1
2
3
4
5{
"name": "...",
"age": "...",
"city": "..."
} - Cache-friendly Tool Output: Uniform format, reduce changes
Strategy Two: Context Compression - A Last Resort
Core Principle: Avoid compression unless necessary due to cost and latency constraints.
Compression is a trade-off:
- Benefits: Reduces token usage, lowers costs
- Costs: Loss of information detail, may affect task quality
Based on Claude Code’s practice, compression is only triggered when approaching context limits:
8-Section Structured Compression Template:
1 | 1. 背景上下文 (Background Context) |
Timing for Compression:
- Prefer using SubAgent to control context length
- Next, consider cleaning irrelevant historical information
- Finally, use compression algorithms
Strategy Three: TODO List - Intelligent Task Management System
1 | ## 当前任务状态(实时更新) |
Core Value of the TODO System:
- Combat Forgetfulness: Maintain clear goals in long conversations
- Progress Visualization: Both user and Agent can see progress
- Priority Management: Automatic sorting (in progress > pending > completed)
- Subtask Decomposition: Automatically break down complex tasks into executable steps
Intelligent Sorting Algorithm:
1 | // Status priority: in_progress(0) > pending(1) > completed(2) |
Strategy Four: Errors as Learning Opportunities
Traditional Approach vs Correct Approach:
❌ Traditional Approach: Hide errors, retry
1 | Action: npm install |
✅ Correct Approach: Retain complete error context
1 | Action: npm install |
Principle: Error information helps the Agent update its world model, learning environmental constraints and dependencies.
Strategy Five: Six-layer Security Protection System
Enterprise-level practice based on Claude Code:
1 | 第1层:输入验证 |
Each layer provides security assurance for the Agent’s safe execution, ensuring stable operation even in complex environments.
Strategy Six: Dynamic Reminder Injection (System Reminder)
Automatically inject context reminders at critical moments:
1 | <system-reminder> |
Trigger Conditions:
- Changes in TODO list status
- Detection of specific operation patterns
- Repeated occurrence of error patterns
- Security-related operations
This dynamic injection mechanism provides the Agent with additional guidance at critical moments, avoiding common mistakes.
Strategy Seven: Parallel Sampling - Balancing Quality and Efficiency
Explore multiple independent inference paths in parallel, then select or synthesize the best result:
1 | 用户问题 |
Application Scenarios:
- Multi-angle analysis of complex problems
- Critical decisions requiring high reliability
- Parallel attempts for exploratory tasks
Implementation Points:
- Use different initial prompts or temperature parameters
- Set reasonable parallelism (usually 3-5)
- Establish an effective result evaluation mechanism
Strategy Eight: Sequential Revision - Iterative Optimization
Allow the Agent to reflect and improve on its output:
1 | Initial Response → Self-assessment → Identify Issues → Generate Improvements → Final Output |
Revision Prompt Example:
1 | 请审视你刚才的回答: |
Best Practices:
- Limit revision rounds (usually 2-3 rounds)
- Provide specific improvement dimensions
- Retain revision history for learning
Source: Thinking in Phases: How Inference-Time Scaling Improves LLM Performance
Design Philosophy of Context Engineering: Minimal Design, Maximum Capability (Alita)
Core Concepts:
Minimal Predefinition:
- Avoid over-design and presets
- Maintain system flexibility
- Allow the Agent to adapt based on task needs
Maximal Self-Evolution:
- Ability to learn and accumulate experience
- Dynamically adapt to new task types
- Continuously optimize execution strategies
Practical Insights:
- Do not attempt to predefine all possible scenarios
- Give the Agent enough flexibility to explore solutions
- Simple architectures are often more effective than complex systems
- “Simplicity is the ultimate sophistication”
Part 4: Memory and Knowledge Systems
Three-layer Memory Architecture: From Short-term to Long-term
Based on Claude Code’s practice, a mature Agent requires a three-layer memory system:
1 | ┌─────────────────────────────────────────────────────┐ |
User Memory - The Key to Personalization
Definition: The Agent remembers specific user preferences, history, and context
Stored Content:
- User Preferences: “I prefer concise code style”, “Avoid using class components”
- Project Context: “We use React 18 + TypeScript”, “API uses GraphQL”
- Historical Decisions: “Last time we chose PostgreSQL over MySQL”, “Use pnpm for dependency management”
Implementation Mechanism:
- Vector database storage
- Similarity retrieval
- Dynamic injection into system prompts
Knowledge Base - The Crystallization of Collective Wisdom
Definition: Reusable knowledge accumulated from interactions between all users and the Agent
Content Types:
- Solution Templates: “How to configure Nginx reverse proxy”
- Best Practices: “Standard structure for Python projects”
- Problem Patterns: “This type of error is usually because…”
Knowledge Lifecycle:
- Capture: Agent successfully solves a problem
- Refinement: Extract general patterns
- Storage: Structurally save
- Reuse: Directly apply to similar problems
Summary: Core Points of Context Engineering
“The context is the agent’s operating system.”
Core Insights:
Paradigm Shift: From Conversation to Action
- Chatbot → Agent
- Prompt Engineering → Context Engineering
Four Major Capabilities of Agents
- 🛠️ Tool Use
- 📋 Task Planning
- 🔧 Error Recovery
- 📚 Learning from Experience
Eight Practice Strategies:
- KV Cache Optimization - Core of architecture design
- Intelligent Compression - 8-section structured template
- TODO System - Intelligent task management
- Error Learning - Retain complete error context
- Security Protection - Six-layer protection system
- Dynamic Reminders - Injection at critical moments
- Parallel Sampling - Balancing quality and efficiency
- Sequential Revision - Iterative output optimization
Technical Summary:
- Intelligent Compression Mechanism: Compress only when necessary, prioritize using SubAgent
- Concurrent Tool Management: Concurrent read operations, serial write operations
- Three-layer Memory Architecture: Short-term, mid-term, and long-term memory work together
- Model as Agent: Achieve capability leap through end-to-end RL training
- Tool Design Principles: Orthogonality, descriptiveness, close to common usage
- Multi-Agent Architecture: Overcome the limitations of a single Agent
🚀 The future is agentic. Let’s build it with well-designed context.
Intelligent Injection Mechanism for File Content
1 | 用户提及文件 → 系统自动检测 |
This intelligent injection mechanism ensures that the Agent can efficiently access relevant file content while avoiding context overload.
Join Pine AI
We are looking for full-stack engineers capable of building SOTA autonomous AI Agents.
Our philosophy: Everyone’s contribution to the company’s valuation should be over ten million dollars
Requirements to Join Pine AI
🤖 1. Proficient in AI Programming
- 80%+ of the code is completed through human-machine collaboration
- Code interview: Complete feature development in 2 hours with AI assistance
- All internal systems are built on AI
💻 2. Passionate about Hands-on Problem Solving
- “Talk is cheap, show me the code”
- Become a combination of architect and product manager
- Directly command AI to reduce information loss
🏗️ 3. Solid Software Engineering Skills
- Comprehensive documentation and testing
- Enable AI to understand and maintain code
- High-quality engineering practices
🧠 4. Understanding of LLM Principles
- Understand basic principles and capability boundaries
- Master the correct methods to harness LLM
- Provide appropriate context and tools
🚀 5. Confidence in Solving World-Class Problems
- Pursue SOTA levels
- Grow with the startup
- Continuously surpass existing levels
🎯 Our Mission
By building Agents that can interact with the world in real-time and learn from experience, we truly solve problems for users and get things done.
Gradually build user trust, ultimately entrusting important tasks to Pine.
Pine AI - Building Agents That Get Things Done
1 | mail -s "Join Pine AI" -A /path/to/your_resume.pdf boj@19pine.ai |
Meta Information: The Creation Process of This Slides
This Slides is itself a product of human-machine collaboration, embodying context engineering in practical application.
- The draft was generated by AI based on provided reference materials
- Humans provided direction, structure, and key insights
- AI was responsible for expansion, organization, and refinement
- Multiple iterations for continuous improvement
【This Slides was created using Slidev. Original Slidev Markdown】