【This article is compiled from the first livestream of the Turing Community AI Agent Bootcamp, Slides link

Turing Community “AI Agent Bootcamp” purchase link

Start here to develop an AI Agent of your own. This article not only systematically introduces the foundational technical path to building a general-purpose AI Agent from scratch (such as context engineering, RAG systems, tool use, multimodal interaction, etc.), but also covers advanced techniques like fast/slow thinking and multi-Agent collaboration. Through 9 weeks of hands-on projects, you will gradually master the full lifecycle of Agent development and key advanced capabilities.

This course had its first livestream preview on August 18 and will officially start on September 11. Each weekly session is about 2 hours and covers all the foundational and advanced content below. Of course, just 2 hours of lectures per week won’t be enough—you’ll also need to spend time coding and practicing.

Bootcamp Core Objectives

Start here to build an AI Agent of your own

🎯 Master core architecture and engineering skills

  • Deeply understand Agent architecture: Systematically master the core design paradigm of LLM + context + tools.
  • Excel at context engineering: Master multi-level context management techniques, from conversation history and user long-term memory to external knowledge bases (RAG) and file systems.
  • Master dynamic tool use: Reliably integrate Agents with external APIs and MCP servers, and enable self-improvement via code generation.
  • Build advanced Agent patterns: Design and implement complex collaboration patterns such as slow/fast thinking (Mixture-of-Thoughts) and Orchestration.

💡 Build a systematic understanding of development and deployment

  • Understand the evolution path: See the progression from basic RAG to Agents that can autonomously develop tools.
  • Master the Agent lifecycle: Be able to independently complete the closed loop of Agent project design, development, evaluation with LLM as a Judge, and deployment.
  • Build domain knowledge: Accumulate cross-domain Agent development experience through hands-on projects in law, academia, programming, and more.
  • Consolidate a knowledge system: Co-create the book “AI Agents, Explained” to systematize fragmented knowledge.

9-Week Hands-on Plan Overview

Week Topic Content Overview Hands-on Case
1 Agent Basics Agent structure and taxonomy; workflow-based vs autonomous Build an Agent that can search the web
2 Context Design Prompt templates; conversation history; user long-term memory Add persona and long-term memory to your Agent
3 RAG and Knowledge Bases Document structuring; retrieval strategies; incremental updates Build a legal Q&A Agent
4 Tool Use and MCP Tool wrapping and MCP integration; external API calls Connect to an MCP server to build a deep-research Agent
5 Programming and Code Execution Codebase understanding; reliable code edits; consistent execution environments Build an Agent that can develop Agents
6 Model Evaluation and Selection Model capability evaluation; LLM as a Judge; safety guardrails Build a benchmark and auto-evaluate Agents with LLM as a Judge
7 Multimodal and Real-time Interaction Real-time voice Agents; operating PCs and phones Implement a voice-call Agent & integrate browser-use for computer control
8 Multi-Agent Collaboration A2A communication protocol; Agent team roles and collaboration Design a multi-Agent system to “make calls while operating a computer”
9 Project Integration and Demo Final assembly and demo; polishing the final deliverable Showcase your unique general-purpose Agent

9-Week Advanced Topics

Week Topic Advanced Content Overview Advanced Hands-on Case
1 Agent Basics The importance of context Explore the impact of missing context on Agent behavior
2 Context Design Organizing user memory Build a personal knowledge management Agent for long-text summarization
3 RAG and Knowledge Bases Long-context compression Build an academic paper analysis Agent to summarize core contributions
4 Tool Use and MCP Learning from experience Enhance the deep-research Agent’s expertise (sub-agents and domain experience)
5 Programming and Code Execution Agent self-evolution Build an Agent that autonomously leverages open-source software to solve unknown problems
6 Model Evaluation and Selection Parallel sampling and sequential revision Add parallelism and revision to the deep-research Agent
7 Multimodal and Real-time Interaction Combining fast and slow thinking Implement a real-time voice Agent that combines fast and slow thinking
8 Multi-Agent Collaboration Orchestration Agent Use an Orchestration Agent to dynamically coordinate calling and computer operation
9 Project Integration and Demo Comparing Agent learning methods Compare four ways Agents learn from experience

AI Agent Bootcamp OverviewAI Agent Bootcamp Overview

Week 1: Agent Basics

Core Content

Agent structure and taxonomy

Workflow-based

  • Predefined flows and decision points
  • Highly deterministic; suitable for automating simple business processes

Autonomous

  • Dynamic planning and self-correction
  • Highly adaptable; suitable for open-ended research and exploration, solving complex problems

Basic frameworks and scenario fit

ReAct framework: Observe → Think → Act

Agent = LLM + context + tools

  • LLM: decision core (the brain)
  • Context: perceives the environment (eyes and ears)
  • Tools: interact with the world (hands)

Hands-on Case: Build an Agent that can search the web

Goal: Build a basic autonomous Agent that can understand user questions, fetch information via a search engine, and produce a summarized answer.

Core challenges:

  • Task decomposition: Break complex questions into searchable keywords
  • Tool definition: Define and implement a web_search tool
  • Result synthesis: Understand search results and synthesize the final answer

Architecture design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌──────────────┐
│ 用户问题 │
└──────┬───────┘


┌──────────────┐ 需要搜索
│ LLM 思考 ├──────────────┐
└──────┬───────┘ │
▲ ▼
│ ┌────────────────┐
│ │调用 web_search │
│ │ 工具 │
│ └────────┬───────┘
│ │
│ ▼
│ ┌────────────────┐
│ │ 搜索引擎 API │
│ └────────┬───────┘
│ │
│ ▼
│ ┌────────────────┐
└──────────────┤ 返回搜索结果 │
└────────────────┘

信息充足 ▼
┌────────────────┐
│ 生成最终答案 │
└────────────────┘

Advanced: The importance of context

Core idea: The context is the agent's operating system. Context is the only basis for an Agent to perceive the world, make decisions, and record history.

Thinking

  • The Agent’s inner monologue and chain of thought
  • If missing: Turns the Agent’s behavior into a black box, making it impossible to debug or understand its decisions

Tool Call

  • The actions the Agent decides to take, recording its intent
  • If missing: You can’t trace the Agent’s action history, making retrospectives difficult

Tool Result

  • Environmental feedback from actions taken
  • If missing: The Agent can’t perceive the consequences of its actions, potentially causing infinite retries or flawed plans

Advanced Practice: Exploring the impact of missing context on Agent behavior

Goal: Through experiments, understand the indispensable roles of thinking, tool call, and tool result in the Agent workflow.

Core challenges:

  • Modify the Agent framework: Change the Agent’s core loop to selectively remove specific parts from the context
  • Design comparative experiments: Create tasks where Agents missing different context parts exhibit clear behavioral differences or failures
  • Behavior analysis: Analyze and summarize what types of failures are caused by each missing context component

Experiment design:

1
2
3
4
5
6
7
8
9
10
11
┌─────────┐     ┌─────────────────────┐
│ 任务 ├────►│ 完整上下文 Agent ├──► 成功
└────┬────┘ └─────────────────────┘

├─────────►┌─────────────────────┐
│ │ 无 Tool Call Agent ├──► 行为异常/难以理解
│ └─────────────────────┘

└─────────►┌─────────────────────┐
│ 无 Tool Result Agent├──► 无限重试/错误规划
└─────────────────────┘

Week 2: Context Design (Context Engineering)

Core Content

Prompt templates

  • System prompt: Define the Agent’s persona, capability boundaries, and behavioral guidelines
  • Toolset: Names, descriptions, and parameters of tools

Conversation history and user memory

  • Event sequence: Model conversation history as an alternating sequence of “observations” and “actions”
  • User long-term memory: Extract key information from conversations (e.g., preferences, personal info) and store it in structured form for future interactions

Hands-on Case: Add persona and long-term memory to your Agent

Goal: Enhance the Agent’s personalization and continuity. The Agent should mimic a specific character’s speaking style (e.g., an anime character) and remember the user’s key information (e.g., name, interests) to apply in subsequent conversations.

Core challenges:

  • Role-playing: How to clearly define the character’s linguistic style and personality in the prompt and keep the Agent stably in character
  • Memory extraction and storage: How to accurately extract key information from unstructured dialogue and store it as a structured JSON object
  • Memory application: How to naturally incorporate the stored user-memory JSON into subsequent prompts so the Agent truly appears to “remember” the user

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
┌─────────────┐
│ 用户输入 │
└──────┬──────┘


┌─────────────────────────────────────┐
│ LLM 思考 │
│ ┌──────────────────────────────┐ │
│ │ 上下文构建 │ │
│ ├──────────────────────────────┤ │
│ │ • 角色设定 Prompt │ │
│ │ • 对话历史 │ │
│ │ • 用户记忆 JSON │ │
│ └──────────────────────────────┘ │
└─────────────┬───────────────────────┘


┌─────────────────────────────────────┐
│ 生成角色化回复 │
└─────────────┬───────────────────────┘

│ 提取关键信息

┌─────────────────────────────────────┐
│ 更新用户记忆 JSON │
└─────────────────────────────────────┘

Advanced Content: Organizing User Memory

Core Idea: Naively stitching memories together leads to context bloat, information conflicts, and staleness. An advanced memory system should continuously organize, deduplicate, correct, and summarize the user’s long-term memories in the background, forming a dynamically evolving user profile.

Implementation Strategies:

  • Memory deduplication and merging: Identify and merge memory entries that are similar or duplicates
  • Conflict resolution: When new memories conflict with old ones (e.g., the user changed preferences), prefer the latest information
  • Periodic summarization: Periodically or during idle time, use an LLM to summarize scattered memory points and distill higher-level user preferences and traits

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
┌───────────────────────────┐
│ 新对话的完整对话历史 │
└────────────┬──────────────┘


┌───────────────────────────┐
│ 记忆整理 Agent │
│ ┌─────────────────────┐ │
│ │ 整理流程 │ │
│ ├─────────────────────┤ │
│ │ • 识别冲突/过时信息 │ │
│ │ • 合并/更新记忆 │ │
│ └─────────────────────┘ │
└───────────────────────────┘

Advanced Practice: Summarize Your Diary into a Personal Report

Goal: Build an Agent that can process large volumes of personal text (e.g., daily diaries, blog posts) and, by reading and organizing these texts, ultimately produce a thorough, clear personal summary report.

Key Challenges:

  • Long-text processing: How to handle diaries/articles whose total size may exceed the LLM’s context window
  • Information extraction and structuring: How to extract structured information points from narrative text (e.g., key events, emotional changes, personal growth)
  • Coherent summary generation: How to organize scattered information points into a logically coherent, highly readable summary report

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
┌─────────────────────┐
│ 批量日记/文章 │
└──────────┬──────────┘


┌─────────────────────┐
│ 分篇读取 │
└──────────┬──────────┘


┌─────────────────────┐
│ 信息提取 Agent │
└──────────┬──────────┘


┌─────────────────────┐
│ 结构化记忆库 │
└──────────┬──────────┘

┌──────────┴──────────┐
│ │
│ 用户指令: '生成总结' │
│ │
└──────────┬──────────┘


┌─────────────────────┐
│ 报告生成 Agent │
│ (读取全部结构化记忆) │
└──────────┬──────────┘


┌─────────────────────┐
│ 生成个人总结报告 │
└─────────────────────┘

Week 3: RAG Systems and Knowledge Bases

Core Content

Document Structuring and Retrieval Strategies

  • Chunking: Split long documents into meaningful semantic chunks
  • Embedding: Vectorize text chunks for similarity search
  • Hybrid retrieval: Combine vector similarity and keyword search to improve recall and precision
  • Re-ranking: Use more sophisticated models to re-rank the initial retrieval results

Basic RAG

  • Knowledge expression: Use clear, structured natural language to express knowledge
  • Knowledge base construction: Process documents and load them into a vector database
  • Precise retrieval: Accurately locate relevant entries in the knowledge base based on the user’s question

Goal: Make the Agent a professional legal consultant. We’ll build a knowledge base using public Chinese Criminal/Civil Law datasets so the Agent can accurately answer users’ legal questions and explicitly cite the specific statutes on which the answers are based.

Key Challenges:

  • Domain data processing: How to parse and clean structured legal provisions and optimize their retrieval performance in a RAG system
  • Answer accuracy and traceability: The Agent’s answers must be strictly grounded in the knowledge base, avoid improvisation, and must provide statute sources
  • Handling ambiguous queries: How to guide users to pose more precise questions to match the most relevant legal provisions

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
┌──────────────────┐
│ 下载法律数据集 │
└────────┬─────────┘


┌──────────────────┐
│ 数据清洗与分块 │
└────────┬─────────┘


┌──────────────────┐
│ 构建向量知识库 │
└────────┬─────────┘

│ ┌──────────────────┐
│ │ 用户法律问题 │
│ └────────┬─────────┘
│ │
│ ▼
│ ┌──────────────────┐
└─►│ LLM + RAG Agent │◄──┐
└────────┬─────────┘ │
│ │
│ 检索 │
▼ │
┌──────────────────┐ │
│ 返回相关法条 ├───┘
└──────────────────┘


┌──────────────────┐
│生成答案并引用法条 │
└──────────────────┘

Advanced Content: Treat the File System as the Ultimate Context

Core Idea: Treat the file system as the ultimate context. An Agent should not stuff massive observations (e.g., web pages, file contents) directly into the context; that causes high costs, degraded performance, and window limits. The right approach is to store these large data in files and keep only a lightweight “pointer” (a summary and the file path) in the context.

Implementation Strategies:

  • Recoverable compression: When a tool returns a large amount of content (e.g., read_file), first save it completely to the sandbox file system
  • Summary and pointer: Append only the content’s summary and file path to the main context
  • On-demand I/O: Through the read_file tool, the Agent can read full content from the file system on demand in later steps

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
正确做法 ✅
┌─────────────────────────────────┐
│ Context (Remains compact) │
├─────────────────────────────────┤
│ • Instruction │
│ • Action 1: readFile('doc_x') │
│ • Observation 1 │
│ (summary: ..., path: 'doc_x')│
│ • ... │
└────────────┬────────────────────┘

│ points to

┌─────────────────────────────────┐
│ File System │
├─────────────────────────────────┤
│ doc_x (Full file content) │
└─────────────────────────────────┘

Advanced Practice: Build an Agent That Can Read Multiple Papers

Goal: Train a research Agent that can read a target paper and all its references (often dozens of PDFs), and, on that basis, summarize the focal paper’s core contributions and innovations relative to its references.

Key Challenges:

  • Handling many PDFs: How to efficiently parse dozens of PDF papers and extract key information (abstracts, conclusions, methodology)
  • Cross-document relational analysis: The core challenge is to build links between the main paper and multiple references for comparative analysis, rather than merely summarizing a single paper
  • Contribution extraction: How to precisely distill the paper’s “incremental contributions” from complex academic discourse

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
┌─────────────────────┐
│ 指定主论文 │
└──────────┬──────────┘


┌─────────────────────────────┐
│ 解析并提取参考文献列表 │
└──────────┬──────────────────┘


┌─────────────────────────────┐
│ 并行下载所有参考文献 PDF │
└──────────┬──────────────────┘


┌─────────────────────────────┐
│ 长上下文处理 Agent │
└──────────┬──────────────────┘

│ 索引所有论文

┌─────────────────────────────┐
│ 文件系统知识库 │
└──────────┬──────────────────┘

┌──────────┴──────────────────┐
│ 用户提问: "总结贡献" │
└──────────┬──────────────────┘


┌─────────────────────────────┐
│ 分析 Agent │
│ (查询/读取论文) │
└──────────┬──────────────────┘


┌─────────────────────────────┐
│ 生成贡献总结报告 │
└─────────────────────────────┘

Week 4: Tool Use and MCP

Core Content

Multiple Ways to Wrap Tools

  • Function Calling: Expose local code functions directly to the Agent
  • API integration: Call external HTTP APIs to fetch real-time data or perform remote operations
  • Agent as a Tool: Wrap a specialized Agent (e.g., a code-generation Agent) as a tool callable by another Agent

MCP (Model Context Protocol)

  • Standardized interface: Provide a unified, language-agnostic connection standard for models and external tools/data sources
  • Plug-and-play: Developers can publish tools conforming to the MCP spec, and Agents can discover and use them dynamically
  • Security and isolation: Built-in permissions and sandboxing to ensure safe tool invocation

Case Study: Connect to an MCP Server to Build a Deep Research Agent

Goal: Build an Agent capable of deep information research. It should connect to multiple external tool servers conforming to MCP and autonomously plan and invoke these tools to complete a complex research project.

Key Challenges:

  • Authoritative source identification: The Agent must precisely identify and prioritize high-credibility sources such as official docs and academic papers amid vast information
  • Multi-tool orchestration: How to plan a call chain that connects the inputs/outputs of multiple tools (e.g., search, then read, then analyze) into a complete workflow
  • Open-ended exploration: How to handle questions with no single answer, performing exploratory searches from multiple angles and aggregating results

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
┌───────────────────┐
│ 用户调研课题 │
└─────────┬─────────┘


┌───────────────────────────────────┐
│ 调研主控 Agent │
└──────────┬────────────────────────┘

│ 连接

┌───────────────────────────────────┐
│ MCP 工具网关 │
└──┬──────────┬──────────┬──────────┘
│ │ │
▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────────┐
│ Web │ │Context│ │ ...其他 │
│Search│ │ 7 │ │MCP Server│
│ MCP │ │ MCP │ │ │
│Server│ │Server │ │ │
└──────┘ └──────┘ └──────────┘

│ 规划调研步骤
│ 调用工具
│ 整合信息

┌───────────────────┐
│ 生成调研报告 │
└───────────────────┘

Advanced Content: Learning from Experience

Core Idea: A truly intelligent agent should not only use tools, but also learn and evolve from the experience of using them. It should remember the “playbook” for successfully solving certain tasks (i.e., prompt templates and tool-invocation sequences) and reuse it directly when encountering similar tasks in the future.

Implementation Strategies:

  • Experience storage: After successfully completing a complex task, the Agent stores the entire process (including user intent, chain of thought, tool-invocation sequence, and final result) in the knowledge base as an “experience case”
  • Experience retrieval: When facing a new task, the Agent first searches the experience base for similar cases
  • Experience application: If similar cases are found, the Agent uses their successful strategies as high-level guidance rather than starting from scratch each time

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
┌───────────┐     ┌─────────────┐
│ 新任务 ├────►│ Agent │
└───────────┘ └──────┬──────┘

│ 检索相似经验

┌──────────────┐
│ 经验知识库 │
└──────┬───────┘

找到成功案例 │

┌──────────────┐
│ 应用经验 │
└──────┬───────┘


┌──────────────┐
│ 任务执行 │
└──────┬───────┘

任务成功 │

┌──────────────┐
│ 保存新经验 │
└──────┬───────┘


┌──────────────┐
│ 生成结果 │
└──────────────┘

Advanced Practice: Enhance the Deep Research Agent’s Expert Capabilities

Goal: Equip the Agent with expert-level capabilities for complex deep-research scenarios. For example, when researching “OpenAI’s co-founders,” it can automatically launch a parallel sub-research Agent for each founder; when searching for people, it can effectively handle name collisions.

Key Challenges:

  • Loading domain experience: How to load different experiential knowledge based on task type (e.g., “academic research” vs. “people research”) to guide the Agent to use the most suitable authoritative sources and prompt strategies
  • Dynamic sub-agents: How to let the main Agent dynamically create multiple parallel sub-agents based on preliminary search results to handle sub-tasks separately
  • Disambiguation: How to design clarification and verification mechanisms for ambiguity-prone scenarios such as people search

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
┌──────────────────────────┐
│ 调研OpenAI cofounders │
└────────────┬─────────────┘


┌──────────────────────────────────┐
│ Agent │
│ 加载'人物调研'经验 │
└────────────┬─────────────────────┘

│ 初步搜索

┌──────────────────────────┐
│ 搜索引擎 │
└────────────┬─────────────┘

返回创始人列表


┌──────────────────────────────────┐
│ 为每个人启动 Sub-agent │
├──────────────────────────────────┤
│ • Sam Altman 调研 Agent │
│ • Greg Brockman 调研 Agent │
│ • ... │
└────────────┬─────────────────────┘


┌──────────────────────────┐
│ 结果汇总 │
└────────────┬─────────────┘


┌──────────────────────────┐
│ 生成最终报告 │
└──────────────────────────┘

Week 5: Programming and Code Execution

Core Challenges for Code Agents

  • Codebase understanding:

    • How to find relevant code in a large codebase (semantic search)?
    • How to accurately find all call sites/references of a function in code?
  • Reliable code modification:

    • How to reliably apply AI-generated diffs to source files (old_string -> new_string)?
  • Consistent execution environment:

    • How to ensure the Agent runs commands in the same terminal session each time (inheriting pwd, env var, etc.)?
    • How to preconfigure the necessary dependencies and tools for the Agent’s execution environment?

Case Study: Build an Agent That Can Develop Agents

Goal: Build an “Agent Development Engineer” Agent. It can take a high-level natural-language requirement (e.g., “Develop an Agent that can browse the web; frontend uses React + Vite + Shadcn UI, backend uses FastAPI…”) and then autonomously complete the entire application development.

Key Challenges:

  • Document-driven development: How to have the Agent first write a design document for the application to be built and strictly follow it for subsequent code implementation
  • Test-driven development: How to ensure the Agent writes and runs tests for every piece of code it generates to guarantee the final application’s quality and correctness
  • Development and testing environment: The Agent needs a solid dev and test environment to autonomously run test cases, find bugs, and then fix them

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌──────────────────────────────┐
│ prompt: 开发一个搜索 Agent │
└───────────────┬──────────────┘


┌──────────────────────────────┐
│ 开发工程师 Agent │
└───────────────┬──────────────┘


┌──────────────────────────────┐
│ 创建 TODO List │
└───────────────┬──────────────┘


┌──────────────────────────────┐
│ 执行: vite create │
└───────────────┬──────────────┘


┌──────────────────────────────┐
│ ...编码 & 调试... │
└───────────────┬──────────────┘


┌──────────────────────────────┐
│ 完成 │
└──────────────────────────────┘

Advanced Topic: Agent Self-Evolution

Core Idea: The ultimate form of an Agent’s capability is self-evolution. When faced with a problem that existing tools can’t solve, an advanced Agent shouldn’t give up; instead, it should use its coding ability to create a new tool for itself.

Implementation Strategy:

  • Capability Boundary Recognition: The Agent must first determine whether the current problem exceeds the capabilities of its existing toolset
  • Tool Creation Planning: The Agent plans the new tool’s functions, inputs, and outputs, and searches open-source repositories (e.g., GitHub) for usable implementations
  • Code Encapsulation and Verification: The Agent wraps the discovered code into a new tool function, writes test cases for it, and verifies correctness in a sandbox
  • Tool Library Persistence: After validation, add the new tool to its permanent tool library for future use

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
┌────────────┐     ┌────────────┐
│ 新问题 ├────►│ Agent │
└────────────┘ └─────┬──────┘

现有工具无法解决


┌──────────────┐
│ 搜索 GitHub │
└──────┬───────┘


┌──────────────────┐
│找到并下载相关代码 │
└──────┬───────────┘


┌──────────────┐
│ 封装为新工具 │
└──────┬───────┘


┌──────────────┐
│ 沙箱中验证 │
└──────┬───────┘

验证通过


┌──────────────┐
│ 加入工具库 │
└──────────────┘

Week 6: Evaluation and Selection of Large Models

Core Content

Evaluating the Capability Boundaries of Large Models

  • Core Capability Dimensions: reasoning ability, knowledge breadth, hallucination, long-text handling, instruction following, tool invocation
  • Build Discriminative Test Cases: Design Agent-centric evaluation sets rather than simple chatbot Q&A
  • LLM as a Judge: Use a strong LLM (e.g., GPT-4.1) as the “judge” to automatically evaluate and compare the output quality of different models or Agents

Adding Safety Guardrails to Large Models

  • Input Filtering: Prevent prompt injection
  • Output Filtering: Monitor and block inappropriate or dangerous content
  • Human Intervention: Introduce a human-in-the-loop confirmation step before high-risk actions
  • Cost Control: Monitor token usage, set budget limits, and prevent abuse

Practical Case: Build an Evaluation Dataset and Use LLM as a Judge to Auto-Evaluate the Agent

Goal: For the in-depth research Agent we built in previous weeks, systematically construct an evaluation dataset. Then develop an automated testing framework that uses the LLM as a Judge approach to assess how different “brains” (e.g., Claude 4 vs Gemini 2.5) and different strategies (e.g., enabling/disabling chain-of-thought) affect the Agent’s performance.

Key Challenges:

  • Evaluation Dataset Design: How to craft a set of research tasks that are representative yet cover edge cases?
  • “Judge” Prompt Design: How to design the prompt for the “LLM Judge” so it can score the Agent’s outputs fairly, consistently, and accurately?
  • Result Interpretability: How to analyze the auto-evaluation results to identify the strengths and weaknesses of different models or strategies

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
┌─────────────────┐
│ 评测任务集 │
└────────┬────────┘


┌────────────────────────────────────┐
│ 自动化评测框架 │
└────┬──────────┬──────────┬─────────┘
│ │ │
▼ ▼ ▼
┌─────────┐┌─────────┐┌─────────┐
│调研Agent││调研Agent││调研Agent│
│(Claude 4││(Claude 4││(Gemini │
│ with ││ no ││ 2.5 │
│thinking)││thinking)││ with │
│ ││ ││thinking)│
└────┬────┘└────┬────┘└────┬────┘
│ │ │
└──────────┴──────────┘


┌──────────────────┐
│ Agent 输出结果 │
└─────────┬────────┘


┌──────────────────┐ ┌─────────────────┐
│ LLM as a Judge │◄────┤ 评测任务集 │
└─────────┬────────┘ └─────────────────┘


┌───────────────────────┐
│ 生成量化评分与分析 │
└───────────────────────┘

Advanced Topic: Parallel Sampling and Sequential Revision

Core Idea: Simulate humans’ “brainstorming” and “reflect-and-revise” processes to handle complex, open-ended problems and improve the quality and robustness of Agent outputs.

Parallel Sampling (Parallel Sampling)

  • Idea: Launch multiple Agent instances simultaneously, using slightly different prompts or a higher temperature to explore solutions in parallel from multiple angles
  • Advantages: Increase the chance of finding the optimal solution and avoid the limitations of a single Agent’s thinking
  • Implementation: Similar to Multi-Agent, but aimed at solving the same problem; finally select the best answer via an evaluation mechanism (e.g., LLM as a Judge)

Sequential Revision (Sequential Revision)

  • Idea: Have the Agent critique and revise its own initial output
  • Process: Initial response → self-evaluation → issue identification → generate improvements → final output
  • Advantages: Improve single-task success rate and depth of answers, achieving self-optimization

Advanced Practice: Add Parallel and Revision Capabilities to the In-Depth Research Agent

Goal: Integrate both Parallel Sampling and Sequential Revision into our in-depth research Agent, and use the evaluation framework we just built to quantify whether and to what extent these strategies improve the Agent’s performance.

Key Challenges:

  • Strategy Fusion: How to organically combine Parallel Sampling (horizontal expansion) and Sequential Revision (vertical deepening) into a single Agent workflow?
  • Cost Control: Both strategies significantly increase LLM call costs; how to design mechanisms that balance performance gains and cost?
  • Performance Attribution: In evaluation, how to accurately attribute performance improvements to Parallel Sampling versus Sequential Revision?

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
┌────────────────┐
│ 调研任务 │
└───────┬────────┘

├─────────────────────────┐
│ 并行采样 │
├─────────────────────────┤
│ ┌─────────────────┐ │
├─►│子 Agent 1 │ │
│ │(Prompt A) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
├─►│子 Agent 2 │ │
│ │(Prompt B) │ │
│ └────────┬────────┘ │
│ │ │
└───────────┼─────────────┘


┌─────────────────┐
│ 初步结果 │
└────────┬────────┘


┌─────────────────┐
│结果评估与筛选 │
└────────┬────────┘

┌───────────┼─────────────┐
│ 顺序修订 │
├─────────────────────────┤
│ ┌──────────┐ │
│ │自我反思 │◄──────┤
│ └─────┬────┘ │
│ │ │
└───────────┼─────────────┘


┌─────────────────┐
│ 最终报告 │
└─────────────────┘

Week 7: Multimodal and Real-Time Interaction

Core Content

Real-Time Voice Call Agent

  • Tech Stack: VAD (Voice Activity Detection), ASR (Automatic Speech Recognition), LLM, TTS (Text-to-Speech)
  • Low-Latency Interaction: Optimize end-to-end latency from user voice input to Agent voice output
  • Natural Interrupt Handling: Allow users to interject while the Agent is speaking, achieving more human-like dialogue flow

Operating Computers and Phones

  • Visual Understanding: The Agent needs to understand screenshots and identify UI elements (buttons, input fields, links)
  • Action Mapping: Map natural-language commands like “click the login button” precisely to screen coordinates or UI element IDs
  • Integration with Existing Frameworks: Invoke mature frameworks like browser-use to quickly give the Agent the ability to operate a computer

Practical Case 1: Build a Real-Time Voice Call Agent That Can Listen and Speak

Goal: From scratch, build an Agent that can engage in real-time, fluent voice conversations with users. It needs to respond quickly, understand and execute voice commands, and even proactively initiate guided dialogue.

Key Challenge:

  • Latency Control: The end-to-end latency from user voice input to Agent voice output is critical to the experience. How to optimize each part of the stack?

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
语音输入流                    大脑                    语音输出流
┌──────────┐ ┌──────────────┐ ┌──────────┐
│用户语音 │ │ │ │播放声音 │
└────┬─────┘ │ │ └────▲─────┘
│ │ │ │
▼ │ │ │
┌──────────┐ │ LLM │ ┌──────────┐
│VAD 断句 │ │ │ │TTS 语音 │
└────┬─────┘ │ │ │ 合成 │
│ │ │ └────▲─────┘
▼ │ │ │
┌──────────┐ 文本流 │ │ 文本流 ┌──────────┐
│ASR 实时 ├──────────►│ ├──────────►│ │
│ 转写 │ │ │ │ │
└──────────┘ └──────────────┘ └──────────┘

Practical Case 2: Integrate browser-use to Let the Agent Operate Your Computer

Goal: Call the existing browser-use framework to give our Agent the ability to operate the computer browser. The Agent should understand user operation commands (e.g., “help me open anthropic.com and find the computer use documentation”) and translate them into actual browser actions.

Key Challenges:

  • Framework Integration: How to smoothly integrate browser-use as a tool into our existing Agent architecture
  • Instruction Generalization: User commands may be ambiguous; how to help the Agent understand them and translate them into precise operations supported by browser-use
  • State Synchronization: How to make the Agent aware of browser operation results (e.g., navigation, element loading) to inform the next decision

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
┌───────────────────┐
│ 用户操作指令 │
└─────────┬─────────┘


┌───────────────────────────────┐
│ 主控 Agent │
└─────────┬─────────────────────┘

│ 决策使用浏览器

┌───────────────────────────────┐
│ 调用 browser-use 工具 │
└─────────┬─────────────────────┘

│ page.goto(url)

┌───────────────────────────────┐
│ 浏览器 │
└─────────┬─────────────────────┘

│ 返回页面截图

┌───────────────────────────────┐
│ 主控 Agent │
│ (分析截图, 规划下一步) │
└─────────┬─────────────────────┘

│ page.click(selector)

┌───────────────────────────────┐
│ 调用 browser-use 工具 │
└───────────────────────────────┘

Advanced Topic: Fast–Slow Thinking and Intelligent Interaction Management

Fast–Slow Thinking (Mixture-of-Thoughts) Architecture

  • Fast Response Path: Use low-latency models (e.g., Gemini 2.5 Flash) for instant feedback, handling simple queries and maintaining conversational flow
  • Deep Thinking Path: Use stronger SOTA models (e.g., Claude 4 Sonnet) for complex reasoning and tool invocation to provide more precise, in-depth answers

Intelligent Interaction Management

  • Smart Interrupts (Interrupt Intent Detection): Use VAD and small models to filter background noise and meaningless backchannels, and only stop speaking when the user has a clear intent to interrupt
  • Turn-Taking Judgment (Turn Detection): Analyze the semantic completeness of what the user has said to decide whether the AI should continue speaking, avoiding talking over the user
  • Silence Management (Silence Management): When the user is silent for an extended time, proactively start a new topic or ask follow-ups to maintain continuity

Advanced Practice: Build an Advanced Real-Time Voice Agent

Goal: Build an advanced voice Agent that integrates the “fast–slow thinking” architecture and “intelligent interaction management,” achieving industry-leading response speed and naturalness of interaction.

Key Challenges and Acceptance Criteria:

  • Basic Reasoning: Ask: “What is 8 to the 6th power?” — must give an initial response within 2 seconds and the correct answer “262144” within 15 seconds.
  • Tool Invocation: Ask: “How is the weather in Beijing today?” — must respond within 2 seconds and return accurate weather via API within 15 seconds.
  • Intelligent Interaction Management:
    • Smart Interrupts: During the Agent’s speech:
      • If the user says “uh-huh,” the Agent should not stop speaking.
      • If the user taps the table, the Agent should not stop speaking.
      • If the user says “Then its battery life…,” the Agent should immediately stop the current utterance.
    • Turn-Taking Judgment: After the user says “Then its battery life…” and deliberately pauses, the Agent should not respond.
    • Silence Management: If the user pauses for more than 3 seconds after saying “Then its battery life…,” the Agent should proactively guide the conversation or ask a follow-up to keep the exchange smooth.

Architecture Design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
┌──────────┐      ┌──────────┐
│用户语音 ├─────►│ ASR │
└──────────┘ └────┬─────┘

文本流 │ ┌─────────────────┐
┌─────────────┼────────►│ 打断/发言权判断 ├──┐
│ │ └────────┬─────────┘ │
│ │ │ │
│ │ ┌─────────▼────────┐ │ 打断信号
│ │ │ 快思考 LLM │ │
│ │ └────────┬─────────┘ │
│ │ │ │
│ 文本流 ▼ │ │
│ ┌───────────────┐ │ │
│ │ 慢思考 LLM │ │ │
│ └───────┬───────┘ │ │
│ │ │ │
│ 中间思考过程 │ │ │
└─────────────┘ │ │
│ │ │
▼ ▼ ▼
┌──────────────────────────────┐
│ TTS │
└─────────────┬────────────────┘


┌──────────┐
│播放声音 │
└──────────┘

Week 8: Multi-Agent Collaboration

Core Content

Limitations of a Single Agent

  • High Context Cost: A single context window balloons quickly in complex tasks
  • Inefficiency of Sequential Execution: Cannot process multiple sub-tasks in parallel
  • Quality Degradation with Long Contexts: Models tend to “forget” or get “distracted” in overly long contexts
  • No Parallel Exploration: Can only explore along a single path

Advantages of Multi-Agent

  • Parallel processing: Break down the task and have different SubAgents process in parallel to improve efficiency
  • Independent context: Each SubAgent has an independent, more focused context window to ensure execution quality
  • Compression is essence: Each SubAgent only needs to return its most important findings, aggregated by the main Agent to achieve efficient information compression
  • Emergent collective intelligence: Suited for open-ended research and other tasks requiring multi-angle analysis

Practical case: Design a Multi-Agent collaborative system to achieve “talk on the phone while using a computer”

Goal: Solve the challenge of “multitasking.” Build a team composed of a “Phone Agent” and a “Computer Agent.” The “Phone Agent” handles voice communication with the user to gather information; the “Computer Agent” operates the web in sync. The two communicate in real time and collaborate efficiently.

Key challenges:

  • Dual-Agent architecture: Two independent Agents, one responsible for voice calls (Phone Agent), and one for operating the browser (Computer Agent)
  • Collaborative communication between Agents: The two Agents must communicate bidirectionally and efficiently. Information obtained by the Phone Agent should be immediately conveyed to the Computer Agent, and vice versa. This can be implemented via tool calls
  • Parallel work and real-time performance: The key is that the two Agents must work in parallel without blocking each other. Each one’s context needs to include real-time messages from the other Agent

Architecture design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
┌──────────┐  语音  ┌──────────────┐  A2A通信  ┌──────────────┐  GUI操作  ┌──────────────┐
│ 用户 │◄──────►│ 电话 Agent │◄─────────►│ 电脑 Agent │◄─────────►│ 浏览器/桌面 │
└──────────┘ └──────┬───────┘ └──────┬───────┘ └──────────────┘
│ │
┌──────┴──────────────┐ ┌──────┴──────────────┐
│ 电话 Agent 流程 │ │ 电脑 Agent 流程 │
├─────────────────────┤ ├─────────────────────┤
│ ┌────┐ │ │ ┌────────────┐ │
│ │ASR │ │ │ │ 接收指令 │ │
│ └──┬─┘ │ │ └──────┬─────┘ │
│ ▼ │ │ ▼ │
│ ┌────┐ 发送指令 │ │ ┌────────────┐ │
│ │LLM ├───────────►┤ │ │多模态 LLM │ │
│ └──┬─┘ │ │ └──────┬─────┘ │
│ ▼ │ │ ▼ │
│ ┌────┐ │ │ ┌────────────┐ │
│ │TTS │ │ │ │执行点击/输入 │ │
│ └────┘ │ │ └────────────┘ │
│ ▲ │ │ │ │
│ │ 返回状态 │ │ │请求澄清 │
│ └──────────────┤◄─────┤◄────────┘ │
└────────────────────┘ └─────────────────────┘

Advanced topic: Orchestration Agent - Treat Sub-agents as tools

Core idea: Instead of hard-coded inter-Agent collaboration, introduce a higher-level “Orchestration Agent”. Its core responsibility is to understand the user’s top-level goal and dynamically select, launch, and coordinate a set of “expert Sub-agents” (as tools) to accomplish the task together.

Implementation strategy:

  • Sub-agent as Tools: Each expert Sub-agent (e.g., Phone Agent, Computer Agent, Research Agent) is encapsulated as a “tool” conforming to a standard interface
  • Dynamic tool invocation: The Orchestration Agent, based on user needs, asynchronously invokes one or more Sub-agent tools
  • Direct inter-Agent communication: Allow invoked Sub-agents to establish direct communication channels for efficient task collaboration, without everything relayed through the Orchestration Agent

Architecture design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
┌──────────────────┐
│ 用户顶层目标 │
└────────┬─────────┘


┌──────────────────────────┐
│ Orchestration Agent │
└────┬──────────┬──────────┘
│ │
│决策调用 │决策调用
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│电话 Agent│ │电脑 Agent│
│ 工具 │ │ 工具 │
└────┬─────┘ └────┬─────┘
│ │
│ A2A 直连 │
◄────────────►
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ 用户 │ │ 浏览器 │
└──────────┘ └──────────┘

Advanced practice: Use an Orchestration Agent to dynamically coordinate phone and computer operations

Goal: Refactor our “talk on the phone while using a computer” system. Instead of hard-coding the launch of two Agents, create an Orchestration Agent. When the user asks “help me call to book a flight”, the Orchestration Agent can automatically understand that the task requires both “making a phone call” and “operating a computer”, then launch these two Sub-agents in parallel and have them collaborate.

Key challenges:

  • Task planning and tool selection: How can the Orchestration Agent accurately decompose a vague user goal into which specific Sub-agent tools are needed?
  • Asynchronous tool management: How to manage the lifecycle (start, monitor, terminate) of multiple parallel, long-running Sub-agent tools?
  • Communication between Sub-agents: How to establish an efficient, temporary direct communication mechanism for dynamically launched Sub-agents?

Architecture design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
┌────────────────────────┐
│ 帮我打电话订机票 │
└───────────┬────────────┘


┌────────────────────────────────┐
│ Orchestration Agent │
│ (思考) │
└────┬──────────────┬────────────┘
│ │
│ 并行启动 │ 并行启动
▼ ▼
┌──────────┐ ┌──────────┐
│电话 Agent│ │电脑 Agent│
└────┬─────┘ └────┬─────┘
│ │
│ A2A 通信 │
◄──────────────►
│ │
┌────┴──────────────┴────┐
│ 任务执行 │
├────────────────────────┤
│ • 获取用户信息 │
│ • 填写表单 │
└────┬──────────────┬────┘
│ │
▼ ▼
┌──────────┐ ┌──────────────┐
│ 用户 │ │航空公司网站 │
└──────────┘ └──────────────┘
│ │
│ │
└──────┬───────┘

任务完成/失败


┌────────────────────────────────┐
│ Orchestration Agent │
│ (向用户报告) │
└────────────────────────────────┘

Week 9: Project Showcase

Core content

Project integration and showcase

  • Integration capability: Combine skills from the first 8 weeks (RAG, tool calling, speech, multimodal, Multi-Agent) into a final project
  • Results presentation: Each participant will have the opportunity to showcase their unique general-purpose Agent and share the thinking and challenges during creation
  • Peer review: Through mutual demos and Q&A, gain inspiration and ideas from classmates’ projects

Book polishing and summary

  • Knowledge consolidation: Together review and summarize the core knowledge points of the 9 weeks and solidify them into the final manuscript of 《In-Depth yet Simple AI Agent》
  • Co-creating content: Propose edits to the manuscript and polish it together to ensure it is “systematic and practical”
  • Credited publication: All participants who co-create will have their names appear in the final printed book

Practical case: Showcase your unique general-purpose Agent

Goal: Provide a comprehensive summary and demo of the personal Agent project built during the camp. This is not only a results report, but also an exercise in systematizing what you learned and clearly explaining complex technical solutions to others.

Presentation highlights:

  • Agent positioning: What core problem does your Agent solve?
  • Technical architecture: How did you combine what you learned (context, RAG, tools, multimodal, Multi-Agent) to achieve the goal?
  • Innovation highlights: What is the most creative design in your Agent?
  • Demo: Live demo of the Agent’s core capabilities
  • Future outlook: How do you plan to continue iterating and improving your Agent?

Final project architecture example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
┌──────────────┐
│ 用户 │
│ (语音/文本) │
└──────┬───────┘


┌─────────────────────────────────────┐
│ 主控 Agent │
└────┬──────┬──────┬──────┬───────────┘
│ │ │ │
│ │ │ └──────────────┐
│ │ │ │
┌────▼──────▼──────▼─────┐ ┌───────▼──────────────────┐
│ 核心能力 │ │ 专业 Agent 团队 │
├────────────────────────┤ ├──────────────────────────┤
│ • 上下文与记忆系统 │ │ • 深度调研 Agent │
│ • 工具调用引擎 │ │ • 编程 Agent │
│ └─► 外部 API │ │ • 电话 Agent │
│ • RAG 知识库 │ │ • 电脑操作 Agent │
└───────────────────────┘ └──────────────────────────┘

Advanced topic: Four ways an Agent learns from experience

1. Rely on long-context capability

  • Idea: Trust and leverage the model’s own long-context processing ability, providing the complete, uncompressed conversation history as input
  • Implementation:
    • Keep recent conversation: Fully retain the recent interaction history (Context Window)
    • Compress long-term memory: Use Linear Attention to automatically compress distant conversation history into the latent space
    • Extract key snippets: Use Sparse Attention to automatically extract segments most relevant to the current task from distant conversation history
  • Pros: Easiest to implement, preserves original information detail to the greatest extent
  • Cons: Strongly dependent on model capability

2. Extract in text form (RAG)

  • Idea: Summarize experience into natural language and store it in a knowledge base
  • Implementation: Retrieve relevant experience text via RAG and inject it into the prompt
  • Pros: Cost-controllable, knowledge is readable and maintainable
  • Cons: Depends on retrieval accuracy

3. Post-training (SFT/RL)

  • Idea: Learn the experience into the model weights
  • Implementation: Use high-quality Agent behavior trajectories as data to fine-tune (SFT) or reinforcement-train (RL) the model
  • Pros: Internalizes experience as the model’s “intuition”, suitable for complex tasks with strong generalization
  • Cons: Higher cost, requires lots of high-quality data; longer cycles, hard to realize a real-time feedback loop—i.e., examples that just failed online won’t immediately prevent similar mistakes

4. Abstract into code (tools/Sub-agent)

  • Idea: Abstract recurring successful patterns into a reusable tool or Sub-agent
  • Implementation: The Agent identifies automatable patterns and writes code to solidify them
  • Pros: Reliable and efficient learning method
  • Cons: Requires strong coding ability from the Agent; as tool count grows, tool selection becomes a challenge

Advanced practice: Compare the four ways an Agent learns from experience

Goal: Using the evaluation framework we built in Week 6, design experiments to compare the pros and cons of the four learning-from-experience approaches for Agents.

Key challenges:

  • Experimental design: How to design a set of tasks that clearly highlights the differences among the four learning methods?
  • Cost-performance tradeoff: In the evaluation report, how to combine each method’s “performance score” with its “computational cost” for a holistic assessment?
  • Scenario-based analysis: Draw conclusions about which learning method to prioritize under which task scenarios?

Architecture design:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
┌────────────┐
│ 评测任务 │
└─────┬──────┘


┌─────────────────────┐
│ 评测框架 │
└──┬───┬───┬───┬──────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────┐┌────┐┌────┐┌────────────┐
│长 ││RAG ││后 ││工具和 │
│上 ││ ││训 ││sub-agent │
│下 ││ ││练 ││ │
│文 ││ ││ ││ │
└──┬─┘└──┬─┘└──┬─┘└──┬─────────┘
│ │ │ │
└─────┴─────┴─────┘


┌──────────────────┐
│ 性能/成本数据 │
└────────┬─────────┘


┌──────────────────┐
│ LLM as a Judge │
└────────┬─────────┘


┌──────────────────┐
│生成对比分析报告 │
└──────────────────┘

Summary recap

Through 9 weeks of systematic study and practice, we completed the full journey from getting started with Agents to building general-purpose intelligent agents:

Core competencies mastered

  1. Agent architecture understanding: Deeply understood the core design paradigm of LLM + context + tools
  2. Context engineering mastery: Mastered multi-level context management techniques
  3. Tooling system construction: Implemented reliable integrations with external APIs and MCP Server
  4. Multimodal interaction: Built voice and vision multimodal Agents
  5. Collaboration pattern design: Implemented complex collaboration patterns such as Multi-Agent and Orchestration

Practical project portfolio

  • Web-connected search Agent
  • Legal Q&A Agent
  • In-depth research Agent
  • Agent development engineer Agent
  • Real-time voice call Agent
  • Multi-Agent collaborative system

Advanced technical exploration

  • Context compression and optimization
  • Four ways to learn from experience
  • Parallel sampling and sequential revision
  • Fast and slow thinking architecture
  • An Agent’s self-evolution

🚀 Start building your own AI Agent right here!


Original Slidev Markdown

Slides link

Comments

2025-08-18
  1. Bootcamp Core Objectives
    1. 🎯 Master core architecture and engineering skills
    2. 💡 Build a systematic understanding of development and deployment
  2. 9-Week Hands-on Plan Overview
    1. 9-Week Advanced Topics
  3. Week 1: Agent Basics
    1. Core Content
      1. Agent structure and taxonomy
      2. Basic frameworks and scenario fit
    2. Hands-on Case: Build an Agent that can search the web
    3. Advanced: The importance of context
    4. Advanced Practice: Exploring the impact of missing context on Agent behavior
  4. Week 2: Context Design (Context Engineering)
    1. Core Content
      1. Prompt templates
      2. Conversation history and user memory
    2. Hands-on Case: Add persona and long-term memory to your Agent
    3. Advanced Content: Organizing User Memory
    4. Advanced Practice: Summarize Your Diary into a Personal Report
  5. Week 3: RAG Systems and Knowledge Bases
    1. Core Content
      1. Document Structuring and Retrieval Strategies
      2. Basic RAG
    2. Case Study: Build a Legal Q&A Agent
    3. Advanced Content: Treat the File System as the Ultimate Context
    4. Advanced Practice: Build an Agent That Can Read Multiple Papers
  6. Week 4: Tool Use and MCP
    1. Core Content
      1. Multiple Ways to Wrap Tools
      2. MCP (Model Context Protocol)
    2. Case Study: Connect to an MCP Server to Build a Deep Research Agent
    3. Advanced Content: Learning from Experience
    4. Advanced Practice: Enhance the Deep Research Agent’s Expert Capabilities
  7. Week 5: Programming and Code Execution
    1. Core Challenges for Code Agents
    2. Case Study: Build an Agent That Can Develop Agents
    3. Advanced Topic: Agent Self-Evolution
  8. Week 6: Evaluation and Selection of Large Models
    1. Core Content
      1. Evaluating the Capability Boundaries of Large Models
      2. Adding Safety Guardrails to Large Models
    2. Practical Case: Build an Evaluation Dataset and Use LLM as a Judge to Auto-Evaluate the Agent
    3. Advanced Topic: Parallel Sampling and Sequential Revision
    4. Advanced Practice: Add Parallel and Revision Capabilities to the In-Depth Research Agent
  9. Week 7: Multimodal and Real-Time Interaction
    1. Core Content
      1. Real-Time Voice Call Agent
      2. Operating Computers and Phones
    2. Practical Case 1: Build a Real-Time Voice Call Agent That Can Listen and Speak
    3. Practical Case 2: Integrate browser-use to Let the Agent Operate Your Computer
    4. Advanced Topic: Fast–Slow Thinking and Intelligent Interaction Management
    5. Advanced Practice: Build an Advanced Real-Time Voice Agent
  10. Week 8: Multi-Agent Collaboration
    1. Core Content
      1. Limitations of a Single Agent
      2. Advantages of Multi-Agent
    2. Practical case: Design a Multi-Agent collaborative system to achieve “talk on the phone while using a computer”
    3. Advanced topic: Orchestration Agent - Treat Sub-agents as tools
    4. Advanced practice: Use an Orchestration Agent to dynamically coordinate phone and computer operations
  11. Week 9: Project Showcase
    1. Core content
      1. Project integration and showcase
      2. Book polishing and summary
    2. Practical case: Showcase your unique general-purpose Agent
    3. Advanced topic: Four ways an Agent learns from experience
      1. 1. Rely on long-context capability
      2. 2. Extract in text form (RAG)
      3. 3. Post-training (SFT/RL)
      4. 4. Abstract into code (tools/Sub-agent)
    4. Advanced practice: Compare the four ways an Agent learns from experience
  12. Summary recap
    1. Core competencies mastered
    2. Practical project portfolio
    3. Advanced technical exploration