Bojie Li
2025-07-10
title: The Translated Work “Illustrated DeepSeek Technology” by Meng Jiaying and Me is About to be Released
date: 2025-06-30 10:00
2025-07-10
The AI Agent Hackathon at UCAS in February 2025 was a great success. Therefore, from July 27 to 30, 2025, at Zhongguancun AI Academy, and from July 31 to August 4 at UCAS, I will host two AI Agent practical topics again.
Many thanks to Professor Zheng Shuxin, Vice Dean of Zhongguancun AI Academy, and Professor Liu Junming from UCAS for inviting me to host these two AI Agent practical activities.
All topics in this AI Agent practice will take you deep into the cutting-edge technologies for building the next generation of AI Agents. You will have the opportunity to practice:
- Multimodal models and thinking model applications: Build the “brain” of the agent with industry-leading multimodal models and thinking models such as Gemini 2.5 Pro and Claude 4 Sonnet.
- Real-time voice interaction: Integrate VAD, ASR, LLM, and TTS technology stacks to create real-time voice agents capable of streaming conversations.
- Autonomous GUI operation: Develop agents that can stably operate browsers and other GUIs to complete complex real-world tasks.
- Advanced Agent Architecture: Explore advanced architectures such as “fast and slow thinking,” “thinking while listening,” and multi-agent collaboration to equip agents with both real-time response and deep thinking capabilities.
- Learning from experience: Build agents that can learn from experience, allowing them to become more proficient with repeated tasks.
- Identifying authoritative information sources: Enable agents to accurately identify and adopt high-credibility information such as official documents and academic papers from vast amounts of information.
- Autonomous tool invocation and creation: Enable agents not only to use existing tools but also to autonomously learn and create new tools to solve open-ended problems.
Suggestions for AI-assisted programming: In this AI Agent practice, we encourage everyone to use AI-assisted programming, which means “developing agents with agents.” We recommend using Cursor for Vibe Coding, and here are some suggestions:
- Documentation first, code later: First, let Cursor write the design document. Your role is to provide improvement suggestions for the AI-generated design document and iterate with the AI until satisfied. Then, let Cursor write the code according to the final design document. During coding, always keep the design document in the agent’s context as a reference.
- Choose the right model: Do not use Cursor’s “auto” mode; be sure to choose a model with thinking ability (with a brain icon next to it), such as Claude 4 Sonnet.
- Test-driven: Be sure to have AI write and execute test cases for its code to ensure code quality.
Feel free to form teams and choose any one of the following topics to start your creative journey!
2025-07-10
The AI Agent Practical Course is a hands-on course conducted by Professor Liu Junming from UCAS and myself. The first session in 2024 had over 50 participants, and the second session in 2025 had over 100 participants. The 2025 Spring AI Agent Practical Course took place in early February 2025 in Beijing.
- Click here to see some of the practical results from the UCAS 2025 AI Agent Practical Course
- Click here to view reference project code (note, not complete code, for reference only)
Course Directory:
2025-06-12
[This article is based on my invited talk at the A2M Internet Architecture and AI Technology Summit, Turing Large Model Technology Session.]
Hello everyone, welcome to the A2M Summit. Today, I will be sharing on the topic “Effective Agents: Real-Time Interaction with the Environment, Learning from Experience.”
Let me first introduce myself. I am the co-founder and chief scientist of Pine AI.
Currently, our business at Pine AI is to help users handle daily chores and disputes by making phone calls through AI. In the U.S., calling customer service is often a hassle. For instance, you might have to wait for half an hour and then spend a long time communicating with the service representative. If the representative is unwilling to help, you might be transferred to another department. So, the entire process can sometimes take one or two hours. Many people don’t have the time to argue with customer service, and sometimes they just have to accept the loss. Additionally, some people struggle with English, making phone communication difficult. Pine can automate this entire process through AI.
Making today’s AI capable of handling tasks end-to-end is extremely challenging; it’s not as simple as applying a SOTA model with a prompt. Most AI products only provide users with information, like generating a research report, but the actual task still requires the user to contact customer service.
To enable AI Agents to complete tasks end-to-end is indeed difficult. Today, we will discuss some of the core technical challenges and how Pine AI addresses these issues.
2025-04-28
1 | #!/usr/bin/env python3 |
3. 配置 Windows 使用本地 DNS
- 打开“控制面板” -> “网络和共享中心” -> “更改适配器设置”
- 右键点击当前使用的网络连接,选择“属性”
- 双击“Internet 协议版本 4 (TCP/IPv4)”
- 选择“使用下面的 DNS 服务器地址”,并输入
127.0.0.1
4. 启动 DNS 服务器
在命令提示符中运行:
1 | python C:\SmartDNS\standalone_smart_dns.py |
结论
通过本文介绍的方案,用户可以在本地搭建一个轻量级的智能 DNS 分流系统,有效解决 DNS 污染问题,并根据网络环境自动选择最佳的解析结果。该方案不仅适用于 macOS,也可以在 Windows 系统上实现,提供了灵活的跨平台解决方案。
3. Create a Windows Service
To run a Python script as a Windows service, we need to use the NSSM (Non-Sucking Service Manager) tool:
Download NSSM
Extract it to a suitable directory, such as
C:\SmartDNS\nssm
Run the command prompt as an administrator, then execute:
1
2cd C:\SmartDNS\nssm\win64
nssm.exe install SmartDNSIn the configuration window that appears, set:
- Path:
C:\Windows\System32\python.exe
- Startup directory:
C:\SmartDNS
- Arguments:
C:\SmartDNS\standalone_smart_dns.py
- Set the service display name and description in the “Details” tab
- Choose “Local System account” in the “Log on” tab
- Click “Install service”
- Path:
Note: In Windows, port 53 is a privileged port, and the Python script needs to run with administrator privileges to bind to this port. The above service setup using “Local System account” can solve this issue. If you still encounter permission issues, consider changing the listening port in the script to a non-privileged port (such as 5353), and then use firewall rules to forward requests from UDP port 53 to 5353.
- Start the service:
1
sc start SmartDNS
4. Configure the System to Use Local DNS
- Open Control Panel > Network and Internet > Network and Sharing Center
- Click on the current active network connection
- Click “Properties”
- Select “Internet Protocol Version 4 (TCP/IPv4)” and click “Properties”
- Choose “Use the following DNS server addresses”
- Set the preferred DNS server to “127.0.0.1”
- Click “OK” to save the settings
5. Verification Test
Execute in the command prompt:
1 | nslookup baidu.com 127.0.0.1 |
If the settings are correct, the domain names should resolve correctly.
6. Restore Default Settings
If you need to restore the default settings:
Stop the DNS Service:
1
2sc stop SmartDNS
sc delete SmartDNSRestore Default DNS Settings:
- Open Control Panel > Network and Internet > Network and Sharing Center
- Click on the current active network connection
- Click “Properties”
- Select “Internet Protocol Version 4 (TCP/IPv4)” and click “Properties”
- Choose “Obtain DNS server address automatically”
- Click “OK” to save the settings
Conclusion
The combination of local anti-pollution DNS and Layer 3 Tunnel provides us with an elegant solution that avoids DNS pollution issues while ensuring the best speed for accessing both domestic and international websites. This solution is particularly suitable for users who need to access both domestic and international resources simultaneously.
When you use both local anti-pollution DNS and a Layer 3 tunnel (configured with regional routing), you will gain the following advantages:
- Pollution-free Resolution: All domain names can obtain the correct IP address
- Efficient Access:
- DNS queries for domestic websites go directly through the local network, obtaining the IP most suitable for your network environment
- DNS queries for international websites go through the tunnel, avoiding pollution and obtaining CDN IPs close to the tunnel exit
- Direct connection for domestic websites, fast speed
- Tunnel connection for international websites, stable and reliable
- Fully Automatic Traffic Splitting: The system automatically determines which route to take, without the need to manually switch DNS or proxies
2025-04-27
[Thank you to all the readers who sent in over 50 corrections! The readers are really meticulous, finding so many errors, and I am very grateful for the corrections!]
My translation “Illustrated Large Models - Principles and Practice of Generative AI” (Hands-On Large Language Models) was released in May 2025. You can search for “Illustrated Large Models” on platforms like JD.com and Taobao.
Praise for the Book (Chinese Edition)
Many thanks to Yuan Jinhui, founder of Silicon Flow, Zhou Lidong, director of Microsoft Research Asia, Lin Junyang, head of Alibaba Qwen Algorithm, Li Guohao, founder of CAMEL-AI.org community, and Zhong Tai, founder of AgentUniverse, for their wholehearted recommendations!
Translator’s Preface
The development of large models is rapid, as the saying goes, “AI advances in a day, while the world takes a year.” Many people find themselves lost in the flourishing garden of models, unsure of which model to use for their application scenarios and unable to predict the development direction of models in the coming year, often feeling anxious. In fact, almost all large models today are based on the Transformer architecture, with all changes stemming from the same origin.
This book, “Illustrated Large Models,” is an excellent resource to help you systematically understand the basic principles and capability boundaries of Transformers and large models. When Turing Company approached me to translate this book, I immediately agreed upon seeing the author’s name, as it was Jay Alammar’s blog post “The Illustrated Transformer” that truly helped me understand Transformers (Chapter 3 of this book is an expansion of that blog post). Although there are countless books and articles explaining large models on the market today, the exquisite illustrations and the depth of explanation in this book are rare. The book starts with tokens and embeddings, not limited to generative models, but also includes representation models that many overlook. Additionally, the book covers practical content such as text classification, text clustering, prompt engineering, RAG, and model fine-tuning.
I am very honored to be the translator of this book, working with editor Liu Meiying to bring this book to Chinese readers.
Taking some time to read this book and systematically understanding the basic principles and capability boundaries of Transformers and large models is like having a map and compass on an adventure journey through large models. This way, we won’t worry about newly released models rendering long-term engineering accumulation useless overnight, and we can develop products for future models. Once the model capabilities are ready, the product can scale up immediately.
I hope this book can become a sightseeing bus in the garden of large models, allowing more people to see the full view of large models. Thus, the ever-expanding capability boundaries of large models become a visual feast rather than a monster devouring everything; we have the opportunity to stand at the forefront of AI, realize more dreams, and gain more freedom.
2025-04-27
This article is a companion material for the book Hands-On Large Language Models.
When interviewing candidates and attending industry seminars, I often find that many people have extensive practical experience but know very little about the basic principles of models. To help everyone better understand this book, and to facilitate those who need to prepare for interviews to read this book more purposefully, I have systematically compiled common interview questions in the field of large models around the themes of each chapter of this book. Most of the answers to these questions can be found directly in the book, while some advanced questions can be answered from the references in this book or the latest papers on the internet. I hope all readers can read this book with these questions in mind.
Chapter 1: Introduction to Large Language Models
- What is the difference between the encoder and decoder in Transformer, and are models with only an encoder or only a decoder useful?
- What are the differences between GPT and the model architecture in the original Transformer paper?
- What are the advantages and disadvantages of encoder-only (BERT-like), decoder-only (GPT-like), and full encoder-decoder architectures?
- Why is the self-attention mechanism of Transformer considered a significant advancement over the attention mechanism in early RNNs?
- Why do large language models have the concept of maximum context length? Why does it refer to the total length of input and output?
- How are the first token latency, input throughput, and output throughput of large language models calculated? What are the requirements for first token latency, input, and output throughput in different application scenarios?
- Why is the two-step paradigm of pre-training and fine-tuning so important? What core capabilities does the foundational model acquire through pre-training? What role does fine-tuning play in guiding the model to follow instructions, answer questions, and align with human values?
- How does LLaMA-3 8B achieve comprehensive capabilities stronger than LLaMA-1 70B?
2025-04-25
When building a cross-regional server network, such as the VLESS connection used in the article “Building a Three-Layer Tunnel with Full US IP, No Manual Proxy Setup Required,” we often encounter an efficiency issue: the congestion control mechanism of the TCP protocol itself. Although TCP congestion control is crucial for the public internet, in tunnel scenarios where the application layer protocol is already encapsulated (and may have its own flow control or congestion handling), the outer layer TCP congestion control becomes a burden.
Why Disable TCP Congestion Control and Nagle in Tunnels?
- TCP-over-TCP Problem: When you transmit data of one TCP connection (e.g., VLESS over TCP) inside another TCP connection, the so-called “TCP-over-TCP” problem arises. Both the inner and outer TCP have their own congestion control and retransmission mechanisms. When packet loss occurs, both layers of TCP will attempt retransmission and reduce the congestion window. This dual processing is not only redundant but can also lead to a sharp decline in performance, especially on high-latency, high-packet-loss international links. The retransmission timer of the inner TCP may trigger prematurely due to the delay and retransmission of the outer TCP, and vice versa, forming a vicious cycle. Additionally, TCP-over-TCP can cause severe Head-of-Line Blocking issues: a lost packet in the outer TCP will block all data of the inner connections it contains, even if these inner connections are completely unrelated. This means that a connection issue of one user may affect other users sharing the same tunnel.
- Application Layer Flow Control: The application layer protocol transmitted in the tunnel may already have its own flow control and reliability mechanisms. In this case, the congestion control of the underlying TCP is completely redundant, and it will only interfere with the normal operation of the upper-layer protocol, limiting its performance potential.
- Nagle Algorithm Delay: The Nagle algorithm aims to reduce the number of small packets in the network by aggregating small TCP packets into a larger one, thereby improving network utilization. However, in tunnel scenarios, we usually want data to be transmitted through the tunnel as quickly as possible, especially for interactive applications (like SSH) or applications with high real-time requirements. The delay introduced by the Nagle algorithm may negatively impact these applications. Disabling Nagle (via the
TCP_NODELAY
option) allows small packets to be sent immediately, reducing latency. - UDP’s Dilemma on the Public Internet: You might wonder, if TCP has so many issues, why not use UDP to establish tunnel connections directly? Unfortunately, UDP on the public internet, especially on international links, is often subject to ISP QoS (Quality of Service) policies, has lower priority, and is more likely to be dropped or throttled, leading to unstable connections. Therefore, in many cases, we have to choose TCP as the tunnel transport layer protocol, which requires us to find ways to optimize TCP’s behavior.
Therefore, for tunnel connections between servers (especially cross-regional connections), disabling the outer layer TCP’s congestion control and Nagle algorithm can significantly improve the tunnel’s throughput and response speed.
Solution: A Script
2025-04-01
[This article is based on my keynote speech at the 2025 China Generative AI Conference. The content is the result of a 2-hour brainstorming session with AI, followed by 3 hours of collaborative work with AI in Cursor for refinement.]
Summary: Some teams have found that the efficiency gains from applying AI in programming and writing are not as significant as expected. The reason often lies in the fact that a lot of knowledge is only in the minds of specific employees and not documented. As a result, AI Agents, like new interns, find it difficult to write code, and even if they do, they don’t know how to test it. Another reason is that internal tools like project management systems can only be operated through GUIs, which are not AI Agent-friendly. Today’s text inference models have reached human-level capabilities, and the inability to complete tasks is often due to a lack of background knowledge and AI-friendly tools.
We will discuss how to build an AI-native team that is friendly to AI Agents from the perspectives of software development, project management, and operations. An AI-native team needs to use recorded voice and written communication as much as possible, like an open-source community, to reduce reliance on individuals. AI Agents need to access various internal company tools through MCP, have enough context information, and a test environment to work efficiently. AI Agents need memory compression mechanisms, reflection mechanisms, and checkpoint rollback mechanisms to work continuously overnight without human intervention, making useful progress every hour. AI employees also need to actively communicate with human employees and other AI employees. This way, human employees can spend most of their time thinking and discussing, while most repetitive execution work is handed over to AI.
Below is the full text of the speech: (The PPT is the version used at the 2025 China Generative AI Conference, but the text explanation is not a transcript; it is an expanded version generated through brainstorming with AI):
Cover Page
2025-03-14
Live Theme: AI Agent, Destined to Explode?!
Time: March 13, 2025, 20:00—22:00
Method: GeekPark WeChat Video Channel “Tonight’s Tech Talk” Live Broadcast (with guests)
Live Guests:
- Jingyu | Deputy Editor of GeekPark
- Li Bojie | Chief Scientist of PINE AI
- Wanchen | Reporter at GeekPark
Key Highlights Summary
- The core features of AI Agents are the abilities to perceive, plan, and act, enabling them to autonomously gather information, make plans, and execute actions.
- General Agents like Manus will mimic “geek programmers” rather than ordinary people, possessing computational thinking and knowing when to use code and tools to solve problems.
- Current AI Agents are mainly divided into compiled types (like Dify) and interpreted types (like Manus), with compiled types having fixed workflows and interpreted types autonomously planning and making decisions.
- Compiled Agents and interpreted Agents will coexist for a long time rather than replace each other, with different scenarios having different optimal solutions.
- There is a “100x cost law” for large models: chip companies earn 10 times, and large model companies earn another 10 times, revealing the huge gap between model pricing and actual costs.
- Foundational models are key to enhancing the capabilities of general Agents, and humans find it hard to imagine something 10 times smarter than themselves, so human thinking should not be imposed on AI.
- Manus emphasizes “Less Structure, More Intelligence,” similar to the classic “The Bitter Lesson,” where the fewer structural constraints humans impose on AI, the higher the AI’s capability ceiling.
- New generation models like Claude 3.7 Sonnet have made significant breakthroughs in tool usage and programming capabilities, laying the foundation for Agent development.
- The open-source release of DeepSeek R1 makes RL (reinforcement learning) technology more accessible, lowering the threshold for developing high-quality Agents.
- RL training is an important means of building competitive barriers, converting industry experience and expertise into model capabilities.
- The computational power threshold required for RL training is not as high as imagined, and small models trained with RL can surpass large models in some vertical domains.
- Multi-agent architectures are not suitable for all scenarios and may replicate inefficient collaboration models found in human organizations in fields like software development.
- AI programming tools can also play a significant role in large software engineering projects but require a high-quality code engineering foundation, including comprehensive documentation, test cases, and standardized interfaces.
- AI programming tools struggle with “spaghetti code” for the same reason new interns find it hard to take over—there’s too much undocumented tribal knowledge in the code.
- The development of Agent technology will drive improvements in software engineering practices, enhancing code quality and maintainability to meet the standards of well-known open-source projects, making more projects AI-friendly.
- The MCP protocol proposed by Anthropic provides a standardized solution for the interconnection of the Agent ecosystem, allowing diverse professional services to connect rather than replace each other.
- OpenAI’s Responses API, Realtime API, and Anthropic’s MCP represent the direction of Agent framework development.
- The work efficiency of Agents is currently limited by the latency of visual models, with humans still having an advantage in certain operational speeds.
- Virtual machine sandboxes can provide independent working environments but require better personal data integration solutions.
- In the future, AI Agents may be divided into “fast thinking” (user interaction) and “slow thinking” (background processing) parts working together.
- General Agents are a battleground for hardware and operating system giants, but large companies will be relatively cautious in releasing products.
- Opportunities for startups in the Agent field mainly lie in vertical domains, accumulating professional data and industry knowledge through deep cultivation of specific scenarios.
- Programming, education, and interpersonal communication are the three fields most likely to see mature Agent applications first.