Bojie Li
2025-06-12
[This article is based on the author’s invited talk at the A2M Internet Architecture and AI Technology Summit Turing Large Model Technology Session.]
Hello everyone, welcome to the A2M Summit. Today, I will be sharing on the topic “More Human-like Agents: Real-time Interaction with the Environment, Learning from Experience.”
Let me first introduce myself. I am the co-founder and chief scientist of Pine AI.
Currently, our business at Pine AI is to help users handle daily chores and disputes through AI phone calls. In the U.S., calling customer service is often a hassle. For instance, you might have to wait for half an hour and then spend a long time communicating with the service representative. If the representative is unwilling to help, you might be transferred to another department. So, the whole process can sometimes take one to two hours. Many people don’t have the time to argue with customer service and sometimes end up at a loss. Additionally, some people struggle with English, making phone communication difficult. Pine can automate this entire process through AI.
Today, we will discuss some core challenges in deploying AI Agents and how Pine AI addresses these issues.
2025-04-28
1 | #!/usr/bin/env python3 |
3. 配置 Windows 使用本地 DNS
- 打开“控制面板” -> “网络和共享中心” -> “更改适配器设置”
- 右键点击当前使用的网络连接,选择“属性”
- 双击“Internet 协议版本 4 (TCP/IPv4)”
- 选择“使用下面的 DNS 服务器地址”,并输入
127.0.0.1
4. 启动 DNS 服务器
在命令提示符中运行:
1 | python C:\SmartDNS\standalone_smart_dns.py |
结论
通过本文介绍的方案,用户可以在本地搭建一个轻量级的智能 DNS 分流系统,有效解决 DNS 污染问题,并根据网络环境自动选择最佳的解析结果。该方案不仅适用于 macOS,也可以在 Windows 系统上实现,提供了灵活的跨平台解决方案。
3. Create a Windows Service
To run a Python script as a Windows service, we need to use the NSSM (Non-Sucking Service Manager) tool:
Download NSSM
Extract it to a suitable directory, such as
C:\SmartDNS\nssm
Run the command prompt as an administrator, then execute:
1
2cd C:\SmartDNS\nssm\win64
nssm.exe install SmartDNSIn the configuration window that appears, set:
- Path:
C:\Windows\System32\python.exe
- Startup directory:
C:\SmartDNS
- Arguments:
C:\SmartDNS\standalone_smart_dns.py
- Set the service display name and description in the “Details” tab
- Choose “Local System account” in the “Log on” tab
- Click “Install service”
- Path:
Note: In Windows, port 53 is a privileged port, and the Python script needs to run with administrator privileges to bind to this port. The above service setup using “Local System account” can solve this issue. If you still encounter permission issues, consider changing the listening port in the script to a non-privileged port (such as 5353), and then use firewall rules to forward requests from UDP port 53 to 5353.
- Start the service:
1
sc start SmartDNS
4. Configure the System to Use Local DNS
- Open Control Panel > Network and Internet > Network and Sharing Center
- Click on the current active network connection
- Click “Properties”
- Select “Internet Protocol Version 4 (TCP/IPv4)” and click “Properties”
- Choose “Use the following DNS server addresses”
- Set the preferred DNS server to “127.0.0.1”
- Click “OK” to save the settings
5. Verification Test
Execute in the command prompt:
1 | nslookup baidu.com 127.0.0.1 |
If the settings are correct, the domain names should resolve correctly.
6. Restore Default Settings
If you need to restore the default settings:
Stop the DNS Service:
1
2sc stop SmartDNS
sc delete SmartDNSRestore Default DNS Settings:
- Open Control Panel > Network and Internet > Network and Sharing Center
- Click on the current active network connection
- Click “Properties”
- Select “Internet Protocol Version 4 (TCP/IPv4)” and click “Properties”
- Choose “Obtain DNS server address automatically”
- Click “OK” to save the settings
Conclusion
The combination of local anti-pollution DNS and Layer 3 Tunnel provides us with an elegant solution that avoids DNS pollution issues while ensuring the best speed for accessing both domestic and international websites. This solution is particularly suitable for users who need to access both domestic and international resources simultaneously.
When you use both local anti-pollution DNS and a Layer 3 tunnel (configured with regional routing), you will gain the following advantages:
- Pollution-free Resolution: All domain names can obtain the correct IP address
- Efficient Access:
- DNS queries for domestic websites go directly through the local network, obtaining the IP most suitable for your network environment
- DNS queries for international websites go through the tunnel, avoiding pollution and obtaining CDN IPs close to the tunnel exit
- Direct connection for domestic websites, fast speed
- Tunnel connection for international websites, stable and reliable
- Fully Automatic Traffic Splitting: The system automatically determines which route to take, without the need to manually switch DNS or proxies
2025-04-27
My translation “Illustrated Large Models - Principles and Practice of Generative AI” (Hands-On Large Language Models) has finally gone to print and will be available in mid-May.
Praise for the Book (Chinese Edition)
Many thanks to Yuan Jinhui, founder of Silicon-Based Mobility, Zhou Lidong, director of Microsoft Research Asia, Lin Junyang, head of algorithms at Alibaba Qwen, Li Guohao, founder of CAMEL-AI.org community, and Zhong Tai, founder of AgentUniverse, for their strong recommendations!
Translator’s Preface
The development of large models is rapid, as the saying goes, “One day in AI, one year in the human world.” Many people are lost in the flourishing garden of models, unsure of which model to use for their application scenarios and unable to predict the development direction of models in the coming year, often feeling anxious. In fact, almost all large models today are based on the Transformer architecture, remaining fundamentally unchanged.
The book “Illustrated Large Models” is an excellent resource to help you systematically understand the basic principles and capability boundaries of Transformers and large models. When Turing Company approached me to translate this book, I immediately agreed upon seeing the author’s name, as it was Jay Alammar’s blog post “The Illustrated Transformer” that truly helped me understand Transformers (Chapter 3 of this book is an expansion of that blog post). Although there are countless books and articles explaining large models on the market, the exquisite illustrations and the depth and simplicity of the explanations in this book are rare. The book starts with tokens and embeddings, not limited to generative models, but also includes representation models that many overlook. Additionally, the book covers practical content such as text classification, text clustering, prompt engineering, RAG, and model fine-tuning.
I am very honored to be the translator of this book, working with editor Liu Meiying to bring this book to Chinese readers.
Take some time to read this book and systematically understand the basic principles and capability boundaries of Transformers and large models, just like having a map and compass on an adventure in the world of large models. This way, we won’t worry about new models rendering long-term engineering accumulation useless overnight, and we can develop products for future models. Once the model capabilities are ready, the product can scale up immediately.
I hope this book can become a sightseeing bus in the garden of large models, allowing more people to see the panorama of large models. Thus, the ever-expanding capability boundaries of large models become a visual feast rather than a monster devouring everything; we have the opportunity to stand at the forefront of AI, realize more dreams, and gain more freedom.
2025-04-27
This article is a companion material for the book Hands-On Large Language Models.
When interviewing candidates and attending industry seminars, I often find that many people have extensive practical experience but know very little about the basic principles of models. To help everyone better understand this book, and to facilitate those who need to prepare for interviews to read this book more purposefully, I have systematically compiled common interview questions in the field of large models around the themes of each chapter of this book. Most of the answers to these questions can be found directly in the book, while some advanced questions can be answered from the references in this book or the latest papers on the internet. I hope all readers can read this book with these questions in mind.
Chapter 1: Introduction to Large Language Models
- What is the difference between the encoder and decoder in Transformer, and are models with only an encoder or only a decoder useful?
- What are the differences between GPT and the model architecture in the original Transformer paper?
- What are the advantages and disadvantages of encoder-only (BERT-like), decoder-only (GPT-like), and full encoder-decoder architectures?
- Why is the self-attention mechanism of Transformer considered a significant advancement over the attention mechanism in early RNNs?
- Why do large language models have the concept of maximum context length? Why does it refer to the total length of input and output?
- How are the first token latency, input throughput, and output throughput of large language models calculated? What are the requirements for first token latency, input, and output throughput in different application scenarios?
- Why is the two-step paradigm of pre-training and fine-tuning so important? What core capabilities does the foundational model acquire through pre-training? What role does fine-tuning play in guiding the model to follow instructions, answer questions, and align with human values?
- How does LLaMA-3 8B achieve comprehensive capabilities stronger than LLaMA-1 70B?
2025-04-25
When building a cross-regional server network, such as the VLESS connection used in the article “Building a Three-Layer Tunnel with Full US IP, No Manual Proxy Setup Required,” we often encounter an efficiency issue: the congestion control mechanism of the TCP protocol itself. Although TCP congestion control is crucial for the public internet, in tunnel scenarios where the application layer protocol is already encapsulated (and may have its own flow control or congestion handling), the outer layer TCP congestion control becomes a burden.
Why Disable TCP Congestion Control and Nagle in Tunnels?
- TCP-over-TCP Problem: When you transmit data of one TCP connection (e.g., VLESS over TCP) inside another TCP connection, the so-called “TCP-over-TCP” problem arises. Both the inner and outer TCP have their own congestion control and retransmission mechanisms. When packet loss occurs, both layers of TCP will attempt retransmission and reduce the congestion window. This dual processing is not only redundant but can also lead to a sharp decline in performance, especially on high-latency, high-packet-loss international links. The retransmission timer of the inner TCP may trigger prematurely due to the delay and retransmission of the outer TCP, and vice versa, forming a vicious cycle. Additionally, TCP-over-TCP can cause severe Head-of-Line Blocking issues: a lost packet in the outer TCP will block all data of the inner connections it contains, even if these inner connections are completely unrelated. This means that a connection issue of one user may affect other users sharing the same tunnel.
- Application Layer Flow Control: The application layer protocol transmitted in the tunnel may already have its own flow control and reliability mechanisms. In this case, the congestion control of the underlying TCP is completely redundant, and it will only interfere with the normal operation of the upper-layer protocol, limiting its performance potential.
- Nagle Algorithm Delay: The Nagle algorithm aims to reduce the number of small packets in the network by aggregating small TCP packets into a larger one, thereby improving network utilization. However, in tunnel scenarios, we usually want data to be transmitted through the tunnel as quickly as possible, especially for interactive applications (like SSH) or applications with high real-time requirements. The delay introduced by the Nagle algorithm may negatively impact these applications. Disabling Nagle (via the
TCP_NODELAY
option) allows small packets to be sent immediately, reducing latency. - UDP’s Dilemma on the Public Internet: You might wonder, if TCP has so many issues, why not use UDP to establish tunnel connections directly? Unfortunately, UDP on the public internet, especially on international links, is often subject to ISP QoS (Quality of Service) policies, has lower priority, and is more likely to be dropped or throttled, leading to unstable connections. Therefore, in many cases, we have to choose TCP as the tunnel transport layer protocol, which requires us to find ways to optimize TCP’s behavior.
Therefore, for tunnel connections between servers (especially cross-regional connections), disabling the outer layer TCP’s congestion control and Nagle algorithm can significantly improve the tunnel’s throughput and response speed.
Solution: A Script
2025-04-01
[This article is based on my keynote speech at the 2025 China Generative AI Conference. The content is the result of a 2-hour brainstorming session with AI, followed by 3 hours of collaborative work with AI in Cursor for refinement.]
Summary: Some teams have found that the efficiency gains from applying AI in programming and writing are not as significant as expected. The reason often lies in the fact that a lot of knowledge is only in the minds of specific employees and not documented. As a result, AI Agents, like new interns, find it difficult to write code, and even if they do, they don’t know how to test it. Another reason is that internal tools like project management systems can only be operated through GUIs, which are not AI Agent-friendly. Today’s text inference models have reached human-level capabilities, and the inability to complete tasks is often due to a lack of background knowledge and AI-friendly tools.
We will discuss how to build an AI-native team that is friendly to AI Agents from the perspectives of software development, project management, and operations. An AI-native team needs to use recorded voice and written communication as much as possible, like an open-source community, to reduce reliance on individuals. AI Agents need to access various internal company tools through MCP, have enough context information, and a test environment to work efficiently. AI Agents need memory compression mechanisms, reflection mechanisms, and checkpoint rollback mechanisms to work continuously overnight without human intervention, making useful progress every hour. AI employees also need to actively communicate with human employees and other AI employees. This way, human employees can spend most of their time thinking and discussing, while most repetitive execution work is handed over to AI.
Below is the full text of the speech: (The PPT is the version used at the 2025 China Generative AI Conference, but the text explanation is not a transcript; it is an expanded version generated through brainstorming with AI):
Cover Page
2025-03-14
Live Theme: AI Agent, Destined to Explode?!
Time: March 13, 2025, 20:00—22:00
Method: GeekPark WeChat Video Channel “Tonight’s Tech Talk” Live Broadcast (with guests)
Live Guests:
- Jingyu | Deputy Editor of GeekPark
- Li Bojie | Chief Scientist of PINE AI
- Wanchen | Reporter at GeekPark
Key Highlights Summary
- The core features of AI Agents are the abilities to perceive, plan, and act, enabling them to autonomously gather information, make plans, and execute actions.
- General Agents like Manus will mimic “geek programmers” rather than ordinary people, possessing computational thinking and knowing when to use code and tools to solve problems.
- Current AI Agents are mainly divided into compiled types (like Dify) and interpreted types (like Manus), with compiled types having fixed workflows and interpreted types autonomously planning and making decisions.
- Compiled Agents and interpreted Agents will coexist for a long time rather than replace each other, with different scenarios having different optimal solutions.
- There is a “100x cost law” for large models: chip companies earn 10 times, and large model companies earn another 10 times, revealing the huge gap between model pricing and actual costs.
- Foundational models are key to enhancing the capabilities of general Agents, and humans find it hard to imagine something 10 times smarter than themselves, so human thinking should not be imposed on AI.
- Manus emphasizes “Less Structure, More Intelligence,” similar to the classic “The Bitter Lesson,” where the fewer structural constraints humans impose on AI, the higher the AI’s capability ceiling.
- New generation models like Claude 3.7 Sonnet have made significant breakthroughs in tool usage and programming capabilities, laying the foundation for Agent development.
- The open-source release of DeepSeek R1 makes RL (reinforcement learning) technology more accessible, lowering the threshold for developing high-quality Agents.
- RL training is an important means of building competitive barriers, converting industry experience and expertise into model capabilities.
- The computational power threshold required for RL training is not as high as imagined, and small models trained with RL can surpass large models in some vertical domains.
- Multi-agent architectures are not suitable for all scenarios and may replicate inefficient collaboration models found in human organizations in fields like software development.
- AI programming tools can also play a significant role in large software engineering projects but require a high-quality code engineering foundation, including comprehensive documentation, test cases, and standardized interfaces.
- AI programming tools struggle with “spaghetti code” for the same reason new interns find it hard to take over—there’s too much undocumented tribal knowledge in the code.
- The development of Agent technology will drive improvements in software engineering practices, enhancing code quality and maintainability to meet the standards of well-known open-source projects, making more projects AI-friendly.
- The MCP protocol proposed by Anthropic provides a standardized solution for the interconnection of the Agent ecosystem, allowing diverse professional services to connect rather than replace each other.
- OpenAI’s Responses API, Realtime API, and Anthropic’s MCP represent the direction of Agent framework development.
- The work efficiency of Agents is currently limited by the latency of visual models, with humans still having an advantage in certain operational speeds.
- Virtual machine sandboxes can provide independent working environments but require better personal data integration solutions.
- In the future, AI Agents may be divided into “fast thinking” (user interaction) and “slow thinking” (background processing) parts working together.
- General Agents are a battleground for hardware and operating system giants, but large companies will be relatively cautious in releasing products.
- Opportunities for startups in the Agent field mainly lie in vertical domains, accumulating professional data and industry knowledge through deep cultivation of specific scenarios.
- Programming, education, and interpersonal communication are the three fields most likely to see mature Agent applications first.
2025-03-14
Why You Need a Three-Layer Tunnel
Does your AI company often encounter the following situations?
- Need to access applications or large model APIs that are only open to US IPs, such as OpenAI, Anthropic, Google, etc.
- Need to connect to the company’s internal network in the US but don’t want to frequently set up proxies
Many people set up application layer proxies, which require setting HTTP_PROXY, HTTPS_PROXY, etc., in environment variables. However, many software do not support configuring proxies directly using environment variables, such as:
- Docker containers do not perceive external environment variables. If you want to use existing docker compose files and want the services inside the docker to automatically use the proxy, you’ll have to tinker a bit.
- Docker requires separate proxy configuration when accessing docker.io to pull images and build images.
- Various software sources, such as pip, npm, etc., require separate proxy configuration.
- Some software, like Google Cloud CLI, do not read proxy configurations from environment variables and require separate proxy configuration.
- Some software, like Cursor, directly use IP addresses to access servers and use non-standard WebSocket protocols, which some proxy software are not compatible with or are prone to issues.
- Some Node.js server-side libraries do not directly detect the HTTP_PROXY environment variable and require configuring an HTTP Proxy Agent. Some libraries (like axios) have bugs in proxy mode.
- Some compiled language code (like C++, Go) often assembles HTTP requests themselves and may not support configuring HTTP proxies.
- Some apps (like ChatGPT, Claude Code) use additional mechanisms to detect network environments. If they detect a proxy, they may refuse service or reduce intelligence (e.g., using a poorer model instead of the SOTA model).
2025-03-08
Overall, I think Manus is a product with a great idea, but there is still a lot of room for improvement in engineering.
Key Innovation: An Agent with Computational Thinking
Many people think it’s just a better computer use, but at first glance, I noticed a fundamental difference: OpenAI Operator and Anthropic Computer Use both mimic ordinary people, while Manus mimics a geek programmer.
OpenAI Operator / Deep Research and Anthropic Computer Use open browsers, desktop GUIs, and mobile apps, delivering results as a piece of text (at most with some Markdown format). Manus, on the other hand, opens a command-line terminal, writes a todo list using a text editor, continuously writes code for automation during work, and the final deliverable (Artifact) is also a piece of code (interactive web pages and charts).
This immediately reminded me of Dr. Jeannette Wing at MSR talking to us about Computational Thinking. Computational thinking is about abstracting problems in daily life and work, and then solving them with systematic logical reasoning and automation tools. I also introduced computational thinking to many juniors during my time at USTC.
2025-03-08
Reposted from NetEase Technology Public Account
Original Title: “Will Manus Initiate the Year of the Agent? A Conversation with Two AI Entrepreneurs Who Left Big Companies”
Produced by | NetEase Technology Attitude Column
Author | Yuan Ning
Editor | Ding Guangsheng
Like a boulder thrown into a lake, the splash from Manus’s release has gradually subsided, but the ripples continue to spread.
Will Manus initiate the year of the Agent? How should we understand Agents and their barriers? Is now the right opportunity for the development of Agents? How are different players preparing for the wave of Agents? Can current Agents replace interns…
On March 8, NetEase Technology invited two guests who left big companies and are now on the front lines of AI entrepreneurship—Li Bojie and Peng Kangwei—to share their insights and thoughts.
Li Bojie, a former “genius youth” at Huawei, served as the deputy chief expert at Huawei’s Computer Network and Protocol Laboratory and is a recipient of the Microsoft Scholar Award. In 2023, he ventured into AI entrepreneurship and is currently the Chief Scientist at PINE AI, dedicated to building a general intelligent assistant like Samantha from “Her” for everyone and every organization.
Peng Kangwei, who once developed a C-end product with over 100 million monthly active users at Tencent, left to start his own business in 2023 and founded Dream Horse Intelligence, which is working on a new generation of AI content platforms.
As entrepreneurs riding the AI wave, how do they find direction amidst the giant waves? What kind of future for Agents can be seen through their perspective? NetEase Technology has compiled their answers to ten key questions.
The following content has been edited by NetEase Technology without changing the original intent: