Bojie Li
2025-07-12
In the previous article, “Building a Three-Layer Tunnel with Full US IP and No Manual Proxy Settings,” we addressed many network issues encountered when accessing global services through the architecture of Domestic Server -> US Server
. However, a new performance bottleneck has gradually emerged: the public connection between the domestic server and the US server experiences high latency and severe packet loss during peak hours.
This results in issues like SSH operation lag, online meeting disconnections, and API request timeouts, even when using a tunnel. The root cause lies in the international internet link between China and the US, which is like a highway during holidays—congestion is the norm.
Faced with this problem, a counterintuitive solution emerges: If the direct route is blocked, would taking a detour be faster?
2025-07-10
title: The Translated Work “Illustrated DeepSeek Technology” by Meng Jiaying and Me is About to be Released
date: 2025-06-30 10:00
2025-07-10
The AI Agent Hackathon at UCAS in February 2025 was very successful, so I will host two AI Agent practical topics again from July 27 to 30 at Zhongguancun Artificial Intelligence Academy and from July 31 to August 4 at UCAS.
Many thanks to Professor Zheng Shuxin, Vice Dean of Zhongguancun Artificial Intelligence Academy, and Professor Liu Junming of UCAS for inviting me to host these two AI Agent practical activities.
All the topics of this AI Agent practice will take you deep into the cutting-edge technology of building the next generation of AI Agents. You will have the opportunity to practice:
- Multimodal models and thinking model applications: Build the “brain” of the agent with industry-leading multimodal models and thinking models such as Gemini 2.5 Pro and Claude 4 Sonnet.
- Real-time voice interaction: Integrate VAD, ASR, LLM, and TTS technology stacks to create real-time voice agents capable of streaming conversations.
- Autonomous operation of graphical interfaces: Develop agents that can stably operate GUIs such as browsers to complete complex real-world tasks.
- Advanced Agent Architecture: Explore advanced architectures such as “fast and slow thinking,” “thinking while listening,” and multi-agent collaboration to give agents the ability to respond in real-time and think deeply.
- Learning from experience: Build agents that can learn from experience, allowing them to become more proficient in repetitive tasks.
- Identifying authoritative information sources: Enable agents to accurately identify and adopt high-credibility information such as official documents and academic papers from vast amounts of information.
- Autonomous tool invocation and creation: Allow agents not only to use existing tools but also to autonomously learn and create new tools to solve open-ended problems.
Suggestions on AI-assisted programming: In this AI Agent practice, we encourage everyone to use AI-assisted programming, which means “developing agents with agents.” We recommend using Cursor for Vibe Coding, and here are some suggestions:
- Documentation first, code later: Let Cursor write the design document first. Your role is to provide improvement suggestions for the AI-generated design document and iterate with the AI until satisfied. Then, let Cursor write the code according to the final design document. During coding, always keep the design document in the agent’s context as a reference.
- Choose the right model: Do not use Cursor’s “auto” mode; be sure to choose a model with thinking ability (with a brain icon next to it), such as Claude 4 Sonnet.
- Test-driven: Be sure to have AI write and execute test cases for its code to ensure code quality.
Feel free to form teams and choose any of the following topics to start your creative journey!
2025-07-10
The AI Agent Practical Course is a hands-on course conducted by Professor Liu Junming from UCAS and myself. The first session in 2024 had over 50 participants, and the second session in 2025 had over 100 participants. The 2025 Spring AI Agent Practical Course took place in early February 2025 in Beijing.
- Click here to see some of the practical results from the UCAS 2025 AI Agent Practical Course
- Click here to view reference project code (note, not complete code, for reference only)
Course Directory:
2025-07-08
Who are we?
Pine AI is dedicated to using AI to help users handle everyday chores and disputes.
In the United States, calling customer service is often a hassle. You might have to wait half an hour, then spend a long time talking with an agent. If the agent isn’t willing to help, you may be transferred to another department. By the end, a single call can take one to two hours. Many people don’t have that much time to haggle with customer service and end up taking the loss. Others find phone communication difficult due to limited spoken English.
Pine AI is building eloquent, knowledgeable, and exceptionally memory-capable AI Agents that can automate this entire process—making calls, sending emails, and using a computer—to handle tasks like a human secretary.
This is absolutely not as simple as slapping a prompt on a SOTA model. We are looking for exceptional people to tackle this world-class challenge with us.
2025-06-12
[This article is based on my invited talk at the A2M Internet Architecture and AI Technology Summit, Turing Large Model Technology Session.]
Hello everyone, welcome to the A2M Summit. Today, I will be sharing on the topic “Effective Agents: Real-Time Interaction with the Environment, Learning from Experience.”
Let me first introduce myself. I am the co-founder and chief scientist of Pine AI.
Currently, our business at Pine AI is to help users handle daily chores and disputes by making phone calls through AI. In the U.S., calling customer service is often a hassle. For instance, you might have to wait for half an hour and then spend a long time communicating with the service representative. If the representative is unwilling to help, you might be transferred to another department. So, the entire process can sometimes take one or two hours. Many people don’t have the time to argue with customer service, and sometimes they just have to accept the loss. Additionally, some people struggle with English, making phone communication difficult. Pine can automate this entire process through AI.
Making today’s AI capable of handling tasks end-to-end is extremely challenging; it’s not as simple as applying a SOTA model with a prompt. Most AI products only provide users with information, like generating a research report, but the actual task still requires the user to contact customer service.
To enable AI Agents to complete tasks end-to-end is indeed difficult. Today, we will discuss some of the core technical challenges and how Pine AI addresses these issues.
2025-04-28
1 | #!/usr/bin/env python3 |
3. 配置 Windows 使用本地 DNS
- 打开“控制面板” -> “网络和共享中心” -> “更改适配器设置”
- 右键点击当前使用的网络连接,选择“属性”
- 双击“Internet 协议版本 4 (TCP/IPv4)”
- 选择“使用下面的 DNS 服务器地址”,并输入
127.0.0.1
4. 启动 DNS 服务器
在命令提示符中运行:
1 | python C:\SmartDNS\standalone_smart_dns.py |
结论
通过本文介绍的方案,用户可以在本地搭建一个轻量级的智能 DNS 分流系统,有效解决 DNS 污染问题,并根据网络环境自动选择最佳的解析结果。该方案不仅适用于 macOS,也可以在 Windows 系统上实现,提供了灵活的跨平台解决方案。
3. Create a Windows Service
To run a Python script as a Windows service, we need to use the NSSM (Non-Sucking Service Manager) tool:
Download NSSM
Extract it to a suitable directory, such as
C:\SmartDNS\nssm
Run the command prompt as an administrator, then execute:
1
2cd C:\SmartDNS\nssm\win64
nssm.exe install SmartDNSIn the configuration window that appears, set:
- Path:
C:\Windows\System32\python.exe
- Startup directory:
C:\SmartDNS
- Arguments:
C:\SmartDNS\standalone_smart_dns.py
- Set the service display name and description in the “Details” tab
- Choose “Local System account” in the “Log on” tab
- Click “Install service”
- Path:
Note: In Windows, port 53 is a privileged port, and the Python script needs to run with administrator privileges to bind to this port. The above service setup using “Local System account” can solve this issue. If you still encounter permission issues, consider changing the listening port in the script to a non-privileged port (such as 5353), and then use firewall rules to forward requests from UDP port 53 to 5353.
- Start the service:
1
sc start SmartDNS
4. Configure the System to Use Local DNS
- Open Control Panel > Network and Internet > Network and Sharing Center
- Click on the current active network connection
- Click “Properties”
- Select “Internet Protocol Version 4 (TCP/IPv4)” and click “Properties”
- Choose “Use the following DNS server addresses”
- Set the preferred DNS server to “127.0.0.1”
- Click “OK” to save the settings
5. Verification Test
Execute in the command prompt:
1 | nslookup baidu.com 127.0.0.1 |
If the settings are correct, the domain names should resolve correctly.
6. Restore Default Settings
If you need to restore the default settings:
Stop the DNS Service:
1
2sc stop SmartDNS
sc delete SmartDNSRestore Default DNS Settings:
- Open Control Panel > Network and Internet > Network and Sharing Center
- Click on the current active network connection
- Click “Properties”
- Select “Internet Protocol Version 4 (TCP/IPv4)” and click “Properties”
- Choose “Obtain DNS server address automatically”
- Click “OK” to save the settings
Conclusion
The combination of local anti-pollution DNS and Layer 3 Tunnel provides us with an elegant solution that avoids DNS pollution issues while ensuring the best speed for accessing both domestic and international websites. This solution is particularly suitable for users who need to access both domestic and international resources simultaneously.
When you use both local anti-pollution DNS and a Layer 3 tunnel (configured with regional routing), you will gain the following advantages:
- Pollution-free Resolution: All domain names can obtain the correct IP address
- Efficient Access:
- DNS queries for domestic websites go directly through the local network, obtaining the IP most suitable for your network environment
- DNS queries for international websites go through the tunnel, avoiding pollution and obtaining CDN IPs close to the tunnel exit
- Direct connection for domestic websites, fast speed
- Tunnel connection for international websites, stable and reliable
- Fully Automatic Traffic Splitting: The system automatically determines which route to take, without the need to manually switch DNS or proxies
2025-04-27
[Thank you to all the readers who sent in over 50 corrections! The readers are really meticulous, finding so many errors, and I am very grateful for the corrections!]
My translation “Illustrated Large Models - Principles and Practice of Generative AI” (Hands-On Large Language Models) was released in May 2025. You can search for “Illustrated Large Models” on platforms like JD.com and Taobao.
Praise for the Book (Chinese Edition)
Many thanks to Yuan Jinhui, founder of Silicon Flow, Zhou Lidong, director of Microsoft Research Asia, Lin Junyang, head of Alibaba Qwen Algorithm, Li Guohao, founder of CAMEL-AI.org community, and Zhong Tai, founder of AgentUniverse, for their wholehearted recommendations!
Translator’s Preface
The development of large models is rapid, as the saying goes, “AI advances in a day, while the world takes a year.” Many people find themselves lost in the flourishing garden of models, unsure of which model to use for their application scenarios and unable to predict the development direction of models in the coming year, often feeling anxious. In fact, almost all large models today are based on the Transformer architecture, with all changes stemming from the same origin.
This book, “Illustrated Large Models,” is an excellent resource to help you systematically understand the basic principles and capability boundaries of Transformers and large models. When Turing Company approached me to translate this book, I immediately agreed upon seeing the author’s name, as it was Jay Alammar’s blog post “The Illustrated Transformer” that truly helped me understand Transformers (Chapter 3 of this book is an expansion of that blog post). Although there are countless books and articles explaining large models on the market today, the exquisite illustrations and the depth of explanation in this book are rare. The book starts with tokens and embeddings, not limited to generative models, but also includes representation models that many overlook. Additionally, the book covers practical content such as text classification, text clustering, prompt engineering, RAG, and model fine-tuning.
I am very honored to be the translator of this book, working with editor Liu Meiying to bring this book to Chinese readers.
Taking some time to read this book and systematically understanding the basic principles and capability boundaries of Transformers and large models is like having a map and compass on an adventure journey through large models. This way, we won’t worry about newly released models rendering long-term engineering accumulation useless overnight, and we can develop products for future models. Once the model capabilities are ready, the product can scale up immediately.
I hope this book can become a sightseeing bus in the garden of large models, allowing more people to see the full view of large models. Thus, the ever-expanding capability boundaries of large models become a visual feast rather than a monster devouring everything; we have the opportunity to stand at the forefront of AI, realize more dreams, and gain more freedom.
2025-04-27
This article is a companion material for the book Hands-On Large Language Models.
When interviewing candidates and attending industry seminars, I often find that many people have extensive practical experience but know very little about the basic principles of models. To help everyone better understand this book, and to facilitate those who need to prepare for interviews to read this book more purposefully, I have systematically compiled common interview questions in the field of large models around the themes of each chapter of this book. Most of the answers to these questions can be found directly in the book, while some advanced questions can be answered from the references in this book or the latest papers on the internet. I hope all readers can read this book with these questions in mind.
Chapter 1: Introduction to Large Language Models
- What is the difference between the encoder and decoder in Transformer, and are models with only an encoder or only a decoder useful?
- What are the differences between GPT and the model architecture in the original Transformer paper?
- What are the advantages and disadvantages of encoder-only (BERT-like), decoder-only (GPT-like), and full encoder-decoder architectures?
- Why is the self-attention mechanism of Transformer considered a significant advancement over the attention mechanism in early RNNs?
- Why do large language models have the concept of maximum context length? Why does it refer to the total length of input and output?
- How are the first token latency, input throughput, and output throughput of large language models calculated? What are the requirements for first token latency, input, and output throughput in different application scenarios?
- Why is the two-step paradigm of pre-training and fine-tuning so important? What core capabilities does the foundational model acquire through pre-training? What role does fine-tuning play in guiding the model to follow instructions, answer questions, and align with human values?
- How does LLaMA-3 8B achieve comprehensive capabilities stronger than LLaMA-1 70B?
2025-04-25
When building a cross-regional server network, such as the VLESS connection used in the article “Building a Three-Layer Tunnel with Full US IP, No Manual Proxy Setup Required,” we often encounter an efficiency issue: the congestion control mechanism of the TCP protocol itself. Although TCP congestion control is crucial for the public internet, in tunnel scenarios where the application layer protocol is already encapsulated (and may have its own flow control or congestion handling), the outer layer TCP congestion control becomes a burden.
Why Disable TCP Congestion Control and Nagle in Tunnels?
- TCP-over-TCP Problem: When you transmit data of one TCP connection (e.g., VLESS over TCP) inside another TCP connection, the so-called “TCP-over-TCP” problem arises. Both the inner and outer TCP have their own congestion control and retransmission mechanisms. When packet loss occurs, both layers of TCP will attempt retransmission and reduce the congestion window. This dual processing is not only redundant but can also lead to a sharp decline in performance, especially on high-latency, high-packet-loss international links. The retransmission timer of the inner TCP may trigger prematurely due to the delay and retransmission of the outer TCP, and vice versa, forming a vicious cycle. Additionally, TCP-over-TCP can cause severe Head-of-Line Blocking issues: a lost packet in the outer TCP will block all data of the inner connections it contains, even if these inner connections are completely unrelated. This means that a connection issue of one user may affect other users sharing the same tunnel.
- Application Layer Flow Control: The application layer protocol transmitted in the tunnel may already have its own flow control and reliability mechanisms. In this case, the congestion control of the underlying TCP is completely redundant, and it will only interfere with the normal operation of the upper-layer protocol, limiting its performance potential.
- Nagle Algorithm Delay: The Nagle algorithm aims to reduce the number of small packets in the network by aggregating small TCP packets into a larger one, thereby improving network utilization. However, in tunnel scenarios, we usually want data to be transmitted through the tunnel as quickly as possible, especially for interactive applications (like SSH) or applications with high real-time requirements. The delay introduced by the Nagle algorithm may negatively impact these applications. Disabling Nagle (via the
TCP_NODELAY
option) allows small packets to be sent immediately, reducing latency. - UDP’s Dilemma on the Public Internet: You might wonder, if TCP has so many issues, why not use UDP to establish tunnel connections directly? Unfortunately, UDP on the public internet, especially on international links, is often subject to ISP QoS (Quality of Service) policies, has lower priority, and is more likely to be dropped or throttled, leading to unstable connections. Therefore, in many cases, we have to choose TCP as the tunnel transport layer protocol, which requires us to find ways to optimize TCP’s behavior.
Therefore, for tunnel connections between servers (especially cross-regional connections), disabling the outer layer TCP’s congestion control and Nagle algorithm can significantly improve the tunnel’s throughput and response speed.