2025-04-27
My Translation "Illustrated Large Models - Principles and Practice of Generative AI" is Coming Soon

My translation “Illustrated Large Models - Principles and Practice of Generative AI” (Hands-On Large Language Models) has finally gone to print and will be available in mid-May.

Praise for the Book (Chinese Edition)

Many thanks to Yuan Jinhui, founder of Silicon-Based Mobility, Zhou Lidong, director of Microsoft Research Asia, Lin Junyang, head of algorithms at Alibaba Qwen, Li Guohao, founder of CAMEL-AI.org community, and Zhong Tai, founder of AgentUniverse, for their strong recommendations!

Translator’s Preface

The development of large models is rapid, as the saying goes, “One day in AI, one year in the human world.” Many people are lost in the flourishing garden of models, unsure of which model to use for their application scenarios and unable to predict the development direction of models in the coming year, often feeling anxious. In fact, almost all large models today are based on the Transformer architecture, remaining fundamentally unchanged.

The book “Illustrated Large Models” is an excellent resource to help you systematically understand the basic principles and capability boundaries of Transformers and large models. When Turing Company approached me to translate this book, I immediately agreed upon seeing the author’s name, as it was Jay Alammar’s blog post “The Illustrated Transformer” that truly helped me understand Transformers (Chapter 3 of this book is an expansion of that blog post). Although there are countless books and articles explaining large models on the market, the exquisite illustrations and the depth and simplicity of the explanations in this book are rare. The book starts with tokens and embeddings, not limited to generative models, but also includes representation models that many overlook. Additionally, the book covers practical content such as text classification, text clustering, prompt engineering, RAG, and model fine-tuning.

I am very honored to be the translator of this book, working with editor Liu Meiying to bring this book to Chinese readers.

Take some time to read this book and systematically understand the basic principles and capability boundaries of Transformers and large models, just like having a map and compass on an adventure in the world of large models. This way, we won’t worry about new models rendering long-term engineering accumulation useless overnight, and we can develop products for future models. Once the model capabilities are ready, the product can scale up immediately.

I hope this book can become a sightseeing bus in the garden of large models, allowing more people to see the panorama of large models. Thus, the ever-expanding capability boundaries of large models become a visual feast rather than a monster devouring everything; we have the opportunity to stand at the forefront of AI, realize more dreams, and gain more freedom.

Read More

2025-04-25
Disable TCP Congestion Control for Tunnel Connections to Improve Transmission Efficiency

When building a cross-regional server network, such as the VLESS connection used in the article “Building a Three-Layer Tunnel with Full US IP, No Manual Proxy Setup Required,” we often encounter an efficiency issue: the congestion control mechanism of the TCP protocol itself. Although TCP congestion control is crucial for the public internet, in tunnel scenarios where the application layer protocol is already encapsulated (and may have its own flow control or congestion handling), the outer layer TCP congestion control becomes a burden.

Why Disable TCP Congestion Control and Nagle in Tunnels?

  1. TCP-over-TCP Problem: When you transmit data of one TCP connection (e.g., VLESS over TCP) inside another TCP connection, the so-called “TCP-over-TCP” problem arises. Both the inner and outer TCP have their own congestion control and retransmission mechanisms. When packet loss occurs, both layers of TCP will attempt retransmission and reduce the congestion window. This dual processing is not only redundant but can also lead to a sharp decline in performance, especially on high-latency, high-packet-loss international links. The retransmission timer of the inner TCP may trigger prematurely due to the delay and retransmission of the outer TCP, and vice versa, forming a vicious cycle. Additionally, TCP-over-TCP can cause severe Head-of-Line Blocking issues: a lost packet in the outer TCP will block all data of the inner connections it contains, even if these inner connections are completely unrelated. This means that a connection issue of one user may affect other users sharing the same tunnel.
  2. Application Layer Flow Control: The application layer protocol transmitted in the tunnel may already have its own flow control and reliability mechanisms. In this case, the congestion control of the underlying TCP is completely redundant, and it will only interfere with the normal operation of the upper-layer protocol, limiting its performance potential.
  3. Nagle Algorithm Delay: The Nagle algorithm aims to reduce the number of small packets in the network by aggregating small TCP packets into a larger one, thereby improving network utilization. However, in tunnel scenarios, we usually want data to be transmitted through the tunnel as quickly as possible, especially for interactive applications (like SSH) or applications with high real-time requirements. The delay introduced by the Nagle algorithm may negatively impact these applications. Disabling Nagle (via the TCP_NODELAY option) allows small packets to be sent immediately, reducing latency.
  4. UDP’s Dilemma on the Public Internet: You might wonder, if TCP has so many issues, why not use UDP to establish tunnel connections directly? Unfortunately, UDP on the public internet, especially on international links, is often subject to ISP QoS (Quality of Service) policies, has lower priority, and is more likely to be dropped or throttled, leading to unstable connections. Therefore, in many cases, we have to choose TCP as the tunnel transport layer protocol, which requires us to find ways to optimize TCP’s behavior.

Therefore, for tunnel connections between servers (especially cross-regional connections), disabling the outer layer TCP’s congestion control and Nagle algorithm can significantly improve the tunnel’s throughput and response speed.

Solution: A Script

Read More

2025-04-01
New Exploration of AI Agents: Building AI-Native Teams and Empowering AI Employees

[This article is based on my keynote speech at the 2025 China Generative AI Conference. The content is the result of a 2-hour brainstorming session with AI, followed by 3 hours of collaborative work with AI in Cursor for refinement.]

Summary: Some teams have found that the efficiency gains from applying AI in programming and writing are not as significant as expected. The reason often lies in the fact that a lot of knowledge is only in the minds of specific employees and not documented. As a result, AI Agents, like new interns, find it difficult to write code, and even if they do, they don’t know how to test it. Another reason is that internal tools like project management systems can only be operated through GUIs, which are not AI Agent-friendly. Today’s text inference models have reached human-level capabilities, and the inability to complete tasks is often due to a lack of background knowledge and AI-friendly tools.

We will discuss how to build an AI-native team that is friendly to AI Agents from the perspectives of software development, project management, and operations. An AI-native team needs to use recorded voice and written communication as much as possible, like an open-source community, to reduce reliance on individuals. AI Agents need to access various internal company tools through MCP, have enough context information, and a test environment to work efficiently. AI Agents need memory compression mechanisms, reflection mechanisms, and checkpoint rollback mechanisms to work continuously overnight without human intervention, making useful progress every hour. AI employees also need to actively communicate with human employees and other AI employees. This way, human employees can spend most of their time thinking and discussing, while most repetitive execution work is handed over to AI.

Download the PPT of “New Exploration of AI Agents: Building AI-Native Teams and Empowering AI Employees” (PDF)

Below is the full text of the speech: (The PPT is the version used at the 2025 China Generative AI Conference, but the text explanation is not a transcript; it is an expanded version generated through brainstorming with AI):

Cover Page

Read More

2025-03-14
AI Agent, Destined to Explode—GeekPark "Tonight's Tech Talk" Live Broadcast

Live Theme: AI Agent, Destined to Explode?!

Time: March 13, 2025, 20:00—22:00

Method: GeekPark WeChat Video Channel “Tonight’s Tech Talk” Live Broadcast (with guests)

Live Guests:

  • Jingyu | Deputy Editor of GeekPark
  • Li Bojie | Chief Scientist of PINE AI
  • Wanchen | Reporter at GeekPark

Key Highlights Summary

  • The core features of AI Agents are the abilities to perceive, plan, and act, enabling them to autonomously gather information, make plans, and execute actions.
  • General Agents like Manus will mimic “geek programmers” rather than ordinary people, possessing computational thinking and knowing when to use code and tools to solve problems.
  • Current AI Agents are mainly divided into compiled types (like Dify) and interpreted types (like Manus), with compiled types having fixed workflows and interpreted types autonomously planning and making decisions.
  • Compiled Agents and interpreted Agents will coexist for a long time rather than replace each other, with different scenarios having different optimal solutions.
  • There is a “100x cost law” for large models: chip companies earn 10 times, and large model companies earn another 10 times, revealing the huge gap between model pricing and actual costs.
  • Foundational models are key to enhancing the capabilities of general Agents, and humans find it hard to imagine something 10 times smarter than themselves, so human thinking should not be imposed on AI.
  • Manus emphasizes “Less Structure, More Intelligence,” similar to the classic “The Bitter Lesson,” where the fewer structural constraints humans impose on AI, the higher the AI’s capability ceiling.
  • New generation models like Claude 3.7 Sonnet have made significant breakthroughs in tool usage and programming capabilities, laying the foundation for Agent development.
  • The open-source release of DeepSeek R1 makes RL (reinforcement learning) technology more accessible, lowering the threshold for developing high-quality Agents.
  • RL training is an important means of building competitive barriers, converting industry experience and expertise into model capabilities.
  • The computational power threshold required for RL training is not as high as imagined, and small models trained with RL can surpass large models in some vertical domains.
  • Multi-agent architectures are not suitable for all scenarios and may replicate inefficient collaboration models found in human organizations in fields like software development.
  • AI programming tools can also play a significant role in large software engineering projects but require a high-quality code engineering foundation, including comprehensive documentation, test cases, and standardized interfaces.
  • AI programming tools struggle with “spaghetti code” for the same reason new interns find it hard to take over—there’s too much undocumented tribal knowledge in the code.
  • The development of Agent technology will drive improvements in software engineering practices, enhancing code quality and maintainability to meet the standards of well-known open-source projects, making more projects AI-friendly.
  • The MCP protocol proposed by Anthropic provides a standardized solution for the interconnection of the Agent ecosystem, allowing diverse professional services to connect rather than replace each other.
  • OpenAI’s Responses API, Realtime API, and Anthropic’s MCP represent the direction of Agent framework development.
  • The work efficiency of Agents is currently limited by the latency of visual models, with humans still having an advantage in certain operational speeds.
  • Virtual machine sandboxes can provide independent working environments but require better personal data integration solutions.
  • In the future, AI Agents may be divided into “fast thinking” (user interaction) and “slow thinking” (background processing) parts working together.
  • General Agents are a battleground for hardware and operating system giants, but large companies will be relatively cautious in releasing products.
  • Opportunities for startups in the Agent field mainly lie in vertical domains, accumulating professional data and industry knowledge through deep cultivation of specific scenarios.
  • Programming, education, and interpersonal communication are the three fields most likely to see mature Agent applications first.
Read More

2025-03-14
Setting Up a Three-Layer Tunnel with Full US IP, No Manual Proxy Configuration Required

Why You Need a Three-Layer Tunnel

Does your AI company often encounter the following situations?

  • Need to access applications or large model APIs that are only open to US IPs, such as OpenAI, Anthropic, Google, etc.
  • Need to connect to the company’s internal network in the US but don’t want to frequently set up proxies

Many people set up application layer proxies, which require setting HTTP_PROXY, HTTPS_PROXY, etc., in environment variables. However, many software do not support configuring proxies directly using environment variables, such as:

  • Docker containers do not perceive external environment variables. If you want to use existing docker compose files and want the services inside the docker to automatically use the proxy, you’ll have to tinker a bit.
  • Docker requires separate proxy configuration when accessing docker.io to pull images and build images.
  • Various software sources, such as pip, npm, etc., require separate proxy configuration.
  • Some software, like Google Cloud CLI, do not read proxy configurations from environment variables and require separate proxy configuration.
  • Some software, like Cursor, directly use IP addresses to access servers and use non-standard WebSocket protocols, which some proxy software are not compatible with or are prone to issues.
  • Some Node.js server-side libraries do not directly detect the HTTP_PROXY environment variable and require configuring an HTTP Proxy Agent. Some libraries (like axios) have bugs in proxy mode.
  • Some compiled language code (like C++, Go) often assembles HTTP requests themselves and may not support configuring HTTP proxies.
  • Some apps (like ChatGPT, Claude Code) use additional mechanisms to detect network environments. If they detect a proxy, they may refuse service or reduce intelligence (e.g., using a poorer model instead of the SOTA model).
Read More

2025-03-08
Manus: An Agent with Computational Thinking, Like a Geek Programmer

This article was first published in a Zhihu answer to the question “How do you evaluate the general AI Agent product Manus released by a Chinese team? Will it become the next big hit?”

Overall, I think Manus is a product with a great idea, but there is still a lot of room for improvement in engineering.

Key Innovation: An Agent with Computational Thinking

Many people think it’s just a better computer use, but at first glance, I noticed a fundamental difference: OpenAI Operator and Anthropic Computer Use both mimic ordinary people, while Manus mimics a geek programmer.

OpenAI Operator / Deep Research and Anthropic Computer Use open browsers, desktop GUIs, and mobile apps, delivering results as a piece of text (at most with some Markdown format). Manus, on the other hand, opens a command-line terminal, writes a todo list using a text editor, continuously writes code for automation during work, and the final deliverable (Artifact) is also a piece of code (interactive web pages and charts).

This immediately reminded me of Dr. Jeannette Wing at MSR talking to us about Computational Thinking. Computational thinking is about abstracting problems in daily life and work, and then solving them with systematic logical reasoning and automation tools. I also introduced computational thinking to many juniors during my time at USTC.

Read More

2025-03-08
Will Manus Initiate the Year of the Agent? - NetEase Technology Live

Reposted from NetEase Technology Public Account

Original Title: “Will Manus Initiate the Year of the Agent? A Conversation with Two AI Entrepreneurs Who Left Big Companies”

Produced by | NetEase Technology Attitude Column

Author | Yuan Ning

Editor | Ding Guangsheng

Like a boulder thrown into a lake, the splash from Manus’s release has gradually subsided, but the ripples continue to spread.

Will Manus initiate the year of the Agent? How should we understand Agents and their barriers? Is now the right opportunity for the development of Agents? How are different players preparing for the wave of Agents? Can current Agents replace interns…

On March 8, NetEase Technology invited two guests who left big companies and are now on the front lines of AI entrepreneurship—Li Bojie and Peng Kangwei—to share their insights and thoughts.

Li Bojie, a former “genius youth” at Huawei, served as the deputy chief expert at Huawei’s Computer Network and Protocol Laboratory and is a recipient of the Microsoft Scholar Award. In 2023, he ventured into AI entrepreneurship and is currently the Chief Scientist at PINE AI, dedicated to building a general intelligent assistant like Samantha from “Her” for everyone and every organization.

Peng Kangwei, who once developed a C-end product with over 100 million monthly active users at Tencent, left to start his own business in 2023 and founded Dream Horse Intelligence, which is working on a new generation of AI content platforms.

As entrepreneurs riding the AI wave, how do they find direction amidst the giant waves? What kind of future for Agents can be seen through their perspective? NetEase Technology has compiled their answers to ten key questions.

The following content has been edited by NetEase Technology without changing the original intent:

Read More

2025-02-17
USTC Course Review Community's 10th Anniversary: Original Developers Return to Create Course Review Community 2.0

This article is reposted from the “Woke Xiaodao News” WeChat public account

What started as a sudden inspiration, pulling in two friends, officially launched after more than two months, has now existed in Woke for ten years.

“10 years ago,” during the 2015 spring semester course selection, Zhang Jingning, a freshman from the School of Physics, was actively participating in discussions in a QQ group chat.

“Which teacher is good for the compulsory course next semester?

“How is the grading?”

“Are there any interesting elective courses?”

The group chat was a closed ecosystem. Participants usually only received a sentence or two of evaluation from a senior, akin to the blind men feeling an elephant. These fragmented discussions made it difficult to filter out truly valuable information and even harder to preserve it.

Zhang Jingning recalled her experience with online courses (MOOC courses): she learned MOOCs spontaneously and proactively. She could learn about course content, teaching style, course difficulty, etc., in advance, and choose courses based on her interests, preferences, and needs, showing strong initiative in MOOC learning.

Coinciding with Academician Hou Jianguo’s launch of the “Freshman ‘Science and Society’ Seminar” at USTC, Zhang Jingning, along with her friends, Li Bojie and Chang Zhen from the School of Computer Science, developed the USTC Course Review Community to promote the transparency of course information on campus and help students find courses that suit them better.

The project started on March 8, 2015, and released its beta version on May 25, taking more than two months.

As of today (February 17, 2025), the website has been running for 3,566 days, with 14,234 participants contributing 37,176 reviews for 17,431 courses.

Read More

2025-01-14
In Memory of My Grandpa

At 1:00 PM on January 12, 2025, my father called me to say that my grandpa had suddenly passed away at home that afternoon.

Grandpa’s Lifetime in Geology

When my grandpa was young, he was a top student. In the late 1950s, he was admitted to Beijing Geological Institute (the predecessor of China University of Geosciences) to study mechanical engineering. At that time, Beijing Geological Institute was a prestigious university that produced many talents. Premier Wen was his junior, and “Father of Chang’e” Ouyang Ziyuan was his senior. Of course, my grandpa was far from being an outstanding alumnus of Beijing Geological Institute. In his junior year, the Sino-Soviet split occurred, and all Soviet experts withdrew, leaving no one to teach. In his senior year, grandpa joined the Institute of Geography of the Chinese Academy of Sciences and became an ordinary geologist.

Although grandpa’s position was not fieldwork, mainly conducting research in the lab, he often had to travel across the country for geological exploration. Geological exploration was not tourism; living rough was the norm. Transportation was not developed back then, and just taking a green train to the destination took several days. The places he went to were remote (places with many people didn’t need exploration), with wild mountains and waters. It was not uncommon to encounter wild animals while camping in the wild or geological disasters halfway up the mountain. There were no mobile phones or GPS back then; if you got lost, you might end up staying in the mountains.

A photo of grandpa from the family album, taken after his retirement during a mountain climb

Read More

2025-01-12
Data is the Moat for Internet and AI Companies

This article was first published in a Zhihu answer to the question “Looking back at the development of the internet, what underlying logics seem simple but will continue to be effective in the future?”

Data is the most important moat.

The Moat for Internet Companies is Data

I really like Lao Wang’s Product Class. Wang Huiwen is one of the founders of Xiaonei and Meituan. His Tsinghua product class is a classic, worth revisiting repeatedly. It talks about economies of scale, and social networks have network effects. The essence of network effects is actually data: who are my friends? How close am I to these friends?

Lao Wang’s product class mentions that replicating WeChat is difficult. Alibaba and ByteDance tried to attack WeChat but failed. However, if one day there is a Prophet app that knows all of a person’s real-life friendships and automatically generates friend relationships based on this, it could potentially compete with WeChat. This is the value of WeChat’s control over friend relationship data.

But this Prophet app doesn’t have WeChat’s chat history or Moments history, so something is still missing. This is the value of conversation history data. If the Prophet app goes further and knows what everyone says and does every day, then even WeChat might not be its match.

Read More
RSS