2025-08-03
Another Vibe Coding Interview Question: Attention-Based LLM Hallucination Detector

Following “Solving LLM Constrained Sampling Interview Question with Vibe Coding”, I’m sharing another Vibe Coding interview question from our company (Pine AI) about the fundamental principles of LLM.

Many people misunderstand Vibe Coding, thinking it’s just about constantly asking AI, “How do you do this? How do you implement that?” This approach is doomed to fail. True Vibe Coding requires you to be the architect and product manager, guiding the AI like a teacher instructing a student, not the other way around.

This interview question assesses candidates’ understanding of the basic principles of Transformers and their engineering ability to quickly implement vibe coding. This is the kind of person we need: someone who understands models and has strong engineering skills.

The Challenge: Attention-Based LLM Hallucination Detector

1. Background & Problem Statement

In many applications, large language models (LLMs) need to answer questions or extract information based on a given context, a process often referred to as “In-Context Learning.” However, LLMs have a known, serious security flaw: when asked about information not present in the context, they may “hallucinate” a correctly formatted but factually incorrect answer instead of admitting the lack of information.

Read More

2025-07-30
From Prompt Engineering to Context Engineering: The Secret to Writing Good Agents

[This article is based on a presentation at the Turing Community’s Large Model Technology Learning Camp, Slides Link]

Explore the design philosophy and practical strategies of AI Agents in depth. From the conversational mode of Chatbots to the action mode of Agents, systematically design and manage the information environment of Agents to build efficient and reliable AI Agent systems.

Table of Contents

  1. Part 1: Paradigm Shift - From Chatbot to Agent
  2. Part 2: Core Analysis of Agents
  3. Part 3: Context Engineering
  4. Part 4: Memory and Knowledge Systems

Part 1: Paradigm Shift - From Chatbot to Agent

From Chatbot to Agent: A Fundamental Paradigm Shift

We are experiencing a fundamental shift in AI interaction modes:

Chatbot Era

  • 🗣️ Conversational Interaction: User asks → AI answers → Repetitive Q&A cycle
  • 📚 Knowledgeable Advisor: Can only “speak” but not “act,” passively responding to user needs
  • 🛠️ Typical Products: ChatGPT, Claude Chat

Agent Era

  • 🎯 Autonomous Action Mode: User sets goals → Agent executes → Autonomous planning and decision-making
  • 💪 Capable Assistant: Can both “think” and “act,” proactively discovering and solving problems
  • 🚀 Typical Products: Claude Code, Cursor, Manus
Read More

2025-07-25
OpenRouter, Anthropic, Volcano Engine, Siliconflow Usage Guide

In AI application development, choosing the right LLM API service is crucial. Whether you are building an intelligent dialogue system, developing an AI Agent, or participating in an AI Hackathon, this article will provide you with a comprehensive API usage guide, covering mainstream services such as OpenRouter, Anthropic API, Volcano Engine, and Siliconflow.

Why Do You Need Multiple API Services?

Different LLM models have their own advantages, especially when developing AI Agents, where you need to choose the right model based on specific scenarios:

  • Claude (Anthropic): Excels in complex reasoning, programming, and Agent tasks, particularly suitable for scenarios requiring deep thinking
  • Gemini (Google): Performs well in long text processing and multimodal understanding, suitable for handling multimedia content such as images and videos
  • GPT (OpenAI): Strong in image understanding and mathematical reasoning, excellent for everyday conversation experiences
  • Doubao (ByteDance): Fast access speed in China, good voice dialogue experience, especially suitable for real-time interaction scenarios
  • Open Source Models: Low cost, highly customizable, suitable for large-scale deployment
Read More

2025-07-21
AI, Our Free "Dopamine Engine": From "Freedom From" to "Freedom To"

(This article is automatically generated based on my one-hour voice chat with Gemini 2.5 Pro)

The human pursuit of freedom is a profound dialogue with the biological instincts deep within us. Before we embark on this dialogue, we must first understand the two core aspects of “freedom,” as articulated by philosopher Isaiah Berlin:

  • The first is “freedom from”, which is negative freedom. It aims to rid us of external constraints, coercion, and interference. This is about delineating a sacred, inviolable “space” in our lives, with its ultimate form being financial freedom—where you are free from the compulsion to sell your labor for a living.
  • The second is “freedom to”, which is positive freedom. It seeks to make us masters of our own will, possessing enough ability and resources to realize our self-worth. This endows us with the “power” to act, with its ultimate form being creative freedom—where you can turn imagination into reality.

Understanding this pair of concepts allows us to uncover a deeper secret, revealed by Richard Sutton, the 2025 Turing Award winner and father of reinforcement learning, in his classic textbook “Reinforcement Learning”: what drives our happiness is not the static “reward” itself, but the dynamic “reward prediction error.” What truly makes our brains secrete dopamine and feel joy is the positive gap between “actual gain” and “prior expectation.”

A completely predictable, surprise-free world, no matter how affluent, has a reward prediction error approaching zero. This biologically explains why pure “Freedom From”—a comfortable, worry-free but unchanging haven—can ultimately lead to emptiness. In contrast, “Freedom To,” filled with challenges, exploration, and creation, is a powerful engine that continuously generates positive prediction errors.

Today, the rise of AI is handing the keys to this engine to each of us in unprecedented ways.

Read More

2025-07-18
Setting Up an IKEv2 Tunnel Without Installing a Client to Bypass Cursor's Regional Restrictions

(Thanks to Koutian Wu for thoroughly debugging and deploying, and for pointing out several technical issues in the original article, which have been corrected in this version)

As access to tools like Cursor and Claude Code becomes restricted in China, traditional HTTP/SOCKS proxies can no longer meet daily needs. These tools not only impose regional restrictions on the server side but may also employ multi-layered techniques to detect the user’s true geographical location (currently only partially implemented, but may be upgraded in the future):

  1. Basic IP Database Matching: Traditional GeoIP database queries
  2. Timezone Consistency Check: Obtaining the client’s timezone via JavaScript and cross-verifying with the IP’s geographical location
  3. DNS Resolution Check: Using Geo DNS resolution results to check the real location
  4. WebRTC IP Leak Detection: Obtaining the user’s real IP address via WebRTC
  5. CloudFlare Source Address Retrieval: Obtaining the real source address through CloudFlare’s HTTP headers

Most current HTTP/SOCKS proxies can only handle basic detection methods, while more complex multi-dimensional detection often leaves them powerless. A three-layer tunnel, working at the network layer, can more thoroughly hide the user’s real network environment.

Besides bypassing geographical restrictions, a three-layer tunnel is also suitable for the following scenarios:

  1. Server Access Control: Avoid exposing the SSH access port of company servers on the public internet
  2. Development and Testing Environment: Avoid exposing the company’s test servers, internal APIs, etc., on the public internet
  3. Secure Network Environment: Ensure communication security in untrusted public WiFi environments

While solutions like WireGuard and OpenVPN are stable and efficient, they require installing dedicated clients, which can be cumbersome in multi-device usage scenarios.

IKEv2, as a modern VPN standard, not only offers excellent performance and stability but, more importantly, is natively integrated into mainstream operating systems like macOS, Windows, iOS, and Android, eliminating the need to install any third-party clients.

This article will build on the architecture idea from “Skillfully Using Hong Kong as a Relay to Build a Smooth and Stable China-US Three-Layer Tunnel“ to construct a China -> Hong Kong -> USA IKEv2 tunnel three-hop solution.

Read More

2025-07-15
Solving LLM Constrained Sampling Interview Questions with Vibe Coding

This is an interview question from our company.

Some say our Vibe Coding programming questions are too difficult, but actually, our company’s 2-hour Vibe Coding interview questions basically don’t require you to write code yourself. Just input the question into the prompt, continuously interact with the LLM to propose requirements and improvement directions, and the AI will complete it for you.

Why is it called Vibe Coding? It’s about minimizing direct code writing. The division of labor between humans and AI becomes very clear: humans are responsible for direction control, problem definition, and result review, while AI is responsible for specific implementation. Something like Claude Code is an extreme example, where humans are not allowed to touch the code, only the LLM can.

Below, I will demonstrate how Vibe Coding works through the complete experience of this interview question. This entire exploration process was not smooth sailing; the AI’s initial solution had serious flaws. It was through my continuous review and direction correction that we finally arrived at a usable solution. This is not only about solving a technical problem but also a deep exploration of the future software development model.

It is worth mentioning that this article itself was also automatically generated by Gemini 2.5 Pro in Cursor based on my work log (including all my conversations with AI and the evolution of the code). From the moment I first posed the question to Cursor, to completing the final usable program, and then generating this illustrated blog post, the entire process took only 1.5 hours.

The Challenge: LLM Constrained Sampling

A software for learning English needs to ensure that all words output by its built-in LLM must be within a 3000-word vocabulary.

Requirements:

  1. Use the Constrained Sampling method of large language models (LLM) to modify the token sampling algorithm in the inference framework (such as transformers) to ensure that all content output by the LLM is within this given 3000-word vocabulary.

  2. Of course, punctuation, spaces, line breaks, etc., are allowed, but special characters, Chinese, French, emojis, etc., are not allowed.

  3. Case transformations of words in the vocabulary are considered valid words, for example, if the word apple is in the vocabulary, then apple, Apple, APPLE are all considered valid outputs.

  4. The 3000-word vocabulary can be any common English word list found online.

  5. The performance of the constrained sampling algorithm should be as good as possible.

Read More

2025-07-12
Skillfully Using Hong Kong as a Transit Point to Build a Smooth and Stable Three-Layer Tunnel Between China and the US

In the previous article, “Building a Three-Layer Tunnel with Full US IP and No Manual Proxy Settings,” we addressed many network issues encountered when accessing global services through the architecture of Domestic Server -> US Server. However, a new performance bottleneck has gradually emerged: the public connection between the domestic server and the US server experiences high latency and severe packet loss during peak hours.

This results in issues like SSH operation lag, online meeting disconnections, and API request timeouts, even when using a tunnel. The root cause lies in the international internet link between China and the US, which is like a highway during holidays—congestion is the norm.

Faced with this problem, a counterintuitive solution emerges: If the direct route is blocked, would taking a detour be faster?

Read More

2025-07-10


title: The Translated Work “Illustrated DeepSeek Technology” by Meng Jiaying and Me is About to be Released
date: 2025-06-30 10:00

Read More

2025-07-10
Zhongguancun Artificial Intelligence Academy & UCAS 2025 Summer AI Agent Practical Topics

The AI Agent Hackathon at UCAS in February 2025 was very successful, so I will host two AI Agent practical topics again from July 27 to 30 at Zhongguancun Artificial Intelligence Academy and from July 31 to August 4 at UCAS.

Many thanks to Professor Zheng Shuxin, Vice Dean of Zhongguancun Artificial Intelligence Academy, and Professor Liu Junming of UCAS for inviting me to host these two AI Agent practical activities.

All the topics of this AI Agent practice will take you deep into the cutting-edge technology of building the next generation of AI Agents. You will have the opportunity to practice:

  • Multimodal models and thinking model applications: Build the “brain” of the agent with industry-leading multimodal models and thinking models such as Gemini 2.5 Pro and Claude 4 Sonnet.
  • Real-time voice interaction: Integrate VAD, ASR, LLM, and TTS technology stacks to create real-time voice agents capable of streaming conversations.
  • Autonomous operation of graphical interfaces: Develop agents that can stably operate GUIs such as browsers to complete complex real-world tasks.
  • Advanced Agent Architecture: Explore advanced architectures such as “fast and slow thinking,” “thinking while listening,” and multi-agent collaboration to give agents the ability to respond in real-time and think deeply.
  • Learning from experience: Build agents that can learn from experience, allowing them to become more proficient in repetitive tasks.
  • Identifying authoritative information sources: Enable agents to accurately identify and adopt high-credibility information such as official documents and academic papers from vast amounts of information.
  • Autonomous tool invocation and creation: Allow agents not only to use existing tools but also to autonomously learn and create new tools to solve open-ended problems.

Suggestions on AI-assisted programming: In this AI Agent practice, we encourage everyone to use AI-assisted programming, which means “developing agents with agents.” We recommend using Cursor for Vibe Coding, and here are some suggestions:

  1. Documentation first, code later: Let Cursor write the design document first. Your role is to provide improvement suggestions for the AI-generated design document and iterate with the AI until satisfied. Then, let Cursor write the code according to the final design document. During coding, always keep the design document in the agent’s context as a reference.
  2. Choose the right model: Do not use Cursor’s “auto” mode; be sure to choose a model with thinking ability (with a brain icon next to it), such as Claude 4 Sonnet.
  3. Test-driven: Be sure to have AI write and execute test cases for its code to ensure code quality.

Feel free to form teams and choose any of the following topics to start your creative journey!

Read More

2025-07-10
UCAS 2025 Spring AI Agent Practical Course

The AI Agent Practical Course is a hands-on course conducted by Professor Liu Junming from UCAS and myself. The first session in 2024 had over 50 participants, and the second session in 2025 had over 100 participants. The 2025 Spring AI Agent Practical Course took place in early February 2025 in Beijing.

Course Directory:

Read More
RSS