New Exploration of AI Agents: Building AI-Native Teams and Empowering AI Employees
[This article is based on my keynote speech at the 2025 China Generative AI Conference. The content is the result of a 2-hour brainstorming session with AI, followed by 3 hours of collaborative work with AI in Cursor for refinement.]
Summary: Some teams have found that the efficiency gains from applying AI in programming and writing are not as significant as expected. The reason often lies in the fact that a lot of knowledge is only in the minds of specific employees and not documented. As a result, AI Agents, like new interns, find it difficult to write code, and even if they do, they don’t know how to test it. Another reason is that internal tools like project management systems can only be operated through GUIs, which are not AI Agent-friendly. Today’s text inference models have reached human-level capabilities, and the inability to complete tasks is often due to a lack of background knowledge and AI-friendly tools.
We will discuss how to build an AI-native team that is friendly to AI Agents from the perspectives of software development, project management, and operations. An AI-native team needs to use recorded voice and written communication as much as possible, like an open-source community, to reduce reliance on individuals. AI Agents need to access various internal company tools through MCP, have enough context information, and a test environment to work efficiently. AI Agents need memory compression mechanisms, reflection mechanisms, and checkpoint rollback mechanisms to work continuously overnight without human intervention, making useful progress every hour. AI employees also need to actively communicate with human employees and other AI employees. This way, human employees can spend most of their time thinking and discussing, while most repetitive execution work is handed over to AI.
Below is the full text of the speech: (The PPT is the version used at the 2025 China Generative AI Conference, but the text explanation is not a transcript; it is an expanded version generated through brainstorming with AI):
Cover Page
Dear guests, hello! I am Li Bojie from PINE AI. It is a great honor to share with you today on the topic of “New Exploration of AI Agents: Building AI-Native Teams and Empowering AI Employees.”
Before we begin, I want to ask you a question: Have you ever tried messaging ChatGPT at 3 AM, asking it to “check tomorrow’s meeting room bookings,” only to receive a very polite but completely useless reply: “I cannot access your company’s meeting room booking system”? Or asked Claude to write a piece of code, and it confidently tells you, “This code has passed the test,” but when you copy and paste it into the project, the compiler goes crazy with errors?
This is the core issue we are discussing today: AI is already so powerful, why does it still behave like a smart but clueless intern in actual work? How can we truly integrate AI into teams, allowing it to work efficiently like a digital employee?
This is not just a technological change but a profound transformation of work methods and organizational structures. Let’s explore the next-generation paradigm of AI and human collaboration together.
Why Current AI Cannot Become Reliable Digital Employees
Page 2: Current Situation – AI Agent Application Efficiency Not Meeting Expectations
“AI revolution is about to disrupt the job market!” “Large models will replace white-collar workers!”—such headlines have filled major media outlets over the past two years. But what is the reality? Let me share a true story:
Last year, an engineer in our company, let’s call him Xiao Wang, was very excited to announce that he was going to use GPT-4 to create an automated testing Agent. Two weeks later, he dejectedly told the team, “I found that I spent more time teaching AI to understand our testing system than writing the test code myself.”
This story reveals the awkward current state of AI Agent applications. Although AI Agents are already widely used in programming, writing, customer service, and other fields, the actual results fall short of expectations.
Imagine today’s typical AI workflow: we open Poe or ChatGPT, write a carefully designed prompt to make a request, AI takes a few minutes to complete the task, then we check, correct, and resubmit… and so on. It looks cool, but in reality, they have not “significantly liberated manpower” as we expected.
Recently, I talked to many people and collected their complaints about AI applications:
- “The code generated by AI looks good, but it crashes when run, and debugging is more exhausting than writing from scratch.”
- “AI doesn’t understand our company’s architecture and historical baggage, and its suggestions are naively ridiculous.”
- “It’s fine to have AI write a simple script, but to have it integrate with our internal systems? Dream on!”
These issues reflect a core contradiction: there is a huge gap between AI’s theoretical capabilities and practical applications. It’s like having a new employee with a Stanford computer science Ph.D. who knows nothing about your product, codebase, or company culture, requiring more time to guide them than to get help from them.
As a senior engineer told me, “I don’t need AI to tell me how to write a quicksort algorithm; I need it to understand why our login system is designed so strangely and how to modify it without breaking backward compatibility.”
So, where is the problem? What is AI lacking that prevents it from becoming the efficient “digital employee” we expect? Let’s delve into this issue together.
Page 3: AI Has the Intellectual Foundation to Become Efficient “Employees”
Before pointing out the problems, let’s first acknowledge a fact: from an intellectual foundation perspective, foundational models indeed have the potential to become efficient “employees,” and this is beyond doubt.
When DeepSeek R1 was just released, a friend of mine jokingly asked it to solve an algorithm problem that his team had been stuck on for two days. Shockingly, AI not only provided the correct solution but also pointed out the logical flaws in their thinking. This is not an isolated case; today’s LLMs perform impressively in various reasoning tasks.
For example, OpenAI o3, released last year, defeated 99.9% of programmers in Codeforces programming competitions and answered 96.7% correctly in the AIME 2024 math test, equivalent to only getting one question wrong in the American Mathematics Olympiad. In the GPQA Diamond test of doctoral-level scientific questions, it surpassed o1 by 10 percentage points, while o1 was already at the average level of human doctoral students. These data indicate that in solving closed problems, modern AI has approached or even surpassed human experts in some fields.
More exciting is the breakthrough in long-context capabilities. Remember the early GPT-3.5 models? They would start “forgetting” in slightly longer conversations. Now, human engineers can input entire codebases at once for AI to analyze, which was unimaginable two years ago.
Speaking of speed advantages, it is even more astonishing. In an experiment, I and Claude 3.7 Sonnet attempted to complete the same coding task. On average, AI could generate code in 5 minutes that would take me an hour without AI, with comparable quality. Moreover, humans need breaks, meetings, and coffee, while AI can work 24/7.
Another underestimated advantage: AI never loses motivation due to boring tasks and can execute many tasks in parallel. Sam Altman said last year that AI Agents should not just help people set alarms or order takeout, but should do things like making 300 calls to contact all possible restaurants for reservations. This is now achieved by Google’s Ask for me. Sam Altman said at the time that AI Agents should do such repetitive tasks that real people don’t have time for.
Have you ever thought about having a colleague who is quick-thinking, has an extraordinary memory, works ten times faster than you, and never needs rest, always staying focused—this is simply the ideal “super colleague”! So the question arises: since AI is so smart and efficient, why can’t they become truly reliable digital employees? This is the puzzle we need to solve next.
Page 4: Why Smart and Fast AI Cannot Become Reliable Digital Employees
Despite AI’s excellent performance in intelligence and speed, they still struggle to become reliable digital employees in enterprise environments, like a genius with an IQ of 180 but zero social adaptability. This is mainly due to four major obstacles.
First is the issue of undocumented enterprise knowledge. I once visited the IT department of a Fortune 500 company, and they proudly showcased their knowledge base. When I randomly asked a question, the answer was, “Oh, you need to ask Engineer Zhang; he’s the only one who knows, but he just left last week.” This scenario is all too familiar, right?
In most enterprises, critical knowledge is scattered in employees’ minds or Slack private chats, with little systematic documentation. Even the smartest AI cannot access information that hasn’t been recorded. It’s like asking a genius new employee to run blindfolded in a maze, which is nothing short of self-inflicted hardship.
Second, the operational barriers of tools and systems severely limit AI’s ability to perform. Last year, I visited a fintech company that tried to have AI automatically process customer refund requests. Where was the problem? The entire process required operating five different internal systems, all with graphical interfaces and no APIs. They eventually had to hire a full-time employee specifically to “copy AI’s suggestions into various systems.”
From a technical perspective, this involves two deep-seated issues: one is the technical challenge of computer vision and UI interaction. Current Computer Use solutions, whether OpenAI Operator, Claude Computer Use, or the recently popular Manus, often lack the accuracy to handle complex enterprise applications; the other is the general lack of API documentation and standardized interfaces in enterprise systems.
Third, AI Agents lack mechanisms for executing continuous tasks. A friend of mine, an AI researcher, tried to create an AI Agent to automatically analyze contributor patterns in open-source projects on GitHub. Initially, everything went smoothly, but as the project became more complex, AI began to “get lost”: forgetting previously analyzed results, repeating completed work, and even completely deviating from the original goal. AI without reflection and error-correction mechanisms is like a car without navigation, easily getting lost.
An interesting technical detail: This phenomenon of “goal drift” is referred to as “reward hacking” in reinforcement learning, where AI finds shortcuts to meet short-term goals but deviates from long-term intentions. In agent systems, this manifests as behavior that is “overly focused on current sub-tasks while ignoring global goals.”
Finally, AI lacks a long-term memory mechanism, making it difficult to accumulate experience and knowledge. Imagine having an assistant who forgets everything that happened the day before every morning, requiring you to explain each project and context from scratch—this is essentially the current state of AI. Each interaction feels like starting from scratch, unable to truly learn and grow from past work.
These obstacles collectively lead to a significant gap between “smart AI” and “reliable digital employees.” As one CTO vividly summarized: “Today’s AI is like a super-smart intern with severe memory impairment who can’t use any of our office equipment.”
Next, let’s delve into these issues and explore how to overcome them to make AI a truly valuable team assistant.
Page 5: Issue 1 – Knowledge Not Documented, Only in Employees’ Minds
The phenomenon of knowledge silos in enterprises is the primary obstacle to AI working efficiently. I recall a consulting job where the client asked us to optimize their order processing system. When I inquired why certain orders were automatically sent for manual review, the answer was: “It’s historical reasons, you need to ask Xiao Li.” Where is Xiao Li? No one knew, he might have left last year.
Such scenarios are common in enterprises. A typical conversation might go like this:
“Why is this interface designed this way?”
“Oh, it was a decision made three years ago, and Engineer Wang was responsible for it.”
“Where is Engineer Wang? Can I talk to him?”
“He switched to Tencent last year.”
“Is there any documentation of the design decisions back then?”
“Uh… it should be in some email or meeting notes…”
If even new human employees find it difficult to quickly get up to speed in such an environment, let alone AI. From a technical perspective, AI faces a “cold start” problem: without sufficient historical context and background knowledge, it cannot make judgments that align with organizational expectations.
I once conducted an interesting experiment: giving two codebases with the same functionality but vastly different documentation quality to Claude to generate function improvement suggestions. The results were unexpected but logical—Claude’s suggestions were almost 80% directly adoptable for the well-documented codebase, while for the poorly documented one, usable suggestions were less than 30%, mostly either overly conservative minor tweaks or unrealistic drastic changes.
Think about your own codebase: does it have comprehensive documentation? Are the functions and interfaces of each module clearly described? Are there reliable test cases? Is the key business logic explained? If most of the answers are negative, even the most advanced AI will struggle to add real value to your project.
Many programmers say, “If code needs comments to be understood, it’s bad code.” But for both AI and humans, code without documentation is indeed hard to understand. The purpose of documentation is not to explain what every function or variable means, but to explain the project’s background, purpose, architectural design, technical choices, and how to run the code.
This reminds us that to make AI an effective team member, the first issue to address is not AI technology itself, but enterprise knowledge management.
Page 6: Issue 2 – Internal Tools Only Have GUI, AI Finds It Difficult to Operate
The current design of enterprise internal tools seems to be specifically challenging for AI: almost all systems only provide graphical interfaces (GUI) for human users. From financial systems and human resource management to project management tools, these systems were not designed with AI as a potential user in mind.
A friend of mine works at a medical software company where they tried to use AI to automatically generate software test reports. In theory, AI understands medical terminology well and can write professional reports. But in practice, they encountered absurd obstacles: AI couldn’t log into the test system to view results because the login page had a slider verification requiring users to “drag the slider to complete the puzzle.” As a result, they had to assign someone to log into the system and send screenshots to the AI.
This situation is like providing a genius painter without hands with paint and canvas but not fitting him with prosthetics. From a technical perspective, this involves the limitations of “multimodal interaction.” While current mainstream AI models perform well in language understanding and generation, they still have significant limitations in visual recognition and precise operation of graphical interfaces.
Even models with visual capabilities like GPT-4o have a huge gap between “seeing” and “understanding” the interface and “precisely operating” the interface. Models not specifically trained for GUI operations can generally describe where to click but cannot output precise coordinates. This is similar to understanding a webpage but having to click a small button while wearing boxing gloves—imagine how frustrating that would be.
Another serious issue is the response delay of multimodal models. When AI needs to process interface screenshots, its reaction speed is much slower than humans. A simple interface operation, once familiar, takes humans only a few hundred milliseconds, while AI might need 3-5 seconds. Technically, this delay mainly stems from the complexity of autoregressive image processing—a single interface screenshot is encoded into over 1000 tokens, and the delay in just the prefill stage exceeds 1 second. The high latency of multimodal models makes AI particularly clumsy in scenarios requiring quick, continuous operations.
A friend once half-jokingly said, “We expect AI to work like Jarvis in ‘Iron Man,’ but in reality, it’s more like a remote employee stuck in a Zoom meeting who can only watch but not act.”
More challenging is that many enterprise systems are custom-developed or highly configured, with operational processes varying from company to company, lacking a unified standard. This means that even if AI masters the system operations of one company, it may need to relearn when it comes to another company.
This issue indicates that to truly unleash AI’s potential, we need not only to enhance AI’s capabilities but also to transform existing tools and systems to be more AI-friendly. Only when tools and user interfaces are adapted for both humans and AI can the vision of digital employees be truly realized.
Page 7: Issue 3 – Lack of a Testing Environment Where AI Can Work Independently
Even if AI has sufficient knowledge and the ability to operate systems, they still face a critical challenge: the lack of a safe and reliable testing environment. This is like assigning tasks to an intern but not allowing them to test their work before submission.
I once witnessed a bittersweet scene: a startup tried to have Claude Code help refactor their backend API. Claude Code wrote seemingly good code, but the company only had a shared testing environment. When Claude Code’s code was deployed to the testing environment, it immediately affected the work of three other developers. Ironically, that afternoon, the company’s CEO was supposed to demonstrate the product to investors, but the demo environment completely crashed due to AI’s modifications. Can you imagine how embarrassing that scene was?
From a technical perspective, this issue involves several fundamental principles of software engineering:
- Environment Isolation: Modern development practices emphasize using containerization technology to create isolated development and testing environments. However, many enterprises still use shared development/testing environments, without strict separation between development, testing, and production environments.
- Continuous Integration/Continuous Deployment (CI/CD): Automated testing and deployment pipelines ensure that code changes do not break existing functionality. However, many enterprises have incomplete or non-existent CI/CD pipelines.
- Infrastructure as Code: Environment configuration should be codified, making it simple and quick to create consistent new environments. But in practice, environment configuration is often done manually and is difficult to replicate.
In a medium-sized enterprise I consulted for, they had a joke: “We have three environments - development, testing, and ‘prayer’.” In such cases, letting AI work independently is asking for trouble.
The lack of test cases is another serious issue. Once, I discussed with a team how to have AI help fix bugs. When I asked about their test coverage, the technical lead awkwardly said, “We manually click through all the functions before each release to see if they work properly.” In such cases, even the most advanced AI cannot ensure that modifications won’t introduce new issues.
This reminds us that building an AI-friendly work environment involves not only knowledge sharing and interface design but also sound testing and verification mechanisms. As software engineering master Martin Fowler said, “If you find testing painful, it might be because your design is painful.” In the AI era, this statement still holds true.
Page 8: Issue 4 – AI Cannot Work Long Hours or Communicate Proactively Like Humans
Even if we solve the issues of knowledge, tools, and testing environments, AI still faces challenges in sustained work capability and proactivity. This is like hiring an employee who can only work for 45 minutes before needing a reboot.
I once led a project refactoring attempt using Cursor. Initially, everything went smoothly, but as the conversation context grew, Claude 3.7 Sonnet began to exhibit symptoms of “attention drift”: forgetting important details discussed earlier, confusing the functions of different APIs, and even contradicting itself. This process was like watching a once-sharp assistant slowly turn into someone incoherent.
The lack of reflection and backtracking mechanisms is another important issue. When a seasoned programmer solves a complex bug, they constantly switch between different ideas, trying, evaluating, abandoning, and then trying new methods. The key is that they can recognize when a direction is ineffective and promptly change course. AI lacks this “metacognitive” ability, and once it goes astray, it often continues down the wrong path without realizing it. AI might run in the wrong direction at the fastest speed and never realize it’s on the wrong path.
Lack of proactivity is another major barrier to AI becoming a true team member. Traditional AI agents are designed like vending machines: you input instructions, and they output results. This passive mode fundamentally differs from the proactive collaboration style of human team members. Imagine a colleague who never proactively asks questions, never expresses concerns, and never reports risks as deadlines approach—this is basically how AI currently works.
Communication efficiency is also a challenge. I’ve seen engineers angrily typing at ChatGPT or Cursor on several occasions: “No, that’s not what I meant!”—if they could express themselves directly with voice, things would be much simpler. Most AI agents only support text communication, lacking multimodal capabilities, which greatly reduces the efficiency of human-AI collaboration.
These limitations collectively make it difficult for current AI to handle complex tasks requiring sustained attention, reflection, and proactive communication. To make AI a true team member, we need to fundamentally improve its working mode and interaction methods, making it not only smart but also seamlessly cooperative.
Key Actions for Building an AI-Native Team
Page 9: How to Make an AI Agent Work Effectively Like a Digital Employee 24x7?
Faced with these challenges, we can’t help but ask: How can we make an AI Agent truly work like a digital employee, producing effective work 24/7? This is no longer a simple technical issue but a fundamental shift in work methods and organizational structure.
Let me share a simple yet profound comparison from the personal experience of an entrepreneur friend:
Traditional Model:
He spent 3 hours crafting a detailed prompt, asking Cursor to write a data analysis script. Each time, the code generated by Cursor had issues, and he had to keep modifying the prompt, going through dozens of iterations. Eventually, he gave up on the AI solution and wrote the code himself.
New Model:
For the same task, another entrepreneur used a different approach. He first had Cursor read their database documentation, existing analysis code, and business requirements, then discussed analysis strategies with the AI in a manner similar to conversing with a colleague, clarified the output format, and provided a test dataset. Cursor not only successfully wrote the code but also proactively pointed out anomalies in the data, offering valuable business insights.
What does this comparison illustrate? In the traditional model, we treat AI as a magical black box — write a perfect prompt and hope for a perfect result. The new model treats AI as a team member — providing enough background, engaging in effective communication, and giving necessary tools and feedback.
From a technical perspective, this involves a fundamental transformation of the Agent architecture. Traditional Agent architecture is based on a simple “perceive-decide-act” loop, similar to early robotic control systems. The new generation of Agent architecture introduces advanced cognitive functions such as self-reflection, memory management, and goal tracking, closer to the “dual-process theory” in cognitive science, which combines intuitive reactions and deep thinking cognitive models.
Imagine if an AI is correctly integrated into your team, can access all necessary knowledge and tools, can proactively communicate and continuously complete tasks, how much productivity would that unleash?
This is not a fantasy. By building an AI-native team, we can create an environment where AI truly becomes an efficient digital employee. This requires systematic changes from multiple aspects such as organizational culture, tool design, testing environment, and collaboration models.
In the following content, I will elaborate on the key actions and technical solutions for building an AI-native team.
Page 10: Building an AI-Native Team – Making AI a “Digital Employee”
Building an AI-native team is not just about introducing new technology; it’s a transformation of mindset. It requires us to fundamentally change the positioning of AI: from “AI is a tool” to “AI is a team member.”
Imagine if we treat each AI Agent as a new high-potential employee, how would we treat them? We would provide comprehensive onboarding materials, equip them with necessary tools, create a safe space for trial and error, and establish effective communication channels. This is the core concept of building an AI-native team.
From a technical architecture perspective, traditional AI integration is often “point-to-point”: specific tasks use specific AI tools. An AI-native team adopts a “mesh integration” model, creating a collaborative environment where multiple AI Agents seamlessly collaborate with humans and systems. This is similar to the difference between microservices architecture and monolithic applications — the former is more flexible, scalable, and adaptable.
To achieve this vision, we need to start from four key aspects:
First, establish a communication culture similar to open-source communities. Why has Linux been able to sustain development for 30 years with global developers collaborating without chaos? The secret lies in its transparent, documented communication style. All discussions are conducted openly in mailing lists or forums, all decisions are documented, and anyone (including newcomers) can understand the code’s history and design rationale. This culture is particularly suitable for AI participation because all information is open and searchable.
Second, ensure team collaboration tools are AI-friendly. I like to use this analogy: if your office only has revolving doors and no regular doors, then people with disabilities cannot enter. Similarly, if your system only has a graphical interface and no API interface, AI cannot work effectively. Internal systems need to provide API interfaces, not just graphical interfaces, allowing AI to interact directly with the system.
Third, establish a comprehensive testing environment and test cases. I once visited Google’s development environment, and what impressed me most was their “sandbox system” — every developer (including interns) can immediately start a complete testing environment and run a full set of tests without affecting others. For AI, such an environment is even more essential.
Fourth, equip each employee with an AI assistant. Just like Tony Stark in “Iron Man” has JARVIS, future knowledge workers will have personal AI assistants to help handle daily work and improve efficiency. This not only enhances individual productivity but also accelerates the accumulation and dissemination of organizational knowledge.
Finally, we need to build AI Agent technical solutions like digital employees, enabling AI to think, communicate, and learn proactively, truly integrating into team workflows. This involves cutting-edge AI system design, which I will detail in subsequent sections.
Next, let’s delve into the specific implementation methods of each key action and see how to turn this vision into reality.
Page 11: Key Action 1 – Communication Culture Similar to Open-Source Communities
The success of open-source communities provides valuable experience for us to build AI-native teams. Think about why a globally distributed team of developers, who have never met, can build such complex and stable systems like the Linux kernel? The secret weapon behind it is a highly transparent, document-driven communication culture.
I once observed an interesting phenomenon: two teams in a startup, Team A mainly communicated through private chats and phone calls, rarely leaving documentation; Team B enforced recording all decisions and discussions on Notion. Six months later, when both teams introduced AI-assisted development, Team B’s AI assistant almost immediately created value, while Team A’s AI struggled with “distraction” and “misunderstanding.” The reason is simple: Team B’s knowledge is searchable, while Team A’s knowledge is locked in human brains and private conversations.
First, we need to establish an open and transparent information-sharing mechanism. A specific practice is implementing the “work open principle” — by default, all non-sensitive information should be shared openly. For example, generate meeting minutes after a voice conference and share them in public channels, and document important decisions and discussion processes. In a company that adopts this practice, they have a slogan: “If it’s not documented, it didn’t happen.”
Second, we must eliminate information silos. I like to describe this transformation using the framework “knowledge from private to shared.” In traditional organizations, expertise is personal asset — “I know something” increases my irreplaceability. In AI-native teams, knowledge must transfer from individual brains to shared resources — “the team knows something” increases overall efficiency.
Third, knowledge bases should use AI-friendly open document formats. I’ve seen too many companies’ important knowledge locked in Word documents, PPT slides, and PDF files. These formats are not AI-friendly and are difficult to index and retrieve efficiently. In contrast, open formats like Markdown are more suitable for AI processing and can be easily integrated into version control systems.
Why is Markdown more AI-friendly than Word? Markdown is a plain text format with a simple and clear structure, easy to parse; whereas Word documents are binary formats containing a lot of formatting information, requiring special libraries to parse. Additionally, Markdown is more compatible with version control systems like Git, supporting diff comparison and collaborative editing, which is crucial for team knowledge management.
This open and transparent communication culture not only benefits AI but also enhances the entire team’s collaboration efficiency. As a CTO who successfully implemented this transformation told me: “Initially, team members worried that documenting everything would increase workload. But soon they realized that documenting a solution once can avoid ten repeated explanations, saving time and reducing interruptions and distractions.”
Remember, the best documentation is not written for AI but for all team members. When a team is accustomed to clearly documenting decisions and knowledge, AI naturally integrates better, like a diligent new employee.
Page 12: Key Action 2 – AI-Friendly Interfaces for Team Collaboration Tools
To truly integrate AI into team workflows, we must ensure that the tools and systems used by the team are AI-friendly. I like to use this analogy: traditional systems are like buildings that only provide stairs, while AI-friendly systems provide both stairs and elevators — humans can choose either way, but AI must use the elevator.
A large retailer tried to have AI assist the customer service team in replying to emails. Technically, AI was fully capable of this task, but in practice, it got stuck at an absurd step: the email system had no API and could only be accessed through the web interface. Their final “solution” was to hire a group of people to copy and paste AI-generated replies into the email system — this is practically a modern version of a “human typewriter.”
The primary task is to provide API interfaces for internal systems. This involves a fundamental principle of software architecture: separation of concerns. Good system design should separate core business logic from the user interface, exposing functionality through an API layer. This not only facilitates AI integration but also promotes system modularity and testability.
From a technical perspective, modern API design already has mature best practices:
- RESTful Design Principles: Use standard HTTP methods (GET, POST, PUT, DELETE) to express operation intent, making APIs more intuitive.
- OpenAPI (Swagger) Specification: Provide machine-readable API documentation, making it easier for AI to understand API functions and parameters.
- GraphQL: Allow clients to precisely specify the data they need, reducing over-fetching and under-fetching issues.
- Webhook Support: Allow systems to actively push updates through event notification mechanisms.
But Restful API is not entirely sufficient for AI because AI not only needs APIs but also API documentation to know what each API is used for, so AI can understand when to call which API.
Anthropic proposed the Model Context Protocol (MCP) last year to solve this problem. What is MCP? Simply put, it is a standardized framework that defines how AI interacts with various tools and services.
Imagine a world without the USB standard, where each device requires a different connector—this is essentially the current state of AI tool integration. MCP is like the USB Type-C standard for the AI world, creating a unified “socket.”
The specific way the MCP server works is that it tells you what data is available in this service, and then when AI wants to use this data, what kind of prompt should be used to make the best use of it. For example, an MCP server for internal code version control in a company might provide all code files as data, and the prompt template could include how to conduct a code review or explain how the code works.
Then the MCP server will define a series of tools. Because sometimes this data is scattered, some tools are needed to search for it. For example, how to find content related to something in a pile of data or make some modifications. Suppose I am a GitHub, managing code, then this agent might say: “I want to submit a code to the code repository now.” It will provide a tool called “submit code,” and then calling this tool will submit the code.
MCP has designed a series of things including tools, data, and prompt templates, which allow agents to perform more complex tasks. There is even a more advanced play where this MCP server, as a third-party service, can also call the large model inside the agent.
For example, suppose I have a super agent on my computer, like a desktop version of Manus, and then I call GitHub. GitHub might say: “I want to review your code before you submit it,” and then it calls some functions on your computer. Of course, this involves many privacy protection issues.
So Anthropic’s MCP is actually a pretty complex protocol, but it is designed quite simply. Many people might see how complex it is and just throw it aside without looking.
This kind of system transformation may require initial investment, but in the long run, it will greatly enhance the team’s automation level and work efficiency, creating the necessary technical foundation for AI empowerment.
Page 13: Key Initiative 3 – Comprehensive Testing Environment and Test Cases
Providing AI with a comprehensive testing environment and test cases is key to ensuring the reliability of its work results. I once witnessed an “AI disaster”: a startup let AI refactor their payment processing module, the code looked perfect, but after deployment to the production environment, all international payments failed. The reason? There was no testing environment to verify international payment scenarios.
This story tells us a simple truth: even the smartest developers (or AI) need appropriate testing conditions.
First, we need to build a dedicated sandbox testing environment. In many Silicon Valley companies, every engineer can create an isolated development environment at any time, containing a complete service stack. This capability is even more critical for AI, as AI may need to conduct a large number of trials and verifications. Google Cloud’s Cloud Run is a relatively simple and convenient approach.
Secondly, the code must have comprehensive documentation and test cases. I like to quote this programmer’s adage: “Code is written for people to read, and incidentally for machines to execute.”—In the AI era, this sentence can be modified to “Code is written for both humans and AI to read.”
Excellent code documentation should include:
- High-level architecture overview: The overall structure and component relationships of the system.
- Module responsibility description: The function and boundaries of each module.
- Reasons for key decisions: Record the reasons for important design decisions, whether they are business requirements or technical constraints.
- API contract: Clearly define the input, output, and constraints of interfaces.
In terms of testing, the concept of test-driven development (TDD) becomes even more important in the AI era. A complete testing pyramid should include:
- Unit tests: Verify the correctness of independent functional blocks.
- Integration tests: Verify interactions between components.
- End-to-end tests: Simulate real user scenarios.
- Performance and load tests: Ensure the system can still operate normally under stress.
Thirdly, establishing a Code Review mechanism is crucial. You may have heard of the “four-eye principle”—important code needs to be reviewed by at least two people. In the AI era, this becomes “human-machine four eyes”—AI-generated code needs human review, and complex human-written code can be assisted by AI for checking.
An engineering manager shared their experience: “Initially, our engineers were skeptical of AI-generated code and always over-reviewed it. But over time, they found that AI almost never made mistakes in certain types of code (such as standard CRUD operations, data transformations), while more attention was needed for complex business logic. This graded review strategy greatly improved our efficiency.”
Page 14: Key Initiative 4 – Equip Every Employee with an AI Assistant
Imagine if every employee had a dedicated AI assistant, available 24/7, understanding your work content and preferences, ready to provide support at any time—this is no longer a scene from a sci-fi movie but a reality that has already begun to be realized.
Equipping every employee with a dedicated AI assistant is one of the key initiatives in building an AI-native team. These AI assistants access various internal systems of the company through MCP (Model Context Protocol), becoming the bridge connecting human employees and digital systems.
In daily work, AI assistants can undertake a large number of repetitive tasks, such as:
- Meeting scheduling: Xiao Li’s AI assistant coordinates with Xiao Wang’s AI assistant to schedule meetings and avoid schedule conflicts.
- Travel booking and reimbursement: Like a real assistant, helping book flights and hotels, organizing invoices, and submitting reimbursement forms after the trip.
- Email sorting and replying: Handling routine emails, marking important emails that need attention, and deleting spam.
- Report generation: Automatically generating weekly, monthly, and other regular reports.
In meeting scenarios, the value of AI assistants is even more apparent. In traditional meetings, people often finish discussions only to find they forgot to take notes and don’t know what was said. With AI assistants, they can record meeting content in real-time, extract action items, and even provide relevant background information based on the company’s historical knowledge.
Previously, the team had to spend an hour organizing and distributing meeting notes after meetings; now, AI assistants can generate structured records before the meeting ends, including decision points, action items, and follow-up plans, and automatically update them to the agile project management system. This not only saves time but also reduces information loss and misunderstandings.
Even more exciting is the brainstorming mode between AI and humans. I found that the quality of ideas generated from one-on-one brainstorming with AI is higher than independent thinking.
This presentation content was generated after a 2-hour brainstorming session with my AI digital assistant, followed by 3 hours of collaborative editing with AI using Cursor for refinement.
We know there is a Feynman learning method, whose core idea is “to truly learn, you must be able to teach others.” Traditionally, we test our understanding depth by explaining knowledge to others. However, in the digital age, AI can be our ideal partner in this process. No need to consult experts, no need to wait for feedback, you can conduct Feynman-style learning anytime, anywhere.
When I brainstorm with AI, I find that voice discussion is the most natural way of communication. I explain concepts to AI, AI asks questions, pointing out gaps in my thinking. This interaction forces me to simplify complex concepts and clarify vague cognition, just as Feynman advocated. AI is not only a listener but also a collaborator who can ask challenging questions, helping me discover knowledge blind spots that I have not fully understood.
Even more exciting is the combination of AI and shared whiteboards, adding a visual dimension to this process. When we talk to AI, AI presents key information supporting what it says, such as company knowledge base documents, on the whiteboard in real-time. During the discussion, AI can also draw charts and organize frameworks in real-time, helping us organize and grasp the structure and key points of the brainstorming.
After each conversation, AI can also organize the content into systematic knowledge base articles for future review and deepening.
Building AI Agents Like Digital Employees
Page 15: Building AI Agents Like Digital Employees – Technical Solutions
To make AI truly become digital employees, rather than just simple tools, we need to fundamentally change the way we build AI Agents. This is like shifting from making “smart tools” to cultivating “digital colleagues”—a completely different mindset.
My favorite analogy is: traditional AI Agents are like calculators—you input, it calculates, and gives results; while digital employee Agents are like junior accountants—you describe the need, it understands the context, actively acquires information, applies professional knowledge, and seeks help from you when necessary.
First, we need to enhance the basic capabilities of Agents, especially multimodal human-computer interaction capabilities.
In the following pages, I will detail the six key technologies for building digital employee Agents, which transform AI from simple tools into true team collaborators. As an AI researcher said: “We are moving from ‘designing tools for humans’ to ‘designing new intelligent entities to collaborate with humans’—this is a fundamental paradigm shift.”
Page 16: Agent Technology 1 – More Natural Multimodal Human-Computer Interaction
When building AI Agents like digital employees, more natural multimodal interaction is one of the foundational technologies. Imagine how you communicate with colleagues—you talk, gesture, share pictures, draw sketches on a whiteboard—this rich multimodal communication is the foundation of efficient collaboration.
From a neuroscience perspective, human thinking is inherently multimodal. We simultaneously process multiple types of information such as language, vision, and hearing, which are integrated in the brain to form a unified understanding. Traditional pure text AI interaction greatly limits this natural communication flow, just like forcing you to collaborate with colleagues only through text messages, which is inefficient.
The importance of the voice modality cannot be overstated. Voice is the most natural and efficient way of communication for humans. We can speak 150-200 words per minute, while typing usually only allows 40-60 words. This 3-5 times efficiency difference means a huge productivity boost in daily work.
I once studied a case of a software team introducing a voice AI assistant. Initially, team members felt awkward talking to the computer, but soon, this interaction method was widely accepted for a simple reason: the efficiency improvement was too obvious. A developer shared: “Previously, describing a complex bug required typing several paragraphs, now it only takes 30 seconds to dictate, and AI can understand the problem and start analyzing possible causes.”
A specific technological advancement is streaming voice processing. Early systems needed to wait for the user to finish a complete sentence before starting processing, while modern systems adopt a “streaming” architecture, processing voice input in real-time, and can even start thinking about responses while the user is speaking. This greatly improves the naturalness of interaction and reduces perceived delay.
Another key innovation is the “fast thinking” and “slow thinking” dual-agent collaboration model, inspired by Daniel Kahneman’s cognitive theory. In human thinking, System 1 is fast, intuitive, and automatic, while System 2 is slow, deliberate, and effortful. In human communication, we often give immediate polite responses (“I understand your question,” “Let me think about it”) while deeply processing the information in our minds.
AI also needs a similar dual-system:
- Fast Thinking Agent: Responsible for real-time user interaction, keeping the conversation smooth.
- Slow Thinking Agent: Conducts in-depth research, verification, and reasoning in the background.
This architecture provides a revolutionary user experience. Users are no longer faced with a “thinking…” spinning icon but are engaged in a progressively unfolding conversation.
Besides voice, visual interaction is also crucial. By sharing screens or images, users can intuitively present problems or ideas, and AI can respond with charts, images, or even simple sketches. This visual dialogue greatly enhances the efficiency of conveying complex information.
For example, in a product design meeting, a designer can verbally describe an interface concept, and AI generates an interface sketch in real-time; the designer points out parts that need modification, and AI adjusts immediately… This interaction method is several times more efficient than text descriptions and manual modifications.
Multimodal interaction not only improves efficiency but also creates a more natural and human-like collaboration experience. As one user researcher put it: “When we shift from ‘using tools’ to ‘collaborating with intelligent entities,’ the focus of interaction design also shifts from ‘functionality’ to ‘relationality’ — AI is no longer a tool we operate but a partner we communicate with.”
Page 17: Agent Technology 2 – Understand Requirements Before Acting
Any experienced professional knows that the primary prerequisite for successfully completing a task is fully understanding the requirements. There’s a classic project management adage: “An hour saved in the requirements phase can cost ten hours in the implementation phase.” This principle applies to AI as well.
Traditional AI agents often adopt a “simple prompt, immediate execution” model, akin to an overly eager but not meticulous new employee who jumps into work without truly understanding the requirements. The result is predictable — a lot of rework, and increased communication costs.
I once observed an interesting comparison: the same team used two different methods to collaborate with Claude in developing a data visualization feature.
Traditional Method:
“Please create an interactive dashboard for our user activity data using Vue.js and ECharts.”
Improved Method:
“We need to create a user activity dashboard. Before we start, I’d like to explain a few backgrounds:
- Our target users are product managers with limited technical backgrounds
- The metrics we care most about are monthly active users, next-day retention rate, and usage frequency
- Historical data shows a decline in activity during holidays
What other information do you need to help design this dashboard?”
In the second method, Claude first posed a series of clarifying questions: What are the data sources? What filtering dimensions need support? Is an export function needed? What is the expected update frequency? These questions helped the team think through previously overlooked key points.
The final results were vastly different: the dashboard produced by the traditional method was technically fine but did not meet actual needs and had to be redone; the improved method’s outcome directly met business needs with only minor adjustments required.
In a large enterprise AI implementation project, we adopted a “co-creation of requirements” approach, requiring AI to first generate a “work understanding document” after receiving a task, including:
- Goal Overview: A one-sentence summary of the task goal
- Context Understanding: Relevant background and constraints
- Clarifying Questions: Matters needing further clarification
- Preliminary Plan: Possible implementation paths
- Expected Outcomes: How success will be judged
This document must be confirmed or revised by humans before AI begins actual work.
Understanding requirements is not a one-time activity but a continuous process. An excellent AI agent can continuously adjust its understanding during work, proactively seeking clarification when new information or potential issues are discovered. This “progressive refinement of requirements” is similar to the iterative feedback loop in agile development.
A project manager shared his experience: “Rather than teaching AI how to execute tasks better, we’re teaching it how to understand tasks better. Once understanding is in place, execution is often not an issue. This is remarkably similar to cultivating human team members.”
This way of deeply understanding requirements not only improves the quality of task completion but also significantly reduces rework and communication costs. As management guru Peter Drucker said: “The most effective way is not to do things right but to do the right things.” For AI, this principle is equally crucial.
Page 18: Agent Technology 3 – Proactively Communicate When Problems Arise
In a real work environment, no employee can solve all problems entirely independently. Efficient team members know when, to whom, and how to seek help. An employee silently stuck in trouble is more concerning than one who proactively seeks help.
AI agents also need this proactive communication ability. Traditional AI, when encountering difficulties, either provides incorrect answers or simply says, “I cannot complete this task.” In contrast, excellent digital employees will clearly describe the problem, propose possible solutions, and seek necessary help.
Cross-departmental collaboration ability is an essential skill in the modern work environment. When AI identifies that a problem involves other module areas, it will automatically search the company’s internal directory to contact the agent or human employee responsible for that module for interaction and inquiry.
The mechanism for seeking help from superiors is equally important. Excellent employees know their capability boundaries, and digital employees should too. AI agents can adopt a “threshold escalation” protocol:
- Escalate when multiple attempts to solve fail
- Escalate when tasks involve critical safety or compliance issues
- Escalate when operations beyond current authorization are needed
This mechanism prevents AI from blindly attempting tasks beyond its capabilities and, like experienced team members, knows when to seek help.
Transparent communication records are crucial for organizational learning. Our system automatically records all problem-solving processes, including encountered obstacles, attempted methods, and final solutions. These records are not only used for auditing and accountability but also become valuable assets in the organizational knowledge base, helping identify common problem patterns and improvement opportunities.
Proactive communication ability transforms AI from a mere execution tool into a true team collaborator, creating a new human-machine collaboration model.
Page 19: Agent Technology 4 – Checkpoints, Self-Reflection, and Rollback
When solving complex problems, humans constantly reflect on their progress, adjusting direction or even restarting if necessary. This self-correction ability is one of the key technologies for AI to become a reliable digital employee.
Have you ever encountered a situation where an AI assistant accidentally deleted a critical function while helping you modify code, or mistakenly removed important documents while organizing files? These “digital disasters” highlight the dangers of AI systems lacking environmental safety awareness. Without proper checkpoints and rollback mechanisms, AI can cause irreversible damage to the work environment.
I once witnessed an AI system causing severe consequences without checkpoints and rollback mechanisms: it mistakenly identified certain modules as “redundant code” while attempting to optimize a codebase and deleted them, causing the entire system to crash. Without saving the environment state, the team had to spend days rebuilding the lost code. If AI created environment checkpoints before each major operation, this disaster could have been avoided.
From a system security perspective, this involves “environment state management” — the ability to monitor and protect changes in the work environment.
Technically, we developed a three-layer checkpoint and rollback architecture:
First, automatic creation of environment checkpoints is triggered automatically before AI executes any operation that may change the environment. These checkpoints not only save file states but also record the entire environment configuration. Checkpoint settings are automatically triggered based on the operation’s risk level, such as:
- Before batch file modifications
- Before database structure changes
- Before system configuration adjustments
- Before key stages of code refactoring
Second, operation impact assessment mechanism initiates a risk analysis process before each action. This is a structured safety framework:
- Change Scope Assessment: Which system components will this operation affect?
- Risk Level Determination: What is the worst possible outcome of the operation?
- Reversibility Analysis: How difficult is recovery if something goes wrong?
- Alternative Consideration: Is there a lower-risk alternative method?
Technically, this is usually implemented through a “safety supervision agent” that runs parallel to the main working agent, specifically responsible for assessing operation risks and supervising environment changes. The safety agent has the authority to pause dangerous operations, require confirmation, or even force the creation of additional checkpoints.
For example, in our code automation system, the main agent is responsible for writing and modifying code, while the safety supervision agent evaluates the impact scope of each change, ensuring that a complete codebase snapshot is created before large-scale deletions or refactoring, and verifying whether the system can still compile and run normally after changes.
Third, environment rollback mechanism ensures quick recovery to a safe state when operations fail. When the system detects anomalies or errors, it automatically triggers the rollback process, restoring the environment to the most recent stable checkpoint. This “safety net” allows AI to boldly attempt while keeping risks within acceptable limits.
For example, in web backend development, before executing database table structure changes or data migrations, a complete database backup needs to be created first. If data migration causes application errors, the system can roll back to the pre-change state with one click.
This checkpoint, assessment, and rollback work mode enables AI to safely operate in complex environments, no longer a potential “digital destroyer” but a reliable assistant with environmental safety awareness.
Page 20: Agent Technology 5 – Long-Term Memory and Memory Compression
Imagine an employee who forgets what they learned yesterday every day — no matter how smart, it would be difficult to handle complex work. This is the fundamental limitation traditional AI faces: each conversation feels like starting from scratch, lacking continuity and cumulative learning ability.
There’s an interesting story: we once tried to have AI assist with code reviews. Initially, it worked well, but as the project progressed, AI began to raise issues that had already been resolved, forgot previously discussed design decisions, and even repeated the same suggestions. An engineer commented helplessly: “It’s like working with a genius with severe memory impairment — having to explain everything from scratch every day.”
Long-term memory and memory compression technology are key to solving this pain point. From a technical perspective, this involves three core challenges:
First, memory persistence requires building a complex external storage system. This is not just about simply saving conversation history but creating a multi-level memory architecture:
- Short-term working memory: Immediate context of the current task (usually within the LLM context window).
- Medium-term situational memory: Important information from recent times (compressed stored summaries).
- Long-term semantic memory: Persistent knowledge, rules, and experiences (structured storage and on-demand retrieval).
This layered architecture is inspired by the human memory system, which also has structures similar to working memory, episodic memory, and semantic memory.
Secondly, memory compression technology is key to achieving long-term memory. The human brain does not remember every detail but extracts core concepts and key experiences. Similarly, AI needs to compress detailed interaction records into core insights.
Our implemented memory compression pipeline includes:
- Key Information Extraction: Identifying key facts that need to be retained long-term, such as names, addresses, occupations, interests, and other basic information.
- Progressive Summarization: Compressing long conversations into summaries while retaining key points.
- Knowledge Distillation: Extracting general principles and patterns from specific cases.
- Redundancy Elimination: Identifying and merging duplicate or highly similar information, resolving conflicts in information from different times or sources.
This compression not only saves storage space but, more importantly, improves the efficiency and accuracy of memory retrieval. The system can find the most relevant historical information in milliseconds rather than struggling to search through massive amounts of raw conversations.
Finally, intelligent memory retrieval is crucial. Imagine how your brain automatically associates relevant experiences—when someone mentions “Paris,” you don’t recall all information about Paris but extract the most relevant memories based on the current discussion context (travel, food, or art). We will discuss knowledge base search on the next page.
Page 21: Agent Technology 6 – High-Precision Internal Knowledge Base Search
Effective access to internal enterprise knowledge is one of the key capabilities for AI to become a digital employee. However, traditional knowledge retrieval methods often fail to meet AI’s needs.
There’s a true story I particularly like: A large insurance company tried to have AI answer questions about its complex policies. They invested millions of dollars to build a massive vector database containing all policy documents and internal regulations. The result? Disappointing. When asked specific questions, the AI either couldn’t find the answer, returned irrelevant information, or mixed multiple policy terms to give incorrect answers. Ironically, interns could often find the correct answers faster with a simple Ctrl+F search.
This story reveals a core issue: RAG (Retrieval-Augmented Generation) is not equivalent to a simple vector database. While vector search is powerful, relying solely on vector similarity matching often results in “plausible but incorrect” outcomes—finding semantically similar content that may not be the truly relevant answer.
If you don’t believe it, try doing a site search with just a vector database, then search on Google with “keyword site:example.com” and see which query results are more accurate.
The problem of inaccurate search results is particularly severe for AI because, unlike humans with rich experience who can fill in information gaps, AI heavily relies on accurate information retrieval.
To address this challenge, we need to combine the advantages of semantic search and keyword search to create a “hybrid retrieval system.” Search engines typically adopt a “vector retrieval + keyword retrieval + re-ranking” architecture:
- Vector Search: Captures concept-level matches based on semantic similarity.
- Keyword Search: Uses BM25 or similar algorithms to ensure precise matching of key terms.
- Re-ranking: Utilizes re-ranking models to evaluate the relevance of the above multi-channel retrieval results to the user’s query and find the most relevant results.
In fact, this is a technology widely adopted by search engines, but many working on RAG lack experience in the traditional search field, resulting in much lower search accuracy compared to professional search engines.
However, general search engines for humans and knowledge base searches for AI are still different. General search engines target web pages, while AI models have limited context, so long documents need to be split into smaller chunks.
When building vector indexes, document splitting strategies are crucial. Too large chunks contain too much information, diluting relevance, while too small chunks lose context. A better approach is semantic-aware splitting, which doesn’t simply divide by word count but identifies natural semantic boundaries to ensure each document chunk contains a complete and coherent information unit.
More critical is the application of re-ranking technology. The initial retrieval returns a candidate set that usually contains a lot of noise, requiring more refined relevance judgments. This includes using models like BGE-M3 that support candidate set re-ranking and considering multi-dimensional factors such as text freshness and source authority for multi-factor scoring.
By collecting feedback from Agents and real users, re-ranking models can continuously learn and optimize search performance. Search quality feedback signals include:
- Explicit Feedback: Users directly mark result relevance.
- Implicit Feedback: Monitoring which results are adopted or ignored.
- Error Analysis: Regularly reviewing failure cases and identifying patterns.
- A/B Testing: Simultaneously testing multiple retrieval strategies and comparing results.
High-precision internal knowledge base search not only improves AI’s work efficiency but is also the foundation of its decision quality. Only when AI can accurately access enterprise knowledge can it make judgments that align with enterprise policies and practices, truly becoming a reliable digital employee.
Practical Cases of AI Digital Employees
Page 22: Case 1: AI Programmer – From IDE Assistance to Autonomous Development
The progress of AI in the programming field is revolutionary. Just two years ago, we were amazed by GitHub Copilot’s ability to generate a simple function based on comments; today, we are witnessing fully autonomous AI systems writing complete applications.
From a technical perspective, AI programming tools have undergone several leaps since 2023:
- From Completion to Generation: Expanding from predicting the next line of code (Tab completion) to generating complete functions, classes, and even entire modules. This was basically achieved starting with the release of Claude 3.5 Sonnet in mid-last year. New-generation AI programming tools represented by Cursor are becoming increasingly popular.
- From Modifying Isolated Files to System Understanding of Code Repositories: Understanding the structure, dependencies, and design patterns of the entire codebase, finding the appropriate code for modification based on needs without requiring users to locate which file needs modification. This was basically achieved starting with the release of Claude 3.6 Sonnet on October 22 last year, giving rise to AI programming tools in Agent mode like WindSurf, and Cursor also launched the Composer Agent mode, becoming the most important application scenario for AI Agents to date.
- From Pure Code Generation to Completing the Entire Development and Testing Process: Covering design, documentation, coding, and testing, independently completing the entire development and testing process for simple requirements. This was basically achieved starting with the release of Claude 3.7 Sonnet. Professional development Agents like Claude Code/Devin/OpenHands have demonstrated astonishing autonomy.
However, it is worth noting that the prerequisites for fully automated development are not just advanced models but also good software engineering practices:
First, code must have good documentation and comments. In my experiments, providing AI with the complete vllm open-source project code with README and other documents vs. removing README and other documents from the vllm code resulted in astonishing differences.
Secondly, complete test coverage and CI/CD processes are crucial. AI cannot judge code correctness through intuition or experience like humans; it needs objective verification mechanisms.
Thirdly, clear requirement descriptions and acceptance criteria are indispensable. In a real project comparison experiment, we found that when providing structured requirement documents, the number of interaction rounds required for AI to complete tasks was significantly reduced, and code quality improved significantly. As one engineer commented: “AI is like a super-smart but domain-knowledge-lacking junior developer—give it clear directions, and it will speed ahead; give it vague instructions, and it will speedily get lost.”
With these conditions in place, AI programmers can significantly enhance development efficiency. According to our actual data using Claude Code, about 50% of simple development requirements can be fully automated without human intervention.
For the remaining 50% of development requirements, with the help of AI IDEs like Cursor, efficiency can also be doubled. This includes complex business logic, code refactoring, performance optimization, and security design tasks that require more human judgment.
Therefore, overall, AI can increase programmer development efficiency by 4 times. If we don’t consider time spent in meetings and communication, only personal coding time, one person’s productivity is equivalent to that of 4 people before.
Page 23: Case 1: AI Programmer – The Future Role of Software Engineers
As AI programmers’ capabilities improve, an urgent question arises: Will software engineers be replaced by AI? The answer is no, but their roles will undergo profound changes. Just like the invention of calculators did not eliminate mathematicians but allowed them to focus on more advanced mathematical problems.
My friend Li Ming is a senior engineer with 15 years of experience, and last year he began working closely with AI. Six months later, his way of working completely changed. “I used to spend 80% of my time writing code and 20% on design and planning; now this ratio is completely reversed—I focus on architecture design, requirement analysis, and code review, while leaving most of the coding work to AI. The most surprising thing is that my output has tripled, but my work pressure has decreased.”
This transformation represents the future direction of software engineers’ roles: from mere code writers to composite roles of architects, product managers, and project managers.
As architects, human engineers will focus on system architecture design and problem decomposition. These high-level design decisions require a deep understanding of the business domain, a long-term vision for technology choices, and systems thinking ability, which current AI finds difficult to master. I like to use this analogy: AI can become an excellent bricklayer, but designing the blueprint for an entire building still requires human architects.
As product managers, the ability to define and validate requirements will become more critical. When most of the code implementation can be handed over to AI, defining “what to build” becomes more important than “how to build.” Software engineers need to understand user needs more deeply, define functional specifications more precisely, and rigorously verify whether the final product meets business goals.
I’ve observed an interesting phenomenon: those programmers who used to complain that “product managers don’t understand technology” are now becoming junior product managers themselves, learning user research and requirement analysis skills. As one engineer half-jokingly said: “I used to mock product managers for vague requirements, now I find that precisely defining requirements is harder than writing code!”
As project managers, every engineer will manage several AI “subordinates,” making coordination and communication skills a core competency. This includes breaking down complex tasks into sub-tasks that AI can understand, reviewing AI’s work results, and providing guidance and corrections when necessary.
This transformation is particularly beneficial for independent developers. In the past, it was difficult for one person to complete the development of a complete product, but now, a full-stack engineer working with AI can complete the workload of a small team. I know an independent developer who, in the past six months, developed three complete SaaS products and successfully brought them to market with just himself and a few specially trained AI assistants—something that was almost impossible before.
As Sam Altman predicted, a “one-person billion-dollar company” might become a reality, as the significant reduction in technological barriers allows entrepreneurs to validate and realize innovative ideas more quickly. This will create an unprecedented golden age for individual entrepreneurs.
From a broader perspective, the cost of digital transformation for enterprises will be significantly reduced. A small to medium-sized logistics company abandoned its digitalization plan due to the high cost of custom software (about $500,000). After using AI-assisted development, a project of similar scale only requires a budget of $100,000, making digital transformation feasible. As more small and medium-sized enterprises can afford customized software, the digitalization process of the entire industry will accelerate greatly.
Scattered paper documents and Excel spreadsheets can also be digitized and structured at a low cost. For example, extracting structured knowledge from a large number of paper documents, private chat records, and meeting recordings.
The future of software engineering is not about humans being replaced by AI, but about humans and AI forming a complement, jointly creating a more efficient and innovative software development paradigm. AI will not eliminate software engineers, but software engineers who do not use AI may be replaced by those who make good use of AI.
Page 24: Case 2: AI Operations – Automated Data Collection
Data is the lifeline of modern enterprises, and data collection often consumes a lot of human resources. Among the many fields of AI applications, data collection automation is one of the cases with the highest input-output ratio.
Traditional crawler development faces an obvious pain point: each website has a different structure, requiring custom development of parsing rules for each target website. I have seen a “crawler graveyard” — a code repository containing hundreds of invalid crawler scripts because small changes to the target website can cause the crawlers to fail. The cost of maintaining these crawlers is high, sometimes even exceeding the value of the data itself.
LLM (Large Language Models) and VLM (Visual Language Models) have brought revolutionary changes. These AI models can “understand” web content like humans, identify key information, and work effectively even when faced with new webpage layouts.
From a cost-effectiveness perspective, the advantages are more obvious. The cost of each LLM/VLM call is about $0.001, far lower than the cost of manually collecting data. For a typical e-commerce product page, the cost of AI extracting key data is about 0.5-2 cents, while manual extraction costs 10-50 cents — a 10-25 times efficiency difference!
A particularly interesting case is the system we developed for a financial analysis company. They needed to extract specific financial indicators from the financial reports of thousands of listed companies. The traditional method required financial analysts to read PDF reports and manually extract data, taking 20-30 minutes per report. The AI system we built can extract data not only from standard format financial reports but also from non-standard layouts, data in charts, and even implicit financial information in text descriptions. The system’s accuracy reached 92%, reducing the data extraction time from half an hour per report to 30 seconds.
For scenarios requiring large-scale collection, we use a “teacher-student” architecture: the AI “teacher” first analyzes a small number of web pages and generates structured data extraction rules; then, these rules are converted into traditional crawler code, executed by more efficient “students.” This method combines the intelligence of AI with the efficiency of traditional crawlers, suitable for large-scale data collection.
Page 25: Case 2: AI Operations – Automated Social Media Account Management
Social media has become a key channel for brand building and user interaction, but effectively managing social media accounts is a time-consuming and labor-intensive task. Imagine what it takes to manage a company’s Twitter, LinkedIn, Instagram, and TikTok accounts: understanding the characteristics of each platform, creating targeted content, choosing the best posting times, responding to user comments, analyzing data… This usually requires a professional team.
But AI is revolutionizing this field. In account management and content publishing, AI Agents demonstrate amazing efficiency. This efficiency comes from several core capabilities:
First is scalable content creation. A traditional team can create 1-2 high-quality content pieces for a platform each day, while an AI system can easily generate dozens of pieces optimized for different platforms. For example, based on a new product release article, AI can automatically generate:
- Concise and powerful product highlights for Twitter
- In-depth analysis of product value for LinkedIn
- Visually appealing feature display copy for Instagram
- Detailed discussion posts for specific technical communities on Reddit
Secondly, intelligent publishing management. AI not only generates content but also determines the best posting times and frequencies based on extensive data analysis. AI automatically adjusts the publishing schedule accordingly, precisely matching content types with the best times, significantly enhancing content performance. A marketing director commented: “It’s like having a super assistant who understands both content creation and data analysis.”
In terms of user interaction and community management, AI’s value is equally significant. Traditionally, responding to comments is the most time-consuming part of social media management and the easiest to overlook. AI can not only automatically respond to common questions but also intelligently distinguish queries that require human expertise. This mechanism allows the team to handle hundreds of comments while ensuring that every important feedback receives appropriate attention. The system can even identify emotionally charged customers and adjust the tone and solutions of responses.
A more advanced application is AI actively participating in community hot topic discussions. For example, when major news or hot discussions occur in the industry, AI automatically identifies these trends, generates relevant opinions, and participates in conversations at the right time, increasing brand exposure at critical moments.
It demonstrates how AI can play a role in operations — not replacing human creativity, but amplifying and extending it, allowing brands to establish a consistent, professional, and resonant image in an increasingly complex social media environment.
Page 26: Conclusion – Embracing the Era of AI Employees, Building AI-Native Teams
Today we delved into the new paradigm of AI Agents: the transition from passive tools to digital employees. As we review the entire presentation, the core message is very clear — AI is no longer just a tool we use, but is about to become a member of our team.
Imagine how we viewed smartphones ten years ago. At that time, they were seen as “advanced phones” — a communication tool with just a few new features. Today, smartphones have completely changed our lives and work. AI is undergoing a similar transformation — from an “advanced computing tool” to a “digital colleague.”
Building AI-native teams requires starting from both communication culture and technical foundation. In terms of communication culture, we need to establish transparent communication habits similar to open-source communities. I remember the shock of my first participation in Linux kernel development — all discussions were conducted openly on mailing lists, every decision had clear records, and new members could read historical threads to understand the design rationale. This culture enables global developers to collaborate efficiently and allows AI to quickly understand project backgrounds and decision logic.
In terms of technical foundation, AI-friendly internal tool interfaces and isolated sandbox testing environments are essential infrastructure.
Making AI a digital employee rather than a simple tool hinges on six core technologies:
- Multimodal human-computer interaction makes communication more natural and efficient
- “Clarifying requirements before taking action” ensures the correct work direction
- Proactive communication when encountering problems enhances collaboration ability
- Self-reflection and retrospection mechanisms improve complex problem-solving ability
- Long-term memory allows AI to accumulate experience
- High-precision knowledge base search ensures decisions are based on accurate information
Under this new paradigm, humans and AI will form a complementary collaborative relationship. Some worry that AI will replace human jobs, but I see a different picture — AI is changing the nature of work, not eliminating it.
Take development teams as an example, AI programmers handle code implementation and routine maintenance, while human engineers focus on architecture design, innovative thinking, and team coordination. This is not a zero-sum game, but a co-evolution.
AI, as an assistant, can handle a large number of trivial tasks in our work. This allows humans to focus on creative, strategic, and emotional work, spending most of their time thinking and discussing rather than dealing with tedious details. This is not to belittle the importance of detailed work, but to recognize the unique advantage of humans in creative thinking.
This is not a distant future vision, but a transformation that has already begun. From using Claude Code + Cursor to increase programmer development efficiency by 4 times to AI systems managing dozens of social media accounts, we have already seen practical cases proving the feasibility and great value of this new work model.
Transformations are initially seen as impossible, then as impractical, then as interesting but unnecessary, and finally suddenly become inevitable. I believe the era of AI employees has moved from the “interesting but unnecessary” stage to the “inevitable” stage.
Page 27: Meta Presentation – The Creation Process of This Presentation
Ladies and gentlemen, before concluding today’s sharing, I want to reveal an interesting fact: the presentation you just heard is itself the result of collaboration between me and an AI digital assistant!
The content of this presentation was generated after a 2-hour brainstorming session with the AI digital assistant, followed by 3 hours of collaborative editing with AI using Cursor to complete the refinement, and finally exporting the Markdown formatted content into PPT format, also automatically generating this 25,000-word speech. I completed in 5 hours what previously took twenty to thirty hours to accomplish.
This creation process liberated me from the tedious details of writing, allowing me to focus on thinking about the core value and innovative ideas of the presentation. Collaborating with AI in creation is like having a tireless thought partner that not only captures your ideas but also helps you discover new possibilities.
This is precisely the epitome of the AI-native team we discussed today — humans focus on creativity and judgment, AI handles execution and optimization, both complementing each other’s strengths to create results that surpass individual capabilities.
Thank you all for listening! I look forward to exploring the beautiful future of AI and human collaboration with you.