Will Manus Initiate the Year of the Agent? - NetEase Technology Live

Reposted from NetEase Technology Public Account

Original Title: “Will Manus Initiate the Year of the Agent? A Conversation with Two AI Entrepreneurs Who Left Big Companies”

Produced by | NetEase Technology Attitude Column

Author | Yuan Ning

Editor | Ding Guangsheng

Like a boulder thrown into a lake, the splash from Manus’s release has gradually subsided, but the ripples continue to spread.

Will Manus initiate the year of the Agent? How should we understand Agents and their barriers? Is now the right opportunity for the development of Agents? How are different players preparing for the wave of Agents? Can current Agents replace interns…

On March 8, NetEase Technology invited two guests who left big companies and are now on the front lines of AI entrepreneurship—Li Bojie and Peng Kangwei—to share their insights and thoughts.

Li Bojie, a former “genius youth” at Huawei, served as the deputy chief expert at Huawei’s Computer Network and Protocol Laboratory and is a recipient of the Microsoft Scholar Award. In 2023, he ventured into AI entrepreneurship and is currently the Chief Scientist at PINE AI, dedicated to building a general intelligent assistant like Samantha from “Her” for everyone and every organization.

Peng Kangwei, who once developed a C-end product with over 100 million monthly active users at Tencent, left to start his own business in 2023 and founded Dream Horse Intelligence, which is working on a new generation of AI content platforms.

As entrepreneurs riding the AI wave, how do they find direction amidst the giant waves? What kind of future for Agents can be seen through their perspective? NetEase Technology has compiled their answers to ten key questions.

The following content has been edited by NetEase Technology without changing the original intent:

How to Understand Agents?

Li Bojie: The concept of Agents is actually very old. As early as the 1960s, when artificial intelligence first appeared, the concept of Agents was already there. The essence of an Agent is actually very simple: it is an intermediary or agent. An Agent can perceive the world, plan based on perception, and then take action.

In traditional ChatBots like ChatGPT, a user inputs a question, and the system provides an answer, ending the Q&A. But Agents are different. When you assign a task to an Agent, it not only answers but also actively collects various materials, utilizes various resources around it, and ultimately helps you get things done. This is the core difference between ChatBots and Agents.

Bill Gates actually valued this direction very early on. When I worked at Microsoft, there was a small paperclip assistant (Clippy) in Office 2003 that would pop up and ask if you needed help. This was actually an early exploration of Agents. But at that time, AI technology was immature, and it was eventually removed in Office 2007.

Later assistants like Cortana and Siri can also be seen as early Agents, capable of helping users complete small tasks. As development progressed, today’s Agents (like Manus) can really help users complete long-term, complex tasks, even lasting half an hour to an hour.

Peng Kangwei: In my view, previous large language models (LLMs) were more like consulting experts, capable of answering difficult questions. But Agents are more like high-IQ interns, able to help us complete specific tasks.

The magic of Agents lies in their three capabilities: “brain, hands, and delivery.” LLMs provide cognitive and thinking abilities, equivalent to the “brain” of an Agent; the “action” ability of an Agent is reflected in its ability to call browsers, search for information, execute commands, etc., equivalent to its “hands”; and the “delivery” ability is reflected in its ability to ultimately provide complete results, whether in the form of documents or web displays.

In Manus’s interface, you can clearly see its work process: on the left is its browsing and operation record, and on the right is its thought process. Ultimately, it can present results in a complete form, such as a document or web display. For actual business, this capability is of very high value.

Why Did Manus Go Viral?

Li Bojie: I think Manus went viral mainly for two reasons. On one hand, media dissemination indeed gave them a particularly large influence.

On the other hand, I think Manus’s design is also very clever. By showing the computer operation process, it makes it easier for ordinary users to understand the working principle of Agents. Many products used to hide the intermediate operation process and directly present the final result. But Manus is different; it directly uses screen animations to step-by-step show the operation process, such as how to browse the web and execute tasks.

Although this “visual operation” method may not be as efficient as directly accessing the web, the display effect is more intuitive, making it easier for users to understand the actual capabilities of Agents.

Peng Kangwei: To add, I think Manus went viral because it was the first product to propose the concept of a “general AI Agent.”

Previously, there were many excellent programming tools that could help developers write code, but Manus extended the capabilities of Agents to a broader field, allowing ordinary people to use it to complete various tasks. This is its first highlight.

The second highlight is the significant improvement in Manus’s engineering capabilities. Take the task of price comparison as an example: in an e-commerce scenario, finding the optimal price requires searching a large amount of information, involving different supply chains and cost structures. Manus can integrate this complex information, conduct multi-level searches and analyses, and ultimately provide the best solution.

This cross-scenario, cross-field general capability is the biggest difference between Manus and early Agent products.

What Are the Barriers to Agents?

Peng Kangwei: From a technical perspective, I think the essence of Agents can actually be likened to the “hands” and “brain” mentioned earlier. Large models are equivalent to the “brain,” while the “hands” are the ability to call various web pages and tools. Technically speaking, reproducing this interactive ability is not difficult.

But the real barrier lies in the engineering deployment and optimization for each specific scenario. For example, in specific scenarios like interviews, e-commerce, and programming, the front-end and back-end joint deployment and code integration require a lot of engineering work. This customized implementation in different scenarios is the key to building barriers.

Li Bojie: I completely agree with Kangwei’s view. The core challenge of Agents is not in calling large models or open interfaces, but in accumulating engineering experience in professional fields.

For general tasks, such as planning a 7-day trip to Japan, Agents may perform well. But in professional fields, such as healthcare, Agents need to have professional knowledge and domain accumulation, and this knowledge may not be in public corpora. Therefore, relying solely on general model datasets may not meet professional needs, and specific fine-tuning and post-training are still required.

Moreover, replicating what others have already done is actually a low-floor, high-ceiling task. For example, when I was using Manus to do research reports, I found that OpenAI’s model often outperformed Manus because OpenAI might have used its own unpublished models, specifically optimized for research scenarios. The Manus team currently cannot create such a large-scale proprietary model, so they can only fine-tune based on open-source models.

Even with open-source solutions, such as open-source models like Owl, due to insufficient engineering optimization capabilities, the effect is still different from Manus.

How Will Manus Profit in the Future? Development Direction?

Peng Kangwei: Based on the Manus team’s previous experience with Monica, they have accumulated a large number of users and usage scenarios in the browser and plugin fields, with a deep understanding of user needs. Therefore, creating an “entry-level” product might be a reasonable direction.

“Entry-level” means a platform like TikTok or WeChat that attracts a large number of users and establishes an ecosystem. For example, creating an Agent platform similar to GitHub Copilot, allowing users to upload and share professional field Agents—such as e-commerce, education, law, and other specialized Agents. This not only meets specific needs but also forms a virtuous cycle through the power of users and the open-source community, improving the entire ecosystem.

Even if Manus cannot become an entry-level product, the current market heat will drive the development of the open-source ecosystem, allowing more people to apply Agent technology to various industries, creating real commercial value.

Li Bojie: I also believe that creating an entry-level product is the biggest opportunity. But for entrepreneurs like me with less experience, directly challenging big companies and mature products is not easy. Therefore, a more realistic path might be to focus on high-value vertical fields and create high-value-added industry solutions.

Some vertical field customers have a strong demand for AI empowerment but lack their own technical reserves. This provides space for the landing of Agents in industry scenarios. In addition, users in vertical fields are often able to bear higher usage costs.

For example, the current model inference cost of Manus is relatively high, with a single execution possibly costing more than $2. If used daily, using it 5 times a day would cost $10, which might be difficult for individual users to afford. But for high-value-added industries like law and finance, the added value after AI empowerment is enough to cover this part of the cost. Therefore, Manus may be more suitable for commercialization in high-value-added fields.

Is Now the Opportunity for Agent Development?

Peng Kangwei: I believe now is indeed an important time for Agent development. Agents have been around for a long time, but due to limited model capabilities and high calling costs, their development was previously restricted.

But with the optimization of models and inference technology, and the continuous reduction of calling costs, the application scenarios of Agents have become more extensive.

Li Bojie: To add, besides cost reduction, the improvement of model capabilities is also a key factor. Early models were often “dumb” when faced with complex tasks, even if equipped with tools (such as searching the web, sending emails, calculators, etc.), the model might not know how to use them correctly.

But this year’s new models, such as domestic ones like DeepSeek, have already solved this problem. When uncertain, the model can actively call external tools to make more reliable judgments.

In addition, the stability of models in solving complex tasks is also improving. Suppose a task requires 10 steps, and the accuracy of each step increases from 90% to 99.9%, the overall success rate will increase from 35% to over 90%. This stability improvement greatly enhances the usability of Agents in commercial environments.

This is why people generally believe that 2025 is the “Year of the Agent.” Previous models were too slow and too dumb to meet actual needs. But this year’s models have made breakthroughs in inference speed, tool-calling capabilities, and stability, allowing Agents to truly have the conditions for widespread deployment.

Currently, the application of Agents in the programming field is already relatively mature, and I believe this capability will expand to more industries this year, truly achieving cross-field applications.

Will the Improvement of Model Capabilities Cover Agent Capabilities?

Li Bojie: My own feeling is that it will. I understand this question as asking whether some functional improvements formed by Agents through fine-tuning or strategy optimization will be replaced by stronger model capabilities. For example, Manus is actually a typical “multi-Agent system” now, where different Agents are responsible for searching, writing code, operating computers, etc.

In the past, we usually designed Agents based on human workflows, like having them do A first, then B, which might yield better results. But as model capabilities improve, many of these engineering-optimized Agents may no longer be necessary.

I’ve experienced this myself. I did a lot of optimization on a project, and when a new model came out, all the fine-tuned engineering I had done was rendered obsolete. I guess Kangwei might have had similar experiences during his entrepreneurial journey.

However, from a technological development perspective, functional optimization still holds value. Because in the future, everyone will have access to equally powerful models, and whoever can achieve better engineering optimization on top of these models will have an advantage.

Peng Kangwei: I think this should be viewed from both a short-term and long-term perspective.

In the long term, many technical and algorithm teams believe that eventually, model capabilities will surpass those of Agents. From a human will perspective, we naturally hope that models will develop to be strong enough, and achieving this goal is highly probable.

In the short term, engineering optimization is still necessary. The “tuition” or “cost” we pay in entrepreneurship is essential because, in the short term, models cannot quickly cover all specific industry scenarios. Therefore, doing engineering optimization well can create barriers and enable products to better serve users.

For example, after this wave of large models emerged, companies with rich experience in NLP and CV fields will find it easier to quickly apply large models, combining industry barriers and experience to form competitiveness.

How can SMEs embrace the Agent wave?

Li Bojie: If you are a beginner company with little understanding of AI, you can start by using low-code tools or the Dify knowledge base system to build a basic knowledge base, and then construct a simple Agent on this foundation. Low-code tools allow you to quickly build an Agent system through drag-and-drop methods.

If you are a company with some technical accumulation, you can try the “fine-tuning” method to integrate industry knowledge and experience into the model.

This approach is equivalent to “internalizing” knowledge into the model, rather than relying on the knowledge base for queries every time. Internalization gives the model a general direction, but when facing specific problems, it still needs to use the knowledge base to confirm details.

When building a knowledge base, retrieval capability is crucial. For example, Google’s search results are significantly better than Bing’s because Google can better ensure the relevance of search results based on user feedback and ranking algorithms. Companies should also pay attention to optimizing retrieval algorithms to improve relevance and user experience when building a knowledge base.

Is Agent development a game for big companies?

Peng Kangwei: I don’t think this issue is that severe for entrepreneurs. Agents essentially serve specific industries and products.

Firstly, Agents are not the endpoint but tools to accelerate products and services. Secondly, whether a product can succeed depends more on market and user feedback, rather than the technical level of the Agent itself.

Although big companies have resources, they won’t do everything. I want to say that there’s no need to feel overly anxious; there are actually many things we can do, and the reach of big companies is limited. Otherwise, there wouldn’t be so many upstream and downstream companies, so there are still many things that startups can do.

Li Bojie: Kangwei is right. Having worked in big companies, I can feel that they are often cautious about innovative products, only launching them when costs are reduced, accuracy is improved, and the product is mature enough that “users won’t make mistakes.” Startups don’t have these concerns and can iterate and experiment more flexibly.

For example, in the AI-assisted programming field, big companies like Microsoft may have had prototypes early on, but they haven’t released them due to concerns about disrupting existing business models or high costs. This leaves room for startups.

What will be the hallmark of the Agent’s inaugural year?

Li Bojie: I personally feel that a hallmark event might be a product capable of comprehensively utilizing multiple modalities such as vision, hearing, and language, independently using computers, phones, and even making calls to help users complete daily tasks. If this capability is realized by 2025, I think it could be called the inaugural year of the Agent. And I also hope I can achieve this.

Peng Kangwei: I think this is more like a vertical explosion, similar to the AI capability explosion at the end of 2022. At that time, we could truly feel the breakthrough in AI’s intelligence level, a qualitative leap compared to the previous generation of AI. This is also why it’s called a large model—the key lies in “large” itself, representing a significant improvement in intelligence.

For AI Agents, I believe it involves two core aspects: one is multimodal understanding. As mentioned earlier, Agents can simultaneously understand and process information from different modalities such as vision, hearing, and language.

The other is multi-task collaboration. This involves various Agent scenarios we discussed, whether in e-commerce, manufacturing, finance, or programming fields, where Agents can efficiently complete tasks and deliver results.

The key is not just the action of completing tasks itself, but being able to deliver tasks with high quality without human intervention. This differs from an intern completing tasks, as an intern may make mistakes during execution and require human correction (whether it’s goal adjustment or action correction). But the goal of AI Agents is to automatically adjust and deliver high-quality results without human intervention.

From this perspective, I also look forward to the “intelligent emergence” moment of AI in task delivery and goal achievement, which will mark a new height in AI’s task understanding and execution efficiency.

Can Agents currently replace interns?

Li Bojie: The key issue is still “memory.” Current Agents remain at the level of “factual memory,” such as simple information like “what was eaten today.” However, when it comes to “procedural memory” (such as the memory of actions like riding a bicycle), AI still finds it challenging.

Currently, methods based on knowledge bases, retrieval, and summarization are still not as effective as human memory. If AI cannot solve the “memory” problem, it cannot truly replace human long-term learning ability, which is equivalent to an intern working at a company for one day and one year making no difference.

Peng Kangwei: In specific fields (such as programming), Agents can already replace some intern work. Companies can use Agents or have each full-time employee “bring a few programming Agents” to achieve cost reduction and efficiency improvement.

In repetitive, process-oriented tasks, Agents indeed help improve productivity.

Live Broadcast Poster