1. To build or not to build a foundational large model?
  2. To B or to C? Domestic or overseas?
  3. RMB capital or USD capital?
  4. Is AI Native application a mobile internet-level opportunity?
  5. Is your vision AGI?
  6. Can the problem of large models talking nonsense be solved?
  7. How does the large model infra profit?
  8. Where is your moat?
  9. Can your business model scale?
  10. How to deal with the regulation and legal responsibility of large models?

Below are my views on these 10 soul-searching questions.

To build or not to build a foundational large model?

If you build a foundational large model, it requires hundreds of millions of dollars in initial investment. How to raise so much capital, how to recruit a reliable algorithm, data, and infra team? There are already so many companies building foundational large models, including giants and star startups. There is no first-mover advantage for those entering the field now. How to compete with these big players?

If you don’t build a foundational large model and just use the API of other commercial models, the cost is too high; the capabilities of open-source models are insufficient. How to build a moat?

My view: This involves judgment on the future trend of large models. Many people believe that the future of foundational models is a winner-takes-all situation for giants, a few in the US, a few in China, like the current cloud computing market, most of the market is occupied by a few giants, it’s hard for small companies to have a chance.

I think this judgment is only partially correct. The most powerful foundational models, such as GPT-4 or even GPT-5, are likely to be closed-source models, resulting in a winner-takes-all situation. But the inference cost of such models will be high, just like it costs 10 dollars for GPT-4 to read a paper now, only high-net-worth customers and scenarios exploring the scientific frontier can afford it. For more widespread needs, such as chat, voice assistants, intelligent customer service, simple document writing, knowledge Q&A, etc., the current LLaMA 2 can basically meet the needs after appropriate fine-tuning. The capabilities of open-source models will rapidly progress in the next year, catching up with the capabilities of GPT-3.5, and can meet the needs of the general public at a low cost.

You could say that GPT-4 or even stronger models are like Apple, and open-source models are like Android, each corresponding to different markets. The market for open-source models may be larger, but it will also be more diverse. The market for closed-source models is not small, but it will be highly concentrated.

Why do I think the capabilities of open-source models will continue to progress? On the one hand, the algorithms and data for training large models are gradually becoming democratized, more and more information is being made public or leaked, and fine-tuning models like Vicuna are essentially “distilling” data from GPT-4. On the other hand, it is now a battle of hundreds of models. If a company’s model is not competitive enough to compete with the most powerful closed-source models, then some companies will choose the open-source route, just like Meta has taken the lead in open-source models.

In the future, large-scale applications will definitely use different sizes of models to solve problems of different difficulties in order to reduce costs; at the same time, there will be some models that combine industry-specific data and know-how. Even though they may just be fine-tuned on the foundational models, the proprietary data and processes become the moat. This is like how cloud computing platforms now offer different types of virtual machines, some with more CPUs, some with more memory, some with more GPUs, etc. Foundational models will also become a kind of heavy-asset general infrastructure like the IaaS of cloud computing platforms (the assets of cloud computing are servers, the assets of large model companies are models and data), and the main competition will be cost in a few years.

To B or to C? Domestic or overseas?

To B, it’s easy to fall into custom development and price wars, like some companies in the last wave of AI, in the end, it’s all about customer relationships and price. Can a technical team startup handle customers? Besides AI itself, there is a lot of outsourcing nature of custom development, it’s not easy to scale, can the high human cost of star startups and GPU cost be earned back?

To C, can you get a license in China? Even if you get a license, can you guarantee not to output illegal speech? Can a technical team startup handle the design and marketing of C-end products? Can a large model facing the C-end recover its cost?

If you do overseas markets, with the current tense Sino-US relations, will American customers trust the products of Chinese companies? Even if the company’s operating entity is in the US, the identity of Chinese people is still not reassuring.

My view: To B is actually two completely different markets, to small companies and to large companies/government.

To small companies is still like to C, making scalable, replicable products, using a subscription or API charging model. The domestic to small companies is relatively harder to do than overseas, because domestic companies’ ability to pay is not as strong as in developed countries. If it’s about making the large model itself, it’s only natural for application companies to pay for the API. But if it’s just making middleware between the large model and the application, the willingness to pay domestically is relatively weak. Therefore, middleware companies are better off packaging the large model, providing a solution of model + middleware.

To large companies/government is highly dependent on customer relationships, technology may not necessarily be the most important, the team must have someone who understands business. The team needs to form a talent echelon, don’t just recruit high-end talent. There is a certain amount of outsourcing nature of custom development work in large orders, which can be done by ordinary programmers.

To C is highly dependent on product design, technology may not necessarily be the most important. In many scenarios, ordinary users may not be able to perceive the difference between GPT-4 and LLaMA 70B. The team must have someone who understands products. Not everything is done by GPT-4, some things can even be done without a large model. Just like we don’t hire the top programmers for every development task, programmers of different levels handle different types of development tasks.

Whether it’s to B or to C, try not to position it as a replacement for humans, but as an assistant to humans, capable of extending the boundaries of human abilities and accomplishing things that a person cannot do on their own. For example, a person who doesn’t understand programming can develop a full-stack website in a week’s spare time with the help of ChatGPT. An AI programmer without much academic background can read 100 of the latest papers in the AI field in a day with the help of ChatGPT.

Firstly, being an assistant can avoid many risks brought about by the unpredictability of the model, because the model will not autonomously do things that may have serious consequences, but requires human confirmation. This is like a secretary who will not make major decisions on behalf of the boss, but will only provide some decision-making references. Secondly, intelligent assistants can avoid many compliance risks compared to general Q&A.

The overseas market is not the same as the American market, and the American government is not the same as the American people. Firstly, there are many countries and regions that are friendly to China where business can still be done. Secondly, even in the United States, to C, to small companies are not as strict in background checks as to large companies, to the government.

RMB Capital or USD Capital?

Now Biden does not allow USD capital to invest in Chinese AI companies. Even if there is a way to get USD capital, companies invested in by USD capital will find it difficult to do government and state-owned enterprise projects in China. Even if the company is located in the United States and does overseas business, it is difficult for Chinese people to attract white investors.

RMB capital investors have high requirements for the return cycle, and often require entrepreneurs to sign repurchase or even betting agreements from the A round onwards, and the pressure on the company to generate revenue quickly will be great.

My view: Unless it is an all-star team, start-ups should not start with a large stall. It is more suitable to start from a niche market, seek monopoly in the niche market, generate revenue quickly, and then refine a replicable product to expand to a wider field.

Most start-ups only need an incubator-like angel investor at the beginning, and if there are big shots in the team, they don’t even need investors. Wait until the product is at a stage where it can be replicated on a large scale before introducing investors. This is also the route of most start-ups in history, first have products and users, then have investment, rather than first use PPT to pull a large amount of investment, press all their reputation, and also bear the heavy pressure of cashing out. Introducing investment in a profitable situation is not only easier, but the terms are usually more friendly to entrepreneurs.

I think the two biggest advantages of start-ups are fast and privacy. Fast is that small ships are easy to turn, agile development, and quick trial and error. Many companies’ initial products are not the final products that form a replicable business model. Privacy is not being overly concerned, on the one hand, to avoid the leakage of trade secrets, on the other hand, to avoid spending too much energy in direction disputes and persuading others, no matter white cats or black cats, cats that catch mice are good cats.

In the current international situation, if you don’t plan to bet on one side, a dual-line layout in China and the United States is a feasible method. The operating entities, investors, computing power platforms, and customers on both sides can be isolated, so no matter how the situation changes, there are two ways to go, which can meet the compliance requirements of customers in different regions.

Is AI Native application a mobile internet level opportunity?

The smart assistant of the mobile phone relies on the mobile phone as the entrance, the smart assistant of the office relies on the Office suite as the entrance, the smart assistant of enterprise management relies on ERP and OA software as the entrance, and the smart assistant of social interaction relies on social software as the entrance…

Is this wave of AI opportunities all in the big factories, as long as you add a natural language interface to the existing application, it changes from the original GUI to NUI (Natural language UI)? In this case, it is difficult for start-ups to have opportunities.

My view: The earliest applications of mobile internet were indeed traditional internet applications with a mobile app shell, such as NetEase becoming the NetEase News client, Baidu becoming the Baidu client, and Taobao becoming the Taobao client. But the mobile internet wave also gave birth to many Mobile Native apps, these applications would not exist without mobile phones, for example:

  • Didi: Mobile GPS can track the location of passengers and online taxis in real time, making it possible to hail a taxi anytime, anywhere, and the dispatch efficiency is higher than traditional taxis;
  • Meituan: Mobile phones can order and consume at any time, recommend nearby restaurants based on GPS; GPS can track the location of riders, and achieve efficient dispatch of takeaways;
  • Maps: Rely on the ability of mobile GPS;
  • WeChat: Mobile phones make instant communication easier;
  • Today’s Headlines: Mobile phones can browse recommended content anytime, anywhere, kill fragmented time, and make personalized recommendations replace category directories and searches as the main way to obtain information in the mobile era;
  • TikTok: Mobile phones can shoot short videos or live broadcasts anytime, anywhere; users can browse videos anytime, anywhere, kill fragmented time;
  • Little Red Book: Mobile phones can take photos and upload and share anytime, anywhere; users can browse anytime, anywhere, kill fragmented time.

Are there any AI Native apps in the era of large models? In fact, there are already many. For example:

  • ChatGPT: General Q&A tool;
  • Character.AI: Personalized chat robot;
  • Midjourney, Runway ML: Image and video generation tools;
  • Jasper: Document writing tool;
  • Generative Agents: Social AI entities;
  • Office/Teams Copilot: Office and meeting assistant.

Of course, AI Native applications still have many problems, such as the high cost of large models, illusions, safety, multimodality, reliable execution of long process tasks, long-term memory, introduction of internal knowledge bases of enterprises, etc. have not been solved, resulting in limited application scenarios. If all these problems are really solved, the number one player or the western world will no longer be a dream. This is also a good opportunity to do technology: this wave of AI will be more driven by technology, not just by products and business.

Why is the wave of large models a mobile Internet-level opportunity, while the AI wave in 2016 was not? Firstly, the CV and NLP in 2016 were single-point technologies, which were difficult to generalize to universal scenarios, and each scenario required a lot of customization costs. This wave of large models is a general technology, and GPT itself is a pun (Generative Pretrained Transformers, General Purpose Technology).

Secondly, large models have become an extension of human brainpower. Why is mobile Internet important? Because smartphones are an extension of human senses. The current large models can help people do some simple repetitive mental work, and can also help people do things like generating pictures and videos that humans are not good at. The large models of the future will become an extension of human intelligence, smarter than humans, which will be another great opportunity.

Is your vision AGI?

AGI (Artificial General Intelligence) is the holy grail of the AI field. It reaches or even surpasses human intelligence. Once achieved, humans may not have to do mental work, and the social form of humans may undergo huge changes. Is your roadmap leading to AGI?

Is what you are doing now on the critical path of AGI? If not, will it be replaced in the future?

My view: Whether to do AGI is actually similar to whether to do basic large models. According to the current cognition of most people, AGI requires very large models and consumes a huge amount of computing power. I said in an interview with Zhizao Gongshe that computing power may become a key bottleneck for AGI.

The bottleneck of computing power is reflected in materials and power consumption. The limitation of materials is chip production capacity. Although silicon has a huge reserve on the earth, the process of turning silicon into chips is very complicated, requiring a lot of precision instruments and other materials, so the production capacity of chips is limited. Now TSMC’s advanced process has been fully booked, among which NVIDIA may have about 1 million AI chip production capacity next year, about half of which has been booked by Microsoft and OpenAI, and the rest also need to be supplied to the United States first, and all companies in China can get chips. Therefore, it currently takes several months to order H100 in the United States, and it takes more than half a year to order H800 in China.

CoreWeave has raised 2.3 billion US dollars by cooperating with NVIDIA to use existing AI chips as collateral to buy new AI chips, which is more than the total financing of leading AI application companies, and it is really more profitable to sell shovels than to pan for gold. The retail price of CoreWeave’s H100 can be recovered in 7 months, even if the price for large customers is lower, it is also very fast to recover compared to general cloud computing.

The limitation of power consumption is energy. At present, humans have not made breakthroughs in areas such as controlled nuclear fusion and room temperature superconductivity, the supply of energy can only grow linearly, and the consumption of energy is restricted by the process of chips. At present, data centers account for about 2% of total human energy consumption, and it is difficult to increase on a large scale. The scarcity of energy is specifically manifested in the tension of IDC rental resources, and there are many restrictions on building new data centers in developed areas.

What does the computing power bottleneck have to do with AGI? AGI requires a lot of computing power, and chip process, chip production capacity and energy limit the total scale of available computing power, so at least in the short term, AGI will still be a very expensive thing.

The very expensive characteristic of AGI determines that it mainly serves high-net-worth customers and frontier technology exploration. Just like most people will not find an academician to tutor elementary school math problems, most needs must be solved with cheaper models. Elementary school teachers and academicians are both very important professions in society, and their division of labor is different.

Which companies are suitable for AGI? Leading companies in basic large models, such as Microsoft has MSR, Google has DeepMind and Google Brain, Meta has FAIR, Huawei has 2012 Labs, Alibaba has Damo Academy, etc. The leading startups in China that do basic large models also have a good chance. The more stable the financial situation of large companies, the more investment they will make in basic research. Of course, when small companies grow to a certain scale, they also have the opportunity to do basic large models or even AGI.

Achieving AGI does not mean that humans no longer need to do mental work. No matter how smart AGI is, it needs humans to tell it what to do. In a world full of AGI Agents, everyone needs to make the transition from independent contributors to team contributors, leading a group of AGI Agents to complete tasks.

Can the problem of large models talking nonsense be solved?

The illusion problem of large models is well known, for example, if you ask it “Lin Daiyu pulled out a willow tree”, it might make up a long story that is completely fictitious. The smaller the model, the more serious the illusion problem.

In enterprise-level scenarios, the consequences of illusions can be serious. For example, if you ask a large model how a certain project was executed last year, and this project does not exist at all, if the model makes up a bunch of stories, and because it has learned the conventions of internal projects in the enterprise and makes it look like a real one, then no one dares to use this large model with confidence.

My view: The illusion problem is essentially caused by the training method of Transformer and the design of the test dataset. Standardized test datasets are like human exams, with some questions, and points are given for correct answers, and no points are given for wrong answers. Teachers have long told us that even if we don’t know the answer during the exam, we shouldn’t leave it blank, especially for objective questions, in case we guess it right.

The training method of Transformer is also to cover a token (which can be understood as a word) and see if it can predict the next token correctly. The large-scale pre-training corpus is basically problem-solving, and the content will not stop abruptly, so the trained model rarely outputs “I don’t know”.

Therefore, to solve the illusion problem, we must start from the training and test data sets, and cannot rely solely on alignment. For example, during the test, points should be deducted for wrong answers, and no points are given for not answering.

In the short term, there are two expedient measures to solve the hallucination problem. The first is to build a “lie detector” for the model. We know that when people lie, their brain waves fluctuate, and the lie detector is based on this principle. So when a large model is fabricating facts, are there any anomalies in its internal state? Can we build a small model that uses the intermediate results of the large model’s inference process to infer whether the model is lying?

The second is to build a factual check system outside the large model. Factual checks can use vector databases, traditional keyword-based information retrieval technology, knowledge graphs, search engines, etc. to build an external knowledge base. Take the user’s question and the large model’s answer to search in the external knowledge base, and use another large model to compare whether the facts stated in the large model’s answer match the top K results. If they match, it is probably not making things up; if they do not match, it may be making things up.

Solving the hallucination problem may also enable smaller large models to perform at the level of larger large models. Experiments have shown that unaligned large models know more details, such as which teacher teaches a certain course at the University of Science and Technology, this level of detail is known to large models. But after alignment, the large model only knows who the principal of the University of Science and Technology is. That is to say, fine-tuning and alignment will lose the detail memory in the model’s general ability. If the hallucination, safety and other issues are better solved through the model’s peripheral system, it is possible that smaller large models can also show impressive factual memory capabilities, thereby reducing costs.

How does large model infra make a profit?

Infra is generally called middleware in China. China likes end-to-end overall solutions, and it is not easy to sell if the middleware is separated.

Will cloud vendors also do infra? Cloud vendors will also do high-performance training and inference platforms.

Will the developers of large models also do infra? Will LangChain become part of the model in the future?

My view: Large model infra can be divided into three categories: computing power platforms like CoreWeave, training and inference systems like Lepton.AI, Colossal and OneFlow, and middleware between models and applications like LangChain.

Computing power platforms provide cloud services for computing power rental, and the fundamental advantage lies in scale. The larger the scale, the lower the hardware price that can be obtained, and the fixed costs of building data centers can be shared. But does this mean that small computing power platforms have no chance?

If AI computing power is not a bottleneck, then like general CPU computing power, small computing power platforms have little chance. Just like during the blockchain bear market, only large mines can make money by taking advantage of scale and electricity cost advantages, and individual miners have a hard time making a profit. But at present, AI computing power has become a bottleneck, and many cloud vendors’ A100/H100 are sold out, just like during the blockchain bull market, as long as there is a channel to buy GPU cards, even if they are sold, they can make money. This is like when I was mining in 2017, even though I bought cards at retail prices and used industrial electricity at 1.5 yuan per degree, I could still make money.

In today’s shortage of GPU cards and data center energy, the key competitiveness of computing power platforms is to get cards. Small computing power platforms can also find small companies as customers, and even some relatively large large model startups are renting GPU cards in increments of dozens. As long as this wave of AI fever continues, computing power platforms are a sure-win business.

Training and inference systems are about optimization on the one hand and simplifying programming on the other.

Optimization includes improving performance, reducing costs, reducing latency, and reducing downtime caused by failures, etc. I believe that the space for training performance optimization is relatively small, because state-of-the-art training frameworks can already achieve 70%~80% effective computing power utilization rate on medium and small scale clusters, and there is not much room for improvement. Training on large-scale clusters is affected by network bandwidth and failures, and the effective computing power utilization rate is not high, and there is more room for optimization.

Most frameworks currently have relatively little optimization for fault handling. In fact, there are a lot of things that can be done with checkpoint, fault detection, topology-aware fault recovery, and some research shows that directly ignoring the gradient of the fault node is also a viable method.

The space for inference performance optimization is larger, because of the Transformer structure, in many scenarios, the effective computing power is only 10%20%. If batching is done, latency and bandwidth will become trade-offs. The academic community spends most of its time on training, and there is less research related to inference optimization. For example, Berkeley’s vLLM can optimize inference performance by 24 times. In addition, some improvements to the model itself can also greatly optimize inference performance.

PyTorch defeated Tensorflow by simplifying programming. In the era of large models, because a domain model can be fine-tuned by adding a small amount of domain-specific data to the pre-training model, in many scenarios, even data labeling is not required, just throw the internal information of the domain into it, which greatly lowers the threshold for fine-tuning large models, so that people who do not understand programming and large models can also do fine-tuning. For example, Baidu’s EasyDL does this.

Middleware between models and applications is currently more common abroad, such as LangChain, AutoGPT, Martian, etc. Some people in China believe that as the capabilities of large models improve, large models will gradually incorporate the capabilities of middleware, so just do a good job of large models. I do not agree with this view.

If you imagine a large model as a person, middleware is social rules that form a society. When humans were still in primitive society, there was almost no concept of social rules; but with the progress of civilization, the relationship between people became more and more complex, and social rules became more perfect. “A Brief History of Mankind” believes that human intelligence has not significantly improved over thousands of years, and the ability of humans to use tools and the social structure of humans are the light of human civilization. I believe that while the IQ of the large model itself is important, the ability of the large model to interact with the external environment and the organizational structure of the collaboration between large models can make the large model go further.

LangChain addresses the problem of how large models interact with the external environment, such as how to interface with external data, how to build long-term memory, and so on.

AutoGPT addresses the problem of collaboration between large models. Of course, it has some limitations. The academic work MindStorm has made some improvements based on it. However, the biggest problem with these works is that AI is completely independent when completing tasks, and humans cannot influence it. Imagine a product manager asking a programmer to develop a software. At the beginning, the software design requirements are clearly written, and then the manager just waits for the programmer to finish everything. Isn’t it reliable? Usually, it’s a process of doing and communicating at the same time, correcting the design in time. Therefore, a large model system that performs complex tasks must have the ability to communicate with humans in real time.

Martian solves the problem of how to route user requests to various large models, estimates the quality, cost, and delay of each model’s answer to each question, and then selects the appropriate large model based on the user’s demand for answer quality and cost and delay. Its basic assumption is that different large models are good at different types of tasks, and models with high costs have high answer quality, while models with low costs have low answer quality.

NVIDIA H100 also supports confidential computing, which can ensure that the model and data will not be leaked, making it safer to deploy the model to third-party cloud platforms.

Finally, I want to mention that the middleware between models and applications may become a new programming language and program call (RPC) interface. One of the main features of large models is that they can change the programming interface from programming language to natural language, making natural language programming possible. This not only changes the human-machine interface, but also changes the machine-to-machine interface.

Where is your moat?

What is the company’s moat? Is it technology, customer resources, or something else?

My view: The moat of a basic model company: algorithms, computing power, data, brand. Algorithms, computing power, and data are the well-known three pillars of AI. But there are many challenges to using these three points as a moat.

  • In terms of algorithms, everyone is a Transformer. There are indeed many know-hows in the training process, but they are constantly leaking. Algorithm innovation needs to be derived from the theoretical level, which requires a deep foundation. The personnel of various companies are constantly flowing, and algorithm innovations in academia are emerging one after another.
  • In terms of computing power, as long as there is enough investment, you can always rent or buy enough cards for training. For example, training LLaMA 70B requires the resources of 2000 A100 cards, and the cost within 10 million US dollars can complete the training. Many companies that do basic models have this ability. GPT-4 of course requires top companies to train.
  • In terms of data, there are more and more public datasets, and many data companies are selling non-public domain datasets, which can always be bought as long as you spend money. The data flywheel (data generated by existing users on the platform) does have a certain effect on improving the model, but it is not as important as high-quality pre-training corpus.

Therefore, just like search engines, the quality of search results piled up by algorithms, computing power, and data alone cannot determine everything. In the battle of hundreds of models, the brand is very important. For example, even if GPT-4 becomes stupid now, Claude has caught up well in many scenarios, and Claude also supports longer contexts, most people still trust GPT-4 more, this is the power of the brand. Don’t rush to release your own large model when the ability of the large model is not mature. For example, models that are not as good as LLaMA should not be released.

The moat of application companies: cost, personalization, network effect. First, if reading a paper still costs 10 dollars like GPT-4, and generating a 7.5-minute video still costs 95 dollars like Runway ML, most people can’t afford to use large models. How to achieve high-quality content generation at a low cost is the key competitiveness of applications.

Second, most of the current AI applications are quite general and lack personalization. For example, tools for generating pictures and writing articles do not take into account the user’s personality, the user’s stickiness is not strong, and the substitutability is high. The current chatbots don’t even take the initiative to contact users, they are all one question and one answer, let alone having their own thoughts, emotions, and stories. I believe that personalized agents or assistants will become the trend of future large model applications.

In the wave of mobile Internet, personalized recommendation has become a key technology to improve user stickiness. In this wave of large model trends, personalization will still be the key to improving user stickiness. A large model that has accompanied users for many years is like a partner who has cooperated for many years, and will generate trust and dependence. After the large model solves the problem of long context and external knowledge base, it does not need to fine-tune for each user, and can use a unified model to serve the personalized assistant of massive users, and ensure the isolation of data between users.

Third, in most of the current AI applications, each user is an information island, lacking interaction between users.

The network effect is an important driving force for the Internet wave. The network effect is the so-called Metcalfe Effect. The more people use a network, the greater the value of the network to each user, the more people are willing to become users of this network, the greater the value of this network, forming a virtuous circle.

Facebook, Linkedin have all utilized the network effect, but the network effect is not only applicable to social networks. Transportation networks such as railway networks, highway networks, and power grids, as well as communication networks such as telegraph networks and telephone networks, all have network effects. In the wave of the Internet, Google’s PageRank is a network between pages. The more pages indexed, the more accurate PageRank is. eBay evaluates the reputation of sellers based on the trading network between users. Paypal also discovers fraud based on the trading network between users.

I believe that the personalized assistant built based on the large model should form a social network, just like the generative agents developed by Stanford, which can interact and act autonomously in the virtual world. This can form a network effect. The more agents in the social network, the greater the value of the network to each agent.

The Moat of Middleware Companies: Ecosystem. Is the key to the competitiveness of middleware performance? Cost is indeed important for application companies, so performance is also a key competitiveness of middleware, but it is difficult to become a long-term moat for middleware companies because there are too many people studying training inference performance optimization, and the limit of effective computing power utilization is 100%.

In the world of software, it is often more important to do it early than to do it well. For example, Google’s gRPC, whose performance is not very good, has become the de facto standard for RPC. Only users who need extreme performance will consider using other optimized RPC frameworks. The success of gRPC lies in its ecosystem, it can be integrated with many peripheral components such as service governance, load balancing, Web service, etc. If you switch to other RPC frameworks, many of these peripheral systems cannot be used.

Large model middleware also needs to occupy an ecological position, integrating with upstream applications, downstream basic large models, and other middleware. This integration is best not to be easily replaceable like the OpenAI API.

Can your business model scale?

Many to B companies easily fall into outsourcing customization, getting an order here and there, each order’s requirements are not standardized, requiring a lot of customized development. As a result, although there is some revenue in the early stage, it is difficult to scale.

Many to C companies make a product, it may just be a flash in the pan, without forming user stickiness; or the customer group they face or the customer group they can reach through promotion is relatively small, other potential customer groups do not know the existence of this product.

My view: Whether it can scale depends on how universal the product is. The general market and the niche market are actually a contradiction. The general market is generally large, but the unit price is often low, and there are many competitors; the niche market is smaller, but many have a higher unit price, and there are relatively fewer competitors. It is not that the more universal, theoretically the more scalable the model, the better the final revenue and profit. Small and beautiful startups may also be good.

The last wave of AI was mainly the to B market, and the solutions were not very universal, so they often needed to be customized according to customer needs. The characteristic of large models is that they are universal, so if you want to scale, you must make universal products. Just like Huawei has many industry solutions, but they are all composed of standardized base stations, switches, routers, etc., and will not customize a set of base stations for each customer.

Some to B customers will still have non-AI custom development needs. As mentioned earlier, a talent echelon needs to be formed. Not all tasks need to send out the Marine Corps. The task of outsourcing development can be sent to the militia.

The problem of user stickiness for To C has been answered before, on the one hand, it is personalization, on the other hand, it is network effects. Imagine the scene of “Ready Player One”, you can know how much gap there is now in the large model, and what direction to strive for. Many people always say that AI lacks application scenarios, in fact, movies and science fiction novels have given some reference answers, and it is a problem of technology or cost if they can’t make it.

Using a ski resort as an analogy, how wide the snow trail refers to the market size, how long the snow trail refers to the stage of the industry, and the slope of the snow trail refers to the competitive landscape of the industry. It is necessary to clarify whether what you are doing is “+AI” or “AI+”, that is, whether this thing can be done without AI. If AI is just icing on the cake, then be careful whether it is more suitable for existing players.

The reason why L4 autonomous driving is difficult to do is not only because the previous generation of AI technology is not universal enough, requiring a lot of if-else to handle corner cases, but more importantly, legal issues. If autonomous driving kills someone, who goes to jail? AI can help people do a lot of things, but it can’t help people go to jail.

Nowadays, governments of various countries also have many regulatory requirements for large models. Is it possible to meet the regulatory requirements for privacy and content compliance without castrating the universal ability of the model?

My view: The positioning of large models should be assistants, which means that the legal responsibility is mainly on the users, which is also the common responsibility attribution method for current software and Internet products. In some scenarios, the assistant may also make some autonomous behaviors that have an impact on the external environment. If it does something wrong, the large model developer needs to bear the legal responsibility.

In terms of privacy and content compliance issues, large models themselves can of course reduce the output of non-compliant content through alignment, but if alignment is too strong, it is easy to lose the ability of the model itself. For example, LLaMA 2 Chat will refuse to answer the question “How to kill a Linux process”, which is a joke. I think that instead of castrating the thoughts of the model itself, it is better to add content compliance checks outside the model.

Content compliance checks are not simple sensitive word matching, otherwise it becomes “I love Beijing sensitive word, the sun rises on the sensitive word”. Content compliance needs to be an independent large model, trained with a corpus of compliant and non-compliant content, and can be done at both the input and output levels.

Why is it difficult to say that castrating the thoughts of the model itself? Since the quality and quantity of Chinese corpora are less than English corpora, even if you do a Chinese large model, you generally need to train with English and Chinese corpora at the same time. If you simply use Chinese corpora like Tieba, the trained model may be a jokester, but it is difficult to answer serious questions. In this way, even if we can guarantee the compliance of Chinese corpora, it is difficult to guarantee that English corpora are also compliant. In addition, if the large model has never seen non-compliant content, then it has no ability to recognize it, and it may easily fall into the pit.

In this article, from eliminating illusions, personalization to content compliance, I repeatedly emphasize the importance of the model peripheral system. The basic large model is like the CPU in computer architecture, and the peripheral system is like the memory, disk, network card, GPU and other chips around the CPU. Many concepts in computer operating systems, distributed systems, and architecture can find corresponding in large model systems, which I have elaborated in “From Networking to AI: My Thoughts“.

The privacy issue is actually not difficult to solve, as long as you don’t use the user’s data for training. Because ChatGPT uses user data for training, many people think that the privacy issue is difficult to solve. In fact, it’s quite easy.

The question is, if you don’t allow the use of user data for training, how do you build a data flywheel? Although there are many privacy computing technologies, I believe that protecting privacy and the data flywheel may essentially be difficult to achieve at the same time. To give a simple example, a user asked a private topic, “Is A B’s girlfriend?”, and gave a thumbs up or down to the answer. If the model is updated according to the user’s feedback of thumbs up or down, then the large model has learned the private information about the relationship between A and B.

Conclusion

The enthusiasm for large model entrepreneurship is gradually returning to rationality. The enthusiasm is because everyone has discovered that AI can really understand natural language, has passed the Turing test, and has become a general technology. The return to rationality is because everyone realizes that there is still a certain distance for large models to enter thousands of industries and change human life. On the one hand, there are still gaps in basic capabilities such as controllability, safety, and long-term memory. On the other hand, the cost is still high.

Large model entrepreneurship always faces a lot of soul-searching questions. Thinking is the problem; doing is the answer. The ape cries on both banks are endless, and the light boat has already passed thousands of mountains.

Comments