The Crossroads of AI: Professional Models and Personal Models
(This article was written by the author in November 2024 at the invitation of Open Source China for the “2024 OSChina Annual AI Review”)
In 2024, large models truly began to be implemented, with most tech workers using at least one large model to enhance efficiency in their work. Many national-level applications and mobile phone manufacturers have also integrated large models. Large models are starting to diverge into two directions: professional models and personal models.
Professional Models
Professional models are designed to enhance productivity, such as AI-assisted programming, writing, design, consulting, education, etc. Once the model’s capabilities reach a threshold, professional models will bring high added value. In 2024, professional models have already been implemented in many fields. For example, AI-assisted programming can more than double development efficiency, with API call or IDE subscription costs of just tens of dollars per month, equivalent to engineers costing tens of thousands of dollars per month. AI-generated images, podcasts, live broadcasts, etc., can increase the work efficiency of artists, voice actors, and hosts by hundreds of times. AI consulting services in psychology, law, and medical fields can reach the level of junior professionals, with hourly charges significantly higher than the model costs. AI virtual foreign teachers can already rival real foreign teachers, and due to standard pronunciation, the effect even surpasses most domestic English teachers. In the future, AI-assisted teaching will change the traditional one-to-many teaching model, making one-on-one AI teaching possible and significantly improving the efficiency and quality of human teachers’ content preparation.
Professional models are a combination of general large models and vertical domain data and workflows. Here, the foundational capability of general large models is key. A world-leading general large model combined with a RAG (Retrieval-Augmented Generation) industry knowledge base often achieves better results than a weaker model fine-tuned with some vertical domain data. Therefore, although professional models have high training and inference costs, considering the high premium space, the investment is worthwhile.
Due to the generality of general large models, it is difficult to establish differentiated barriers and form network effects, so the competition among foundational model companies will be very fierce, with computing power becoming the key to long-term competitiveness. For large companies, it is crucial to concentrate computing power, data, and talent and maintain organizational efficiency. Startups need more financial support or deep cooperation with cloud computing platforms or chip manufacturers to compete at the highest level of professional models. An exception is generative models based on diffusion models for images and videos, where the creative demand is simple and may not require such a large general language model, presenting an opportunity for differentiated competition.
As the programming capabilities of professional models improve and AI Agent workflows mature further, low-code programming will become possible, allowing many ideas to quickly turn into applications, significantly reducing the trial-and-error cost of application entrepreneurship. In the future, even a “billion-dollar company with only one person,” as Sam Altman mentioned, may emerge. Due to the reduced cost of customized development and knowledge collection and organization, many real-world workflows will be transformed into industry applications through Agent workflows, and a large amount of scattered industry knowledge will be converted into structured industry data, solving the customization development challenges in the digital transformation of traditional industries.
In 2025, for software with low technical difficulty, the role of programmers will gradually transition to architects + product managers + project managers, only needing to break down software development projects into tasks with a granularity of less than an hour, clearly described and automatically verifiable results, and hand them over to AI for development, then manually accept the results and iterate requirements. Therefore, for programmers, soft skills such as requirement expression and communication, as well as hard skills like system architecture design, will become increasingly important because AI needs humans to express requirements clearly to perform well, and complex system architecture design and problem-solving still rely on humans.
Models with strong reasoning capabilities, represented by o1, aim to expand the boundaries of human intelligence and will reach human expert levels in limited fields such as mathematics and programming by 2025. Models represented by Claude 3.5 Sonnet will reach human average levels in completing general tasks through GUI operations, allowing AI to complete clearly described, non-creative work end-to-end like junior employees. Based on reasoning and GUI operation capabilities, AI Agents will reach a reliability threshold in 2025, truly becoming human copilots, automating repetitive work, and helping employees solve unfamiliar field problems, accelerating employee growth. Companies that first adopt Agent workflows will gain significant competitive advantages.
Professional models are a necessary path to AGI. The CEO of Anthropic predicts that in the next five years, professional models will surpass human expert levels in almost all research fields, accelerating human scientific progress tenfold, and in 15 years, human lifespan is expected to reach 150 years. However, whether AGI can be achieved, the greatest uncertainty lies in technology and funding. Technologically, some leading large model companies have found that the capabilities of Transformers have “hit a wall,” with most high-quality corpora already used, and the capability boundaries and generalization of reinforcement learning are yet to be verified. Financially, some think tanks predict that AGI will require trillions of dollars of investment, and the energy consumption of chips will double the overall energy consumption of humanity. If AGI is achieved, it will significantly change the international competitive landscape and human lifestyle.
Personal Models
Compared to professional models, which are more like the “Apollo Program,” personal models do not require such large training investments and are easier to monetize. Personal models aim to help ordinary people improve their quality of life, such as life assistants, travel assistants, phone assistants, etc., turning scenes from sci-fi movies like “Her” into reality.
It is generally believed that models with both GPT-4o multimodal capabilities and o1 reasoning capabilities can meet the needs of personal models. Currently, top domestic closed-source and open-source models are close to the technical goals of personal models, with inference costs significantly lower than OpenAI’s current pricing. However, the costs of end-to-end multimodal models and reasoning models are still high, and they are not stable enough in some scenarios. But since 2023, there has been a “Moore’s Law-like” trend of model knowledge density doubling every eight months, coupled with hardware Moore’s Law and inference framework optimization, in one to two years, the cost of using personal models will reach a level where users can use them anytime, just like internet applications, and can be profitable through advertising and premium feature subscriptions.
Models with strong reasoning capabilities like o1 do not necessarily need to be large. Reasoning capabilities will become a standard feature of personal models, and models that frequently make calculation errors will be eliminated. The improvement in model reasoning capabilities will allow Agent workflows to reliably handle complex tasks, truly saving users time and even completing information collection and analysis beyond human capabilities.
Personal models on the end side of mobile phones, PCs, and spatial computing devices, combined with Agent workflows, will be sufficient to meet most daily needs that humans can respond to instantly, and smart cars may become the computing center of the home. Cloud models, as a supplement to end-side models, will be used to handle more complex tasks that require some human thought, as well as to handle a large number of repetitive tasks and data. The multimodal capabilities of models will make AR/VR and other spatial computing devices a more natural human-computer interaction entry point. Multimodal and reasoning capabilities will also enable embodied intelligence to truly possess general perception, planning, and control capabilities.
Top professional model companies have the highest quality data, so they can distill the highest knowledge density personal models. However, due to the lower inference costs of personal models, models with slightly lower knowledge density may still have a market. Due to the lower training costs, personal models will flourish in the future, and AI companies will find it difficult to establish a moat based solely on the model itself, with the importance of products outweighing model capabilities.
The key to AI products aimed at personal life and entertainment is user interaction. Currently, excellent AI applications are no longer just about generating text. After Claude Artifacts, AI generates code, runs the code, and produces text and image-rich answers, intuitive charts, multimodal podcasts with explanations, and even interactive mini-games and mini-applications, which have become the new paradigm of AI applications.
Before the cost of personal models is reduced to a level where they can be used freely, commercially successful applications may have a higher “read-write ratio,” meaning the content generated by the model can be used multiple times by users. One model is content communities, where creators use AI to generate content, and a large number of users access this content; another model is where a high proportion of user questions are repetitive, such as photo-based question searches, generating research reports, etc.; another model is where AI is only used in the creation stage, such as AI-assisted game and video production, industry data collection and integration, etc.
Overall, AI applications are currently in the “iPhone 1” era, with model capabilities, application ecosystems, and user habits rapidly evolving. The saying “AI a day, human a year” reflects that even AI experts find it challenging to keep up with all the latest scientific advancements. The era of large models has just begun, and the best way to predict the future is to continuously learn, explore, utilize AI capabilities, discover one’s true interests and pursuits, and thus create the future.