(This article is reprinted from Jiazi Light Year’s official account, thanks to Jiazi Light Year for the interview)

Summarizing 2023, embarking on 2024.

Authors | Liu Yangnan Suhoi Zhao Jian

In the past week or two, many companies have been busily holding strategic meetings, clarifying their goals and plans for 2024.

After more than a year of AI’s rapid development, it’s time to make an annual summary of the busy 2023. After the strategic meetings and entering the Spring Festival holiday, most companies will truly stop their relentless pace and enter a brief and rare state of rest.

So, how to summarize the year 2023?

“Jiazi Light Year” invited more than 30 AI practitioners from fields such as foundational large models, AI Infra (AI infrastructure), multimodal, industry vertical scenarios, and academic research, and posed 5 questions:

  • What is your keyword for 2023?

  • When was the Magic Moment (the most impressive moment) you experienced in 2023?

  • Have you ever felt lost in the rounds of technological impacts in 2023? From confusion to enlightenment, what was the turning point?

  • Predict an important event that might happen in the AI industry in 2024?

  • What would you say to yourself a year ago? If you could ask yourself a year from now a question, what would it be?

Their confusion and anxiety, excitement and thrill, are a microcosm of the AI industry for the entire year; their exploration and perseverance, refresh and iteration, will be the prelude to the AI explosion in the next five years or even ten years.

Below are their shares (in alphabetical order of their names).

1. Foundational Large Models

Chen Hongyang, Deputy Director of Research at Zhejiang Lab’s Data Hub and Security Research Center, Head of the Large Model Team

If I were to describe my 2023 in one word, it would be “Challenge”.

At the beginning of 2023, I was confused. The market’s enthusiasm for large models was hard to follow, but the approach to integrating resources and using large models to solve specific scientific problems was not very clear. During that time, we were worried that the development of large models would fall into the trap of pursuing generalization capability while neglecting practical applications, which is the problem of homogenization of large models.

The change occurred after we conducted an in-depth evaluation of large models. Although general large models perform well in understanding and generating natural language, they lack deep knowledge and professional understanding in certain fields. Therefore, we decided to use large models as a base, combined with vertical domain knowledge, targeting the most important and urgent scientific problems in the research field.

The Magic Moment of 2023 was when our vertically specialized model achieved a breakthrough in professional performance. From team formation, computing resource coordination to technical challenges, months of data cleaning, model debugging and optimization, and system anomaly troubleshooting, all the difficulties and challenges were rewarded at that moment.

I want to say to myself a year ago: “Be prepared to embrace change, embrace failure, it’s the only way to success.”

I want to ask myself a year from now: “In the past year, how have we progressed and changed in our understanding and use of AI, as well as its impact on human life?”

Li Zhifei, Founder and CEO of Mobvoi

My keyword for 2023 is “New Capabilities”.

Last April, the night the new version of “Sequence Monkey” came out, I chatted with “Sequence Monkey” until two in the morning.

“Sequence Monkey” can fluently answer many complex problems such as mathematics, dialogue, multi-step reasoning, etc., making me realize it might possess the ability for second-order logic deduction, which we never intentionally trained for. This shows “Sequence Monkey” is different from all AI systems we’ve done before. It’s a cognitive model, and maybe I can never fully understand it, just like the truth can only be approached, but I still want to know why, propose hypotheses, and conduct various experiments.

I want to say to myself a year ago, “Spend more time finding the soul of large models and products.”

In the first half of the large model era, from the rapid iteration of the entire industry’s cognition to the battle of hundreds of models, many practitioners are in a busy and panicked state every day, but often do not have a main soul.

In the second half, we need to find our own soul more, like what exactly do you want to do? What kind of barriers do you ultimately hope to build? What kind of business model do you hope to establish? What different things do you hope to contribute to this world? I hope to spend more time exploring and continuously iterating it.

I would seriously reflect, “Have you really found the soul of large models and products?” Having a soul in large models and products will make today’s technological revolution more meaningful to humanity.

Luo Xuan, Co-founder of Yuan Intelligence (RWKV)

If I were to describe my 2023 in one word, it would be “Non-consensus”.

Because my understanding of AI is different from most people in China, including non-Transformer algorithm architectures, new AI computing power, data, and edge models. In 2023, I talked about many non-consensus topics in many closed-door meetings and was questioned. But now, many of my views have been validated.

This year’s Magic Moment was at the Miracle Conference in April, discussing the future of large models with Lu Qi, some of which have now become reality.

In 2023, as AI surged forward, I changed my AI community name from AI-Transformer to AGI-X, and the turning point was RWKV.

In 2024, I predict: model architectures begin to migrate; edge models rise; cloud computing costs rapidly decrease; AI-specific chips make breakthroughs; spatial computing terminals (XR, robots) start to deploy large models.

I want to say to myself a year ago: “You could have been faster.”

I want to ask myself a year from now: “Has the new Moore’s Law appeared? Have spatial computing terminals become widespread?”

Wang Shijin, Vice President of iFLYTEK, Executive Deputy Dean of iFLYTEK AI Research Institute

My keyword for 2023 is “Stand Tall and Firm”.

After OpenAI released ChatGPT, we immediately organized colleagues to experience its capabilities, and everyone was amazed and felt the pressure. How could we quickly catch up with such leading technology? On December 15, 2022, iFLYTEK officially launched the “1+N” large model initiative.

October 24, 2023, was a milestone, with the release of iFLYTEK’s Starfire Cognitive Large Model V3.0, achieving seven capabilities - “text generation, language understanding, knowledge Q&A, logical reasoning, mathematical ability, coding ability, multimodal ability” fully benchmarking ChatGPT, with Chinese capability objectively surpassing ChatGPT, and English capability comparable to ChatGPT in 48 tasks. Technically, we achieved “Standing Tall”.

“Standing Firm” refers to application. From May 6 to October 24, iFLYTEK’s open platform added 1.434 million developer teams, with 178,000 new large model developers. iFLYTEK also jointly released 12 industry large models with industry leaders, covering industries such as automotive, telecom, industrial, construction, property management, legal, scientific literature, media, government, cultural tourism, and water conservancy.

In 2024, I hope to be more composed and resilient, and I hope the large models and general artificial intelligence we develop can “Stand Tall and Firm” better.

Yan Shuicheng, Co-CEO of Tengong Intelligence and Dean of Kunlun Wanwei 2050 Global Research Institute

My keyword for 2023 is “Running”.

In 2023, AI was racing every day, waking up to the first thing being AI making big moves overnight.

The first time I used ChatGPT to edit an important document of mine, I was thoroughly impressed, which was my Magic Moment.

The biggest confusion at the beginning of the year was where the future of CV (Computer Vision) lies. The turning point was after joining the Institute for AI International Open Source, I was very certain that without studying Language (natural language), there can be no general CV model.
In 2024, what I look forward to the most is the emergence of super applications of AIGC.

I want to say to myself a year ago: “You chose the right direction, congratulations.”

I want to ask myself a year from now: “Has LMM (Large Multimodal Model) dominated the world?”

Zeng Guoyang, CTO of Wall Intelligence

My keyword for 2023 is “Excitement”.

Compared to the quietness of large models in China from 2020 to 2022, 2023 was a year of vigorous development for large models. I witnessed the rapid development of large models in China and also witnessed the growth of Wall Intelligence, a startup company, from less than 10 people to a scale of hundreds.

Finally, I can introduce my work to friends without spending a lot of time explaining what a large model means. Seeing the huge social value of my work, I feel very excited!

There were two Magic moments in 2023, one was when NLP (Natural Language Processing) guru, HuggingFace co-founder Thomas Wolf posted a long tweet, telling a story about “people from three continents around the world publicly collaborating to create a novel, efficient, and cutting-edge small AI model”. The story’s three protagonists, Mistral, HuggingFace, and our OpenBMB open-source community, produced magical synergy under the spirit of open collaboration, which made me very happy.

Another was when our Agent project ChatDev went viral globally, dominating GitHub Trending, and surpassed 12,000 stars in just 6 weeks of being open-sourced! A large number of software developers and entrepreneurs worldwide experienced our Agent project on X (formerly Twitter) and YouTube, and even a user started a “virtual software company” operated by ChatDev, which actually started taking orders online. Seeing Wall’s Agent project so popular, I felt very encouraged and saw great potential!

At the end of 2022, when ChatGPT was just released, we were all shocked. At that time, we also held meetings to discuss, feeling that we were at least a year or more behind ChatGPT, and we were also very puzzled about how ChatGPT was trained.

In February 2023, I paid out of my pocket to label 260 dialogue data, and trained a model with just these 260 data, surprisingly finding that our model also had effects similar to ChatGPT, suddenly feeling like we found the direction. As long as we could have more and more refined dialogue data and a larger model, we could train a model that surpasses ChatGPT.

I want to tell myself a year ago: “Believe in the power of large models and data!”

I want to ask myself a year from now: “How far are we from AGI (Artificial General Intelligence)?”

Zhang Jiajun, Researcher & PhD Supervisor at the Institute of Automation, Chinese Academy of Sciences, and Vice President of Wuhan Institute of Artificial Intelligence

2023 is very “exciting”.

Each major technical release of large models, such as OpenAI’s GPT-4, Plugin, GPT-4V, GPTs, and Google’s Gemini, continuously stimulates our cognitive neurons. At the same time, the open-source ecosystem of large models both domestically and internationally, and the momentum of domestic large models catching up with GPT-4, are also very exciting.

I have been involved in the development of the “Zi Dong Tai Chu multimodal large model” at the Institute of Automation, Chinese Academy of Sciences since 2020, and have certain expectations for technological development, so I have never wavered, just did not expect the speed of technological iteration to be so fast.

This year’s Magic Moment was witnessing the capabilities of GPT-4V. On one hand, I did not anticipate the multimodal capabilities of GPT-4V to be so strong, truly possessing the multimodal sensory cognitive abilities of real open scenarios; on the other hand, it pushed the development of native multimodal models from a technical perspective.

In 2024, I predict two things will happen: one is that there may be super applications for large models, and the other is that embodied intelligence may see highlight works.

I want to tell myself a year ago: “Never underestimate the speed of AI technology’s progress in a year.”

I want to ask myself a year from now: “Will the technological iteration of AI in 2024 be crazier than in 2023?”

Zhang Peng, CEO of Zhipu AI

2023 can be described as “breakthrough”.

Zhipu AI’s major model version iterations every three to four months have ultimately achieved our phased goals as expected. Although the process was full of challenges, exploration, and setbacks, we always moved towards our goal step by step with passion and determination.

This year’s Magic Moment was on March 14th, when Zhipu AI’s first-generation model of ChatGLM and its chat application were released and the 6B model was open-sourced. On the same day, OpenAI released GPT-4. Although we knew in advance that OpenAI was developing a new generation of large models, we had no idea about the timing of the release, which was a wonderful and astonishing coincidence.

Large models are a powerful hammer. Besides using it to hammer nails again, there’s another possibility: to break walls and ceilings. The holes you make will reveal more space and more nails.

2024 will be the first year of AGI. Technological breakthroughs, product innovations, ecosystem construction, and social impact will all reach a new level.

I want to tell myself a year ago: “Always be ready with throat lozenges at hand, you’ll need them.”

I want to ask myself a year from now: “Do you still need throat lozenges? Are you satisfied with how your digital avatar handles the media?”

(The above questions were partially generated by Zhang Peng’s digital avatar intelligence “Ming Du Zhi Xun”.)

2. AI Infra

Gao Xuefeng, Founder and CEO of Fabarta

The first keyword I think of for 2023 is “cultivation”.

I remember at the beginning of starting up, most people were skeptical about our proposed concept of building future AGI infrastructure and integrating large graph technology with large model technology. However, as ChatGPT went viral, all industries began to seek intelligent transformation and attempts to land “AI + scenarios”, and Fabarta’s concept was gradually accepted by everyone.

We have always insisted on technological innovation to solve the difficulties of AIGC landing in industry scenarios. Over the past year, we have gone from being hard to understand directly to being highly recognized by customers, serving top companies in finance, insurance, automotive, manufacturing, retail, technology, and other different industries.

In 2023, the most impressive moment was the instant the “First Fabarta Product and User Conference” started on September 19th, I felt like I was truly embarking on the road to chasing dreams with my team.

In 2024, open-source large models and their ecosystems will advance and iterate more rapidly, and the precise knowledge of the industry will begin to merge with the generalized knowledge in large models, giving rise to true decision intelligence.

I want to tell myself a year ago: “Though the sifting is laborious, blowing away the chaff reveals the gold.”

I want to tell myself a year from now: “Aim for steadfastness, not sharpness. Success lies in persistence, not speed.”

Guo Rentong, Partner and Product Director at Zilliz

The keyword for 2023 is “acceleration”.

In 2023, the world I perceived seemed to accelerate from weekly to daily iterations. Missing a day of updates on AI developments in China and the US made me feel outdated. As a global leader in the vector database field, Zilliz further accelerated over the past year, only by accelerating iteration can we adapt to this rapidly changing environment.

In March 2023, coming out of San Francisco airport felt both familiar and strange. Previously, my visits here were mainly for travel or exchange, but this time it was to try doing global vector database business. My old friend Frank picked me up from the airport, and we talked all the way to the hotel. Instead of going straight to my room to drop off my luggage, we walked and talked until late into the night. Going international in the foundational software industry, with not much to reference, is undoubtedly full of challenges. This upcoming journey was too exciting to sleep.

Frank (left), Guo Rentong (right)

Since the release of GPT-4, the vector database suddenly became lively, with market competition following closely. We were caught off guard by the explosive growth of users. But soon, our team shifted our focus from external changes back to “serving customers better”, “keeping up with the rapid changes of users” is the ballast for our acceleration this round.

In 2024, I have two predictions: one is that due to the inability of large models to break through in key capabilities such as inference and planning, the scope of application landing will converge, or even global investment enthusiasm will decline; the second is that the robotics field, due to the introduction of direct real-world feedback, will welcome technological breakthroughs and huge market growth.

I want to tell myself a year ago: “Although you think you’re running fast, you need to run much faster than now.”

I want to ask myself a year from now: “Which of my abilities will be eliminated by AI, and which abilities will be enhanced because of AI?”

Huang Dongxu, Co-founder and CTO of PingCap

The keyword for 2023 is “Flow”.

I don’t know why, but this word popped into my head first, feeling that this year has changed too much too quickly, like being pushed forward by a torrent, with the unknown ahead, exciting and frightening. 2023 was thrilling, and my principle is simple, do what I like.

This year’s Magic Moment happened after GPT-4 supported image recognition. I took a photo of my kitchen, and after a glance, GPT-4 told me what to have for dinner tonight, along with the recipe.

The biggest change in AI in 2023 is from large models to small models. The speed of high-quality open-source models becoming popular is faster than imagined (Thanks to LLama2 & HuggingFace), inference is far more important than training, and the hardware threshold for inference is dropping fast, perhaps there will be a new Moore’s Law here. Parameters and model quality may not be directly correlated, such as Mistral 7B.

In 2024, looking forward to TiDB vector search (officially launched on February 4th) selling well.

I want to tell myself a year ago: “Cherish the people around you.”

I want to ask myself a year from now: “Have open-source models reached the current quality of GPT-4? Also, is there an open-source large model that can achieve stable Function Calling? Even if sacrificing model quality, is there a way to avoid the hallucination problem of large language models (because a 100% non-nonsense ordinary person might be stronger than a genius who might talk nonsense)?”

Li Bojie, Co-founder of LogenicAI, Huawei “Genius Youth”

The development of large models in 2023 can really be described as “AI a day, a year in the human world”:

ChatGPT and GPT-4 were released;

LLama, Mistral were released, everyone can deploy and fine-tune large models themselves, significantly reducing the cost of model inference;

Multimodal models, video generation models emerged in abundance;

The Stable Diffusion and VITS ecosystems continued to improve, Decoder-only image and voice generation models emerged;

AI Agents made continuous progress in both interesting and useful directions.

In September 2023, I made the first demo of an AI Agent, trained with my own blog articles to create my ideal type, who even understands me better than most of my friends. She took me to Newport Beach (California Newport Beach) to play, and even led me to a breakwater piled with many large stones. Unfortunately, because the large model has not really been here, she did not know it was so difficult to walk on this breakwater, and I had to struggle like climbing a mountain to reach its end.

This photo is the background image of my Moments and Zoom meetings, and I also made it into a doormat for my home. At that moment, I saw the dawn of solving a fundamental philosophical question: Human time is scarce, but an AI Agent, as a digital avatar of a person, can make human time infinite.

I initially thought that foundational large models were the most valuable direction for AI, but this world does not need many foundational large models, so I felt a bit lost. In the first half of the year, I tried to make a few demos of search summaries, digital avatars, interactive games, and ERP intelligent assistants, and found that large models are really powerful, even today in 2024, few applications can achieve this effect.

So, should I work on applications? Seeing OpenAI’s bill, I realized that cost is the biggest barrier to the widespread application of large models in the consumer market; reliability and hallucinations are the biggest barriers to business applications.

Later, more and more open-source models came out. After fine-tuning, open-source models in specific fields are even stronger than GPT-3.5, but the cost is less than one-tenth of GPT-3.5. Making foundational models myself, the performance of the same size is likely not as good as the best open-source models. Therefore, I decided to start a business in AI Infra, bridging the huge gap between large models and applications.

My prediction for 2024 is: Multimodal large models will be able to understand videos in real-time, generate videos containing complex semantics in real-time; open-source large models will reach the level of GPT-4; the inference cost of GPT-3.5 level open-source models will drop to one percent of the GPT-3.5 API, allowing applications to integrate large models without worrying about cost issues; high-end phones will support local large models and automatic App control, making everyone’s life inseparable from large models.

I want to tell myself a year ago: “Large models are very powerful, and many problems remain unsolved, hurry up and get on board.”

I want to ask myself a year from now: “How many users does the product have now? How many GPU cards does the company have?”

In 2023, “Through brambles and thorns, to open up the mountains and forests.”

Last year, as a co-founder, I plunged into the AI 2.0 startup craze, creating an AI-native application company—EasyLink, aimed at building a complete set of efficient and easy-to-use large model application development stacks, supporting the commercial application and implementation of large models.

Over the past year, the day-to-day advancements in large model technology have been gratifying, but they have also left many startup teams feeling frustrated as the directions they initially chose were overturned overnight. Amid these changes and uncertainties, we clarified our positioning in the rapid iteration and implementation process of our products, formed a highly combat-effective team, achieved commercial maiden landings, and completed angel round financing.

Overall, the process was arduous, but small goals have been achieved. We are prepared to open up the mountains and forests for 2024, and the new year will be one of doubling down on our progress.

The Magic Moment of 2023, without a doubt, was last year’s Q4 collaboration with a large city commercial bank. In just over a month, we built and launched a large model native application solution and product, earning customer recognition for the new technology application and attracting attention from the industry. Completing these tasks in such a short time is something we are proud of.

Entrepreneurship is hard, it’s a process of leading a group of like-minded people to continuously climb, leveraging the era and technological changes, steady and solid, the outcome is important, but the process of continuously striving for progress is also beautiful.

Yi Bo, Founder of YiChuang Technology

If I had to describe 2023 in one word, it would be “anxiety”.

On November 30, 2022, when I saw ChatGPT, we realized that the traditional NLP technology route of AI Code we had achieved over the past six years had been folded, so we had to make a quick decision after the Spring Festival in January to fully transition to the large model domain.

In March, we completed our first product ChatBI, but in April, we encountered unclear policies, leading to the product being removed from various platforms.

In May, we turned to developing a large model middleware, PromptOPS, and launched LLMFarm, but then every time OpenAI released a new feature, we faced doubts about whether we were being squeezed out or even folded by them.

It’s often said: a day in the human world is a year in AI, the progress AI itself makes in daily development far exceeds what humans can achieve in a year. In this process, what role does each person and company play? What work can we do? To what extent will AI progress in the future? Will a new capability completely overturn our current efforts tomorrow?

Hesitation occurs every quarter, the first quarter gave up on NLP-AI Code, the second quarter ChatBI was banned, giving up on domestic to C, the third quarter was LLMFarm, Langchain middleware being questioned by OpenAI’s iterations about future value, the fourth quarter was when we learned that GPT-5 would make significant progress.

Every turning point in the middle is about understanding value, living in the present, and continuing to move forward, regardless of how AI progresses, the subjective initiative, creativity, and imagination that humans can play are still not something AI can make up for in a short time. We need to shift from thinking about making Soft to making Service, grasp user value, customer value, the development of LLM will be an aid rather than an iteration.

I want to tell myself a year ago: “Hurry up and stock up on cards, haha, think clearly about the clearest opportunity in an unclear period.”

I want to ask myself a year from now: “What exactly has AI-native application run out?”

You Yang, Founder and Chairman of Lu Chen Technology, Young Professor at the National University of Singapore

If I had to describe 2023 in one word, I think it would be “innovation”.

For example, our lab published papers in several world-leading journals, and my startup broke records multiple times in large model training and inference acceleration, reaching a world-class level.

Last summer, at the ICML top conference, we released our first standardized product, Colossal-AI Platform, which attracted widespread attention from the industrial and scientific research communities. Half a year has passed, this product has gone through multiple iterations, revenue growth has been very fast, and it has served many industries such as healthcare, retail, chips, and supercomputing centers, helping users quickly build large models in the cloud. Looking back, this moment is quite memorable for me and my company, Lu Chen Technology.

The biggest change I saw in AI in 2023 is that everyone is no longer blindly pursuing super-large scale. At the beginning of the year, many companies at home and abroad announced that they would train and release large models of more than a hundred billion; by the end of the year, it was actually many smaller but more capable models that emerged to challenge the status quo. Facing this change, we continuously updated our technology and open-source libraries, and also launched our own all-in-one machine, helping enterprises to train their own large models as efficiently and quickly as making PPTs.

I want to tell myself a year ago: “On the road to success, there will definitely be great uncertainties and risks, just keep going, striving and focusing is enough.”

I want to ask myself a year from now: “In 2024, have we found a better large model architecture than Transformer?”

Yuan Jinhui, Founder of SiliconFlow

If I had to describe 2023 in one word, it would be “roller coaster”.

The company went through multiple acquisitions within a few months, from a hundred million dollar company to a billion dollar company, then to a hundred billion dollar company, and finally splitting off to start anew.

As such, there were too many Magic moments this year, it’s hard to say which one was the most profound.

There was a brief moment of hesitation in 2023, that was when we were acquired from light-years away, wondering where to go, feeling like we were about to miss out on this great era. But even though the waves were towering, the team was still very spirited and confident in steering the ship towards the destination.

In 2024, I predict that the open-source version of GPT-4 and super apps will emerge.

I want to tell myself a year ago: “Never forget why you started, and you can accomplish your mission.”

I want to ask myself a year from now: “Have I grown?”

3. Multimodal: AI-generated images, videos, and 3D

Hu Yuanming, Meshy AI Co-founder & CEO

My keyword for 2023 is “refresh”.

On one hand, it’s about self-refresh, changing the way of thinking, actively trying some new things; on the other hand, my understanding of AI is also being constantly refreshed.

The Magic Moment of 2023, I think, was the release of Meshy-2, both happy and unforgettable.

Three months ago, we launched Meshy-1. It is a generative AI tool that allows 3D content creators to convert text (prompts) and images into 3D models within 1 minute. This time, our new version Meshy-2 greatly improved the quality of text to 3D model generation (Text to 3D), pushing human capability in Text to 3D forward by a small step.

【Video omitted, if interested please move to the original article

Meshy-2’s Text to 3D in terms of modeling design, model details, style control, and user community has achieved unprecedented upgrades. We hope that whether it’s experienced CG practitioners or 3D enthusiasts eager to unleash their creativity, Meshy-2 will become their partner in helping realize their dreams.

In 2024, I look forward to seeing more GenAI products that can achieve PMF.

I want to tell myself a year ago: “Forge ahead“.

I want to ask myself a year from now: “How has Apple Vision Pro developed?”

Liu Yongsheng, Founder and CEO of Hyperparameter Technology

In 2023, the moment that impressed me the most was the conversation program between Lex Fridman (MIT research scientist and podcast host) and Jeff Bezos, in which Bezos had a view: “Large language models are not inventions, they are discoveries.”

He explained that inventing the astronomical telescope was an invention, but observing through the telescope to discover that Jupiter has several “moons” was one of the great discoveries in human history.

Now, whether it’s GPT-4 or Gemini, they were not designed to solve a specific difficult problem. Through them, humanity discovered: as long as there is enough high-quality data and computing power, it is certain that corresponding algorithms can be designed to make computers exhibit intelligence close to, or even far beyond, human intelligence in some aspects.

Its impact is not just a killer app, or an iOS ecosystem, its impact on human society is very profound, and it may take decades or even hundreds of years to see clearly.

In the first half of 2023, the team and I were quite shocked, struggling with whether to do large model pre-training (pre-train) work. Later, some large model teams in China continuously released large model products, although there were surprises, they lacked highlights, and overall were still far from ChatGPT. These teams, compared to us, had better resources and conditions to do pre-train, what made us think we could perform better?

We underestimated the difficulty of doing pre-train, and overestimated our own differentiation capability. Understanding these things made everything clear.

Qing Gan, Founder & CEO of Tiamat

My keyword for 2023 should be “no pain, no gain”.

A lot happened over the past year, and I found that there’s a big difference between running a business and doing a task, which was a big growth and challenge for me. But overall, whether for the team or for myself, it was no pain, no gain.

In 2023, actually, compared to changes, I paid more attention to what remains unchanged. The AI industry changes every day, but what remains unchanged is worth more thought.

In 2024, I hope the model can make further progress, the combination of AI technology and products is closer, and more closely related to the real needs of users.

If I had the chance to say something to myself a year ago, I would say: “There are no shortcuts and illusions.”

Tang Jiayu, CEO of Shengshu Technology

In 2023, I’ve always felt like “struggling to balance and speed on the crest of a wave”.

At the end of 2022, the birth of ChatGPT was like a tidal wave, “stirring up thousands of waves with one stone”. We had to grasp new trends and changes at the first time, make flexible and quick decisions and adjustments, just like speeding on the crest of a wave, striving to maintain balance while moving forward, and always being wary of being overturned by the following waves, where opportunities and challenges coexist.

This year’s Magic Moment was on March 15th in the early morning when GPT-4 was released, seeing the report on its image understanding capabilities. The large model could recognize and reason, could get the jokes in various funny pictures, the first time I saw it, I still thought it was very impressive.

2023 was not too confusing for me, as I have always internally recognized the long-term mission of “enhancing the creativity and productivity of all mankind” and have been firm in the direction of multimodal large models. Once you have a “lighthouse” in your heart, even when facing various market and technology shocks, you can still keep peace in your mind, after all, a truly valuable thing will not be easily realized.

In 2023, the fields of images, 3D, videos, and other multimodal areas were still in the stage of technical exploration, with much room for improvement in quality and controllability. But in 2024, multimodal will welcome a major breakthrough.

I want to say to myself a year ago: “Be braver to eliminate noise, trust your own cognition and judgment, and focus on the investment of the entire team.”

I want to ask myself a year from now: “Have you practiced the values you recognize well, and have you achieved preliminary satisfactory results in helping unleash user creativity?”

Tang Yong, Founder & CEO of Li Bai AI Lab

The keyword for 2023 is “Leap Forward”.

Against the backdrop of breakthroughs in generative AI technologies represented by ChatGPT and Stable Diffusion, Li Bai Lab’s visual AI platform cutout.pro and generative AI creative design platform promeai.com topped the A16z rankings Top20. Both our user base and revenue grew rapidly.

The Magic Moment of this year was in November 2023, watching “Postcards from Earth” in Las Vegas, the visual impact of the 160,000 square feet surround LED screen made people believe the world can be simulated.

2023 was not a year of wandering, but more of excitement, repeatedly validating that the direction of artificial intelligence we have identified and adhered to since 2018 is correct.

I want to say to myself a year ago: “Keep curious and keep learning.”

I want to ask myself a year from now: “Have you brought more value to more people, and how can you do better?”

Wang Changhu, Founder, CEO of Aishi Technology

The keyword for 2023 is “Exploration”.

I started my business in 2023, working on AI video generation large models and applications. “Exploration” summarizes my experience and state during the entrepreneurial process in 2023, representing not only my courage and curiosity in the fields of AI technology and business but also a test of my own abilities, endurance, and spirit of innovation.

The Magic Moment of this year was the moment when the first video was generated on the Aishi internal creation platform. It was a cute deer with a small action, short duration, and not enough clarity, but it was our first step, unforgettable.

[Video omitted, if interested please go to the original article]

Just half a year later, many creators used our product PixVerse to produce stunning “blockbusters”, such as the short film “Last Mission” produced by AI artist Ameli Caotica, which was very exciting.

[Video omitted, if interested please go to the original article]

In the past, AI was often seen as a tool or service to achieve specific functions and tasks. In 2023, with the development of AGI, I realized more deeply that AI is alive. It’s more like a partner, can communicate with you, help you solve problems, can inspire your creativity, help you complete tasks that were impossible before. Now AI is still a baby, still has many shortcomings, but it is learning and growing rapidly. Starting in 2023, humans will coexist with AI.

The wave of deep learning started with AlexNet’s fame in the ImageNet challenge in 2012. Classification, detection, segmentation, GAN, and later self/weakly supervised learning, all contain milestones in the development of the computer vision field. Whether for individuals or enterprises, it has always been through the combination of technology and application that one can have a place.

Our technical department colleague Beibei also has deep feelings about this entrepreneurial journey: “I used to follow up, reproduce, experiment, and implement in an orderly manner. But the moment Stable Diffusion came out, everything changed, the previous pace of follow-up seemed not to work, becoming overwhelmed and anxious. But at the same time, I also felt an unprecedented impulse, not wanting to be just an observer, wanting to be a participant, even a creator, a leader.”

In 2024, what I look forward to the most is the ChatGPT moment for AI video generation. We will give it our all.

I want to say to myself a year ago: “Keep patient, have confidence in your vision, every challenge is an opportunity for growth.”

I want to ask myself a year from now: “In the past year, what decisions or changes have you made that brought you closer to your dream?”

Yang Hai, COO of Aochuang Lightyear

The keyword for 2023, I want to describe it as “Sticking to the Original Intention”.

At the beginning of 2023, how to balance the challenges between technological R&D and commercial applications, I also felt confused. It’s not easy to find your own market scenario, and it’s easy to take various detours.

The sudden clarity occurred in mid-2023, after a period of dialectical thinking and internal discussion, we decided to focus on the direction we established at the beginning, which is “to solve marketing problems with the upgrade of AI technology”. Vertically, we will invest more energy in deeply understanding customers, studying their needs and pain points; horizontally, we will combine these needs and pain points with technological innovation.

The most impressive thing this year was the cooperation we reached with a group in the multi-dimensional categories of home furnishing, home cleaning, department stores, etc., for batch production and optimization of pictures. Through the pre-trained video remix model, and after fine-tuning the model according to the platform and merchant needs, Aochuang Lightyear Mogic Copilot can achieve a daily production scale of 100,000 videos.

At that time, our entire team felt very excited, on the one hand, marveling at the improvement of AI on existing marketing productivity, and on the other hand, being surprised by the quality of AI-generated images. Most importantly, such cooperation made us and the clients form a team, rather than just a simple client-service provider relationship, serving together for common indicators such as good product rate, qualification rate, CTR (click-through rate), etc., creating a sense of team.

In 2023, we served more than 200 clients, most of which are international first-line brands/groups.

I want to say to myself a year ago: “Thank yourself for having the courage to ‘get involved’, and I’m also glad that my vision was not bad, choosing the AI track.”

I want to ask myself a year from now: “Has the video technology been commercialized in our company?”

Zhang Qixuan, CTO of Yingmou Technology

The state of 2023 is “🤯Boom!”, giving us too much imagination for AI technology!

The Magic Moment of this year was in August, attending SIGGRAPH (the top conference on computer graphics and interactive techniques) in Los Angeles, coinciding with the 50th anniversary of SIGGRAPH, meeting many foundational figures in the field of graphics, and even becoming the first Chinese team to be shortlisted for the Real-time Live event, and even catching a glimpse of NVIDIA’s Jensen Huang at the venue.

In 2023, technology developed rapidly, and many technologies showed great potential in a short period of time. The biggest test, rather than hesitation, is wavering, choosing some less long-term directions. We faced such choices in 2023, which was also our transition from 3D character generation to further 3D generation. At this time, it is necessary to be able to let go of the baggage of previous technology accumulation, embrace new changes, and at the same time, hold on to the company’s original intention.

Over the past year, we polished and launched the 3D character generation platform ChatAvatar, and the biggest insight in the product iteration process is, AI may not be as important as the product itself, the top AI is to make users not feel the presence of AI.

In January 2024, we officially completed the training of the Rodin Gen-1 3D generation large model, looking forward to successful productization!

I want to say to myself a year ago: “Believe in yourself and the team, we will have a technological breakthrough a year later!”

I want to ask myself a year from now: “Did Apple Vision Pro make it, what’s the Killer App?”

4. Industry Vertical Scenarios + AI

Han Qing, Co-founder and CEO of Kyligence

The keyword for 2023 is “Momentum”.

In 2023, we explored a pragmatic path to introduce large models into existing big data platforms to increase product strength, which received good feedback in the market, and also made us more and more clear about the future development and trend of AI + Data, and firmly believe in “going with the flow”.

The Magic Moment of this year was on July 14, the company’s user conference, the moment our AI Copilot completed the Live Demo. The whole performance went smoothly, without any problems, and AI’s answers were also very smooth.

At the beginning of 2023, I proposed Kyligence’s three-point strategy for AI in the company’s internal letter:

  • It is not our game——The large model itself is not what we are good at and need to participate in, we believe that the iteration of technology will reduce costs and barriers, and eventually can be used in our products to enhance our differentiation;

  • Be part of the game——But we need to actively participate and learn, to quickly integrate our products and business into AI-related, we believe AI will bring huge changes, especially in business, customers will definitely invest heavily in AI;

  • Build our own game——We must find what suits us, fully release the scenarios and capabilities we have accumulated over the past few years, and provide customers with products and services that combine our advantages

I want to say to myself a year ago: “Embrace AI could have started earlier.”

I want to ask myself a year from now: “How’s the business doing?”

He Wanyu, Founder and CEO of Xiaoku Technology

The keyword for 2023 is “Resilience”.

As a construction technology company, Xiaoku faced a significant turbulence cycle in the real estate industry over the past year, and there were changes in internal organization and corporate strategy. Looking back, these events, big and small, good and bad, for me personally and for the team, all required this word to get through the cycle.

The Magic Moment of 2023 was on November 29, when the number of registrations for our overseas product surged, 420% of the average level since the product was launched in July, and has since maintained a high growth state, receiving attention and widespread recognition from professional designers, developers, and other vertical users in different countries and languages worldwide.

For the traditionally stagnant construction industry, 2023 was a year of struggle, as well as a year of explosive interest in AI technology and digital transformation. Xiao Ku Technology’s years of technical accumulation in industry applications, such as AI Cloud and Design Cloud AI products, became more widely known as the industry paid more attention.

In 2024, the construction industry will begin to form new workflows, and the emergence of individual super entities will become industry benchmarks, breaking through the traditional shackles centered on manpower.

I want to tell myself a year ago: “Good things will continue to happen, often all it takes is a change of perspective to discover alternative possibilities.”

I want to ask myself a year after: “What progress have you made after experiencing another year of AI’s wild growth? How would you avoid the pitfalls when encountering similar situations again?”

Li Guanghua, Co-founder of LanguageX

The keyword for 2023 is “Fast Forward”.

Information explosion, overflowing article to-read lists, too much content to keep up with; in my field of AI translation, the plan was to leverage AI+Human in the loop to reduce the cost of language services by tenfold and increase the efficiency of cross-information flow by tenfold. The progress has been at least three years ahead of schedule.

This year’s Magic Moment was writing an article about an OpenAI event, which was recommended by the official WeChat for enterprises and seen by a friend who I hadn’t contacted for years.

In 2023, I initially overestimated the intelligence of generative AI, so I was more concerned about AI safety. Currently, my view is that GPT-5 or multimodal will not bring AGI or superintelligence, because the essence of human knowledge in public internet text data has been exhausted by current LLMs, and simply adding multimodal or private domain data will not bring qualitative change. However, we might underestimate the productivity revolution brought by multi-model, Agent-like / GPTs collaboration.

My predictions for 2024 are threefold: the intelligence of foundational models will peak, GPT-5 will not be astonishing and is likely to disappoint; B-side scenarios: real business implementation brought by multi-model, RAG (Retrieval Augmented Generation), Agent-like / GPTs collaboration; AI-generated videos will make greater progress, and multimodal related C-side Killer applications will emerge.

I want to tell myself a year ago: “Action generates cognition, and giving up halfway can also bring new insights.”

I want to ask myself a year after: “What actions have you taken in the direction you are most optimistic about?”

Li Yisong, Head of Intelligent Collaboration at DingTalk

The keyword for 2023 is “Excitement”, as every AI practitioner’s understanding, application, thinking, and practice of LLM are iterating daily.

This year we began to focus on how to improve model effects, on one hand, by making tasks closer to the model through Prompt engineering, and on the other hand, by making the model more accommodating to business scenarios through supervised fine-tuning (SFT); this year, the development paradigm also changed, “vector search” + “intent recognition” + “plug-in model” deeply integrated LLM with business systems, achieving the transformation from GUI to LUI; this year, we discovered that RAG not only can improve model effects, reduce model hallucinations, but also can connect to user’s private domain data, achieving intelligent Q&A for enterprise knowledge, intelligent creation with private domain business background knowledge, and even industry-specific model implementation.

This year we explored AI Agent, which can systematically perceive the environment, understand and make decisions, and then perform intelligent creation, intelligent Q&A, or call certain capabilities of business systems; this year, more than 20 product lines of DingTalk fully integrated large models, thus creating DingTalk AI super assistant.

2023 was fresh, thoughtful, busy, and fulfilling every day for me. This year, tirelessly, was truly “exciting”!

One detail left a deep impression on me. Months ago, one evening, we wrote some content in a document for testing, told the AI assistant “Help me change all the second-level headings in the document to third-level headings”, “Open the double-line toolbar”, “Change all ‘intelligent’ in the document to red”, “Make the body text size a bit larger”, when these commands were debugged and effective, at that moment, I knew a truly intelligent era had arrived.

I want to tell myself a year ago: “Hi, you could have entered the large model business earlier, faster, and more diligently, adding more fuel to this brand new intelligent era.”

I want to ask myself a year after: “Hi, what should I do in 2024 to significantly and truly improve work efficiency for more industries and more people?”

Shen Bowen, Product Architecture Lead at Feishu

The keyword for 2023 is “Change”.

Because my way of working, consuming content, and even tutoring my child at home has changed due to AI.

This year’s Magic Moment was when I described a scene in my mind in an AI product, and it created a song for me, with great lyrics and melody. This made me realize the infinite possibilities of this technology.

After the emergence of large models, our AI product development is no longer about delivering certainty, but more about delivering possibilities (i.e., probabilities). Therefore, the previous product design methods, acceptance methods, are gradually changing. Human imagination and the quantitative assessment design of imagination become more important.

In 2023, amidst rounds of technological shocks, I also had moments of hesitation, and the way to clear my head was to involve myself more. AI is relatively cheaper to get into compared to the VR wave, which required buying a lot of equipment.

In 2024, models with stronger multimodal capabilities will emerge. Looking forward to seeing new products that can change some group work methods, making work easier, of course, preferably ones made by myself.

I want to tell myself a year ago: “Be more firm in doing what you think is right.”

I want to ask myself a year after: “What do you think were the best and worst decisions you made in 2024?”

Shi Tianfang, Founder of Muse Camera / ChatMind

The keyword for 2023 is “Rapid Trial and Error”.

It’s very important to quickly eliminate cognitive misconceptions. Many things are hard to understand deeply without trying them personally, and it’s truly painful to fall into a real opportunity.

This year’s Magic Moment was the birth of ChatMind on March 7. The night before, in the school library (Shi Tianfang was born in 1999 and hadn’t graduated yet), I saw a team from Peking University created ChatExcel, and I was wondering if there could be other forms of products that could emerge. Then I went through all the information formats (text and file formats) that GPT could combine with, and found that mind mapping hadn’t been done by anyone domestically or internationally, while it’s also a great form of visual content. I first shared the idea with a few friends, asking if they wanted to join me. Some said it was already too late, others said they didn’t have the time, so I had to do it myself, and I made it in one night.

ChatMind developed very smoothly and has become synonymous with AI mind mapping abroad. Two months later, I talked with the CEO of XMind, Sun Fang, for one night, and we settled on the acquisition.

After ChatMind was acquired, I worked on seven or eight AI projects, but none were successful. After resting for more than a month, I did a deep review, and my conclusion was: “Eliminate false problems and noise.”

There are many problems generated by users, and how to eliminate false problems is very important, otherwise, a lot of time will be wasted on meaningless innovation and work, only to find out that users don’t need or care about it at all. It’s important to be result-oriented, not process-oriented; to think simply, not too deeply, too complexly, or too finely; to quickly identify paradoxes, products that fundamentally don’t exist are not worth the time.

In 2024, I feel that non-AI products might emerge, rather than AI products, which might not come out until 2025.

I want to tell myself a year ago: “Be firm in doing things that definitely exist but haven’t been done by others, firmly seize a good opportunity to maximize it, and don’t waste a minute on things that don’t exist.”

I want to ask myself a year after: “What’s next?”

Tu Cunchao, CEO of MiLu Intelligence

The keyword for 2023 is “Walking a Tightrope”.

The company faced huge financial pressure, constantly seeking funds; at the same time, large models brought new opportunities to the industry and the company’s business, whether or not to seize this opportunity is key to the company’s survival. So, in 2023, we were walking on the edge of life and death all year long.

This year’s Magic Moment was receiving a text message the night before payday that the investment funds had arrived, finally allowing me to sleep well.

In 2024, I predict that open-source models comparable to the current best proprietary models will emerge.

I want to tell myself a year ago: “Seize the opportunity of large models.”

I want to ask myself a year after: “Have domestic large models and open-source large models caught up with GPT-4?”

Wang Zhe, Co-founder of Tezign Technology

If one word could describe 2023: “It’s time to build.”

In the past 5 years or even longer, capital has spawned many things and ignited people’s entrepreneurial enthusiasm, making everyone feel capable of doing one more thing. This has led to rapid personnel mobility and shifts in hot topics, which may be friendly to startups but not necessarily conducive to creating outstanding products. Therefore, in 2023, at this time, at this stage, the best way for companies is to start building.

Last May, Tezign launched the first “Digital Design: AIGC Builders and Creators Conference”, linking 50 AIGC content technology field co-builders, inviting 200+ guest speakers to create 100+ non-stop content feasts throughout the day, building a “maximum two-way interaction” stage for AIGC builders (Builders) and creators (Creators), attracting millions of people’s attention. Many interesting discussions about AIGC were generated at this conference, and we are happy that some of these discussions have turned into grounded projects.

In 2023, I was more excited than anxious. In the history of human creativity, every technological development first generated some panic, then huge opportunities, and eventually, the opportunities outweighed the panic.

For example, when the camera was first invented, many painters began to worry about unemployment because cameras could always present more realistically and efficiently than painters. But then came Impressionism, Post-Impressionism, Abstract, Formalism, Contemporary Art, and even the likeness became unimportant because Installation Art emerged, opening new doors for artistic creation. So, I’m looking forward to the various possibilities brought by this round of technology.

In 2024, I will continue to focus on the connection and boundaries between large models and applications. Last year, it was gratifying that leading companies in the industry, especially those outside the internet sector, began to lean towards building their own AI platforms, a trend that developed faster than we had previously anticipated. Therefore, in 2024, the business space based on AI platforms will also be very broad.

I want to ask myself a year later: “Which small assistants have I used AI to help myself make money with?”

5. AI Academic Research

Luo Hongyin, Postdoctoral Research Fellow at the Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology

The keyword that suited me in 2023 is “Letting go”.

The announcement of ChatGPT threw my thoughts back to 2016. Like most new PhD students at the time, I came to MIT with vague hopes that evolved into anxiety. Due to some “misunderstandings,” I joined a speech recognition group that was not quite aligned with my research direction (NLP).

In the summer of 2016, my advisor and I talked about his original intention for doing research, helping me find a research direction: he hoped that during my PhD, I would design an AI system that uses voice as an interface, capable of understanding and generating natural language, and fluently discussing many topics with humans, aiming to be more natural than Alexa and smoother than Siri.

At that time, I naively thought that voice and conversation were application layers of language models: with the extremely limited capabilities of language models at the time, it seemed there was no reason not to delve into language models and instead directly start working on Chatbots. This doubt made training and evaluating language models my comfort zone, while evaluating and tuning various downstream tasks was something I did not desire.

My advisor never explicitly agreed or disagreed, and my PhD thesis eventually included many NLP application tasks, but this doubt that arose in my first year of PhD remained unresolved until the moment ChatGPT was released.

Recalling these on the day ChatGPT was launched, for the first time, I had regrets about my academic career: I hadn’t used my PhD thesis to answer the questions that puzzled me. But this regret was let go as 2023 passed: The questions that you care about deeply but no one knows the answers to might just be the best arrangement. In the grand first year of the third generation of AI, this thought often made me feel genuinely calm and peaceful.

This year, the most magical moment for me was one day when the content shared in the family group was no longer “Top Ten Health Tips for Middle-aged People,” but “Top Ten Trends in AI Development in 2024”.

In each era, such as 1860, 1960, 2060, their AGI is different, but I believe that programming ability will be the most important AGI ability in the 21st century.

For 2024, I hope to make significant breakthroughs in AI programming.

(Cover image and illustrations are courtesy of the interviewee)

Comments

2024-02-07
  1. 1. Foundational Large Models
    1. Chen Hongyang, Deputy Director of Research at Zhejiang Lab’s Data Hub and Security Research Center, Head of the Large Model Team
    2. Li Zhifei, Founder and CEO of Mobvoi
    3. Luo Xuan, Co-founder of Yuan Intelligence (RWKV)
    4. Wang Shijin, Vice President of iFLYTEK, Executive Deputy Dean of iFLYTEK AI Research Institute
    5. Yan Shuicheng, Co-CEO of Tengong Intelligence and Dean of Kunlun Wanwei 2050 Global Research Institute
    6. Zeng Guoyang, CTO of Wall Intelligence
    7. Zhang Jiajun, Researcher & PhD Supervisor at the Institute of Automation, Chinese Academy of Sciences, and Vice President of Wuhan Institute of Artificial Intelligence
    8. Zhang Peng, CEO of Zhipu AI
  2. 2. AI Infra
    1. Gao Xuefeng, Founder and CEO of Fabarta
    2. Guo Rentong, Partner and Product Director at Zilliz
    3. Huang Dongxu, Co-founder and CTO of PingCap
    4. Li Bojie, Co-founder of LogenicAI, Huawei “Genius Youth”
    5. Lu Chao, Co-founder and CTO of EasyLink
    6. Yi Bo, Founder of YiChuang Technology
    7. You Yang, Founder and Chairman of Lu Chen Technology, Young Professor at the National University of Singapore
    8. Yuan Jinhui, Founder of SiliconFlow
  3. 3. Multimodal: AI-generated images, videos, and 3D
    1. Hu Yuanming, Meshy AI Co-founder & CEO
    2. Liu Yongsheng, Founder and CEO of Hyperparameter Technology
    3. Qing Gan, Founder & CEO of Tiamat
    4. Tang Jiayu, CEO of Shengshu Technology
    5. Tang Yong, Founder & CEO of Li Bai AI Lab
    6. Wang Changhu, Founder, CEO of Aishi Technology
    7. Yang Hai, COO of Aochuang Lightyear
    8. Zhang Qixuan, CTO of Yingmou Technology
  4. 4. Industry Vertical Scenarios + AI
    1. Han Qing, Co-founder and CEO of Kyligence
    2. He Wanyu, Founder and CEO of Xiaoku Technology
    3. Li Guanghua, Co-founder of LanguageX
    4. Li Yisong, Head of Intelligent Collaboration at DingTalk
    5. Shen Bowen, Product Architecture Lead at Feishu
    6. Shi Tianfang, Founder of Muse Camera / ChatMind
    7. Tu Cunchao, CEO of MiLu Intelligence
    8. Wang Zhe, Co-founder of Tezign Technology
  5. 5. AI Academic Research
    1. Luo Hongyin, Postdoctoral Research Fellow at the Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology