Bojie Li
2024-04-22
A month ago, the interview video by Zhihu “New Figures” was finally released. It was my first time participating in such an interview that included aspects of personal life, and it definitely wasn’t a company PR, as the name and products of our company were never mentioned throughout the entire session, and few people even know the real name of our company.
It seems that Zhihu still maintains journalistic integrity, as they did not let me view the video before publishing it; all editing, titles, and voice-overs were done by the Zhihu editors.
(04:16, 215 MB)
Video shooting locations:
- Beijing office
- Home (interview, cooking with my wife, and some photos)
- Shucun Suburban Park (a place where I often run, the flying electric butterfly was made by me in 2017, it got caught in a tree during the shooting, and our very capable photographer climbed up the tree to retrieve it)
2024-04-17
Produced by | Sohu Technology
Author | Liang Changjun
“Every day from 9 AM to 3 PM, I have meetings with foreign teams for remote development, internal testing, or bug fixing.” Entrepreneur Li Bojie, who is about to launch an AI product overseas, has been exceptionally busy recently.
This is a C-end AI evaluation product that helps users recommend different AI models or products. He hopes to make this product the “TikTok of the large model era.”
More than a year ago, when Li Bojie decided to leave Huawei to start his own business, he aimed to enter the overseas market. At that time, domestic large models were still in the stage of fierce technical competition, but now more and more companies are choosing the same direction as him.
Whether it’s ByteDance, Baidu, Alibaba, or large model unicorns like MiniMax, Dark Side of the Moon, and Zero One Everything, they are all accelerating their overseas expansion to tap into the global market.
Many companies are quietly making a fortune. Sohu Technology has learned that several overseas products have achieved rapid growth in users and revenue, and some have even started to become profitable. Some products have seen a surge in traffic with AI support, and are expected to achieve profits of 70 to 80 million yuan this year.
In the mobile internet era, Chinese companies went overseas and created TikTok. Now everyone is trying to create the TikTok of the AI era. This is a huge opportunity, but also full of challenges.
2024-04-15
(This article was first published on Zhihu answer: “How to develop research taste in the field of computer systems?”)
In the blink of an eye, it’s been nearly 10 years since I graduated from USTC. Yesterday, while discussing with my wife the recent developments of our classmates in the USTC systems circle, I realized that research taste is the most critical factor in determining academic outcomes. The second key factor is hands-on ability.
What is research taste? I believe that research taste is about identifying influential future research directions and topics.
Many students are technically strong, meaning they have strong hands-on skills and system implementation abilities, but still fail to produce influential research outcomes. The main reason is poor research taste, choosing research directions that either merely chase trends without original thought or are too niche to attract attention.
PhD Students’ Research Taste Depends on Their Advisors
I believe that research taste initially depends heavily on the advisor, and later on one’s own vision.
2024-04-14
(This article was first published on Zhihu Answer: “What are the current benchmarks for evaluating large language models?”)
We must praise our co-founder @SIY.Z for Chatbot Arena!
Chatbot Arena is a community-based evaluation benchmark for large models. Since its launch a year ago, Chatbot Arena has received over 650,000 valid user votes.
Chatbot Arena Witnesses the Rapid Evolution of Large Models
In the past month, we have witnessed several very interesting events on Chatbot Arena:
- Anthropic’s release of Claude-3, with its large Opus model surpassing GPT-4-Turbo, and its medium Sonnet and small Haiku models matching the performance of GPT-4. This marks the first time a company other than OpenAI has taken the top spot on the leaderboard. Anthropic’s valuation has reached $20B, closely approaching OpenAI’s $80B. OpenAI should feel a bit threatened.
- Cohere released the strongest open-source model to date, Command R+, with a 104B model matching the performance of GPT-4, although still behind GPT-4-Turbo. Earlier this year, I mentioned the four major trends for large models in 2024 during an interview with Jiazi Guangnian (“AI One Day, Human One Year: My Year with AI | Jiazi Guangnian”): “Multimodal large models capable of real-time video understanding and generating videos with complex semantics; open-source large models reaching GPT-4 level; the inference cost of GPT-3.5 level open-source models dropping to one percent of the GPT-3.5 API, making it cost-effective to integrate large models; high-end smartphones supporting local large models and automatic app operation, making everyone’s life dependent on large models.” The first is Sora, the second is Command R+, both have come true. I still hold this view, if a company mainly focused on foundational models cannot train a GPT-4 by 2024, they should stop trying, wasting a lot of computing power, and not even matching open-source models.
- Tongyi Qianwen released a 32B open-source model, almost reaching the top 10, performing well in both Chinese and English. The cost-effectiveness of the 32B model is still very strong.
- OpenAI was surpassed by Anthropic’s Claude Opus, and naturally, they did not show weakness, immediately releasing GPT-4-Turbo-2024-04-09, reclaiming the top spot on the leaderboard. However, OpenAI has been slow to release GPT-4.5 or GPT-5, and the much-anticipated multimodal model has not yet appeared, which is somewhat disappointing.
2024-04-07
This video is an interview with me by the Bilibili uploader “Apple Bubbles”, original video link
The entire interview lasted half an hour, recorded in one take, with no edits except for the intro added by the uploader, and no prepared answers to the questions.
(27:07, 136 MB)
2024-03-29
(The full text is about 40,000 words, mainly from a 2-hour report at the USTC Alumni AI Salon on December 21, 2023, and is a technical extended version of the 15-minute report at the Zhihu AI Pioneers Salon on January 6, 2024. The article has been organized and expanded by the author.)
- Should AI Agents Be More Entertaining or More Useful: Slides PDF
- Should AI Agents Be More Entertaining or More Useful: Slides PPTX
I am honored to share some of my thoughts on AI Agents at the USTC Alumni AI Salon. I am Li Bojie, from the 2010 Science Experimental Class, and I pursued a joint PhD at USTC and Microsoft Research Asia from 2014 to 2019. From 2019 to 2023, I was part of the first cohort of Huawei’s Genius Youth. Today, I am working on AI Agent startups with a group of USTC alumni.
Today is the seventh day since the passing of Professor Tang Xiaou, so I specially set today’s PPT to a black background, which is also my first time using a black background for a presentation. I also hope that as AI technology develops, everyone can have their own digital avatar in the future, achieving eternal life in the digital world, where life is no longer limited and there is no more sorrow from separation.
AI: Entertaining and Useful
The development of AI has always had two directions, one is entertaining AI, which is more human-like, and the other is useful AI, which is more tool-like.
Should AI be more like humans or more like tools? Actually, there is a lot of controversy about this. For example, Sam Altman, CEO of OpenAI, said that AI should be a tool, not a life form. However, many sci-fi movies depict AI that is more human-like, such as Samantha in Her, Tu Ya Ya in The Wandering Earth 2, Ash in Black Mirror, so we hope to bring these sci-fi scenarios to reality. Only a few sci-fi movies feature tool-like AI, such as Jarvis in Iron Man.
Besides the horizontal dimension of entertaining and useful, there is another vertical dimension, which is fast thinking and slow thinking. This is a concept from neuroscience, from the book “Thinking, Fast and Slow,” which says that human thinking can be divided into fast thinking and slow thinking.
Fast thinking refers to basic visual and auditory perception abilities and expressive abilities like speaking that do not require deliberate thought, like ChatGPT, stable diffusion. These are tool-like fast thinking AIs that respond to specific questions and do not initiate interaction unless prompted. Whereas Character AI, Inflection Pi, and Talkie (Hoshino) simulate conversations with a person or anime game character, these conversations do not involve solving complex tasks and lack long-term memory, thus they are only suitable for casual chats and cannot help solve problems in life and work like Samantha in Her.
Slow thinking refers to stateful complex thinking, which involves planning and solving complex problems, determining what to do first and what to do next. For example, MetaGPT writing code simulates the division of labor in a software development team, and AutoGPT breaks down a complex task into many stages to complete step by step. Although these systems still have many practical issues, they already represent a nascent form of slow thinking capability.
Unfortunately, there are almost no products in the first quadrant that combine slow thinking with human-like attributes. Stanford AI Town is a notable academic attempt, but there is no real human interaction in Stanford AI Town, and the AI Agent’s daily schedule is pre-arranged, so it is not very interesting.
Interestingly, most of the AI in sci-fi movies actually falls into this first quadrant. Therefore, this is the current gap between AI Agents and human dreams. Therefore, what we are doing is exactly the opposite of what Sam Altman said; we hope to make AI more human-like while also capable of slow thinking, eventually evolving into a digital life form.
2024-02-25
Since December 2023, I have been working as a corporate mentor in collaboration with Professor Junming Liu from USTC on an AI Agent practical project, with about 80 students from across the country participating. Most of them are undergraduates with only basic programming skills, along with some doctoral and master’s students with a foundation in AI.
In December 2023 and January 2024, we held 6 group meetings to explain the basics of AI Agents, how to use the OpenAI API, this AI Agent practical project, and to answer questions students had during the practice. The practical project includes:
- Corporate ERP Assistant
- Werewolf
- Intelligent Data Collection
- Mobile Voice Assistant
- Meeting Assistant
- Old Friends Reunion
- Undercover
From February 20-24, some students participating in this research project gathered in Beijing for a Hackathon and presented the interim results of their projects. Participants generally felt the power of large models, surprised that such complex functions could be achieved with just a few hundred lines of code. Below are some of the project outcomes:
2024-02-22
Recently, Groq’s inference chips have made headlines with their large model output speed of 500 tokens/s.
In a nutshell, this chip plays a trick of trading space for time, storing both model weights and intermediate data in SRAM, instead of HBM or DRAM.
This is something I did 8 years ago at Microsoft Asia Research Institute (MSRA), suitable for the neural networks of that time, but really not suitable for today’s large models. Because large models based on Transformers require a lot of memory to store the KV Cache.
Although Groq’s chips have a very fast output speed, due to the limited memory size, the batch size cannot be very large. If we calculate the cost-effectiveness in terms of $/token, it may not be competitive.
Groq needs a cluster of hundreds of cards to run the LLaMA-2 70B model
2024-02-20
My Early Encounters with AI
Meeting AI During My PhD
Originally, my PhD research was focused on networks and systems, with my dissertation titled “High-Performance Data Center Systems Based on Programmable Network Cards“. Many in the field of networks and systems look down upon some AI research, claiming that AI papers are easy to “water down” and that with just an idea, a paper can be published in one or two months. In contrast, top conference papers in networks and systems often require a significant amount of work, taking as long as a year to complete.
Aside from the AI courses I took in school, my first serious AI-related project was in 2016, using FPGA to accelerate neural networks in Bing Ranking. That period was the previous wave of AI hype, and the so-called “four AI dragons” of today all started during that time.
Microsoft deployed FPGAs on a large scale in data centers not only for network virtualization but also for an important piece of neural network inference acceleration. At that time, we also used pipeline parallelism to store all the neural network weights on the FPGA’s SRAM, achieving super-linear acceleration. This story is described in more detail in the section “Exploration of Machine Learning Accelerators” in “Five Years of PhD at MSRA — Leading My First SOSP Paper“.
At that time, many people working in networks and systems didn’t understand AI, nor did they care to understand it, unable to distinguish between training and inference, or forward and backward operators. By optimizing these operators, I at least understood how basic feedforward neural networks (FFNN) work. However, I didn’t get involved in business applications or tinker with my own models.
2024-02-16
A joke is circulating among investors today: “I can finally get a good night’s sleep, because I no longer have to worry about the video generation companies I’ve invested in being overtaken by others.”
Last month, during an interview with Jiazi Light Year, “AI one day, human world one year: My 2023 with AI | Jiazi Light Year,” I predicted the four major trends of 2024, the first of which was video generation. I didn’t expect it to come true so quickly. (Of course, the videos generated by Sora currently do not contain complex semantics, and it cannot generate in real-time, so there’s still a chance for others)
- Multimodal large models can understand videos in real-time and generate videos containing complex semantics in real-time;
- Open-source large models reach the level of GPT-4;
- The inference cost of GPT-3.5 level open-source models drops to one percent of the GPT-3.5 API, alleviating cost concerns when integrating large models into applications;
- High-end phones support local large models and automatic App operations, making everyone’s life inseparable from large models.
Video Generation Models as World Simulators
The title of OpenAI’s technical report is also very meaningful: Video generation models as world simulators. (Video generation models as world simulators)
The last sentence of the technical report is also well written: We believe that the capabilities demonstrated by Sora today indicate that the continuous expansion of video models is a hopeful path to powerful simulators, capable of simulating the physical world, the digital world, and the objects, animals, and people living in these worlds.
In fact, as early as 2016, OpenAI explicitly stated that generative models are the most promising direction for computers to understand the world. They even quoted physicist Feynman’s words: What I cannot create, I do not understand. (What I cannot create, I do not understand)