Page 2 | Bojie Li

2024-05-05

Are Highly Ambitious Men Suitable as Life Partners?

(This article is my Zhihu answer to “Are highly ambitious men suitable as life partners?”)

After starting my own business, I met many entrepreneurs, most of whom are highly ambitious men.

I discovered an interesting phenomenon: These entrepreneurs have a significantly higher single rate compared to their peers. Moreover, the stability of their marriages is also lower than that of their peers.

High Single Rate

In the fields of AI, mobile internet, and Web3, successful co-founders of startups are generally worth at least a small fortune; even those whose startups didn’t succeed have very impressive resumes, such as graduating from prestigious schools, holding high positions in big companies, and having various titles and awards. They certainly don’t have trouble finding good partners. So why is the single rate so high and the marriage stability so low?

The core reason is that highly ambitious men spend most of their time and interest on their careers, investing relatively little in life, emotions, and family.

2024-04-22

Zhihu "New Figures" Interview: The AGI Belief of Huawei Top Talent

A month ago, the interview video by Zhihu “New Figures” was finally released. It was my first time participating in such an interview that included aspects of personal life, and it definitely wasn’t a company PR, as the name and products of our company were never mentioned throughout the entire session, and few people even know the real name of our company.

It seems that Zhihu still maintains journalistic integrity, as they did not let me view the video before publishing it; all editing, titles, and voice-overs were done by the Zhihu editors.

(04:16, 215 MB)

Video shooting locations:

Beijing office
Home (interview, cooking with my wife, and some photos)
Shucun Suburban Park (a place where I often run, the flying electric butterfly was made by me in 2017, it got caught in a tree during the shooting, and our very capable photographer climbed up the tree to retrieve it)

2024-04-17

Is It Difficult to Develop Large Models Domestically? ByteDance, Baidu, and Unicorns Compete Overseas, Who Will Profit First from Over 70 AI Products?

Source: Sohu Technology Interview “Is It Difficult to Develop Large Models Domestically? ByteDance, Baidu, and Unicorns Compete Overseas, Who Will Profit First from Over 70 AI Products?”

Produced by | Sohu Technology

Author | Liang Changjun

“Every day from 9 AM to 3 PM, I have meetings with foreign teams for remote development, internal testing, or bug fixing.” Entrepreneur Li Bojie, who is about to launch an AI product overseas, has been exceptionally busy recently.

This is a C-end AI evaluation product that helps users recommend different AI models or products. He hopes to make this product the “TikTok of the large model era.”

More than a year ago, when Li Bojie decided to leave Huawei to start his own business, he aimed to enter the overseas market. At that time, domestic large models were still in the stage of fierce technical competition, but now more and more companies are choosing the same direction as him.

Whether it’s ByteDance, Baidu, Alibaba, or large model unicorns like MiniMax, Dark Side of the Moon, and Zero One Everything, they are all accelerating their overseas expansion to tap into the global market.

Many companies are quietly making a fortune. Sohu Technology has learned that several overseas products have achieved rapid growth in users and revenue, and some have even started to become profitable. Some products have seen a surge in traffic with AI support, and are expected to achieve profits of 70 to 80 million yuan this year.

In the mobile internet era, Chinese companies went overseas and created TikTok. Now everyone is trying to create the TikTok of the AI era. This is a huge opportunity, but also full of challenges.

2024-04-15

How to Develop Research Taste?

(This article was first published on Zhihu answer: “How to develop research taste in the field of computer systems?”)

In the blink of an eye, it’s been nearly 10 years since I graduated from USTC. Yesterday, while discussing with my wife the recent developments of our classmates in the USTC systems circle, I realized that research taste is the most critical factor in determining academic outcomes. The second key factor is hands-on ability.

What is research taste? I believe that research taste is about identifying influential future research directions and topics.

Many students are technically strong, meaning they have strong hands-on skills and system implementation abilities, but still fail to produce influential research outcomes. The main reason is poor research taste, choosing research directions that either merely chase trends without original thought or are too niche to attract attention.

PhD Students’ Research Taste Depends on Their Advisors

I believe that research taste initially depends heavily on the advisor, and later on one’s own vision.

2024-04-14

Chatbot Arena: A Community-Based Evaluation Benchmark for Large Models

(This article was first published on Zhihu Answer: “What are the current benchmarks for evaluating large language models?”)

We must praise our co-founder @SIY.Z for Chatbot Arena!

Chatbot Arena is a community-based evaluation benchmark for large models. Since its launch a year ago, Chatbot Arena has received over 650,000 valid user votes.

Chatbot Arena Witnesses the Rapid Evolution of Large Models

In the past month, we have witnessed several very interesting events on Chatbot Arena:

Anthropic’s release of Claude-3, with its large Opus model surpassing GPT-4-Turbo, and its medium Sonnet and small Haiku models matching the performance of GPT-4. This marks the first time a company other than OpenAI has taken the top spot on the leaderboard. Anthropic’s valuation has reached $20B, closely approaching OpenAI’s $80B. OpenAI should feel a bit threatened.
Cohere released the strongest open-source model to date, Command R+, with a 104B model matching the performance of GPT-4, although still behind GPT-4-Turbo. Earlier this year, I mentioned the four major trends for large models in 2024 during an interview with Jiazi Guangnian (“AI One Day, Human One Year: My Year with AI | Jiazi Guangnian”): “Multimodal large models capable of real-time video understanding and generating videos with complex semantics; open-source large models reaching GPT-4 level; the inference cost of GPT-3.5 level open-source models dropping to one percent of the GPT-3.5 API, making it cost-effective to integrate large models; high-end smartphones supporting local large models and automatic app operation, making everyone’s life dependent on large models.” The first is Sora, the second is Command R+, both have come true. I still hold this view, if a company mainly focused on foundational models cannot train a GPT-4 by 2024, they should stop trying, wasting a lot of computing power, and not even matching open-source models.
Tongyi Qianwen released a 32B open-source model, almost reaching the top 10, performing well in both Chinese and English. The cost-effectiveness of the 32B model is still very strong.
OpenAI was surpassed by Anthropic’s Claude Opus, and naturally, they did not show weakness, immediately releasing GPT-4-Turbo-2024-04-09, reclaiming the top spot on the leaderboard. However, OpenAI has been slow to release GPT-4.5 or GPT-5, and the much-anticipated multimodal model has not yet appeared, which is somewhat disappointing.

2024-04-07

Bilibili Uploader Interview with Li Bojie: Why Start a Business

This video is an interview with me by the Bilibili uploader “Apple Bubbles”, original video link

The entire interview lasted half an hour, recorded in one take, with no edits except for the intro added by the uploader, and no prepared answers to the questions.

(27:07, 136 MB)

2024-03-29

Long Talk: Should AI Agents Be More Entertaining or More Useful?

(The full text is about 40,000 words, mainly from a 2-hour report at the USTC Alumni AI Salon on December 21, 2023, and is a technical extended version of the 15-minute report at the Zhihu AI Pioneers Salon on January 6, 2024. The article has been organized and expanded by the author.)

I am honored to share some of my thoughts on AI Agents at the USTC Alumni AI Salon. I am Li Bojie, from the 2010 Science Experimental Class, and I pursued a joint PhD at USTC and Microsoft Research Asia from 2014 to 2019. From 2019 to 2023, I was part of the first cohort of Huawei’s Genius Youth. Today, I am working on AI Agent startups with a group of USTC alumni.

Today is the seventh day since the passing of Professor Tang Xiaou, so I specially set today’s PPT to a black background, which is also my first time using a black background for a presentation. I also hope that as AI technology develops, everyone can have their own digital avatar in the future, achieving eternal life in the digital world, where life is no longer limited and there is no more sorrow from separation.

AI: Entertaining and Useful

The development of AI has always had two directions, one is entertaining AI, which is more human-like, and the other is useful AI, which is more tool-like.

Should AI be more like humans or more like tools? Actually, there is a lot of controversy about this. For example, Sam Altman, CEO of OpenAI, said that AI should be a tool, not a life form. However, many sci-fi movies depict AI that is more human-like, such as Samantha in Her, Tu Ya Ya in The Wandering Earth 2, Ash in Black Mirror, so we hope to bring these sci-fi scenarios to reality. Only a few sci-fi movies feature tool-like AI, such as Jarvis in Iron Man.

Besides the horizontal dimension of entertaining and useful, there is another vertical dimension, which is fast thinking and slow thinking. This is a concept from neuroscience, from the book “Thinking, Fast and Slow,” which says that human thinking can be divided into fast thinking and slow thinking.

Fast thinking refers to basic visual and auditory perception abilities and expressive abilities like speaking that do not require deliberate thought, like ChatGPT, stable diffusion. These are tool-like fast thinking AIs that respond to specific questions and do not initiate interaction unless prompted. Whereas Character AI, Inflection Pi, and Talkie (Hoshino) simulate conversations with a person or anime game character, these conversations do not involve solving complex tasks and lack long-term memory, thus they are only suitable for casual chats and cannot help solve problems in life and work like Samantha in Her.

Slow thinking refers to stateful complex thinking, which involves planning and solving complex problems, determining what to do first and what to do next. For example, MetaGPT writing code simulates the division of labor in a software development team, and AutoGPT breaks down a complex task into many stages to complete step by step. Although these systems still have many practical issues, they already represent a nascent form of slow thinking capability.

Unfortunately, there are almost no products in the first quadrant that combine slow thinking with human-like attributes. Stanford AI Town is a notable academic attempt, but there is no real human interaction in Stanford AI Town, and the AI Agent’s daily schedule is pre-arranged, so it is not very interesting.

Interestingly, most of the AI in sci-fi movies actually falls into this first quadrant. Therefore, this is the current gap between AI Agents and human dreams. Therefore, what we are doing is exactly the opposite of what Sam Altman said; we hope to make AI more human-like while also capable of slow thinking, eventually evolving into a digital life form.

2024-02-25

USTC Practical Project: Undergraduates with Basic Programming Skills Can Also Develop AI Agents

Since December 2023, I have been working as a corporate mentor in collaboration with Professor Junming Liu from USTC on an AI Agent practical project, with about 80 students from across the country participating. Most of them are undergraduates with only basic programming skills, along with some doctoral and master’s students with a foundation in AI.

In December 2023 and January 2024, we held 6 group meetings to explain the basics of AI Agents, how to use the OpenAI API, this AI Agent practical project, and to answer questions students had during the practice. The practical project includes:

Corporate ERP Assistant
Werewolf
Intelligent Data Collection
Mobile Voice Assistant
Meeting Assistant
Old Friends Reunion
Undercover

From February 20-24, some students participating in this research project gathered in Beijing for a Hackathon and presented the interim results of their projects. Participants generally felt the power of large models, surprised that such complex functions could be achieved with just a few hundred lines of code. Below are some of the project outcomes:

2024-02-22

Groq Inference Chips: A Trick of Trading Space for Time

Recently, Groq’s inference chips have made headlines with their large model output speed of 500 tokens/s.

In a nutshell, this chip plays a trick of trading space for time, storing both model weights and intermediate data in SRAM, instead of HBM or DRAM.

This is something I did 8 years ago at Microsoft Asia Research Institute (MSRA), suitable for the neural networks of that time, but really not suitable for today’s large models. Because large models based on Transformers require a lot of memory to store the KV Cache.

Although Groq’s chips have a very fast output speed, due to the limited memory size, the batch size cannot be very large. If we calculate the cost-effectiveness in terms of $/token, it may not be competitive.

Groq needs a cluster of hundreds of cards to run the LLaMA-2 70B model

My Early Encounters with AI

Meeting AI During My PhD

Originally, my PhD research was focused on networks and systems, with my dissertation titled “High-Performance Data Center Systems Based on Programmable Network Cards“. Many in the field of networks and systems look down upon some AI research, claiming that AI papers are easy to “water down” and that with just an idea, a paper can be published in one or two months. In contrast, top conference papers in networks and systems often require a significant amount of work, taking as long as a year to complete.

Aside from the AI courses I took in school, my first serious AI-related project was in 2016, using FPGA to accelerate neural networks in Bing Ranking. That period was the previous wave of AI hype, and the so-called “four AI dragons” of today all started during that time.

Microsoft deployed FPGAs on a large scale in data centers not only for network virtualization but also for an important piece of neural network inference acceleration. At that time, we also used pipeline parallelism to store all the neural network weights on the FPGA’s SRAM, achieving super-linear acceleration. This story is described in more detail in the section “Exploration of Machine Learning Accelerators” in “Five Years of PhD at MSRA — Leading My First SOSP Paper“.

At that time, many people working in networks and systems didn’t understand AI, nor did they care to understand it, unable to distinguish between training and inference, or forward and backward operators. By optimizing these operators, I at least understood how basic feedforward neural networks (FFNN) work. However, I didn’t get involved in business applications or tinker with my own models.

RSS

Bojie Li

2024-05-05

Are Highly Ambitious Men Suitable as Life Partners?

High Single Rate

2024-04-22

Zhihu "New Figures" Interview: The AGI Belief of Huawei Top Talent

2024-04-17

Is It Difficult to Develop Large Models Domestically? ByteDance, Baidu, and Unicorns Compete Overseas, Who Will Profit First from Over 70 AI Products?

2024-04-15

How to Develop Research Taste?

PhD Students’ Research Taste Depends on Their Advisors

2024-04-14

Chatbot Arena: A Community-Based Evaluation Benchmark for Large Models

Chatbot Arena Witnesses the Rapid Evolution of Large Models

2024-04-07

Bilibili Uploader Interview with Li Bojie: Why Start a Business

2024-03-29

Long Talk: Should AI Agents Be More Entertaining or More Useful?

AI: Entertaining and Useful

2024-02-25

USTC Practical Project: Undergraduates with Basic Programming Skills Can Also Develop AI Agents

2024-02-22

Groq Inference Chips: A Trick of Trading Space for Time

Groq needs a cluster of hundreds of cards to run the LLaMA-2 70B model

2024-02-20

How I Embarked on the AI Entrepreneurship Journey

My Early Encounters with AI

Meeting AI During My PhD

Links

Bojie Li

2024-05-05 Are Highly Ambitious Men Suitable as Life Partners?

High Single Rate

2024-04-22 Zhihu "New Figures" Interview: The AGI Belief of Huawei Top Talent

2024-04-17 Is It Difficult to Develop Large Models Domestically? ByteDance, Baidu, and Unicorns Compete Overseas, Who Will Profit First from Over 70 AI Products?

2024-04-15 How to Develop Research Taste?

PhD Students’ Research Taste Depends on Their Advisors

2024-04-14 Chatbot Arena: A Community-Based Evaluation Benchmark for Large Models

Chatbot Arena Witnesses the Rapid Evolution of Large Models

2024-04-07 Bilibili Uploader Interview with Li Bojie: Why Start a Business

2024-03-29 Long Talk: Should AI Agents Be More Entertaining or More Useful?

AI: Entertaining and Useful

2024-02-25 USTC Practical Project: Undergraduates with Basic Programming Skills Can Also Develop AI Agents

2024-02-22 Groq Inference Chips: A Trick of Trading Space for Time

Groq needs a cluster of hundreds of cards to run the LLaMA-2 70B model

2024-02-20 How I Embarked on the AI Entrepreneurship Journey

My Early Encounters with AI

Meeting AI During My PhD

Links

2024-05-05

Are Highly Ambitious Men Suitable as Life Partners?

2024-04-22

Zhihu "New Figures" Interview: The AGI Belief of Huawei Top Talent

2024-04-17

Is It Difficult to Develop Large Models Domestically? ByteDance, Baidu, and Unicorns Compete Overseas, Who Will Profit First from Over 70 AI Products?

2024-04-15

How to Develop Research Taste?

2024-04-14

Chatbot Arena: A Community-Based Evaluation Benchmark for Large Models

2024-04-07

Bilibili Uploader Interview with Li Bojie: Why Start a Business

2024-03-29

Long Talk: Should AI Agents Be More Entertaining or More Useful?

2024-02-25

USTC Practical Project: Undergraduates with Basic Programming Skills Can Also Develop AI Agents

2024-02-22

Groq Inference Chips: A Trick of Trading Space for Time

2024-02-20

How I Embarked on the AI Entrepreneurship Journey