Bojie Li
2024-12-31
(This article was written by the author in November 2024 at the invitation of Open Source China for the “2024 OSChina Annual AI Review”)
In 2024, large models truly began to be implemented, with most tech workers using at least one large model to enhance efficiency in their work. Many national-level applications and mobile phone manufacturers have also integrated large models. Large models are starting to diverge into two directions: professional models and personal models.
Professional Models
Professional models are designed to enhance productivity, such as AI-assisted programming, writing, design, consulting, education, etc. Once the model’s capabilities reach a threshold, professional models will bring high added value. In 2024, professional models have already been implemented in many fields. For example, AI-assisted programming can more than double development efficiency, with API call or IDE subscription costs of just tens of dollars per month, equivalent to engineers costing tens of thousands of dollars per month. AI-generated images, podcasts, live broadcasts, etc., can increase the work efficiency of artists, voice actors, and hosts by hundreds of times. AI consulting services in psychology, law, and medical fields can reach the level of junior professionals, with hourly charges significantly higher than the model costs. AI virtual foreign teachers can already rival real foreign teachers, and due to standard pronunciation, the effect even surpasses most domestic English teachers. In the future, AI-assisted teaching will change the traditional one-to-many teaching model, making one-on-one AI teaching possible and significantly improving the efficiency and quality of human teachers’ content preparation.
2024-12-28
Long article warning, this article contains 10016 words, estimated reading time 27 minutes
“Dialogue” is a series of in-depth interviews launched by the Woke Advanced Alliance. We invite and interview outstanding alumni from USTC who have experienced setbacks, tasted failures, and achieved success during their university life at USTC. Through in-depth conversations, we hope to showcase their life journeys and personal choices, hoping that their experiences can illuminate more paths for future USTC students.
In this issue of the Dialogue column, we invited Senior Brother Li Bojie (personal homepage: 01.me), a USTC 1000 alumnus, USTC-MSRA joint PhD, one of the first Huawei “Genius Youth” awardees, AI entrepreneur, and co-founder of the USTC course evaluation community. He was an assistant scientist and deputy chief expert at Huawei’s Computer Network and Protocol Laboratory. He has published multiple papers at top conferences such as SIGCOMM, SOSP, NSDI, and ATC, and has received the ACM China Outstanding Doctoral Dissertation Award and the “Microsoft Scholar” scholarship.
This article is original by Woke Advanced Alliance. Do not repost without permission.
Interview, Editing | Feng Wenjun, Chen Lei
Proofreading | Zhao Guohua
Theme Summary
Learning and Practice Experience During University
How to View Mathematical Foundations
Development History of the Course Evaluation Community
How to Transition to AI Research
Academic Planning and Career Choices
Misconceptions and Suggestions for Choosing a PhD
2024-12-21
This article was first published in a Zhihu answer to “What do you think of OpenAI’s latest o3 model? How powerful is it?“
When o1 first came out, many people doubted that it had not yet reached AGI (Artificial General Intelligence). The programming and mathematical capabilities demonstrated by o3 not only meet the threshold for AGI but even touch the edges of ASI (Artificial Superintelligence).
o3 further validates the value of RL and test-time scaling, providing a path to continue enhancing model intelligence and solving more difficult problems through post-training and increased inference time when high-quality pre-training data is nearly exhausted and model capabilities hit a “wall.”
Many have seen the specific performance metrics of o3, so I won’t repeat them. Here’s a summary:
- o3 defeated 99.9% of programmers in Codeforces programming competitions, ranking 175th among 168,076 programmers. Even the authors of o3 couldn’t beat it.
- o3 also shows significant improvement over o1 in meeting real-world programming needs. In the SWE-Bench software development test, the previously released o1-preview scored 41.3%, while o3 scored 71.7%. This means o3 can directly meet 70% of real-world needs and pass unit tests, leaving only 30% of the work for human programmers, which AI can also help significantly improve efficiency.
- It scored 96.7% on the AIME 2024 math test, equivalent to only missing one question in the American Mathematics Olympiad.
- In the GPQA Diamond test for PhD-level scientific questions, it exceeded o1 by 10 percentage points, while o1 was already at the average level of human PhD students.
- In graphical logic reasoning ARC-AGI, after fine-tuning, o3 reached 87.5%, surpassing the human average (85%).
2024-11-16
On the evening of November 15, 2024, at the Zhihu Academic Bar, I, along with prominent figures like Kai-Fu Lee, Zhiyuan Liu, and Guohao Dai, participated in an open mic sharing session.
Question:
“Vulnerabilities & Bugs—What moment made you feel like the world had a bug?”
On Zhihu, there are several highly upvoted questions about bugs, such as “What moment made you feel like the world had a bug?” and “What are some bugs that left you dumbfounded?”
However, it’s not scary when the world has a bug; what’s scary is when AI discovers a bug.
Recently, did AI discover a major security vulnerability in the real world for the first time? A vulnerability in SQLite was fortunately discovered by Google’s AI Agent, and after being fixed, it caused no damage. Could it be that with further evolution, AI could permanently prevent global blue screen incidents like those from Microsoft? This possibility is exciting.
Answer:
2024-11-01
Original podcast content: Six Forks Podcast “R&D Positions Must Embrace AI, Then Spend the Remaining Time Doing Experiments—A Conversation with Huawei’s First Batch of Genius Youth Li Bojie”
The following content is approximately 35,000 words, organized by the author using AI based on the podcast content. Thanks to Hunter Leslie for the wonderful interview and post-production, the 2-hour session was a blast without any retakes. Also, thanks to AI for allowing me to organize 30,000 words of content in an afternoon and supplement it with previously written materials.
Core Points Summary:
- Sci-fi movies like “Her” and “Black Mirror” involving AI scenarios have already been realized or are close to realization, turning sci-fi into reality will undoubtedly have immense value.
- Model capabilities are rapidly increasing, and small AI companies should make friends with foundational model companies rather than embellishing or wrapping models.
- The success rate of “20% projects” is relatively high; start with interest projects based on daily work and life needs during spare time, and if there is a generalized need, expand into commercial projects for a higher success rate.
- Many performance issues in AI applications are not model problems but should be solved with system optimization based on first principles.
- A lot of work in the AI industry has not been published or open-sourced, creating a huge information gap.
- The information gap in modern society is enormous; AI interacting more with users can understand everyone’s knowledge boundaries, greatly improving recommendation efficiency and helping to bridge the information gap.
- OpenAI o1’s strong reasoning ability is crucial for the reliability of model applications in serious scenarios.
- For most users’ daily life needs, the most capable models are already sufficient; the focus is on reducing costs. AGI might be very expensive, mainly used to solve the most important problems in human science.
- Limited energy and chip manufacturing capabilities are major challenges for AGI.
- Startups need to recruit people with solid computer science knowledge, strong learning ability, and strong self-drive.
- AI-assisted programming can significantly enhance programmers’ work efficiency, freeing up time for exploring “20% projects” or achieving a better work-life balance.
- After AI improves efficiency, it will bring more demand, turning more needs into reality, and even independent developers can complete work that previously required a team.
- A person’s career is composed of a series of projects, and it’s important that each project has an impact. Different projects are suitable for different approaches, including startups, small and beautiful companies, communities, academic projects, etc.
Full Text:
2024-10-24
Q: What is the one product you most want to share from the past year?
A: I previously mentioned a saying, “AI in a day, human in a year.” There have been many exciting products in the past year. If I had to choose one, I would pick OpenAI o1, which, simply put, taught AI to think. This thinking is most evident in mathematics and programming. We shouldn’t understand mathematics and programming narrowly, as they are the biggest challenges for current large models in commercial applications.
In mathematics, most large models currently can’t calculate accurately, such as not distinguishing between 3.8 and 3.11, leading to low accuracy and making them unreliable in serious scenarios, like booking a flight or calculating expenses. What if they make a mistake? Now that models can calculate accurately, they can be used in many serious scenarios.
Programming isn’t just for programmers. We’ve observed an important trend in AI applications: the generated content is not just text but a multimodal content with images and text, or even interactive mini-games or mini-programs, like Claude Artifacts, OpenAI Canvas, Google NotebookLM generating podcasts, and Perplexity generating illustrated wikis. These contents are essentially a piece of code generated by large models and then dynamically rendered. This kind of multimodal content tests the programming ability of large models.
2024-10-20
Before starting my business, my wife bought me “Xiaomi’s Entrepreneurial Thinking,” but I never read it. Recently, I had some time to go through it and found it very rewarding. I used to dislike such books, thinking these experiences were processed and beautified, and some advice might not be applicable. However, after having personal entrepreneurial experience, reading books by industry leaders makes a lot of sense.
The essence of “Xiaomi’s Entrepreneurial Thinking” is in Chapter Six, “The Seven-Word Formula for the Internet,” which is Focus, Extreme, Reputation, Speed.
The development approach of MIUI fully embodies the “Focus, Extreme, Reputation, Speed” seven-word formula for the internet:
- Focus: Initially, only four functions were developed (phone, SMS, contacts, and desktop), with extreme restraint.
- Extreme: With customizable lock screens and themes, it could simulate any phone, pursuing an extreme experience.
- Reputation: The entire company communicated with users on forums, making friends with them. It was very popular on the XDA forum and became a hit abroad, with its earliest internationalization starting from MIUI.
- Speed: Weekly iterations, adopting an internet development model.
Focus
Focus is the most important of the seven-word formula for the internet and applies to all companies and products.
Companies Need Focus
Lei Jun shared his first entrepreneurial failure experience. Lei Jun was technically strong, completing four years of credits by his sophomore year. In his junior year, he wrote the antivirus software “Immunity 90,” which sold for a million yuan—a significant amount in the 1990s. So, in his senior year, he founded the Tricolor Company with two tech experts, Li Ruxiong and Wang Quanguo (both of whom are very successful now), but this venture quickly ended in failure.
2024-10-08
(This article was first published in a Zhihu answer to “Why was the 2024 Nobel Prize in Physics awarded to machine learning in artificial neural networks?”)
Some people joked that many physicists hadn’t heard of the two people who won this year’s Nobel Prize in Physics…
The Connection Between Artificial Neural Networks and Statistical Physics Is Not Accidental
In early July, when I returned to my alma mater for the 10th anniversary of my undergraduate graduation, I chatted with some classmates who are into mathematics and physics about AI. I was surprised to find that many fundamental concepts in AI today originate from statistical physics, such as diffusion models and emergence.
@SIY.Z also explained to me the statistical physics foundations behind many classic AI algorithms, such as the significant achievement of the two Nobel laureates, the Restricted Boltzmann Machine (RBM).
This connection is not accidental because statistical physics studies the behavior of systems composed of a large number of particles, just as artificial neural networks are systems composed of a large number of neurons. The early development of artificial neural networks clearly reveals this connection:
Hopfield Network
In 1982, Hopfield, while studying the principles of human memory, aimed to create a mathematical model to explain and simulate how neural networks store and reconstruct information, especially how neurons in the brain form memories through interconnections.
Specifically, the purpose of this research was to construct a CAM (Content-Addressable Memory) that supports “semantic fuzzy matching,” where multiple pieces of data are stored during the storage phase, and during the reconstruction phase, a partially lost or modified piece of data is input to find the original data that matches it best.
The Hopfield network utilized the atomic spin characteristic of matter, which allows each atom to be viewed as a small magnet. This is why the Hopfield network and subsequent artificial neural networks resemble the Ising model in statistical physics, which explains why matter has ferromagnetism.
2024-10-02
On September 20-21, I was invited to attend the 2024 Yunqi Conference. I spent nearly two days exploring all three exhibition halls and engaged with almost every booth that piqued my interest.
- Hall 1: Breakthroughs and Challenges in Foundational Models
- Hall 2: Computing Power and Cloud Native, the Core Architecture Supporting AI
- Hall 3: Application Implementation, AI Empowering Various Industries
My previous research focus was on the computing infrastructure and cloud native in Hall 2. Now, I mainly work on AI applications, so I am also very familiar with the content of Hall 1 and Hall 3. After two days of discussions, I really felt like I had completed the Yunqi Conference.
After the conference, I spoke into a recorder for over two hours, and then had AI organize this nearly 30,000-word article. I couldn’t finish organizing it by September 22, and with my busy work schedule, I took some time during the National Day holiday to edit it with AI, spending about 9 hours in total, including the recording. In the past, without AI, it was unimaginable to write 30,000 words in 9 hours.
Outline of the full text:
Hall 1 (Foundational Models): The Primary Driving Force of AI
- Video Generation: From Single Generation to Breakthroughs in Diverse Scenarios
- From Text-to-Video to Multi-Modal Input Generation
- Motion Reference Generation: From Static Images to Dynamic Videos
- Digital Human Technology Based on Lip Sync and Video Generation
- Speech Recognition and Synthesis
- Speech Recognition Technology
- Speech Synthesis Technology
- Music Synthesis Technology
- Future Directions: Multi-Modal End-to-End Models
- Agent Technology
- Inference Technology: The Technological Driving Force Behind a Hundredfold Cost Reduction
- Video Generation: From Single Generation to Breakthroughs in Diverse Scenarios
Hall 3 (Applications): AI Moving from Demo to Various Industries
- AI-Generated Design: A New Paradigm of Generative AI
- PPT Generation (Tongyi Qianwen)
- Chat Assistant with Rich Text and Images (Kimi’s Mermaid Diagram)
- Displaying Generated Content in Image Form (New Interpretation of Chinese)
- Design Draft Generation (Motiff)
- Application Prototype Generation (Anthropic Claude)
- Intelligent Consumer Electronics: High Expectations, Slow Progress
- AI-Assisted Operations: From Hotspot Information Push to Fan Interaction
- Disruptive Applications of AI in Education: From Personalized to Contextual Learning
- AI-Generated Design: A New Paradigm of Generative AI
Hall 2 (Computing Infrastructure): The Computing Power Foundation of AI
- CXL Architecture: Efficient Integration of Cloud Resources
- Cloud Computing and High-Density Servers: Optimization of Computing Power Clusters
- Cloud Native and Serverless
- Confidential Computing: Data Security and Trust Transfer in the AI Era
Conclusion: Two Bitter Lessons in Foundational Models, Computing Power, and Applications
- The Three Exhibition Halls of the Yunqi Conference Reflect Two Bitter Lessons
- Lesson One: Foundational Models are Key to AI Applications
- Lesson Two: Computing Power is Key to Foundational Models
2024-09-18
Why don’t American internet companies need 996, and yet have higher per capita output?
Many people simply attribute it to social culture “involution” and insufficient enforcement of the eight-hour workday, but I don’t think these are the main reasons. Many companies with overseas operations don’t implement 996 for their overseas teams, and they don’t even require clocking in, but their domestic teams still need 996. Why is that?
As a programmer who has some understanding of both domestic and American companies, I believe the main reasons are as follows:
- Higher customer unit price for American companies
- Lower time requirements for manual services from American customers
- Higher code quality of junior programmers in American companies
- Lower management costs for American companies
- Better use of tools and SaaS services by American companies
- Clearer goals and boundaries for American companies
- A few 007 heroes in American companies carrying the load
Higher Customer Unit Price for American Companies
A person with similar abilities, working the same amount of time, is likely to generate more revenue and profit in an American company than in a Chinese company. The reason lies in the customer unit price.