2024-11-01
Making Friends with Foundational Model Companies—Six Forks Podcast

Original podcast content: Six Forks Podcast “R&D Positions Must Embrace AI, Then Spend the Remaining Time Conducting Experiments—A Conversation with Huawei’s First Batch of Genius Youth Li Bojie”

The following content is about 30,000 words, organized by the author using AI based on the podcast content. Thanks to Hunter Leslie for the wonderful interview and post-production. The two-hour session was a blast, with no re-recordings. Also, thanks to AI for allowing me to organize 30,000 words of content in one afternoon and supplement it with other previously written materials.

Hunter Leslie: Welcome to Six Forks, I’m Hunter. Today we’re going to talk about R&D-related topics, and our guest is Li Bojie. Bojie was jointly trained by USTC and Microsoft, and was part of Huawei’s first batch of genius youth. In just three years at Huawei, he became a level 20 senior expert. In July 2023, driven by his belief in AI, he ventured into entrepreneurship in the fields of large models and Web3. Bojie, please say hello to everyone.

Li Bojie: Hello everyone, my name is Li Bojie. I was an undergraduate at USTC in 2010, then a Ph.D. student at USTC and MSRA (Microsoft Research Asia) in 2014, and part of Huawei’s first batch of genius youth in 2019. In 2023, I left Huawei to start a business with my classmates.

Hunter Leslie: Yes, exactly. So the first question I really want to ask is, you see, in 2019, the genius youth joined Huawei, and in just two or three years reached level 20. Having been at Huawei, I know how difficult it is to reach that level. Everything seemed to be going smoothly, so why suddenly start a business, right? Because at this stage, being able to advance so quickly within a platform is actually a very difficult thing.

Li Bojie: If I were to sum it up in one sentence, I wanted to experience a different life through entrepreneurship and enable AI to better benefit humanity.

If I elaborate a bit more, I can share my story about when I first connected with AI. Initially, I was doing systems research during my undergraduate years and didn’t understand AI. But when I joined MSRA, which is considered China’s best AI lab and often referred to as the “Huangpu Military Academy” of AI, I was exposed to a lot of AI-related things even though I was working on systems and networks. However, I didn’t initially learn AI algorithms because many of us in systems thought AI was as intelligent as the amount of human input it received. Why? Because AI was still quite primitive at the time, unable to truly understand natural language. It could only capture patterns and relationships between inputs, data, and outputs, but whether it truly understood was questionable.

I think it was in early 2017 when a lecture changed my entire perspective. I was at MSRA, and I can’t remember which professor it was, but they talked about two movies from 2013: one was “Her,” and the other was an episode from “Black Mirror” (Be Right Back).

Read More

2024-10-24
Live Sharing on Byte MarsCode 1024 Code Night

Q: What is the one product you most want to share from the past year?

A: I previously mentioned a saying, “AI in a day, human in a year.” There have been many exciting products in the past year. If I had to choose one, I would pick OpenAI o1, which, simply put, taught AI to think. This thinking is most evident in mathematics and programming. We shouldn’t understand mathematics and programming narrowly, as they are the biggest challenges for current large models in commercial applications.

In mathematics, most large models currently can’t calculate accurately, such as not distinguishing between 3.8 and 3.11, leading to low accuracy and making them unreliable in serious scenarios, like booking a flight or calculating expenses. What if they make a mistake? Now that models can calculate accurately, they can be used in many serious scenarios.

Programming isn’t just for programmers. We’ve observed an important trend in AI applications: the generated content is not just text but a multimodal content with images and text, or even interactive mini-games or mini-programs, like Claude Artifacts, OpenAI Canvas, Google NotebookLM generating podcasts, and Perplexity generating illustrated wikis. These contents are essentially a piece of code generated by large models and then dynamically rendered. This kind of multimodal content tests the programming ability of large models.

Read More

2024-10-20
Thoughts Inspired by "Xiaomi's Entrepreneurial Thinking"

Before starting my business, my wife bought me “Xiaomi’s Entrepreneurial Thinking,” but I never read it. Recently, I had some time to go through it and found it very rewarding. I used to dislike such books, thinking these experiences were processed and beautified, and some advice might not be applicable. However, after having personal entrepreneurial experience, reading books by industry leaders makes a lot of sense.

The essence of “Xiaomi’s Entrepreneurial Thinking” is in Chapter Six, “The Seven-Word Formula for the Internet,” which is Focus, Extreme, Reputation, Speed.

The development approach of MIUI fully embodies the “Focus, Extreme, Reputation, Speed” seven-word formula for the internet:

  • Focus: Initially, only four functions were developed (phone, SMS, contacts, and desktop), with extreme restraint.
  • Extreme: With customizable lock screens and themes, it could simulate any phone, pursuing an extreme experience.
  • Reputation: The entire company communicated with users on forums, making friends with them. It was very popular on the XDA forum and became a hit abroad, with its earliest internationalization starting from MIUI.
  • Speed: Weekly iterations, adopting an internet development model.

Focus

Focus is the most important of the seven-word formula for the internet and applies to all companies and products.

Companies Need Focus

Lei Jun shared his first entrepreneurial failure experience. Lei Jun was technically strong, completing four years of credits by his sophomore year. In his junior year, he wrote the antivirus software “Immunity 90,” which sold for a million yuan—a significant amount in the 1990s. So, in his senior year, he founded the Tricolor Company with two tech experts, Li Ruxiong and Wang Quanguo (both of whom are very successful now), but this venture quickly ended in failure.

Read More

2024-10-08
Why the Nobel Prize in Physics Was Awarded to AI

(This article was first published in a Zhihu answer to “Why was the 2024 Nobel Prize in Physics awarded to machine learning in artificial neural networks?”)

Some people joked that many physicists hadn’t heard of the two people who won this year’s Nobel Prize in Physics…

The Connection Between Artificial Neural Networks and Statistical Physics Is Not Accidental

In early July, when I returned to my alma mater for the 10th anniversary of my undergraduate graduation, I chatted with some classmates who are into mathematics and physics about AI. I was surprised to find that many fundamental concepts in AI today originate from statistical physics, such as diffusion models and emergence.

@SIY.Z also explained to me the statistical physics foundations behind many classic AI algorithms, such as the significant achievement of the two Nobel laureates, the Restricted Boltzmann Machine (RBM).

This connection is not accidental because statistical physics studies the behavior of systems composed of a large number of particles, just as artificial neural networks are systems composed of a large number of neurons. The early development of artificial neural networks clearly reveals this connection:

Hopfield Network

In 1982, Hopfield, while studying the principles of human memory, aimed to create a mathematical model to explain and simulate how neural networks store and reconstruct information, especially how neurons in the brain form memories through interconnections.

Specifically, the purpose of this research was to construct a CAM (Content-Addressable Memory) that supports “semantic fuzzy matching,” where multiple pieces of data are stored during the storage phase, and during the reconstruction phase, a partially lost or modified piece of data is input to find the original data that matches it best.

The Hopfield network utilized the atomic spin characteristic of matter, which allows each atom to be viewed as a small magnet. This is why the Hopfield network and subsequent artificial neural networks resemble the Ising model in statistical physics, which explains why matter has ferromagnetism.

Read More

2024-10-02
2024 Yunqi Conference: Foundational Models, Applications, and Two Bitter Lessons in Computing Power

On September 20-21, I was invited to attend the 2024 Yunqi Conference. I spent nearly two days exploring all three exhibition halls and engaged with almost every booth that piqued my interest.

  • Hall 1: Breakthroughs and Challenges in Foundational Models
  • Hall 2: Computing Power and Cloud Native, the Core Architecture Supporting AI
  • Hall 3: Application Implementation, AI Empowering Various Industries

My previous research focus was on the computing infrastructure and cloud native in Hall 2. Now, I mainly work on AI applications, so I am also very familiar with the content of Hall 1 and Hall 3. After two days of discussions, I really felt like I had completed the Yunqi Conference.

After the conference, I spoke into a recorder for over two hours, and then had AI organize this nearly 30,000-word article. I couldn’t finish organizing it by September 22, and with my busy work schedule, I took some time during the National Day holiday to edit it with AI, spending about 9 hours in total, including the recording. In the past, without AI, it was unimaginable to write 30,000 words in 9 hours.

Outline of the full text:

  • Hall 1 (Foundational Models): The Primary Driving Force of AI

    • Video Generation: From Single Generation to Breakthroughs in Diverse Scenarios
      • From Text-to-Video to Multi-Modal Input Generation
      • Motion Reference Generation: From Static Images to Dynamic Videos
      • Digital Human Technology Based on Lip Sync and Video Generation
    • Speech Recognition and Synthesis
      • Speech Recognition Technology
      • Speech Synthesis Technology
      • Music Synthesis Technology
      • Future Directions: Multi-Modal End-to-End Models
    • Agent Technology
    • Inference Technology: The Technological Driving Force Behind a Hundredfold Cost Reduction
  • Hall 3 (Applications): AI Moving from Demo to Various Industries

    • AI-Generated Design: A New Paradigm of Generative AI
      • PPT Generation (Tongyi Qianwen)
      • Chat Assistant with Rich Text and Images (Kimi’s Mermaid Diagram)
      • Displaying Generated Content in Image Form (New Interpretation of Chinese)
      • Design Draft Generation (Motiff)
      • Application Prototype Generation (Anthropic Claude)
    • Intelligent Consumer Electronics: High Expectations, Slow Progress
    • AI-Assisted Operations: From Hotspot Information Push to Fan Interaction
    • Disruptive Applications of AI in Education: From Personalized to Contextual Learning
  • Hall 2 (Computing Infrastructure): The Computing Power Foundation of AI

    • CXL Architecture: Efficient Integration of Cloud Resources
    • Cloud Computing and High-Density Servers: Optimization of Computing Power Clusters
    • Cloud Native and Serverless
    • Confidential Computing: Data Security and Trust Transfer in the AI Era
  • Conclusion: Two Bitter Lessons in Foundational Models, Computing Power, and Applications

    • The Three Exhibition Halls of the Yunqi Conference Reflect Two Bitter Lessons
    • Lesson One: Foundational Models are Key to AI Applications
    • Lesson Two: Computing Power is Key to Foundational Models
Read More

2024-09-18
Why American Tech Giants Don't Need 996

Why don’t American internet companies need 996, and yet have higher per capita output?

Many people simply attribute it to social culture “involution” and insufficient enforcement of the eight-hour workday, but I don’t think these are the main reasons. Many companies with overseas operations don’t implement 996 for their overseas teams, and they don’t even require clocking in, but their domestic teams still need 996. Why is that?

As a programmer who has some understanding of both domestic and American companies, I believe the main reasons are as follows:

  1. Higher customer unit price for American companies
  2. Lower time requirements for manual services from American customers
  3. Higher code quality of junior programmers in American companies
  4. Lower management costs for American companies
  5. Better use of tools and SaaS services by American companies
  6. Clearer goals and boundaries for American companies
  7. A few 007 heroes in American companies carrying the load

Higher Customer Unit Price for American Companies

A person with similar abilities, working the same amount of time, is likely to generate more revenue and profit in an American company than in a Chinese company. The reason lies in the customer unit price.

Read More

2024-09-14
Cursor: Writing 800 Lines of Code in 2 Hours to Develop an AI Course Selection Assistant

Switching from IDE and Vim to Cursor

Previously, I used JetBrains series IDEs (PyCharm, CLion) for larger projects and vim for smaller ones. The most annoying part of developing larger projects is writing boilerplate code. Most of the time is not spent thinking about the design of functionalities or algorithms but on boilerplate code.

Cursor is an AI-assisted programming IDE similar to GitHub co-pilot, with an interface quite similar to VS Code. When Cursor was first open-sourced in 2023, I started using it, but it wasn’t particularly useful due to the limitations of the foundational model at that time. After GPT-4o was released in May this year, I started using Cursor again and found it more convenient than asking code questions in ChatGPT. Firstly, there is no need to switch windows back and forth, and secondly, Cursor has context, making queries more efficient.

In the past three months, with the more powerful coding capabilities of Claude 3.5 Sonnet, I have completely switched from PyCharm and Vim to Cursor because Cursor’s development efficiency is much higher than PyCharm with AI completion features, doubling the overall development efficiency. My GitHub has also easily stayed all green in the past three months.

GitHub has stayed all green in the past three months

Cursor can help quickly get started with new languages and frameworks

Cursor is not only useful for improving development efficiency but also for quickly familiarizing ourselves with new programming languages, frameworks, and tech stacks. For example, writing backends in Go, frontends in React, and smart contracts in Solidity were all new to me, but with AI-assisted programming, these are not difficult. If I had such powerful AI when I was in school, I could have learned many more programming skills.

Read More

2024-09-13
OpenAI o1: The Powerful Reasoning Ability of Slow Thinking

Rumors about OpenAI o1 started with last year’s Q*, and this year’s Strawberry fueled the fire again. Apart from the name o1, most of the content has already been speculated: using reinforcement learning methods to teach large models more efficient Chain-of-Thought thinking, significantly enhancing the model’s reasoning ability.

I won’t repeat OpenAI’s official test data here. From my experience, the overall effect is very good, and the claims are not exaggerated.

  • It can score over 120 points on the 2024 college entrance exam math paper (out of 150) and completed the test in just 10 minutes.
  • It can solve elementary school math competition problems correctly, thinking of both standard equation methods and “clever solutions” suitable for elementary students.
  • It can solve previously challenging problems for large models, such as determining whether 3.8 or 3.11 is larger, whether Pi or 3.1416 is larger, and how many r’s are in “strawberry.”
  • In programming, it can independently complete the development of a demo project, seemingly stronger in coding ability than the current best, Claude 3.5 Sonnet.
  • An example in the OpenAI o1 System Card shows that when solving a CTF problem, the remote verification environment’s container broke, and o1-preview found a vulnerability in the competition platform, started a new container, and directly read the flag. Although OpenAI intended to highlight AI’s security risks, this also demonstrates o1’s ability to actively interact with the environment to solve problems.

Some say that OpenAI has created such a powerful model that the gap with other companies has widened, making small companies unnecessary. I believe the situation is quite the opposite. For AI companies and academia without the capability to train foundational models themselves, as well as AI Infra companies and AI Agent companies, this is optimistic news.

Read More

2024-08-10
AI Generated Qixi Video

The Qixi gift I made for my wife: an AI-generated video composed of 25 AI-generated 5-second clips and a piece of AI-generated music. Most of these videos were created using our static photos combined with textual descriptions of actions, some of which are quirky movements; some were generated by overlaying our photos onto other landscape pictures.

The cost of generating the video was about 10 dollars. Although the result is not as good as Sora and has many obvious physical inaccuracies, it is much better than last year’s open-source models like Stable Video Diffusion, and the consistency with the reference images has also improved.

(Video 02:02, 44 MB)

Read More

2024-07-21
Three Sources of Trust in Web3: From Trusting People to Trusting Math, Community, and Economics

The two biggest advantages of Web3 are tokenomics and trust. Tokenomics addresses the issue of profit distribution. This article mainly discusses the issue of trust.

The essence of traditional Web2 trust is trust in people. I dare to store my data with Apple and Google because I believe they won’t sell my data. I dare to anonymously complain about my company on platforms like Maimai because I trust they won’t leak my identity. But obviously, in the face of profit, people are not always trustworthy.

How can Web3 better solve the trust issue? I believe that Web3’s trust comes from three main sources: Cryptographic Trust, Decentralized Trust, and Economic Trust.

The essence of cryptographic trust is trusting math, the essence of decentralized trust is trusting that the majority won’t collude to do evil, and the essence of economic trust is trusting that the majority won’t engage in unprofitable trades. Therefore, these three types of trust decrease in reliability.

So why not just use cryptographic trust? Because many problems cannot be solved by cryptographic trust alone. Although these three types of trust decrease in reliability, their application scope increases.

Next, we will introduce these three types of trust one by one.

Cryptographic Trust

  • How can I prove my identity without revealing who I am? For example, Maimai needs to verify that I am a member of a certain company, but I don’t want to reveal my exact identity to Maimai. Is this possible?
  • How can online games with randomness ensure fairness? For example, how can a Texas Hold’em platform prove that its dealing is absolutely fair and that the dealer is not secretly looking at the cards?
Read More
RSS