Making Friends with Foundational Model Companies—Six Forks Podcast
Original podcast content: Six Forks Podcast “R&D Positions Must Embrace AI, Then Spend the Remaining Time Doing Experiments—A Conversation with Huawei’s First Batch of Genius Youth Li Bojie”
The following content is approximately 35,000 words, organized by the author using AI based on the podcast content. Thanks to Hunter Leslie for the wonderful interview and post-production, the 2-hour session was a blast without any retakes. Also, thanks to AI for allowing me to organize 30,000 words of content in an afternoon and supplement it with previously written materials.
Core Points Summary:
- Sci-fi movies like “Her” and “Black Mirror” involving AI scenarios have already been realized or are close to realization, turning sci-fi into reality will undoubtedly have immense value.
- Model capabilities are rapidly increasing, and small AI companies should make friends with foundational model companies rather than embellishing or wrapping models.
- The success rate of “20% projects” is relatively high; start with interest projects based on daily work and life needs during spare time, and if there is a generalized need, expand into commercial projects for a higher success rate.
- Many performance issues in AI applications are not model problems but should be solved with system optimization based on first principles.
- A lot of work in the AI industry has not been published or open-sourced, creating a huge information gap.
- The information gap in modern society is enormous; AI interacting more with users can understand everyone’s knowledge boundaries, greatly improving recommendation efficiency and helping to bridge the information gap.
- OpenAI o1’s strong reasoning ability is crucial for the reliability of model applications in serious scenarios.
- For most users’ daily life needs, the most capable models are already sufficient; the focus is on reducing costs. AGI might be very expensive, mainly used to solve the most important problems in human science.
- Limited energy and chip manufacturing capabilities are major challenges for AGI.
- Startups need to recruit people with solid computer science knowledge, strong learning ability, and strong self-drive.
- AI-assisted programming can significantly enhance programmers’ work efficiency, freeing up time for exploring “20% projects” or achieving a better work-life balance.
- After AI improves efficiency, it will bring more demand, turning more needs into reality, and even independent developers can complete work that previously required a team.
- A person’s career is composed of a series of projects, and it’s important that each project has an impact. Different projects are suitable for different approaches, including startups, small and beautiful companies, communities, academic projects, etc.
Full Text:
Hunter Leslie: Welcome to Six Forks, I’m Hunter. Today we’re talking about R&D-related matters, and our guest is Li Bojie. Bojie is a joint-trained student of USTC and Microsoft, one of Huawei’s first batch of genius youth, and in just three years at Huawei, he became a level 20 senior expert. In July 2023, driven by his belief in AI, he started a business in the fields of large models and Web3. Bojie, please say hello to everyone.
Li Bojie: Hello everyone, my name is Li Bojie. I was an undergraduate at USTC in 2010, then a PhD student at USTC and MSRA (Microsoft Research Asia) in 2014, and one of Huawei’s first batch of genius youth in 2019. In 2023, I left Huawei to start a business with my classmates.
Hunter Leslie: Yes, exactly. So the first question I really want to ask is, you see, in 2019, the genius youth joined Huawei, and in two or three years reached level 20. Having worked at Huawei, I know that this level is very difficult to achieve. Everything seemed to be going smoothly, so why did you suddenly start a business? Because at this stage, being able to advance so quickly within a platform is actually a very difficult thing.
Li Bojie: If I were to sum it up in one sentence, starting a business is about wanting to experience a different life and enabling AI to better benefit humanity.
If I elaborate a bit more, I can tell my story about when I first got involved with AI. Initially, I was doing systems research, and as an undergraduate, I didn’t understand AI. But when I got to MSRA, which is considered the best AI lab in China, often referred to as the “Huangpu Military Academy” of AI, I was exposed to a lot of AI-related things even though I was working on systems and networks. However, I didn’t initially learn AI algorithms because many of us in systems thought AI was as intelligent as the amount of human input it received. Why? Because AI was still quite dumb at the time, unable to truly understand natural language, only capturing patterns and relationships between input and output data, but whether it truly understood was questionable.
I think it was in early 2017 when a lecture changed all my views. I was at MSRA, and I can’t remember which professor it was, but they talked about two movies, both from 2013: one called “Her” and another episode from “Black Mirror” (Be Right Back).
The first, “Her,” as many might know now, is about a general AI assistant that can listen, see, and speak, helping you operate a computer and complete daily tasks, make phone calls, solve social anxiety issues, and provide emotional value. The male protagonist, going through a divorce, finds emotional solace in the AI and eventually falls in love with it.
The other, “Black Mirror,” tells another story. The female protagonist’s husband dies, but she discovers she’s pregnant. Her friend recommends an AI digital clone, initially a text chat based on online data. Later, it upgrades to voice after uploading some videos, and eventually, she orders a physical robot resembling her deceased boyfriend. They continue living together, raising ethical questions about whether AI can replace real people, which is still challenging.
These two movies were recommended by the professor, and after watching them, I was deeply moved. MSRA had many studies on text and voice processing, and it seemed technically feasible, especially the text chat in “Black Mirror,” which MSRA’s technology could achieve even in 2017. So I wondered if I could train something using my chat logs, as I had just broken up with my girlfriend and had 100,000 chat records. I tried to see if I could train something from them.
But I didn’t know AI, and our group didn’t have GPUs since we were focused on systems and networks. Coincidentally, cryptocurrency mining was booming, and Bitcoin prices were soaring. I realized that mining GPUs were becoming expensive, and buying some might even be profitable. So I spent tens of thousands on dozens of cards, like the old 980 and 1080Ti, as it was 2017. I rented a basement in Beijing…
Hunter Leslie: Were you still in school then?
Li Bojie: Yes, I was still in school. I was a joint student at USTC, but I spent most of my time in Beijing at MSRA. I found a cheap basement and ran an electric line because the machines consumed too much power for regular wiring. I also set up some fans to prevent it from becoming an oven. I assembled some cases and put the GPUs in. Most of the time, they weren’t training models because I didn’t have much time; they were mainly mining. When I sold the machines, I found that the second-hand ones were more expensive than the original price due to the Bitcoin surge, so I made a small profit.
Meanwhile, I trained models occasionally, but my skills were limited, and I didn’t understand AI well, so I could only use others’ models, which didn’t work well. We know that in 2017, even Transformers weren’t around, so the AI models were quite old and ineffective.
Later, I learned about the Microsoft Xiaoice team, a famous chatbot since 2013. In 2014, they even hired a celebrity as a product manager, making it very popular. Xiaoice had many capabilities, including text chat, voice, riddles, couplets, and poetry, so I learned a lot from that team, gaining some understanding of AI.
Now, seven years later, in 2024, I find that the things we couldn’t solve before are now mostly solvable because AI models have developed rapidly. Whether it’s voice or text, these are no longer issues. As mentioned earlier, whether it’s the virtual assistant scenario in “Her” or the digital clone in “Black Mirror,” they are now feasible, and our company has the technology.
Even the scenarios we can’t solve yet, like creating a robot identical to a boyfriend in “Black Mirror,” which I thought would take 20 years, now seem achievable within five years due to the rapid development of embodied intelligence.
So I think it’s truly a great era where many sci-fi movie scenarios can become reality. We’ve seen many sci-fi movies, like Avatar and Marvel, involving physical laws or mechanical limitations that are hard to overcome in the short term. But with AI, these movie scenarios are either already reality or could become reality in the near future. I believe AI is incredibly exciting because sci-fi movies represent humanity’s aspirations for future technology. Turning sci-fi into reality will undoubtedly have immense commercial value.
Hunter Leslie: Yes, because you just mentioned embodied intelligence, I think there’s still a lack of consensus among entrepreneurs and investors, including some scholars. People think that besides the brain, you need a body, and part of the control algorithm, but it doesn’t seem easy. Even with strong systems like Google’s RT2. So you think it might take about four to five years, but there’s also a mainstream view that it might take ten years or more. Elon Musk’s timeline is around 2030, with about 100 billion robots. Would you say you’re relatively optimistic?
Li Bojie: In 5 years, it aligns pretty well with Musk’s expectations. I’m not overly optimistic either. I think there are just two major events in 5 years: one is AGI, where AI’s capabilities reach or surpass the general intelligence level of humans; the second is embodied intelligence, where humanoid robots become commercially viable.
Hunter Leslie: But AGI might happen in the next year or two.
Li Bojie: Yes, AGI might progress a bit faster in the next couple of years. Our company doesn’t work on embodied intelligence, and I’m not very knowledgeable about it, so I’ll just make some bold statements. I feel that the most challenging aspect might still be the latency of foundational models because current embodied intelligence still uses non-large model approaches, relying on traditional reinforcement learning methods for control. As for large models, the main issue is that their latency is still too high, making it difficult to achieve precise low-latency control at the millisecond level. However, model advancements are happening very quickly, right? We might discuss this further later, as model latency is one of the key areas we’re focusing on.
Another point is that I feel the mechanics of robots are actually quite ready because I’ve seen many robot manufacturers do demos where they have someone remotely controlling the robot from behind, and the remote control works quite well. So, I feel that the final gap really lies in AI; once AI is sorted out, embodied intelligence will naturally be resolved.
Although I don’t have the capability to work on embodied intelligence and currently have no plans to touch this area, I believe its potential is enormous. As we mentioned earlier, the AI part in sci-fi movies is already on its way to becoming a reality, while the mechanical and space exploration parts are tasks for embodied intelligence. The bottleneck for embodied intelligence currently seems to be AI.
A quote from Liu Cixin deeply moved me: “The promised stars and sea, yet you only gave me Facebook… From a long-term perspective, in countless possible futures, no matter how prosperous Earth becomes, those futures without space travel are dim.” Why has the world turned out as Liu described? Traveling to other planets is almost everyone’s common dream. Why does capital flow into the internet and AI but not as much into manned spaceflight? Because in recent decades, there hasn’t been a major breakthrough in energy technology, and the vast distances and strong gravity of the universe have become nearly insurmountable obstacles for human physical exploration.
But we know that the speed of information transmission is the speed of light, making vast distances not seem so unreachable. Even if information requires a physical carrier, embodied intelligence might be more suitable for space environments than the human body. I believe AI is currently the most feasible technological route to spread human civilization into the depths of the universe. If AI can carry human intelligence on chips and survive, reproduce, and evolve autonomously, then why can’t chips be another form of life? Life has evolved significantly in form to adapt to environments, from oceans to land. Why can’t adapting to the cosmic environment be another evolution of life? I don’t wish for humans on Earth to be replaced by AI, but why must life in space and on other planets take the form of human bodies?
Therefore, even though some people say AI is a bubble, and some say AI products are hard to monetize, I don’t care about these. As long as what I do can help humanity realize the scenes in sci-fi movies, I have immense passion.
Hunter Leslie: Is your current company called Logenic?
Li Bojie: Actually, Logenic was our earliest name, and we haven’t used that name for a long time.
Hunter Leslie: So, I’m not sure if you can talk about this. For example, what is your current company doing, and what problem do you want to solve?
Li Bojie: Actually, the name Logenic was something my co-founder Siyuan and I came up with together, but we didn’t really think through what we wanted to do at the time, so we just picked a name. After naming it, we kept changing directions, and after changing directions, we stopped using that name because Logenic felt too generic and lacked specificity. Later, we switched to a more focused name, but that new name hasn’t been publicized or disclosed.
I think there’s a saying by Lei Jun that I find quite insightful. He said that in the early stages of a startup, don’t make a big fuss about promoting the personal relationship between the entrepreneur and the company. Why? He said that when he founded Xiaomi, he had already succeeded in entrepreneurship with Kingsoft, and people’s expectations of him were very high. If he started again with MIUI, starting small, two problems might arise.
First, the company’s team might leverage his reputation for promotion, leading to two reactions when people hear about Lei Jun’s work. The first reaction is, “How could someone as impressive as Lei Jun come up with something as simple as MIUI?” The second reaction is, “Lei Jun’s work must be amazing, I’ll use it without thinking.” This way, people actually ignore whether the product itself is good or not and don’t care.
Additionally, he has many resources, so he might directly buy traffic, right? Many foundational model companies do this now, spending 15 or 20 yuan per user to buy traffic, and suddenly gaining 10 million users. But by the second month, 95% of those users are gone. I think this kind of thing is mostly a waste of money or convenient for fundraising, but not much else. So, we haven’t promoted this aspect ourselves. But I think this might just be my personal opinion, and it might not be good because most people still go for this rapid scaling approach, right?
Hunter Leslie: Yes, yes, yes, like Liu Xiaohu, right? Very aggressive. So, if this project is currently in a relatively confidential stage, if you look at the medium to long term, maybe three to five years, what problem do you want to solve with AI in this wave of entrepreneurship?
Li Bojie: Before talking about the problem I want to solve, I think I should first share my thoughts on what small-scale startups like ours should do.
This year, I think the biggest realization is that small-scale startups must make friends with foundational model companies, not enemies.
Foundational model companies are those that work on pre-trained foundational models, like OpenAI, Anthropic, or the domestic Liu Xiaohu you mentioned. These companies have enormous resources, with a single round of funding possibly exceeding $1 billion, allowing them to explore AGI. We know there’s a scaling law for models: the larger the model, the better its performance ceiling. So, these foundational model companies can explore AGI, while small companies like ours find it challenging to do so. To work on AGI now, $1 billion might not be enough; it might require $100 billion or $1 trillion. Such resources are clearly beyond the reach of entrepreneurs at our level. Competing with them on model capabilities is very risky.
Another situation is if I work on applications, but applications are built on top of existing models. This phenomenon is also risky because you often see a foundational model company, like OpenAI, release a new model, and suddenly a bunch of startups are wiped out. The problem is that you’re actually competing in the same track as them. Many people say that every time OpenAI holds a conference, a wave of startups dies. This shows that these companies are actually making enemies with foundational model companies.
But I think AI is still in the rapid ascent phase of the S-curve, and the capabilities of foundational models are rapidly improving. At this time, if I only do a little packaging on top of it, make some small engineering optimizations…
Hunter Leslie: Wrapping?
Li Bojie: It’s hard to have a moat because it gets replaced quickly. I have a deep understanding of this because I’ve made this mistake myself.
Last year, our team initially did a lot of fine-tuning work, one for voice fine-tuning and one for text fine-tuning. Voice fine-tuning meant we wanted to create celebrity voices, like Musk, Trump…
Hunter Leslie: Guo Degang
Li Bojie: For example, if I wanted to create a Guo Degang voice pack, I’d download a bunch of Guo Degang’s voice recordings and tune them until everything he said sounded like Guo Degang. But this required very high-quality voice downloads. If it’s Guo Degang, it’s fine because he’s a comedian. But if it’s Musk, who speaks hesitantly, and the YouTube video quality isn’t high, the downloaded voice quality isn’t clean, and it often crashes during training. After crashing, there are many corner cases that are hard to solve, so the final result wasn’t good.
How was this problem eventually solved? This year, new models emerged with zero-shot learning, meaning I only need to upload a one-minute voice clip, and it doesn’t matter if there’s background noise or stuttering. It doesn’t matter what kind of voice it is; it can mimic it, even stuttering.
Hunter Leslie: I saw ByteDance also released such a product, and it works well.
Li Bojie: Actually, some open-source ones work even better. For example, I like a company called Fish Speech, which is from Fish Audio, and it’s open-source. You can upload a one-minute voice clip, and it handles everything for you. Of course, it’s not perfect, and there’s still room for improvement, but it’s commercially usable.
That’s the first fine-tuning thing, which is essentially an improvement on foundational models, and many previous engineering optimizations are no longer needed. The second thing is text fine-tuning. At that time, we believed in one thing: I have a small model, and I do some fine-tuning. Fine-tuning means, for example, Trump speaks amusingly, right? So, I want to create a model that mimics Trump’s speech, and I gather a lot of Trump’s speech data, and it really mimics Trump’s style. But after achieving that, the problem is that the fine-tuned model often loses some of its original capabilities. For example, it can mimic Trump’s speech, but it might not even be able to solve a simple elementary math problem. There are many such issues that are hard to solve.
Also, the model from that year, like when we first started, we used LLaMA 1, the earliest open-source model. My co-founder at Berkeley worked on Vicuna, which was the first open-source dialogue model based on LLaMA. But because the foundational model’s capabilities were lacking, that model often started talking nonsense after 20 rounds of dialogue, not knowing what to say. This wasn’t something simple fine-tuning could solve.
But now, with the same cost, models at the 7B or 8B level don’t have this problem. Whether it’s domestic Qwen 2.5 or overseas LLaMA 3.1, or the newly released Yi Lightning, they don’t have this issue. So, with the improvement of foundational model capabilities, fine-tuning might not even be necessary. I just need to set the character, like putting Trump’s speech characteristics into a prompt, and using the best models now, like Yi Lightning, OpenAI’s latest GPT-4o, or Claude 3.5 Sonnet, they can all handle it without needing fine-tuning.
I just talked a lot, and it might have been a bit verbose. The point I wanted to make is that the fine-tuning and engineering changes we make can easily be overshadowed by the advancements in foundational models. This means that what we’re doing is essentially working against foundational model companies.
So, I’m thinking about how to become friends with foundational model companies. I currently have two ideas. The first is to optimize systems. Companies like OpenAI or Anthropic focus on improving algorithms within the model, but there are many things outside the model in a complete application that need improvement, which I’ll discuss in more detail later.
The second thing is that most people, including those in the AI industry, are not well-informed about AI models’ new developments, their boundaries, when to use which model, and how to write prompts. Bridging this information gap is also crucial.
Hunter Leslie: So, are you planning to focus mainly on these two directions in the future?
Li Bojie: Yes, I plan to focus mainly on these two directions in the future.
Specifically, the first thing is system optimization based on first principles.
From my undergraduate days tinkering with systems in the Linux Association to my Ph.D. research on high-performance data center systems at MSRA, where I was featured in a special report “Tinkering with Systems to Improve Performance by 10 Times,” to my work at Huawei on system performance optimization, I’ve developed a habit of thinking based on first principles. I consider what the application’s performance should be based on hardware capabilities, what it currently is, and the reasons for the gap. First principles thinking is not only used by Musk but also advocated by Google’s Jeff Dean.
When many people talk about system optimization, they think of AI training and inference optimization, which is highly competitive. Many believe that training and inference optimization mainly involves optimizing CUDA operators and designing new Attention algorithms or position encoding, but that’s not the case. There are many areas in the system worth optimizing beyond the model itself.
Recently, I was quite moved by something. We calculated that using H100 machines to serve a 70B model and sell APIs would result in a loss based on current market API pricing. Are these companies really doing business at a loss? In September, Berkeley’s community version vLLM 0.6 improved performance by 2.7 times. Why? In previous versions, only 38% of the time was spent on GPU computation, with the rest wasted on HTTP API servers and scheduling, including contention for Python’s GIL global lock. The 2.7 times performance improvement didn’t come from CUDA operators, Attention algorithms, or quantization, but from the seemingly insignificant HTTP server and scheduling. Friends from several companies told me they had already made these optimizations internally and had some that the vLLM community version hadn’t done yet, so their actual inference performance was much higher than the community version.
If we look beyond model inference and consider the entire AI application end-to-end, we’ll find many critical points for AI applications that no one is addressing.
For example, the latency issue with API calls is crucial for real-time interactive applications, but many large models and speech synthesis, and recognition APIs have high first-word latency (TTFT). For instance, a speech synthesis open-source project I like, Fish Speech, has a server on the East Coast, and the first-word latency is only 200 milliseconds when called from the East Coast, but it takes over 600 milliseconds from the West Coast. Pinging from the West Coast to the East Coast takes only 75 milliseconds, and with fast data center speeds, theoretically, the 500KB synthesized speech should take only 5 milliseconds to transmit. So why is the theoretical latency not 300 milliseconds but 600 milliseconds? One reason is TCP’s slow start, and another is the multiple handshakes during connection establishment. These are fundamental things in wide-area network optimization, and even though optimizing them might not result in great papers, neither Google nor Cloudflare has done it, nor have I seen anyone else do it.
OpenAI API is similar, with a first-word latency of only 400 milliseconds on the West Coast but up to 1 second in Asia. We know the network latency between the West Coast and Asia is only 200 milliseconds, so where did the remaining 400 milliseconds go? It’s still the overhead of connection establishment. OpenAI uses CloudFlare’s services, and the Asian access point is actually CloudFlare’s Asian access point IP, indicating that even a large service like CloudFlare has significant optimization space. And is this API latency unique to AI? No, other APIs are the same. This issue has existed since the internet’s inception, for decades, with known solutions in academia, and many large companies quietly using them, but most people are unaware.
Another example I want to share is voice calls. When we first made a demo of voice calls at the end of last year, the latency was as high as 5 seconds. ChatGPT’s earliest voice call feature also had this level of latency. We analyzed what was slow, such as AI algorithms, the theoretical inference time based on model computation and memory access, compared with GPU performance metrics, and found that the actual time was more than 10 times longer, indicating over 10 times optimization space. We also looked at how much time was wasted on network protocols, database access, and client-side, gradually reducing it from 5 seconds to 2.5 seconds, 2 seconds, 1.5 seconds, 1 second, 750 milliseconds, until now we can achieve end-to-end 500-600 milliseconds, faster than any other I’ve seen, and the entire system can run on a single 4090.
Moreover, because we used the latest open-source model in the voice cloning field, AI can mimic anyone’s voice. I can use any favorite person or game character as a voice pack, and even bring Trump and Musk into a group chat with me. This is something OpenAI absolutely can’t do. It’s not that OpenAI can’t do voice cloning, but its scale is too large, and the copyright risk is too high. GPT-4o was sued just because its voice was similar to the female lead in “Her.” If all celebrity voices could be freely mimicked, OpenAI would definitely get into trouble.
Such an optimized voice call that can mimic anyone’s voice costs less than 3 cents per hour, while OpenAI’s latest realtime API costs 120 yuan per hour, with only a few fixed voices and higher end-to-end latency than ours. If you use OpenAI’s speech recognition, large model, and speech synthesis to assemble a system, it would cost 6 yuan per hour, with end-to-end latency definitely over 2 seconds. After using a 500-millisecond voice call, going back to a 2-second one feels like it’s broken. What does 3 cents per hour mean? Watching an hour of high-definition video on Bilibili might cost more than 3 cents in bandwidth. We can look at Tencent Cloud and Agora’s RTC service pricing; even WeChat voice call services sold externally are priced close to 3 cents per hour. This means that the cost of large models is no longer an issue, and many applications no longer need users to pay for subscriptions; they can use the internet model.
I think many people, especially those working on algorithms, only care about model performance metrics but not system performance metrics. But in real business scenarios, performance metrics are often crucial. For example, many companies providing large model API services don’t pay attention to TTFT (first-word latency), but it’s critical for AI Agents. For instance, Claude 3.5’s Computer Use is still relatively slow, and other RPA Agents that fully use large models to operate phones and computers take four to five seconds to respond, which is a latency issue. Embodied intelligence robots now find it difficult to use large models directly for control, also due to latency. If an application only calls a large model once end-to-end and does nothing else, a one-second first-word latency might be acceptable. But in actual scenarios, it’s all about Agentic workflow, and with such high single-call latency, the total adds up to be very slow.
What is Agentic workflow? A simple example is AI search that everyone is working on. Perplexity can produce results in 1 second, but most AI search applications take 4-5 seconds. Why so slow? Because I need to convert the user’s question into several search keywords. If I directly use a long sentence to search in Google, the results are often not good. This requires calling a large model, like OpenAI’s GPT-4o mini, which takes five to six hundred milliseconds. Then I call Google Search, which is fast at 0.2 seconds, but Google Search API has a limit of 10,000 calls per day, and third-party calls take three to four seconds. Google Search only returns links and content summaries, so I need to download the entire webpage, which takes 2 seconds for five or six pages. Finally, I call the large model to output the answer, with a long context and a first-word latency of one second, totaling 4 seconds.
How to achieve Perplexity’s under 1 second? First, deploying the model locally speeds up keyword extraction and final answer output. The main delay is downloading webpages, which are just HTML files. Why does downloading take so long? This is entirely a network optimization issue, and for most websites, 0.5 seconds is enough. This way, the overall time can be under 1 second. Of course, Perplexity might have used caching, not necessarily done the way I described, but I want to say it can be done without caching. The slow third-party Google Search API is actually due to their crawler not being optimized. If optimized, it wouldn’t be slower than directly calling the API. So, after saying so much, do you see any AI-related things here? It’s all about system optimization.
The second thing is bridging the information gap.
I find that many things I consider common knowledge in the industry are unknown to some people in the same industry. For example, some people think Anthropic was the first to make AI operate computers, but AI operating computers and phones has been around for years, with many RPA solutions. Anthropic just improved a 7.9% benchmark to 14.9%, and today this benchmark has been surpassed, while humans are at 75%.
There’s also the issue of this year’s Nobel Prize in Physics. Many people ask, what does the Boltzmann machine have to do with large models? When GPT-4o realtime API was released, many people tried it and said AI could finally make calls. I reminded them to be careful not to blow up their bills. Some people don’t know that Claude 3.5 Sonnet is currently the most capable model for general programming.
AI can be very helpful in bridging the information gap. We know that previously people searched for information, such as through websites and search engines. Then came the era of information finding people, like recommendation systems. Now AI can generate unique information for each individual. After interacting with you more, AI understands your knowledge boundaries, which significantly increases the efficiency of recommendations.
Speaking of bridging the information gap, I remember an interesting project I participated in during school, the USTC Course Review Community, which is a website where students review courses. The reason for its creation was that my girlfriend at the time didn’t know which course to choose and had no idea where to find relevant information. So, she wanted to create a website for students to share their course experiences. She pulled me and my roommate in to develop this website, and now it has hundreds of thousands of visitors every month. Some students even said that after transferring from USTC to other schools, they didn’t know how to choose courses without such a review community. This is an example of bridging the information gap.
One regret I have is that much of the work I did before was not published, or even if it was published in papers, it wasn’t open-sourced. For example, the core idea in FlashAttention, which we actually published as the AKG automatic operator generator at PLDI 2021, is about automatic operator fusion and loop tiling, balancing between recomputation and storing intermediate results. I remember in 2019 when I first joined Huawei, I was responsible for writing an operator, specifically softmax. To fuse the softmax operator, an online algorithm was needed. After much searching, I finally found a paper published by NVIDIA in 2018, which proposed an online algorithm that could compute softmax by scanning the data only once. With this algorithm, combined with the AKG framework, it was possible to fuse the preceding matrix multiplication with the subsequent softmax. However, at that time, I didn’t even know what Attention was, so I couldn’t have proposed FlashAttention. But if AKG had been open-sourced back then and used by the community, FlashAttention might have been invented earlier.
Similarly, with RPC, I actually developed a very high-performance RPC framework at Huawei. We mentioned earlier the issue of high API call latency between the West and East coasts of the US, and the framework I developed could solve this. But this work wasn’t worth publishing in a paper because the technologies used were already proposed by academia. It’s just that there wasn’t a good engineering implementation available. Perhaps many big companies have a lot of black technology internally for optimization, which means these technologies are locked away, creating an information gap.
During my PhD, I also worked on projects like ClickNP, a framework for developing network functions on FPGAs using high-level languages. It was just a paper at the time and wasn’t open-sourced. If it had been open-sourced, programming network functions on FPGAs in academia might have been much simpler. ClickNP was mainly used for research purposes at Microsoft, and after I stopped FPGA research, it was likely locked away. I believe this is a waste of valuable intellectual resources. To this day, there isn’t an open-source framework like ClickNP that supports developing network functions on FPGAs using high-level languages. Students in academia either have to use difficult-to-write Verilog or general HLS tools, without a framework optimized for network programming. If it had been open-sourced, many people could have continued to use and improve it, even if one day I no longer contributed to the project, as long as it wasn’t obsolete, others could continue to maintain it.
I think many information gaps exist because experts consider things too obvious to explain in detail, but most people don’t understand. For example, in the Transformer paper, there is a small footnote hinting at the KV Cache concept. The authors might have thought it was obvious, but for most readers, it wasn’t, so KV Cache had to be reinvented.
Some information gaps exist because most people’s attention is limited. If an article covers too many topics, things in the corners go unnoticed. Leslie Lamport, a Turing Award winner from Microsoft Research and a pioneer in distributed systems, told us that his most famous paper introduced the concept of relative space-time from relativity into computer distributed systems, proposing the concept of logical clocks. He mentioned in the paper that using such relative clocks could determine the order of all input messages, enabling the implementation of any state machine and distributed system. However, many people told him they never saw the state machine part in the paper, making him doubt his memory. I think it’s because the concept of logical clocks is already mind-bending, and most readers felt they learned a lot just by understanding it, so they didn’t pay much attention to the state machine part.
Since most people’s attention is limited, it reminds us that whether writing papers or developing products, the focus must be sharp enough to be explained in one sentence. Otherwise, good things hidden in the corners are hard to discover. A product manager told me that several popular apps recently have features that ByteDance’s buttons already have, but there are too many agents in the buttons, so users don’t know what to do at a glance, resulting in less traffic than a single popular app.
I have another thought: bridging the information gap actually goes against the essence of business, which relies on information gaps to create a moat. Moreover, most people are lazy to think and unwilling to learn new knowledge, so bridging the information gap is inherently painful for most people. Therefore, many companies initially dedicated to bridging the information gap start with high-quality content, but after reaching a certain scale, they become vulgar and turn into time-wasting platforms. If I had to choose between founding a Wikipedia and a TikTok, I would definitely choose Wikipedia.
Hunter Leslie: I have a question because I talked to some entrepreneurial classmates before, and I find it difficult because large models iterate very quickly, covering many capabilities. Today, you have to find a direction within this, which I think is inherently challenging. How do I find a direction in such a large demand space, like you clearly say you want to do this and that? So, are there any good methods you think, for example, if I’m starting a business now, I definitely have to think about this issue, right? Are there any good methods to pinpoint something that might work?
Li Bojie: Are you saying that I first set a big framework saying I want to start a business, and now, okay, start looking for where I want to do AI, right?
Hunter Leslie: But I don’t want to be wiped out by OpenAI, so what should I do? Is there a methodology for this demand issue, do you think?
Li Bojie: At the beginning, I didn’t know what to do either, and I explored slowly, adjusting directions back and forth, stepping on many pitfalls. But later, I talked to some experienced people in entrepreneurial circles and watched interviews with outstanding individuals like Zuckerberg and Lei Jun. I found that they might advocate for the so-called “20% time entrepreneurship” concept. What does it mean? It means they might first use 20% of their time part-time to do something, and then find that this part-time project is very popular with users, and then they build on this project to make it bigger, turning it into a commercial project. This has a higher chance of success. Recently, for example, Google’s hottest product, NotebookLM, which is a podcast product, was a Google 20% project. But Google spent so much money and so many people on so many products, but none of them…
Hunter Leslie: Made a comeback, right?
Li Bojie: Yes, made a comeback, and this was a 20% project.
This fundamentally raises a question, why is it difficult for big companies to innovate? We all know the difference between top-down innovation and bottom-up innovation.
Once I say I want to start a business, okay, now there’s money, people, everything is set up for you, and you have to find a nail to hit quickly, and it must be able to scale, right? At this point, sometimes actions become distorted, like I don’t know what to do, or I think something is too small to be worth doing because many things potentially have a lot of market potential or many users, but at first, you don’t know there is so much demand. So, maybe at the beginning of the discussion, you think this thing probably won’t be used by many people, so you dismiss it.
At this point, it’s easy to converge on some common needs, which are those common needs that everyone can see. In the end, it’s very likely to compete directly with big companies, like wanting to make a better ChatGPT, or a better Siri, or a better foundational model, right? Basically, it’s about tossing around a few things, which is definitely the track for direct competition with big companies.
Hunter Leslie: For example, if I’m a researcher at DeepMind today and I say I want to start a business, I understand that the initial direction might be my own interest, or I have a pain point, so I start doing this. It’s a bit like lean startup, where I spend some time making an MVP to run, see how it goes, and if it doesn’t work, I tweak it, and in the future, I might get closer to what I want to do, but I don’t necessarily invest fully from the start, right? So, it has a process, not like someone stands up and says I want to make Apple and just does it.
Li Bojie: Yes, that logic is correct because you see Facebook was also made when he was in school, right? And initially, it was about which student’s photo looked better, right? And Google wasn’t specifically started as a business; they first made a search engine algorithm in school, right? Then Larry Page and others took it out to start a business. Many other companies are similar.
Unless there’s a model like Copy To China, which can be done. It means it’s already there, right? Then I take it over, copy it, I have money to spend, and buy users and traffic. I think this can be done.
Hunter Leslie: Do you think this is feasible now? Because you see Meituan and Alibaba’s e-commerce, which originally copied eBay, during the mobile internet era, it seemed possible. Do you think today, for example, in the recruitment industry, which is quite hot recently with companies like Mercor valued at $250 million, and Final Run, if I directly copy them and do it in China, what do you think?
Li Bojie: I think this logic is still valid today. You say all the other foundational model companies are copying OpenAI, right? Not just in China, you say Anthropic was also made by people from OpenAI, right? So, OpenAI is the pioneer, and everyone else is chasing after them. But now, Anthropic seems to be running possibly faster than OpenAI, right? It’s hard to say, and you can’t say that those running behind are necessarily followers, right?
Hunter Leslie: If we were to replicate this in China, for example, with AI, what kind of adaptations do you think would be necessary? Because China and the U.S. are different, and user habits are also different. For instance, 2C is different from 2B. If I decide to do something for 2C, I might need to make some micro-innovations, and the domestic models are far behind those overseas. If I want to do this, it feels like it would just be that way. The environments abroad and at home are different. One is payment, and the other is that their models are inherently much stronger than ours. Because essentially, I think the underlying model of an AI product is almost your lower limit. So it feels like it’s hard to create something that surpasses a similar application in Silicon Valley, that’s how I feel.
Li Bojie: Look at Baidu; its search results haven’t surpassed Google, right? But Baidu is still doing well in China, right?
So, I think the first point is that the model is sufficient in many scenarios. This also involves my judgment about the future. I think a GPT-4 level model, as long as its cost decreases a bit more—there’s already a trend of rapid decline—will be sufficient for most application scenarios. Because now, for example, GPT-4 is somewhat like a liberal arts student, right? This liberal arts student’s abilities are generally very useful for normal writing and daily tasks.
But many people haven’t used it yet, and I think there are two main reasons: The first is that its cost is still relatively high, so in many places, it can only use a paywall to block users. For example, you can only use the best model if you pay; if you don’t pay, you can’t afford it. And the second is that most users haven’t developed the habit, just like when the iPhone 1 first came out, maybe those who liked electronic products thought it was amazing, but most people still said their Nokia was better, right?
Hunter Leslie: There’s a first adopter issue, a problem of cultivating user habits.
Li Bojie: If its cost really drops to a very low level, and users can use it freely, then I think a GPT-4 level model is actually sufficient for most daily scenarios. In this way, most of the basic model companies in China have already reached this level. The next step is to figure out how to reduce the cost to a very low level.
Another path mentioned is moving towards AGI. I personally agree with the view of Anthropic’s CEO. He recently published a long article, and in this article, he mentioned that future AGI models will be very large and possibly very expensive. But such AGI might not be for ordinary people to use; instead, there will be millions of these super intelligences, which are smarter and more talented than any human, forming a genius nation.
This so-called genius nation is used to solve the most important problems in human science, such as medicine, social science, natural science, biology—many of these require a lot of experiments, but human efficiency in conducting these experiments is very low. So, progress in fields like medicine, natural science, and biology is very slow. But if there is AI, it can replicate many copies, equivalent to having millions of top scientists doing research for you every day. So, the scientific progress of the next 50 to 100 years could be shortened to 5 to 10 years. Once AGI is achieved, it is expected to reach AGI within 5 years, and then in another 5 to 10 years, it will give us 50 to 100 years of scientific progress.
Hunter Leslie: So, he expects that in 10 to 20 years, the average human lifespan could be expected to reach 150 years.
Li Bojie: You probably saw that too, right? Of course, that might be a bit optimistic, but I also agree with this direction. I think those models, the very impressive ones, will be very expensive. So, they are used for these high-end scientific research scenarios.
Hunter Leslie: And the next question is, because you just mentioned that some new models are constantly iterating. And then, the application still relies heavily on the underlying model. Recently, we’ve also seen that whether it’s OpenAI due to shareholder pressure or for financing, they released o1, and recently there was a leak that there might be Orion in December, which is said to have 100 times the performance of GPT-4. Including what you just mentioned about what Anthropic’s CEO said, this might be because it’s very complex, requires financing, and needs to stabilize the team, so what Anthropic’s CEO said might also be for similar reasons, needing external publicity to gain more attention and resources. Do you think this is a normal thing, or is there some hype involved?
Li Bojie: I think you have a point; there is definitely some purpose for financing. OpenAI’s usual approach is to hold back a big move and release it all at once when everything is ready, but you can see that the recently released real-time voice API and o1 have a feeling of unfinished research work. These are actually because it might need to raise a large amount of money, so it has to do this. Including Anthropic’s CEO, who has always had a relatively pessimistic attitude, calling for AI to be safe. He left OpenAI to start Anthropic with the intention of taking things slowly. But why has he suddenly changed to a more optimistic tone now? It must be to raise money.
I think there is indeed a fundamental problem, which is that developing AGI requires a lot of funds. This is also why I don’t want to compete head-on with OpenAI and Anthropic, because now OpenAI has used tens of billions of dollars, which is far from enough for AGI. Many think tank analysis reports show that from GPT-2 to GPT-4, computing power may have increased by 1,000 to 10,000 times. If it comes to AGI, it may require the same level of improvement.
GPT-4 has already used 100,000 chips, and increasing it by 1,000 times means 100 million chips or 1 billion chips. Now, the world’s chip manufacturing capacity and energy capacity, if 100 million chips are needed, the energy consumed would already exceed the total of all data centers worldwide. But we know that human energy is basically growing linearly, and there has been no progress in controlled nuclear fusion for decades, so it’s hard to expect energy to suddenly increase tenfold in 5 years. So, limited energy and chip manufacturing capacity are major challenges facing AGI.
With just this much energy, and including the chip manufacturing capacity, chip factories also need to be gradually improved, which is also very difficult. So, I think this capacity might reach its limit at about 1,000 times growth. At this 1,000 times point, can I still train AGI? And this means needing 1,000 times the money; it was originally 1 billion dollars, now it’s 1 trillion dollars. This 1 trillion dollars, including energy and chip expenses, requires raising a lot of money. So, what you just mentioned about O1, it definitely has this purpose, including needing to make the whole society realize that this is very important, and it has the potential to become a very important thing.
Talking about O1, I’ll say a few more words. My personal view on O1 is that I think it’s a very big breakthrough, and many in the industry also say it has opened a new paradigm.
The first is reinforcement learning, which can greatly supplement the lack of training data, because we previously said that GPT-4 level models have basically used up almost all high-quality text data in human society. So, what about new data? If you let it generate freely, it can only be “garbage in, garbage out,” right? That’s very difficult. So, what to do? O1’s method is reinforcement learning, using the self-play method from AlphaGo’s training method. It uses mathematics and programming, which have clear right and wrong answers, because you need to know the reward function, whether it’s right or wrong. Mathematics and programming are easy to judge, so I can let it generate endless training data, called post-training, but post-training might have more training data than pre-training. In this way, its data can be infinitely expanded.
The second thing is test time scaling, which means it can use more slow-thinking time during inference to do this, which is also very crucial. For example, if you give me a math problem and ask me to answer it within a second, it’s very difficult, I can’t do it, right? But because a token actually carries computational power, its thinking time is limited. If I give it more thinking time and let it write out the intermediate thinking process step by step, then first, its accuracy might improve a lot.
For example, previously, the model couldn’t always figure out whether 3.8 or 3.11 was larger, mainly because if you give me two large numbers and ask me to compare them within a second, I might easily make mistakes because that’s intuition. But if you give me more time, I actually have a methodology to compare them digit by digit, right? Now, OpenAI’s O1 actually compares them digit by digit, explicitly writing this logic into the RL process and the test time process. So, by following this methodology, it won’t make mistakes.
I think this is very crucial because the problem of AI making mistakes is a key factor that prevents it from being used normally in large-scale commercial applications. For example, in many commercial scenarios, like some 2B cases, some banks come to us and ask if we can use large models to do accounting. I say this can’t be done now because if you make a mistake in accounting, it’s a big problem, and the current large model’s accuracy is at most 90%, which is not enough, much lower than the accuracy people want.
Secondly, some are more complex. I’ve been working on agents, trying to make them perform slightly more complex actions, like doing something step by step, where each step has a 90% success rate, but after 10 steps, it’s only 10%. But if each step has a 99.9% success rate, then after 10 steps, the success rate might be 99%. It’s an exponential accumulation process. So, the single-step accuracy must be high enough, at least higher than humans, for it to be useful. So, I think if O1 continues in this direction, it solves a very crucial point about whether AI can be used in serious commercial scenarios, these high-value, high-added-value commercial scenarios.
Hunter Leslie: So, about this COT thing, do you think it might be a completely new paradigm different from the previous Next Token Prediction, right?
Li Bojie: It’s still Next Token Prediction.
Hunter Leslie: You think it’s still Next Token Prediction?
Li Bojie: Yes, it definitely is, because it’s still predicting one token at a time, just that it writes out the thought process. I remember there’s a line in “Sapiens” that says human thinking is conducted through language. Actually, that’s what COT is about; COT is about writing out the human thought process in a linguistic form. Of course, the language doesn’t have to be English or Chinese; it could be some kind of intermediate language, but the idea is to use language to express it, so the thought process is essentially data.
Hunter Leslie: Right, I suddenly thought of a question because, you know, it seems like there’s a consensus now, right, that by 2029 or 2030, AGI might be ten thousand times stronger than human intelligence. If there are some, let’s call them black swan events, what we’re discussing today might not be correct, or it might not be correct. Maybe by 2029, we can talk again to see if our predictions were right or wrong. Do you think there might be some events that could prevent this from happening, like in Huawei’s red and blue teams doing confrontations, I just imagine this thing can’t happen. What do you think could prevent AGI from being developed?
Li Bojie: I think there are many reasons that could lead to this. For example, the first reason might be that the so-called scaling law hits a bottleneck after scaling to a certain point, which is possible, right? We know GPT2 to GPT3 to 4 can scale, but can it scale to 5? No one knows, including possibly some difficulties within OpenAI. If they had GPT5 ready, they wouldn’t need to use O1 as a placeholder, right? So, it must be that 5 hasn’t been trained to the level they want. Although they surely have many interesting internal developments, they might feel it’s not convenient to disclose them to the public yet. So, this indicates that there are indeed challenges.
Then the second thing is, for example, even if the scaling law holds and continues to grow, before reaching the AGI level, human electrical energy or chip production capacity might be exhausted, meaning that even if all of humanity’s production capacity is concentrated, AGI might not be achievable. We can’t cover the entire Earth’s surface with solar panels, right?
The third possibility is that investors have lost confidence, because after all, this isn’t a matter of life and death for humanity. People like me, who are very radical e/acc advocates, are still relatively few; most people are pragmatic. If investors see no profit after five years, they might stop investing, right? Because a large part of humanity is pragmatic, if they can’t see short-term returns, they might stop further investment, which is another possibility.
The fourth possibility is geopolitical factors, because AI has a significant potential threat to humanity, which is why people like Ilya often bring it up. Could it be seen as something akin to nuclear weapons, where it truly poses a threat to humanity because it reaches a level of intelligence comparable to humans, meaning it can autonomously control many things, and if not managed well, it could directly wipe out humanity, right? Could governments or other organizations restrict its development?
I think these four points could potentially prevent AI from reaching AGI. But I hope these four points don’t happen, which is also the consensus of the entire industry.
Hunter Leslie: Since you’re in R&D, I understand that you might focus more on engineering than algorithms. And maybe now you’re hiring for your own company, building your own team. I’m curious, since you’ve been at Huawei, right? In a large system, at this point in time, how do you define an R&D engineer or an R&D position? What do you think makes a good R&D person or someone with the right capabilities? When you were at Huawei or now that you’re starting your own business, has your definition of competency models changed?
Li Bojie: I think it’s like this, firstly, a fundamental capability that’s crucial in both large companies and startups is having a solid foundation in computer science. They need to have a thorough understanding of basic computer system concepts and the basic components of each model or system, like operating systems, databases, etc. They need to have a clear understanding of what each can and cannot do. This is important everywhere. They don’t necessarily need to have published many papers, right? They might have worked on enough projects and have engineering experience, which is okay.
Secondly, I think this is where large companies and startups differ. In large companies, they might be like a cog in a machine, focusing on their specific work. They don’t need strong learning abilities to stay there. Of course, if their learning ability isn’t strong, their growth potential might be limited.
In startups, I think having strong learning abilities is crucial because startups change very quickly. Every startup in its early stages might frequently pivot to change products, so if I hire someone for NLP, for example, and then switch to CV, and they don’t know any CV algorithms, they might struggle to adapt.
Hunter Leslie: So, they need strong abilities to draw analogies and learn new things.
Li Bojie: Another point is that compared to large companies, startups require an additional ability, which is strong self-motivation. They need to know what they want to do and should do without much management or external pressure and KPI pressure, and complete their work with high quality.
I think I’ve suffered a bit in this area myself because Huawei’s management system is very comprehensive, and when I moved to a startup, I had to handle everything myself. I thought hiring a programmer who could work was enough, and I just needed to manage them a bit. But I found that there were many management issues in the company because Huawei has a large system. On the surface, I manage people and evaluate their performance, but besides that, the company has a whole set of assessment systems, attendance systems, and a complete HR system, right? And a whole company culture, right? Everyone knows when to start and finish work, and everyone stays late, etc., and some implicit things. Everyone follows the company’s norms, and if another team does it this way, I should too.
But in a startup, forming this culture is relatively difficult. I have to build up a company culture from scratch, and everyone needs to be committed and self-motivated to do this. If someone is there just for the salary, even if they’re highly capable, a startup might not want them because it would lead to very high management costs.
Hunter Leslie: So, it’s hard for startups to train someone from the start; screening becomes the most important thing.
Li Bojie: Yes, I think so. Many large companies also say, I remember ByteDance saying something like, people can’t be trained, only screened, right?
Hunter Leslie: ByteDance said that, I think.
Li Bojie: I think many large companies are similar. Of course, campus recruitment might still lean towards training because they’re fresh graduates, right? Including our genius youth program at Huawei, they have a special training program because they know you don’t know much at the start, with no engineering experience, so they won’t give you a large team of dozens of people at first. You’ll start as an individual contributor, familiarizing yourself with the company’s processes, systems, culture, and engineering experience. When I joined, it took about six months before they let me lead a small team of four or five people. Then I transitioned to a new project with ten people, and by the time I left, I was directly or indirectly leading a team of twenty-five people.
It’s a gradual process of development, allowing you to improve step by step. For example, from being an individual contributor to a project leader, which is one level, and then to a leader of leaders, which is a new challenge because you need to do indirect management, right? Of course, I haven’t reached a higher level of three layers yet, which might have new challenges. So, I think this is a very challenging thing that requires development. Of course, in a growing company, like a startup, everyone might go through this process.
Hunter Leslie: Yes, you mentioned three abilities: a solid foundation and quality in your profession, which is necessary; secondly, the ability to learn and draw analogies; and thirdly, self-motivation and vision. This might sound simple, but it’s actually a high standard, right? Very high, very high, and there are very few people who meet these requirements. So, I think it might be a problem that today, when I want to start a business, everyone hopes for such people to come, but where do you find such people? Right? I’m not in the Bay Area, where everyone wants to change the world and is confident. If you’re in China, do you think there are some good methods or reliable channels?
Li Bojie: If it’s about myself, I can’t say I’ve found them because I haven’t found many people who meet all three criteria, right? But I’ve listened to many interviews with big names, and I feel like they speak the truth, saying that people who meet these criteria are inherently rare, which is why the failure rate of startups is 99%. For a startup team to succeed, all conditions must be met, including the team, resources, the direction of the startup, and timing, all of which are crucial. So, the success rate of startups is inherently low, and there’s no way to force it.
How can we improve its success rate? I think it might be what was mentioned earlier (the 20% project), having something that looks like it’s taking shape, with a definite user demand, many people liking it, and it’s already on a fast growth path where everyone can see its potential. At this point, if I call out to start a business, the possibility of attracting like-minded people is relatively high. If I’m just a blank slate with only two pages of PPT, how can others believe that those two pages of PPT can eventually turn into a $100 billion company, right? That’s very difficult, right? So it’s hard to attract reliable people at that point.
Hunter Leslie: So it’s still necessary to let them see, or maybe your earliest partners are people you know well and trust, who understand your level.
Hunter Leslie: We’ve seen Cursor, OpenAI, and others releasing programming tools. There might be a few million engineers globally now. And then AI comes along, and many tasks are done by Agents. So for engineers, what are the things that change and don’t change about their roles? I think this is something everyone is concerned about, with daily fears about where their jobs are going, especially with big companies laying off staff.
Li Bojie: The things that change and don’t change in this context, I think this is a good question. It’s about whether we need to worry about unemployment after AI arrives, right?
I personally feel there’s never a need to worry about unemployment. Why? AI enhances your efficiency, which inevitably brings more demand. For example, over the past few decades, from writing assembly to C, then C++, and now Python, Java, etc., every technological advancement brings new demands, so there’s never a need to worry about unemployment. Instead, there are more programmers, and the entire IT industry is becoming more developed. Why? Originally, when I was writing assembly, only the military could afford to pay for programming, so its application range was very narrow. Later, with C, we could develop system software, like foundational companies such as Microsoft, Apple, and Google. Then came Java, PHP, Python, right? These languages now allow a small business owner to hire a few people and turn their idea into reality.
But not all ideas can become reality now. There’s a joke that says, just missing one programmer, right? Programmers are scarce because there are many ideas, but finding programmers costs money, so developing an app requires high costs. For example, an app, if developed line by line 10 years ago, might have required a $1 million development cost. But now, I might hire a few people, and if they are proficient in using AI-assisted programming tools like Cursor, it might only cost $100,000.
Some independent developers, if they are highly skilled, might handle it all by themselves without needing to hire programmers. I think this might happen soon, within two years, where some strong product managers, if they can clearly articulate their needs, might not need programmers at all. They could just tell AI what they want, and AI could handle everything for them. At that point, everyone would focus their energy on thinking about needs and what they want to do, rather than spending time on the minutiae of implementation.
Sam Altman said that in the future, there will be billion-dollar companies run by just one person, and I think that’s entirely possible. A person strong in both business and technology, if they can leverage the latest AI technology, could achieve what used to require a company of dozens of people. Look at the hottest AI companies in Silicon Valley now; when they reached a billion-dollar valuation, they only had about a dozen people. These companies are using AI extensively internally, and their efficiency is completely different from traditional companies. For example, when I have meetings with them, they often have an AI Notetaker in their Zoom meetings, automatically taking meeting notes. AI meeting notes aren’t rocket science anymore; even Tencent Meeting can do it officially now, but most companies just haven’t adopted it.
Therefore, as long as programmers are good at using AI and learning new technologies, they will never be unemployed. AI will definitely expand the market.
Another point is, for those who were doing foundational work, like system infrastructure optimization, will they be unemployed? I don’t think so. No matter when, these foundational optimizations remain a high-tech field that AI finds very difficult to replace. Even now, assembly language programmers haven’t been replaced because every compiler or operating system has some core high-performance code that interacts with hardware, which must be written in assembly and can’t be replaced. So it always has its application value.
Hunter Leslie: As an outsider, assembly language isn’t the same system as Python or Java, right?
Li Bojie: It’s a very low-level language, where you have to tell it, for example, to move something from register A to register B, then add the values of register A and B and put them in register C. In a computer, there are just those eight registers, and you have to load four bytes from one memory address to another, which is very low-level. So if you want to use this to develop an Android app, imagine how much work it would take, right? How many memory addresses would you have to manipulate just to draw an interface? It’s very difficult to program.
Hunter Leslie: So current AI programming tools like Copilot can’t handle assembly?
Li Bojie: It can write it, but it might not optimize it as well as professional optimization personnel. It can write some basic assembly. For example, if I want to write an operating system, I can use AI to help, but it definitely can’t optimize it as well as Linux. AI can also help develop a website, using some design templates, but it definitely can’t create something as smooth as TikTok.
Hunter Leslie: So essentially, as an engineer or architect, the unchanging part is still thinking about the product and the user. But the changing part might be that what used to take many person-days to accomplish can now be done much faster. And you have to use this technology because if you don’t, it’s hard to keep up; you might be eliminated.
Li Bojie: Yes, I think so. Many daily tasks are actually spent on these details. For example, daily tasks like filling out forms, submitting expenses, collecting invoices one by one, right? Now AI can basically handle these tasks. For programmers, it might be writing so-called glue code, right? The front end provides an interface document, and the back end has to implement each interface in the document, which is just CRUD operations, right? User CRUD, content CRUD, right? These things consume a lot of daily development energy. But these tasks can be completely replaced by AI.
If someone doesn’t use AI at all, their development efficiency might be at least half as slow as others. At least for myself, using AI has definitely doubled my development capability compared to not using it.
Hunter Leslie: So is there a saying that if you truly embrace AI, as a developer, how should you allocate your energy? For example, should you spend as much time writing code as before, or more time researching products and users? What’s your view on this? What might the ratio look like?
Li Bojie: I think for a programmer who doesn’t want to transition into a product manager role, there’s no need to focus too much on products and users too early. I just need to focus on how to use AI to quickly complete the tasks assigned by my boss, which is very important. For example, if my boss assigns a task to implement a new page, it might have taken a week before, but now with AI, I can finish it in a day. This greatly improves efficiency. With the increased efficiency, the remaining time can be used for rest or, as you mentioned, to think about product and user-related issues, thereby enhancing one’s understanding in other areas.
Hunter Leslie: So the main focus is still on completing engineering tasks, right?
Li Bojie: My feeling is that with the current capabilities of AI, it essentially frees up your time. Once your time is freed up, you don’t need to do those “manual labor” tasks, and you can think and do more valuable and meaningful things.
For programmers, what are more valuable and meaningful things? This is actually a matter of perspective. I prefer the concept of Google’s 20% project. I think, in China, because of the busy 996 work culture, most programmers don’t have time for 20% projects, so the overall innovation capability is relatively low. But in Silicon Valley, one good thing is that many programmers have enough time to work on part-time projects, and many companies there are relatively tolerant of these activities. Like Google, the 20% project is deeply embedded in the company culture.
I believe a lot of innovation actually comes from the bottom up, arising from difficult problems encountered in daily life and work, and then creating a project to solve these problems. If the solution is clever, you enjoy using it, and the problem itself has enough promotional value, meaning many others have the same need, then it’s a good product with PMF that can be launched. But I think this is hard to plan entirely from the top down. So that’s why the 20% project is meaningful. I think if programmers use AI in the future, reducing work hours and shortening the time to complete existing engineering tasks, they can spend more time on projects they are interested in. And with AI assistance, one person might be able to create an MVP using AI, without necessarily hiring a front-end, back-end, or designer.
But can we change the 20% time to 20% of people, allowing 20% of employees to focus on innovation instead of product development? Many big companies’ AI Labs do this, but few succeed. Why? I think a fundamental reason is that 20% project innovation is about solving problems encountered in daily life and work, not just brainstorming ideas. If 20% of people sit in an office thinking about innovation, the ideas they come up with might only be suitable for publishing papers, with no real demand.
Hunter Leslie: So, spending time on some experiments is meaningful. So, if I’m doing R&D at a big company now and AI comes along, I need to use it, which might involve a process of learning, adapting, and transforming. Do you have any suggestions? Because my own feeling is, for example, I’m already 40 years old this year, and I might not have that much motivation or I just don’t want to learn new things. But some colleagues might want to transform, but they might not have particularly strong motivation to learn new things, and some classmates might want to switch but might not have particularly good ideas. Since you happen to be starting a business, what advice would you give to everyone? How to embrace AI more quickly and let AI empower oneself, what methods or ideas do you think you can share?
Li Bojie: My own suggestion is to first look at how others are using AI well. For example, online, like your Podcast, I guess there are many, right, teaching people how to use AI, see how others can get a decent little game done in half an hour, right? Then you know that AI should be used this way.
Actually, I’m like this myself. Until last year, I always liked using ChatGPT. I would encounter a problem, like wanting it to write a piece of code, then I would type it into ChatGPT, then copy the code out and paste it into IDEs like PyCharm, but this way is quite inefficient.
So when did I make this change? It was when Cursor became quite popular recently, basically around April or May this year, I started using Cursor intensively because there was a very strong model called Claude 3.5 Sonnet that came out, and its coding ability was particularly strong. What’s the difference between using it in an IDE and using it outside in ChatGPT? Because ChatGPT doesn’t know the environment of your surrounding code, so when you ask questions, the questions are always very limited, and it can’t directly help you modify the existing long code. However, in IDEs like Cursor or GitHub Copilot, it’s completely different because it gets the context of the entire project, and then it reads the code and knows where to change, sometimes even without needing to locate where to change. There’s a very powerful feature where you just fill out a dialog box, input what you want to do, and it gets it all done for you.
Of course, it can’t handle complex requirements, the current model’s capabilities are still limited, you might still need to modify it. But anyway, many times it can find out where it needs to change by itself, without people having to put in a lot of effort, so it’s a way of how machines and humans cooperate, exploring the boundaries of the model, and this boundary is constantly changing. The original model was poor, so I needed people to do more things, like finding which line of code to change, hooking it, and then saying change, and then it could change. Now it might be able to understand the code itself, so you don’t need to hook which line of code, just tell it the requirement, and that might be a new progress.
If it’s a project with hundreds of thousands of lines, it still can’t cover everything at once. So, I still need to tell it which module, because I’m the one who truly understands the project, I know which module my requirement should change, I need to tell it the few files related to this module, and then it can change it well. Maybe one day the model’s capabilities will be stronger, and I won’t even need to tell it which module to change.
Another issue is debugging, the current models often write code with bugs, and I still have to debug and fix them. Maybe in the future, it can debug itself, and that would be another step up.
So, each of us learning is to follow the trend of model capability development. But I think there are two points here:
First, if you are using it as a productivity tool, you must use the best model, don’t use a poor one. Sometimes if you use a poor model, it’s like if my first phone was a very poor one, like a knockoff, then I might have a bad impression of phones, thinking they’re really hard to use, right? But if the first phone you used was an Apple, then you might think phones are great, right? This means if you start with a poor model, you might have a bad impression of AI overall, and then you won’t have motivation later.
The second thing is to look at how others are using it. So I think what you’re doing with this podcast or many others doing work to bridge the information gap is very meaningful.
Hunter Leslie: So maybe first you need some channels to gain the information gap, whether it’s listening to podcasts or watching videos on YouTube, and another thing is you have to try it yourself. You mentioned using the best model, is the best model based on evaluation scores? Or where can I see it? Because now for 2C, or some products, there are rankings on Product Hunt. Are there other places? If I haven’t used it, where can I find the best model?
Li Bojie: Chatbot Arena is an academic project by Berkeley, it’s actually a blind test platform where users randomly select two models to test, but you don’t know who is who, and everyone does AB comparisons on it, and whichever is better forms a vote. All these blind test votes are organized into a ranking list. Now, the ones ranked higher are probably OpenAI’s models, and domestically, there’s Yi Lightning, which is cheap and good. Then there’s Google’s, and also Anthropic Claude.
There are also category rankings, like the one just mentioned is the overall ranking, but if it’s a programming category ranking, then Anthropic’s Claude is definitely first now. So, for example, if I’m doing programming, I definitely look at the programming category ranking. These authoritative international rankings are definitely better. From this aspect, you can see the models, and for products, I think like you said, Product Hunt is quite good, because if you look at the programming category, I estimate that the ones ranked at the top are probably Cursor, Github Copilot, AI Devin, which are for programmers.
Hunter Leslie: Another issue is, maybe you don’t deal with this problem much because you yourself have been relatively smooth in big companies and then started your own business doing what you like. But in fact, I think many people in the industry in China are worried about being laid off. There are two situations when being laid off, one is I’m still doing it, and the other is I’m not doing it anymore. So I don’t know if you’ve thought about it yourself, or if you have some people around you who you think are relatively successful, like if I’ve been doing R&D for ten years and I don’t want to do it anymore, I can’t do it anymore. If I want to transition to doing other things, do you think there are some good ideas? Like what might be more successful?
Li Bojie: I think there are quite a few people around me who have transitioned, like many who are a bit older than me, they might indeed feel that after doing R&D for a few years, they’re not that interested in technology anymore, or they feel that doing technology is too tiring, working 996 is too exhausting, and they want to focus more on family, like having a better work-life balance. In this case, I think there are a few directions.
First, I think for smart people, doing quant is a pretty good idea. Quant is following the market, as long as your method is good enough, you might not need a very large team, maybe just yourself or joining an elite team of a few people, investing some money, and as long as you can make money, that’s fine. Its only goal is to make money in the securities market, right? Anyway, this thing is relatively closed, you don’t need to do operations, manage large teams, or other complex things. Many smart people have actually made quite a bit of money in this area.
But, this thing is a bit of a winner-takes-all because it’s actually competing with the world’s top smart minds. If you feel your intelligence isn’t enough to compete with these top people, then going in might just be being a leeks.
Hunter Leslie: So, if you’re not that outstanding, it’s better to choose another direction.
Li Bojie: I feel that product and technical planning are two good directions. Because transitioning from technology to product and technical planning is actually advantageous. For example, if a product manager has no technical background, they might propose some unrealistic requirements because they don’t understand the boundaries and capabilities of technology. Many product managers have encountered this situation. Sometimes, a product manager thinks a requirement is simple, but it might actually take a year to complete. Conversely, some things a product manager thinks are difficult, but for technical people, it can be done in a day. So, if someone with a technical background transitions to product management, they will have a better grasp of the difficulty and complexity of technology, which is an advantage for a product manager.
The second direction is technical planning. Generally, in some big companies, there are planners or think tank analysts responsible for technical planning. If someone with a technical background does technical planning, they will definitely have more insights because planning itself requires predicting what the future will be like. For example, if you ask me to predict what AI will be like in five years, whether AGI will appear, I can analyze a lot for you. But if someone can’t even understand the current AI models, asking them to plan for the future will be problematic.
Besides these two directions, there are actually many other choices. For example, opening a B&B, doing education, etc., are all good choices. Especially in the education industry, many people want to pass on their knowledge to others but don’t want to live a 996 life, so they choose this industry. Whether it’s basic education, university education, or online podcasts or selling courses, they’re all good choices. I’ve seen many classmates or friends succeed in this field.
Hunter Leslie: You must have experienced a lot from last year to this year, and have new understandings of direction and team, right? I feel you’re someone who likes to think, so I want to ask what questions you’ve been thinking about recently? You might not have an answer yet, but maybe you want to find a wise person to discuss this question. What have you been thinking about recently?
Li Bojie: I definitely have many questions I want to ask. If I’m only allowed to ask one question, I would ask: Do you think AGI can be achieved, and when can it be achieved? This question is very crucial, it determines the upper limit of AI.
This wave of AI is advancing rapidly, directly heading towards AGI, or will there be some twists and turns in the middle, like the last wave of AI? For example, the CV models in 2016, at that time, the larger the model, the stronger the capability, but in the end, it was found that besides CV, the model couldn’t do anything else, it couldn’t handle NLP. So in the end, it was still Transformer that unified CV and NLP. Will Transformer also have an upper limit, with some things it can’t handle?
But I think the current wave has one advantage: at least it now has multimodal capabilities, and OpenAI’s GPT-4o and Claude 3.5 Sonnet both have good coding abilities, and reasoning ability o1 is also seeing the dawn. The impression these things give is that they may not be difficult; as long as you invest enough computing power and have enough insight, you can get it done.
Some even suggest that not so much computing power is needed. For example, recently, Kai-Fu Lee mentioned that he only spent $3 million to train the highly ranked Yi Lightning model, which can greatly reduce costs. Also, with o1, OpenAI, as a pioneer, spent a lot of computing power to train this reinforcement learning thing, but my gut feeling is that if the method is used correctly, it may not require that much computing power. You see, AlphaZero evolved very quickly, reaching top human levels by noon and surpassing human levels by evening. If the feedback is done right, a medium-sized company or even a school might be able to create something with reasoning capabilities comparable to the current o1 mini. So this is also very exciting.
Hunter Leslie: (Can AGI be realized?) This question is indeed very difficult to solve at present, even if you discuss it with Elon Musk and Sam Altman, everyone has their own views. But this question is very important, especially if you are starting a business. The capability boundaries of the model may affect your judgment and understanding when laying out products, so this is very critical.
Li Bojie: Therefore, I think it is necessary to be friends with foundational model companies, as I mentioned earlier. If you are seen as an enemy by foundational model companies, eventually, no foundational model company may be willing to share information with you, or even let you use their API, because they worry that your company will replace them.
But if you become friends, many companies may be willing to let you use some internal unpublished things first or share their insights and ongoing work with you. For example, AI programming’s Devin got the beta version before o1 was released and made it into o1’s showcase. Chatbot Arena got the anonymous version of GPT-4o for users to test before GPT-4o was released. Agora and LiveKit had already adapted to the real-time API for real-time voice calls before it was released.
Once you become friends with foundational model companies, you may have an edge over others in understanding what the future will look like. As we mentioned earlier, foundational models are in a rapid development stage, and the capabilities of foundational models determine what applications can and cannot do. So it’s best to be friends with top foundational model companies in the world, like OpenAI, Anthropic, or Google.
Hunter Leslie: What has been your biggest anxiety recently?
Li Bojie: Recently, my biggest anxiety is what I can do on the road to AGI and how I should do it.
I talked a lot earlier, about science fiction, foundational models, and applications, but AGI is a matter for all humanity, and my personal ability is very limited, so what should I do? I don’t want to do things like embellishing foundational models because once the foundational model progresses, all that embellishing work is wasted. So I think it’s still important to be friends with foundational model companies and be part of their ecosystem, like the system optimization and bridging information gaps I mentioned earlier. This way, my work can make a small contribution to the big task of AGI.
Another thing is how to do it. I remember when I was interning at MSRA, former dean Harry Shum said that he didn’t care whether he was in academia or industry; he often switched between the two. He only cared about whether each project was impactful enough because a person’s career is made up of a series of projects, and as long as each project is impactful enough, that’s fine. This had a big impact on me, and he also said that not everything is suitable for the industry, nor is everything suitable for academia; different things have different approaches.
I’ve also talked to many people and have a rough understanding that there are many different ways to do something.
One way is the typical startup company, which takes a lot of investors’ money, initially raising a lot of funds, pursuing growth and going public. Such companies are often seen as the most successful examples, but they may not be suitable for every company, especially not for certain fields. Because in the process of growing the company, there will inevitably be conflicts between commercial realities and technical ideals, like what happened with OpenAI in the past year, where many people left, involving compromises between technical ideals and commercial realities. Once the company grows, it is no longer as cool or as technically driven.
The second possible way is the small and beautiful startup company. As a small and beautiful company, they start as a small team, possibly without VC investment, but they have a very clear PMF, solving a real problem, so they can make money and support themselves. If one day the market is big enough, they might scale up quickly, but if not, they maintain their status.
For example, many people like the so-called “9-6-5” (no overtime) companies. In fact, in China, there are basically only two types: one is mature foreign companies, and the other is these small and beautiful startups. They may maintain a technically driven and cool vibe for a longer time, with a good internal technical atmosphere, because they don’t have much competitive pressure and can support themselves, so they don’t need to move too fast. This is the second way I think is also a good idea.
The third way is the community approach, such as open source communities. For example, in the early days of Linux, if Linus had sought an angel investor saying he was making an operating system similar to UNIX and asked for investment, he might have been kicked out. But Linux expanded little by little through the community method, and it truly is an open-source operating system with value because everything else is commercial.
Open source has its value, but open-source projects have a problem: once they grow, or when individuals or teams face economic pressure, it involves a commercial issue: how to commercialize open-source projects? This is another challenge because many open-source projects have not been very successful in commercialization, mainly because it’s hard to balance the interests of the open-source community and the company. If I create something new, should I make it a closed commercial version or contribute it to the community? This is very troublesome.
Actually, I’ve been observing this recently, like vLLM, which is a very obvious example. vLLM hasn’t been commercialized yet, but many companies have forked vLLM and made many optimizations. One thing I found very touching was that in September, a vLLM 0.6 version was released, and its performance improved by 2.7 times. The 2.7 times performance improvement wasn’t due to a bunch of fancy optimizations, like optimizing some operators. It was actually an HTTP server wasting a lot of performance. Also, Python has a GIL global lock, and scheduling wasted a lot of performance. So they increased GPU utilization from 38% to much higher, and performance improved nearly threefold. I talked to several big companies, both domestic and foreign, and they said they had done such optimizations internally long ago, but they hadn’t contributed them to the community. They also said their internal stuff has better inference performance than the current open-source version of vLLM. Every big company has many things hidden internally, so there’s always this balance issue between commercial and open source.
Speaking of open-source communities, besides open-source communities, there are also non-profit projects, which may not be open source but are similar community projects. Wikipedia is a good example. If Wikipedia had initially said it wanted to create an encyclopedia and sought funding, it might have been difficult to get funding. But it has its value.
The third thing is Web3, which is also a good example of community projects. For example, Bitcoin is the ancestor of Web3. If you said you wanted to create a decentralized anonymous currency and sought funding, it would be difficult, right? But it has its value. Many projects also start as community projects and then use Web3 to do some financing, which can also be successful. But Web3 has a problem now: there are many financial speculation projects in this field. In such a mixed state, if your project is truly technology-driven, can it stand out and let people know its long-term value, rather than being wiped out by a Bitcoin cycle? This is also a challenging thing.
For community projects, no matter which of these three routes they take, it’s still quite difficult. But I think this is something that technical idealists might like to do.
Community projects may not have clear commercial value initially, but at least they have community value, a public good, solving a public interest for humanity, right? If I just want to play with it myself and don’t know if it’s useful, it might be a research project. Many big names are doing research projects and doing them successfully.
So I think these are four different types: typical startups, small and beautiful companies, community projects, or research projects, all suitable for different stages and types of projects. So I’m actually thinking about which way to go for the two things mentioned earlier. This is also a thought process.
Hunter Leslie: Okay, thank you for listening to this episode of the podcast. Please follow, like, and share. If you have any thoughts, feel free to interact and leave comments. The next episode will be even more exciting!