Bojie Li
2024-02-22
Recently, Groq’s inference chips have made headlines with their large model output speed of 500 tokens/s.
In a nutshell, this chip plays a trick of trading space for time, storing both model weights and intermediate data in SRAM, instead of HBM or DRAM.
This is something I did 8 years ago at Microsoft Asia Research Institute (MSRA), suitable for the neural networks of that time, but really not suitable for today’s large models. Because large models based on Transformers require a lot of memory to store the KV Cache.
Although Groq’s chips have a very fast output speed, due to the limited memory size, the batch size cannot be very large. If we calculate the cost-effectiveness in terms of $/token, it may not be competitive.
Groq needs a cluster of hundreds of cards to run the LLaMA-2 70B model
2024-02-20
My Early Encounters with AI
Meeting AI During My PhD
Originally, my PhD research was focused on networks and systems, with my dissertation titled “High-Performance Data Center Systems Based on Programmable Network Cards“. Many in the field of networks and systems look down upon some AI research, claiming that AI papers are easy to “water down” and that with just an idea, a paper can be published in one or two months. In contrast, top conference papers in networks and systems often require a significant amount of work, taking as long as a year to complete.
Aside from the AI courses I took in school, my first serious AI-related project was in 2016, using FPGA to accelerate neural networks in Bing Ranking. That period was the previous wave of AI hype, and the so-called “four AI dragons” of today all started during that time.
Microsoft deployed FPGAs on a large scale in data centers not only for network virtualization but also for an important piece of neural network inference acceleration. At that time, we also used pipeline parallelism to store all the neural network weights on the FPGA’s SRAM, achieving super-linear acceleration. This story is described in more detail in the section “Exploration of Machine Learning Accelerators” in “Five Years of PhD at MSRA — Leading My First SOSP Paper“.
At that time, many people working in networks and systems didn’t understand AI, nor did they care to understand it, unable to distinguish between training and inference, or forward and backward operators. By optimizing these operators, I at least understood how basic feedforward neural networks (FFNN) work. However, I didn’t get involved in business applications or tinker with my own models.
2024-02-16
A joke is circulating among investors today: “I can finally get a good night’s sleep, because I no longer have to worry about the video generation companies I’ve invested in being overtaken by others.”
Last month, during an interview with Jiazi Light Year, “AI one day, human world one year: My 2023 with AI | Jiazi Light Year,” I predicted the four major trends of 2024, the first of which was video generation. I didn’t expect it to come true so quickly. (Of course, the videos generated by Sora currently do not contain complex semantics, and it cannot generate in real-time, so there’s still a chance for others)
- Multimodal large models can understand videos in real-time and generate videos containing complex semantics in real-time;
- Open-source large models reach the level of GPT-4;
- The inference cost of GPT-3.5 level open-source models drops to one percent of the GPT-3.5 API, alleviating cost concerns when integrating large models into applications;
- High-end phones support local large models and automatic App operations, making everyone’s life inseparable from large models.
Video Generation Models as World Simulators
The title of OpenAI’s technical report is also very meaningful: Video generation models as world simulators. (Video generation models as world simulators)
The last sentence of the technical report is also well written: We believe that the capabilities demonstrated by Sora today indicate that the continuous expansion of video models is a hopeful path to powerful simulators, capable of simulating the physical world, the digital world, and the objects, animals, and people living in these worlds.
In fact, as early as 2016, OpenAI explicitly stated that generative models are the most promising direction for computers to understand the world. They even quoted physicist Feynman’s words: What I cannot create, I do not understand. (What I cannot create, I do not understand)
2024-02-14
My mentor recommended that I read Peter Thiel’s Zero to One, truly a must-read for entrepreneurs. Peter Thiel is a Silicon Valley angel, a thinker in the investment world, and a founder of the PayPal Mafia.
Therefore, the author of “The Black Swan” commented on this book, saying that when a person with an adventurous spirit writes a book, it is a must-read. If the author is Peter Thiel, read it twice. But just to be safe, please read it three times, because “Zero to One” is definitely a classic.
The biggest takeaway from this book is that entrepreneurship and research are almost the same in many aspects.
All Profitable Companies Are Monopolies
The most interesting point in the book is that all successful businesses are different, or to say, all profitable companies are monopolies.
The monopolies discussed in the book are not those that rely on government resources to achieve monopoly, but those that innovate, making the products they supply to consumers unavailable from other businesses.
If there are multiple completely competitive companies in an industry, no matter how much value is created, the company’s profits will not be too much. For example, the American airline industry creates hundreds of billions of dollars in value every year, but airlines can only earn 37 cents from each passenger per flight. Google creates less value annually than the airline industry, but its profit margin is 21%, more than 100 times that of the airline industry.
Monopolists lie for self-protection, by positioning themselves as part of a larger market to fabricate non-existent competition. For example, Google does not position itself as a search engine company, but as an advertising company or a diversified technology company, the latter two markets being larger, with Google being just a minor player in the entire market.
And non-monopolists, in order to exaggerate their uniqueness, often define their market as the intersection of various smaller markets. For example, an English restaurant in Palo Alto, or the only company developing an email payment system (PayPal).
But describing the market too narrowly is a deadly temptation, it seems like you can naturally dominate it, but such a market may not exist at all, or it’s too small to support a company.
2024-02-07
(This article is reprinted from Jiazi Light Year’s official account, thanks to Jiazi Light Year for the interview)
Summarizing 2023, embarking on 2024.
Authors | Liu Yangnan Suhoi Zhao Jian
In the past week or two, many companies have been busily holding strategic meetings, clarifying their goals and plans for 2024.
After more than a year of AI’s rapid development, it’s time to make an annual summary of the busy 2023. After the strategic meetings and entering the Spring Festival holiday, most companies will truly stop their relentless pace and enter a brief and rare state of rest.
So, how to summarize the year 2023?
“Jiazi Light Year” invited more than 30 AI practitioners from fields such as foundational large models, AI Infra (AI infrastructure), multimodal, industry vertical scenarios, and academic research, and posed 5 questions:
What is your keyword for 2023?
When was the Magic Moment (the most impressive moment) you experienced in 2023?
Have you ever felt lost in the rounds of technological impacts in 2023? From confusion to enlightenment, what was the turning point?
Predict an important event that might happen in the AI industry in 2024?
What would you say to yourself a year ago? If you could ask yourself a year from now a question, what would it be?
Their confusion and anxiety, excitement and thrill, are a microcosm of the AI industry for the entire year; their exploration and perseverance, refresh and iteration, will be the prelude to the AI explosion in the next five years or even ten years.
Below are their shares (in alphabetical order of their names).
2024-02-03
We booked our wedding photoshoot with Artiz Studio in Korea as early as 2021, wanting to have it at the University of Science and Technology, but after the pandemic, the university has been closed to outsiders. Fortunately, Artiz Studio is a nationwide chain, so we switched to Beijing for our shoot in August 2023 without any additional cost, and the shooting environment in Beijing was even better than in Hefei.
Edited photo album (131 photos)
Video photo album (03:54, 150 MB)
2024-02-03
(This article is a transcript of a speech given by the author at the first Zhihu AI Pioneer Salon on January 6, 2024)
I am honored to meet everyone and to share at the Zhihu AI Pioneer Salon. I am Bojie Li, co-founder of Logenic AI. Currently, AI Agents are very popular. For example, in a roadshow with more than 70 projects, over half are related to AI Agents. What will the future of AI Agents look like? Should they be more interesting or more useful?
We know that the development of AI has always had two directions: one is interesting AI, AI that is more like humans, and the other direction is more useful AI, that is, should AI be more like humans or more like tools? There is a lot of controversy. For example, Sam Altman, CEO of OpenAI, said that AI should be a tool, it should not be a life form, but what we are doing now is the opposite, we are making AI more like humans. In fact, many AIs in science fiction movies are more like humans, such as Samantha in Her, Tu Ya Ya in “The Wandering Earth 2”, and Ash in Black Mirror, so we hope to bring these sci-fi scenes to reality.
Besides the directions of interesting and useful, there is another dimension, which is fast thinking and slow thinking. There is a book called “Thinking, Fast and Slow”, which says that human thinking can be divided into fast thinking and slow thinking, that is, fast thinking is subconscious thinking, not needing to think it over, like ChatGPT’s question-and-answer can be considered a kind of fast thinking because it won’t proactively find you when you don’t ask it questions. Slow thinking, on the other hand, is stateful complex thinking, that is, how to plan and solve a complex problem, what to do, and what to do next.
2024-01-27
(The video contains 96 photos, 06:24, 190 MB)
2024-01-08
(Reprinted from Sohu Technology, author: Liang Changjun)
Editor’s Note:
Life reignites, like spring willows sprouting, after enduring the harsh winter, finally bursting with vitality.
Everyone is a navigator, in the journey of life, we inevitably encounter difficulties, setbacks, and failures. Facing the baptism of storms, we constantly adjust our course, move forward firmly, and search for our own shore.
Life reignites, is also a re-recognition of self-worth. We need to learn to appreciate our strengths, like the harmony of a qin and se, and accept our shortcomings, like raw jade that needs to be polished to shine.
This road is not easy, but like a spring in the stone, accumulating day by day, eventually converging into the sea.
On this New Year’s Eve, Sohu Finance and Sohu Technology jointly launch a planned report, focusing on the journey of life reignition of individual small characters, bravely facing the challenges of life.
2023-12-23
People often ask me to recommend some classic papers related to AI Agents and large models. Here, I list some papers that have been quite enlightening for me, which can serve as a Reading List.
Most of these papers were just published this year, but there are also some classic papers on text large models and image/video generation models. Understanding these classic papers is key to comprehending large models.
If you finish reading all these papers, even if you only grasp the core ideas, I guarantee you will no longer be just a prompt engineer but will be able to engage in in-depth discussions with professional researchers in large models.