OpenAI Sora: Video Generation Models as World Simulators

A joke is circulating among investors today: “I can finally get a good night’s sleep, because I no longer have to worry about the video generation companies I’ve invested in being overtaken by others.”

Last month, during an interview with Jiazi Light Year, “AI one day, human world one year: My 2023 with AI | Jiazi Light Year,” I predicted the four major trends of 2024, the first of which was video generation. I didn’t expect it to come true so quickly. (Of course, the videos generated by Sora currently do not contain complex semantics, and it cannot generate in real-time, so there’s still a chance for others)

  1. Multimodal large models can understand videos in real-time and generate videos containing complex semantics in real-time;
  2. Open-source large models reach the level of GPT-4;
  3. The inference cost of GPT-3.5 level open-source models drops to one percent of the GPT-3.5 API, alleviating cost concerns when integrating large models into applications;
  4. High-end phones support local large models and automatic App operations, making everyone’s life inseparable from large models.

Video Generation Models as World Simulators

The title of OpenAI’s technical report is also very meaningful: Video generation models as world simulators. (Video generation models as world simulators)

The last sentence of the technical report is also well written: We believe that the capabilities demonstrated by Sora today indicate that the continuous expansion of video models is a hopeful path to powerful simulators, capable of simulating the physical world, the digital world, and the objects, animals, and people living in these worlds.

In fact, as early as 2016, OpenAI explicitly stated that generative models are the most promising direction for computers to understand the world. They even quoted physicist Feynman’s words: What I cannot create, I do not understand. (What I cannot create, I do not understand)

Read More

Reading Notes of Peter Thiel's Zero to One

My mentor recommended that I read Peter Thiel’s Zero to One, truly a must-read for entrepreneurs. Peter Thiel is a Silicon Valley angel, a thinker in the investment world, and a founder of the PayPal Mafia.

Therefore, the author of “The Black Swan” commented on this book, saying that when a person with an adventurous spirit writes a book, it is a must-read. If the author is Peter Thiel, read it twice. But just to be safe, please read it three times, because “Zero to One” is definitely a classic.

The biggest takeaway from this book is that entrepreneurship and research are almost the same in many aspects.

All Profitable Companies Are Monopolies

The most interesting point in the book is that all successful businesses are different, or to say, all profitable companies are monopolies.

The monopolies discussed in the book are not those that rely on government resources to achieve monopoly, but those that innovate, making the products they supply to consumers unavailable from other businesses.

If there are multiple completely competitive companies in an industry, no matter how much value is created, the company’s profits will not be too much. For example, the American airline industry creates hundreds of billions of dollars in value every year, but airlines can only earn 37 cents from each passenger per flight. Google creates less value annually than the airline industry, but its profit margin is 21%, more than 100 times that of the airline industry.

Monopolists lie for self-protection, by positioning themselves as part of a larger market to fabricate non-existent competition. For example, Google does not position itself as a search engine company, but as an advertising company or a diversified technology company, the latter two markets being larger, with Google being just a minor player in the entire market.

And non-monopolists, in order to exaggerate their uniqueness, often define their market as the intersection of various smaller markets. For example, an English restaurant in Palo Alto, or the only company developing an email payment system (PayPal).

But describing the market too narrowly is a deadly temptation, it seems like you can naturally dominate it, but such a market may not exist at all, or it’s too small to support a company.

Read More

AI a Day, Human a Year: My 2023 with AI | Jiazi Light Year

(This article is reprinted from Jiazi Light Year’s official account, thanks to Jiazi Light Year for the interview)

Summarizing 2023, embarking on 2024.

Authors | Liu Yangnan Suhoi Zhao Jian

In the past week or two, many companies have been busily holding strategic meetings, clarifying their goals and plans for 2024.

After more than a year of AI’s rapid development, it’s time to make an annual summary of the busy 2023. After the strategic meetings and entering the Spring Festival holiday, most companies will truly stop their relentless pace and enter a brief and rare state of rest.

So, how to summarize the year 2023?

“Jiazi Light Year” invited more than 30 AI practitioners from fields such as foundational large models, AI Infra (AI infrastructure), multimodal, industry vertical scenarios, and academic research, and posed 5 questions:

  • What is your keyword for 2023?

  • When was the Magic Moment (the most impressive moment) you experienced in 2023?

  • Have you ever felt lost in the rounds of technological impacts in 2023? From confusion to enlightenment, what was the turning point?

  • Predict an important event that might happen in the AI industry in 2024?

  • What would you say to yourself a year ago? If you could ask yourself a year from now a question, what would it be?

Their confusion and anxiety, excitement and thrill, are a microcosm of the AI industry for the entire year; their exploration and perseverance, refresh and iteration, will be the prelude to the AI explosion in the next five years or even ten years.

Below are their shares (in alphabetical order of their names).

Read More

Our Wedding Photoshoot with Artiz Studio in Korea

We booked our wedding photoshoot with Artiz Studio in Korea as early as 2021, wanting to have it at the University of Science and Technology, but after the pandemic, the university has been closed to outsiders. Fortunately, Artiz Studio is a nationwide chain, so we switched to Beijing for our shoot in August 2023 without any additional cost, and the shooting environment in Beijing was even better than in Hefei.

Edited photo album (131 photos)

Video photo album (03:54, 150 MB)

Read More

The Next Stop for Generative AI: More Interesting or More Useful?

(This article is a transcript of a speech given by the author at the first Zhihu AI Pioneer Salon on January 6, 2024)

I am honored to meet everyone and to share at the Zhihu AI Pioneer Salon. I am Bojie Li, co-founder of Logenic AI. Currently, AI Agents are very popular. For example, in a roadshow with more than 70 projects, over half are related to AI Agents. What will the future of AI Agents look like? Should they be more interesting or more useful?

We know that the development of AI has always had two directions: one is interesting AI, AI that is more like humans, and the other direction is more useful AI, that is, should AI be more like humans or more like tools? There is a lot of controversy. For example, Sam Altman, CEO of OpenAI, said that AI should be a tool, it should not be a life form, but what we are doing now is the opposite, we are making AI more like humans. In fact, many AIs in science fiction movies are more like humans, such as Samantha in Her, Tu Ya Ya in “The Wandering Earth 2”, and Ash in Black Mirror, so we hope to bring these sci-fi scenes to reality.

Besides the directions of interesting and useful, there is another dimension, which is fast thinking and slow thinking. There is a book called “Thinking, Fast and Slow”, which says that human thinking can be divided into fast thinking and slow thinking, that is, fast thinking is subconscious thinking, not needing to think it over, like ChatGPT’s question-and-answer can be considered a kind of fast thinking because it won’t proactively find you when you don’t ask it questions. Slow thinking, on the other hand, is stateful complex thinking, that is, how to plan and solve a complex problem, what to do, and what to do next.

Read More

My 2023 Trip to the United States

(The video contains 96 photos, 06:24, 190 MB)

Read More

Young People Diving into AI: Huawei Genius Gives Up Million-Dollar Salary, Gen Z Drops Out to Start a Business, Fearing Failure Less Than Not Making Money

(Reprinted from Sohu Technology, author: Liang Changjun)

Editor’s Note:

Life reignites, like spring willows sprouting, after enduring the harsh winter, finally bursting with vitality.

Everyone is a navigator, in the journey of life, we inevitably encounter difficulties, setbacks, and failures. Facing the baptism of storms, we constantly adjust our course, move forward firmly, and search for our own shore.

Life reignites, is also a re-recognition of self-worth. We need to learn to appreciate our strengths, like the harmony of a qin and se, and accept our shortcomings, like raw jade that needs to be polished to shine.

This road is not easy, but like a spring in the stone, accumulating day by day, eventually converging into the sea.

On this New Year’s Eve, Sohu Finance and Sohu Technology jointly launch a planned report, focusing on the journey of life reignition of individual small characters, bravely facing the challenges of life.

Read More

AI Agent & Recommendations for Classic Papers on Large Models

I am often asked to recommend some classic papers related to AI Agents and large models. Here, I list some papers that have been particularly enlightening for me, which can serve as a Reading List.

Most of the papers listed here were published this year, but there are also some classic papers on text-based large models and image/video generation models. Understanding these classic papers is key to understanding large models.

If you finish reading all these papers, even if you only grasp the core idea of the papers, I guarantee you will no longer be just a prompt engineer, but will be able to discuss in-depth with professional researchers of large models.

Read More

Continuous Critical Hits, Teasing the Genius Youth Together? Beijing AI Salon on December 21

(Reprinted from USTC Alumni Foundation)

On December 21, the USTC Beijing Alumni AI Salon was held at the Computer Network Information Center of the Chinese Academy of Sciences. The former Huawei “genius youth” and co-founder of Logenic AI, Li Bojie (1000), delivered a keynote report on “The Next Stop for AI Agents: Interesting or Useful?” sharing with nearly 200 students and alumni both online and offline.

Keynote Report

The report revolved around the theme “AI Agent: Useful or Interesting?” and, combining specific life and work scenarios, analyzed from an “interesting” perspective how to achieve long-term memory of AI agents at a low cost and how to model the internal thought process of humans; from a “useful” perspective, it discussed how to achieve image understanding of AI agents, complex task planning and decomposition, and how to reduce hallucinations. In addition, he also shared his views on how to reduce the inference cost of large models.

Read More

Which Domestic AI Large Model Has the Most Promising Future?

(This article was first published on Zhihu)

No conflict of interest: Since I am not working on foundational large models (I work on infra and application layers) and am currently not involved in the domestic market, I can provide some information from a relatively neutral perspective.

After a few months of entrepreneurship, I found that I could access much more information than ordinary big company employees, learning a lot from investors and core members of the world’s top AI companies. Based on the information gathered in the United States over three months, I feel that ByteDance and Baidu are the most promising among the big companies, and among the startups that have publicly released large models, Zhipu and Moonshot are the most promising.

Although Robin said that there are already hundreds of companies working on foundational large models in China, due to the relatively homogeneous nature of foundational large models, the market for foundational large models is likely to end up like the public cloud market, with the top 3 occupying most of the market share, and the rest being categorized as others.

Most of the large model startups in China have just started for half a year, and nothing is set in stone yet. Some hidden masters are still quietly preparing their big moves. The era of large models has just begun, and as long as the green hills are there, one need not worry about firewood.

Read More