Intelligent Manufacturing Society Interviews Huawei 2012 Lab - Assistant Scientist - Li Bojie: AI is like a nuclear bomb, we cannot fall behind
** (Article from the WeChat public account of the Intelligent Manufacturing Society, original link, many thanks to the Intelligent Manufacturing Society for their excellent questions and editing) **
What impact will AI ultimately have on the technology and life of human society?
With the release of GPT4, the performance of large model AI has once again refreshed the public’s imagination. The content produced by AIGC is becoming more realistic and refined. With the continuous deepening of data cleaning and training, AI’s understanding of natural language has also shown great progress. From passively accepting data “feeding” to actively asking questions to the world, perhaps, the “artificial intelligence life” in science fiction is no longer far away from us.
Anxiety is inevitable, and “AI unemployment” seems to be really happening in some industries. On May 18, 2023, local time, BT, the largest telecom operator in the UK, said it would cut 40,000 to 55,000 jobs between 2028 and 2030. The layoffs will include BT’s direct employees and third-party employees, reducing the company’s total number of employees by 31-42%. Currently, BT has about 130,000 employees.
BT’s boss Philip Jansen publicly stated that after completing fiber optic deployment, digitizing work methods, adopting artificial intelligence (AI), and simplifying its structure, it will rely on fewer labor and significantly reduced cost base, “the new BT Group will be a leaner enterprise with a brighter future”. Looking back at home, some Internet technology companies have also shown related trends, especially the art outsourcing positions of game companies, which can be described as “disaster areas”.
Speaking of this issue, Li Bojie, assistant scientist at Huawei 2012 Lab, said that some of the public’s anxieties have been magnified by the media. AI technology is not a flood beast that replaces humans, but rather, it liberates productivity and shapes more new jobs. “For example, if we look at the past industrial revolution, people who used to do farming now have to use machines. The education they need, as well as the changes in society, economy, and people’s lifestyle, are all very big.”
Li Bojie believes that after AI technology becomes popular and becomes a new production tool, more industries and occupations will be generated in response. “For example, after the computer, there is no need for a copyist to copy things hard, right? AI is the same, some industries directly involve people, it can’t replace, like the service industry, right? But some things that follow the rules and do fixed patterns, AI can simplify a lot of labor.”
As a researcher of data center network technology closely related to AI, Li Bojie has put forward many views and thoughts on AI. Below is the conversation record of Xiao Zhi, the main writer of the Intelligent Manufacturing Society, and Li Bojie:
AI has been widely used
Xiao Zhi: At this stage, what applications or attempts does Huawei have for AI technology in helping to detect and prevent network attacks?
Li Bojie: This involves AI intelligent operation and maintenance. The so-called intelligent operation and maintenance is to detect whether there are any attacks in the network based on some logs of the network. I am not doing related work myself, but I heard that other departments are doing related work. They mainly include public clouds and data communication product lines, and they will do some related things and have some related products.
At present, AI does network attack detection and the like. Generally speaking, large models will not be used. Large models are mainly aimed at understanding human knowledge, targeting pictures or natural language text. But in network attacks, it targets network protocol messages, which are actually very fixed patterns, and there is no need to use a particularly complex neural network or something. Long before the appearance of large models, it should be said that 10 years ago, from the beginning of the popularity of deep learning, many people used machine learning to do various intrusion detections and traffic classifications and so on.
Xiao Zhi: Will the progress and development of AI technology improve and improve the management of our network center equipment?
Li Bojie: The first question you just mentioned is about security detection. In fact, in addition to using AI for security detection, you can also use logs to detect faults, or even fault warnings. For example, in the data center, there are many gray faults. It’s not that the whole thing is directly disconnected, but it’s on the edge of bad and not bad, and it may break down in the future.
How do you know if it will break down? Generally speaking, many things have a stage of not working normally before they break down, such as hard disks. When they are about to break down, they may have bad sectors and the access delay is erratic. If a GPU is about to break down, it may often overheat. This can actually be analyzed and identified through logs and machine learning algorithms to see what problems it may have. That is to say, these faults can be detected in advance, and the components that may fail are replaced in advance, which is much better than replacing them after the faults occur.
Especially in large-scale AI training, the cost of replacing the machine after a failure is relatively high. For example, the training of a large AI model requires 10,000 cards. If one card is broken, it will cause the entire cluster training of these 10,000 cards to stop. After the task on this card is migrated to other places, find a place to arrange it, and all the cards start to redo the current stage of the task, a faulty component will slow down the overall.
If the failure rate is relatively high, such as a failure every hour, assuming that it takes 10 minutes to recover from a failure, it is equivalent to wasting 10 minutes every hour on recovery, and the larger the scale, the higher the failure frequency, the higher the proportion of time used for failure recovery, and the lower the overall efficiency. So the best method is to be able to predict the points that may fail in advance and do some avoidance and replacement in advance.
In a sense, today’s AI clusters are like the first human computer ENIAC, composed of 18,000 vacuum tubes. Every day, vacuum tube failures cause downtime, and maintenance personnel need to replace the broken parts daily. Historically, integrated circuits with lower failure rates replaced vacuum tubes. So, could future AI clusters also achieve self-healing? In the short term, we can rely on AI intelligent operations, and in the long term, we can consider new materials and processes. For example, is it possible for chips to grow like human brain neurons and have the ability to physically replace faulty parts? This could have a significant impact on the fate of chips and AI.
Xiao Zhi: Is our current domestic network technology and data center network carrying capacity sufficient to support AI’s large-scale and widespread entry into daily life?
Li Bojie: This is definitely enough, you should believe this. Look at our domestic big factories, like ByteDance, it has more than 100,000 cards, and our Huawei cards also exceed 10,000, and then Alibaba definitely has a lot of cards, it has not publicly stated how many, but I estimate it may not be less than 100,000.
So, it’s not because big models have come, and everyone is crazy to buy cards. In this case, cards can’t be bought. It must have been a long time ago, and major companies have a lot of layouts in AI.
In fact, every domestic (tech) company uses AI in its key business. The most typical is search recommendation. All those recommendation algorithms, AI is its core technology. Then search engines, like Baidu, also use AI methods to do such a ranking.
And like Alibaba, it does a sort of goods, including advertising recommendations, we generally call it search recommendation advertising, search, push, and broadcast are all one thing, in fact, they all use this set of technology.
Huawei’s big model layout: computing platform, vertical, ToB
Xiao Zhi: What directions will Huawei’s AI layout involve more?
Li Bojie: It can be mainly divided into two aspects. On the one hand, it is the computing platform. Our Huawei’s Ascend and Kunpeng provide self-developed AI chips and general-purpose processors, as well as solutions for the AI and computing industry. The big model will definitely rise in the future, and major companies and some start-ups will do big models. And because of the competition between China and the United States, there are some restrictions on purchasing NVIDIA’s high-end hardware. At this time, our computing platform will definitely become a key infrastructure. Our Huawei’s past development has taken advantage of the big development of the information industry. The Internet and mobile Internet have expanded the communication pipeline; now the big model is a good opportunity to expand the computing industry pipeline.
On the other hand, it is the big model itself. The big model we are doing here may not be a general service like ChatGPT for the C-end, which can answer anything to the user, more is to solve a specific scenario problem, B-end service.
For example, typically, some financial models can be made. This model itself may be general, but I don’t let it answer all kinds of messy questions. For example, ask him how Lin Daiyu pulls out the weeping willow, right? It won’t have these inputs at all, its input is some specific field content.
For example, let it transform the ERP system. In the ERP system, it may ask about the salary of employees who have joined a certain department for less than a year in the past 10 months, or find out the best-performing employees in the past 10 months. Users can greatly simplify the process that originally required multiple mouse clicks by interacting with the ERP system in such a natural language way.
In addition, there is a voice assistant as an intelligent device entrance, like our Xiaoyi, right? Most of the previous smart assistants were artificial intelligence, basically, if the few words you ask and say are slightly different from the preset template, they don’t know what to do. After using the big model, he can at least help you do a lot of things, but the things he does are still limited to the controllable range.
How does the big model affect the data center network?
Xiao Zhi: After the big model catches fire, such as GPT3, its parameter scale is close to 175 billion parameters, and the parameter scale of GPT4 is even larger. What impact and changes will this parameter scale expansion have on our data center network requirements?
Li Bojie: I think this is a very critical thing, and it’s what I’m currently working on.
First of all, like GPT3 needs 175 billion parameters, it actually requires many cards to train together to train the model. Here at Huawei, the Pangu big model is also trained by thousands of cards, and then a large scale, hundreds of billions of parameters of such a model.
In this, there are mainly two types of communication, one is super node, or rack internal communication. It may be a few hundred cards, or a few dozen hosts, such a range, to do communication. At this time in communication, it requires extremely high bandwidth, such as the current NVLink, which requires 900 GB of communication performance per second, and the communication performance requirements are very high.
And in a larger range, other communication modes can be used, and the communication bandwidth requirements are not so high, such as when expanded between racks, for example, I have 10,000 GPUs, it may be divided into 20 clusters, each cluster has 500 GPUs, 500 internal high-performance communication, and then each cluster and each cluster use InfiniBand technology to do communication. In terms of performance, it is generally around 20 GB per second.
You just asked about the development of the big model and the subsequent, what kind of demands might it have on network performance, it can be said that the bigger the model, the higher the demand for the network.
For example, suppose you train a model with 175 billion parameters and a model with 1 trillion parameters on the same cluster. If the model structure is similar, then the 175 billion model may achieve a communication and calculation ratio of 2:8, that is, communication only accounts for 20% of the time.
But if it’s a model with trillions of parameters, the time spent on communication may account for 70~80%, which means that most of the time is spent on communication, which means that the actual calculation efficiency will be greatly reduced - most of the time, these computing units are waiting for the communication to be completed before they can start the next round of calculations.
To solve this problem, we definitely need a higher performance communication network. We at Huawei are also working on such a new type of interconnect bus, hoping to integrate the small-scale NVLink bus and the large-scale InfiniBand bus to create a new bus standard that can still achieve high communication performance in large-scale situations.
Xiao Zhi: Is there any progress or breakthrough in this direction?
Li Bojie: In 2022, our rotating chairman Xu Zhijun announced at the annual report conference that we have launched our Lingqu bus, a peer-to-peer architecture, which means that our GPUs, CPUs and other different computing devices can directly interconnect without the need for CPU to transfer communication. This achieves what I just mentioned, that high performance can still be achieved on a large scale.
Computing power, the key constraint of AI
Xiao Zhi: What do you think of the project abroad using GPT4 to analyze GPT2?
Li Bojie: I think this is a very key thing. I have also seen this work before. It actually means that some artificial intelligence will definitely evolve itself in the future, which is a very key thing.
Previously, many people were studying AI for science, that is, using AI to study and advance science, mainly the development of some natural science fields. Using AI to predict AI is also a very key thing. For example, is it possible to use AI itself to explore how its model structure should be innovated, how its data should be obtained, and even how to interact with the world to further strengthen its model.
For example, we think about how people learn about this world? It is actually that people start from infancy, constantly interact with the three-dimensional world, get responses, and gradually form a world model, but the way we currently train machine learning is still a passive acquisition type, all are people inputting a bunch of things, and will not actively interact with this world.
In the future, for example, AI can have robots, actively interact with the world, explore, including it has multiple different senses, which is the so-called multimodal, not only text, but also visual, auditory and other different feelings, it will form a more powerful model.
Xiao Zhi: It sounds more like the artificial intelligence life in science fiction movies?
Li Bojie: I think this is really not far away, basically it can be achieved within 10 years, unless there are any hardware or theoretical problems that make us really unable to do it. Of course, it is impossible to predict now. I think the biggest possible constraint on this happening is computing power. What I am really worried about now is whether all the computing power of mankind can support the occurrence of such things as artificial intelligence life. If there is enough computing power, many things can be done.
Why do I feel that the constraint of computing power will be very large? If you look at the development of computer science, in the past few decades, everyone has been trying to come up with more efficient algorithms. For example, our database, if you want to query how old a person is currently, scan all the hundreds of thousands of records in the entire database, this algorithm is definitely very inefficient, everyone will think it is best to check once and find out how old that person is, right?
But if we look at neural networks, the way they work is just the opposite, that is, they traverse everything, each neuron stores certain information, as if everything has been scanned according to the input, and finally compares which is the optimal answer, and then outputs it.
In traditional data structures, algorithms or databases, this efficiency is considered unacceptable and very inefficient. But it is precisely because our computing power has progressed to this step, allowing it to go through all the data, that neural networks show such strong performance.
In the future, we want AI to do more and more things, in addition to generating some text, we also want it to process video, audio, and its data volume is larger. Suppose each person has a smart assistant, it needs to record all the things that happened in this person’s life, which is a large amount of data, according to statistics it can reach 1 TB, if all use the current neural network mode to process, in fact, our computing power may be insufficient. The limitation of computing power is the biggest problem of artificial intelligence now.
The most fundamental limitation of computing power lies in energy and materials. How many joules of energy are needed to process each bit? Although the lower limit has not been explored physically, the chip technology we can achieve is limited. Now all data centers account for about 1%~2% of human energy consumption. If all electronic devices, including terminal devices and communication devices, are included, it may already account for close to 5%. There is no obvious breakthrough in human energy at present, controlled nuclear fusion has not been solved, that is to say, the maximum increase in computing power energy consumption is about 20 times, there is no way.
So, whether the computing power constrained by current energy and chip technology can support such a large demand is a challenging problem. Under limited energy, it depends on whether semiconductor technology can continue to reduce the energy consumption of each bit of computing power. I saw recently that Sam Altman (CEO of OpenAI) mentioned that he also invested in a controlled nuclear fusion company. If AI can solve controlled nuclear fusion, it is completely a new world.
“AI is like a nuclear bomb”
Xiao Zhi: There is a hotly discussed issue recently, how do you see the anxiety of AI causing unemployment?
Li Bojie: I feel that AI will have a big change in the way society exists.
I don’t know if this wave of AI can become the next industrial revolution, this is a bit too big, but for example, we look at the past industrial revolution, people who used to do farming now have to use machines, and then the education he needs, the changes in society, economy and people’s way of life and production are all very big.
Equivalent to the original one person needed per acre of land, everyone is farming, now I have a tractor, a tractor can handle 100 acres of land, the remaining 99 people can do other things, this is the root cause of our industrial society. Previously, because the productivity of agriculture was very low, everyone had to be fixed on the land, and no one was going to do more complex things. It is precisely because of the expansion of production capacity that many people can do other things and new industries are created.
Xiao Zhi: Do you think AI should be a technological liberation of productivity?
Li Bojie: It’s equivalent to some low-level positions are no longer needed, in fact, this matter does not need to be pulled to the industrial revolution so far, such as the emergence of computers also replaced many professions, right?
Isn’t it true that after the computer, there is no need for people to copy things hard? In fact, AI is the same, some industries are directly related to people, can not be replaced, such as the service industry, but some are very step by step, do a very fixed, standardized things, can simplify a lot of labor. For example, programmers are still a relatively professional occupation, but after GPT enables natural language programming, everyone can be a programmer, allowing machines to automate repetitive work in all walks of life.
There is no need to be particularly anxious about unemployment, the most anxious thing about AI is to prevent it from being used for evil. Some time ago, Sam Altman was questioned by Congress, also raised this issue, saying whether to make a nuclear non-proliferation treaty-like thing, AI’s ability if it reaches a certain level, must be subject to similar International Atomic Energy Agency monitoring. This is still very key, if a powerful AI falls into the hands of a terrorist, it is actually quite terrifying.
Xiao Zhi: Now everyone no longer needs to discuss whether or not to do AI, but should discuss how to manage AI in the future?
Li Bojie: In my own feeling, it is impossible to control it from doing it, AI is like a nuclear bomb, once it is invented, someone will definitely do it, if you don’t do it yourself, you can only fall behind others.
The electrical signals in AI chips are faster than the conduction speed of neurons in the human brain, and the communication bandwidth between AI chips is several orders of magnitude faster than human language-based communication, so the intelligence level of future AI is likely to be higher than humans. The silicon-based intelligence represented by AI compared to the carbon-based intelligence represented by humans is like a nuclear bomb compared to conventional weapons. The reason why humans have not been destroyed today is due to strict control of nuclear bombs. What we can do is to try to constrain AI for good purposes, let AI be a good assistant to humans, and not let humans be enslaved by AI like in science fiction.
With the consent of the interviewee, the text content has been revised
Organized and edited by: Xiao Zhi