How to Develop Research Taste?
(This article was first published on Zhihu answer: “How to develop research taste in the field of computer systems?”)
In the blink of an eye, it’s been nearly 10 years since I graduated from USTC. Yesterday, while discussing with my wife the recent developments of our classmates in the USTC systems circle, I realized that research taste is the most critical factor in determining academic outcomes. The second key factor is hands-on ability.
What is research taste? I believe that research taste is about identifying influential future research directions and topics.
Many students are technically strong, meaning they have strong hands-on skills and system implementation abilities, but still fail to produce influential research outcomes. The main reason is poor research taste, choosing research directions that either merely chase trends without original thought or are too niche to attract attention.
PhD Students’ Research Taste Depends on Their Advisors
I believe that research taste initially depends heavily on the advisor, and later on one’s own vision.
For PhD students, who are in the early stages of their research careers, research taste is crucially dependent on their advisors. When you propose an unreliable research topic, a good advisor with a strong research taste will tell you it’s unreliable, while a poor advisor might say the direction is not bad and to try it out. Thus, a good advisor helps you quickly weed out 95 unreliable topics from a potential 100, leaving 5 to slowly research and explore. In contrast, an average advisor might let you go in circles with 100 unreliable topics, wasting a lot of time without much progress.
A PhD student’s research taste is largely influenced by their advisor’s research taste. Some advisors prefer more theoretical work, others more engineering work. Some like more innovative ideas, others more solid implementations. There’s no right or wrong here; all can produce valuable research. The key to research taste is to recognize trends in the field earlier than most.
My advisor led me through several influential projects, all because he could see trends that most others had not yet recognized.
My PhD research focused on FPGA-based programmable network cards. In 2013, when Software Defined Networking (SDN) was popular, most people thought processing network functions on data center CPUs using software was the trend. At that time, the mainstream data center networks were still at 10 Gbps, and CPUs were not yet a bottleneck, making programming on CPUs much more convenient than using fixed-function ASICs, so not many were interested in hardware acceleration for networks. But my advisor noticed that the improvement in data center network performance was much faster than the improvement in CPU performance, thus identifying a trend that CPUs would definitely become a bottleneck in network protocols. To balance performance and programmability, FPGA-based programmable network cards were a good approach. Today, 200 Gbps data center networks are no longer a novelty, so nearly all cloud computing data centers have deployed programmable network cards for acceleration.
My work at Huawei involved operator compilation optimization and large-scale high-performance network interconnection. In 2019, my advisor plotted a scatter chart of AI computing power trends and found that the growth in AI computing demands was far exceeding the pace of Moore’s Law, making distributed parallel training extremely important, thus initiating the large-scale high-performance network interconnection project to support tens of thousands of cards. When I joined this project in 2020, the GPT-3 paper had just been published, further supporting our trend analysis. In 2020, most were still using single-machine multi-card training, and distributed training tasks exceeding a few dozen cards were rare. Thus, many thought we were crazy to train a model with ten thousand cards, as the entire company didn’t even have that many cards at the time. Today, Huawei’s Ascend might be the only domestic AI chip capable of large-scale training.
To gauge an advisor’s research taste, the most crucial aspect is to look at the impact of the advisor’s previous research outcomes, whether they lead or follow. Some advisors publish many papers, but they are mostly following others’ results. For example, discovering someone made a key-value store and thinking of a method to double its performance could lead to another paper. Although this approach can quickly publish papers in the short term, it is detrimental to long-term research taste.
Lead-type papers are usually those that propose somewhat immature designs at the beginning of a field, but whose subsequent influence is significant. Many with average research taste, upon seeing these classic works, might think they are poorly done and believe they could design the system much better. For instance, many classic systems in the graph computing field, like Spark, Giraph, GraphLab, GraphX, perform even worse than single-threaded performance on 128 cores. The paper “Scalability! But at what COST?” criticizes this issue.
Then these people think, if I improve this classic work, make it perform better, safer, and more user-friendly, I could definitely publish in a top conference. After much effort, they manage to produce the work and submit it to a top conference, only for the reviewers to find it too engineering-focused with insufficient novelty.
What is novelty? Novelty is whether the reviewer learns something new from the paper. Many follow-type papers merely use engineering methods to boost performance, which reviewers have already considered, so they learn nothing new from the paper. In contrast, some lead-type papers’ greatest contributions are posing a problem others hadn’t thought of or offering a new solution to a well-known problem. Classic papers often contain many insights, providing endless fascination and new learning with each reading, like nearly every sentence in the Bitcoin paper proving true over the subsequent 15 years. The implementation part of classic papers only needs to work and prove better than existing work, not necessarily perfect, thus allowing followers to find many areas for improvement.
Insufficient novelty is perhaps the biggest criticism of a paper. Technical solutions being unreasonable, proofs having flaws, comparative experiments being poorly conducted, and poor writing are all improvable. My recent two SIGCOMM papers were submitted three and five times, respectively; I was still a PhD student during the first submission and had graduated by the time they were accepted. Meanwhile, several papers I wrote at Huawei were continuously rejected and have not been accepted to date. So, being rejected is not necessarily bad; reviewer comments are the greatest motivation for improving a paper.
However, if multiple reviewers consistently find the paper’s novelty lacking, that is very difficult to improve. Insufficient novelty should ideally be identified by the advisor during the early project initiation phase. Discovering only after completing the work that it’s just a slightly improved incremental work, or worse, that it’s already been done by others but wasn’t discovered during initial research, or that other researchers with the same idea got there first, is unfortunate.
Some, upon receiving reviews criticizing the lack of novelty, feel that the academic community is too elitist, believing the industry only cares about the best product, regardless of who proposed it first. Some even start businesses with this mindset, thinking that if their technology surpasses that of industry pioneers or leaders, they will surely win customers. But they end up disappointed.
The market may not value originality as much, but it values brand; B2B products rely on customer relationships, while B2C products depend on network effects. Peter Thiel in “Zero to One” points out that to have a monopolistic advantage in technology, it must solve problems others can’t, thus needing to be ten times better than existing technology; merely being one or two times better is insufficient.
Independent Researchers’ Research Taste Depends on Vision
After completing a PhD, you become an independent researcher, whether in academia or industry, and must come up with ideas on your own, no longer relying on an advisor.
I’ve only been a PhD graduate for five years, so I hesitate to call myself a great independent researcher. However, based on my observations of top independent researchers, I believe an independent researcher’s research taste mainly depends on vision.
In the entrepreneurial circle, my favorite example is Elon Musk, who aims to send humans to Mars, a long-term vision. SpaceX and Tesla are steps in the process of realizing this vision. Sometimes, compromises with business realities are necessary to achieve this vision. For example, to accumulate the energy technology needed for Mars, Tesla started by making electric vehicles, a commercially viable approach.
Silicon Valley startup guru Peter Thiel also said this; 25 years ago, he wanted to replace the dollar with digital currency, but attempts at completely decentralized digital currencies like Bitcoin failed at the time, leading him to create Paypal, changing people’s payment methods first, which is also an important step towards the vision of digital currency.
Both Elon Musk and Peter Thiel start with their vision, then look at the current technologies and products on the market, identify what’s missing to achieve their vision, and work in that direction.
My personal vision is to realize silicon-based life. So, what does silicon-based life need? I believe there are four key technology directions:
- Computing Power (AI Infra): Enhancing the efficiency of AI training and inference. This has been my main research direction for the past 10 years, during my PhD at MSRA and my work at Huawei. AI Infra is also one of our startup team’s core competencies; for example, we can optimize a 10-second end-to-end speech pipeline to 1.5 seconds, reducing the inference cost of the same model by tenfold.
- Foundation Models: This is the most critical factor in how AI gains intelligence. However, I’m not an AI algorithm expert, and there are already enough people working in this field, so I haven’t pursued this direction.
- AI Agent: Addressing how AI can be more like humans. Led by OpenAI, most current research on AI Agents is tool-oriented, making AI increasingly like a cold, emotionless tool rather than possessing human memory, personality, and emotions. Making AI Agents more human-like has been my main research direction over the past year.
- Super Alignment: A superintelligence smarter than humans must obey human commands and follow human intentions, or it would be a disaster for humanity. Even for current models, alignment is very important for enhancing user experience. This is a research direction we recently started.
Whether it’s writing papers or starting a tech business, the key is to solve problems that others cannot solve. I won’t follow the crowd into those highly competitive fields where everyone else has already thought of and can solve the problems.
Research taste is the ability to judge trends and predict the future
Metaphysically speaking, research taste is the ability to judge trends and predict the future.
It sounds unreliable to predict the future, but the essence of many things is about predicting the future:
- The working principle of large models is to predict the next token. Given a sentence, predict what the next word is. For example, “The capital of China is”, the next word is likely “Beijing”. All propositions can be organized in the form of fill-in-the-blank questions, so the ability to predict the next word implies general intelligence.
- The purpose of science is to discover a set of rules to predict what will happen in a given scenario. For example, if you pick up an apple and let go, will the apple fall to the ground or fly upwards? Having the ability to predict the future is also an important criterion for distinguishing science from pseudoscience.
- The essence of investing is predicting how much money the subject will make in the future. For instance, 10 years ago, the consensus in the AI community was that computing power is one of the three pillars of AI, so NVIDIA’s stock was bound to rise, but no one expected it to increase by 100 times. At the end of last year, the consensus in the Web3 community was that Web3 would be popular in 2024-2025, when Bitcoin was only $25,000, and now it’s close to $70,000.
- The fate of an individual and a company largely depends on the course of history itself, and the essence of making choices is predicting the future.
As you can see, research taste is a scarce ability. Most people think predicting the future is unreliable because most people do not have the ability to predict the future. However, leading scientists and successful entrepreneurs have far superior abilities to judge trends and predict the future.
Some might say, isn’t this survivorship bias, where those who bet right become leading scientists and successful entrepreneurs, and those who bet wrong are seen as lacking ability? To answer this, one only needs to look at the trend judgments and future predictions made by leading scientists and successful entrepreneurs 10 years ago and see how many of these predictions have come true. Although I don’t have data or research results to support this, I believe that successful people have a stronger ability to predict the future than the average person.