Bojie Li
2019-12-08
Doctoral Thesis from University of Science and Technology of China, Author: Bojie Li
Chinese Version: High Performance Data Center Systems Based on Programmable Network Cards (PDF, 8 MB)
AI Translated Unofficial English Version: High Performance Data Center Systems with Programmable Network Interface Cards (PDF, 8 MB)
Publication Date: 2019-05-26.
2019-08-19
Communication intensive applications in hosts with multi-core CPU and high speed networking hardware often put considerable stress on the native socket system in an OS. Existing socket replacements often leave significant performance on the table, as well have limitations on compatibility and isolation.
In this paper, we describe SocksDirect, a user-space high performance socket system. SocksDirect is fully compatible with Linux socket and can be used as a drop-in replacement with no modification to existing applications. To achieve high performance, SocksDirect leverages RDMA and shared memory (SHM) for inter-host and intra-host communication, respectively. To bridge the semantics gap between socket and RDMA/SHM, we optimize for the common cases while maintaining compatibility in general. SocksDirect achieves isolation by employing a trusted monitor daemon to handle control plane operations such as connection establishment and access control. The data plane is peer-to-peer between processes, in which we remove multi-thread synchronization, buffer management, large payload copy and process wakeup overheads in common cases. Experiments show that SocksDirect achieves 7 to 20x better message throughput and 17 to 35x better latency compared with Linux socket, and reduces Nginx HTTP latency by 5.5 times.
2019-08-08
Introduction to Score Cheating (Informatics Competition)
Introduction to Score Cheating (Mathematics Competition)
The Principle of Numbers (July 2009)
Introduction to Score Cheating (Informatics Competition) Source File (Yongzhong Office Format)
Introduction to Score Cheating (Mathematics Competition) Source File (Microsoft Office Format)
2019-08-08
2019-08-07
Hardware-based transports, such as RDMA, are becoming prevalent because of its low latency, high throughput and low CPU overhead. However, current RDMA NICs have limited NIC memory to store per-flow transport states. When the number of flows exceed memory capacity, the NIC needs to swap out flow states to host memory via PCIe, leading to performance degradation.
This paper presents a hardware-based transport without per-flow state. At its core, flow state bounces between the two end hosts along with a data packet, analagous to a thread whose state is always in-flight. To enable multiple in-flight packets, each thread is assigned a distinct sequence of packets to send. We enable each thread to fork, throttle and merge independently, which effectively simulates a window-based congestion control mechanism. For loss recovery, we design an epoch-based single loss detector for all flows, which enables selective retransmission and the storage size is proportional to the number of lost packets in a round trip. When there are more losses than the NIC can handle, the receiver CPU is notified to recover loss.
We design and implement RDMA, TCP and TLS transports without per-flow states in an FPGA prototype. The transports have small network bandwidth and CPU overhead. Simulations and testbed experiments show that flows share network bandwidth fairly in a multi-bottleneck network, and solves the incast problem even better than DCTCP and DCQCN. With a large number of concurrent flows, the throughput of our stateless hardware-based TLS transport is 100x of a stateful hardware-based transport and 50x of a software-based transport.
2018-11-28
(This is an interview script from the AI Finance Society WeChat public account “No.5 Danleng Street Magic Academy”. Due to copyright issues, only excerpts including myself are included. For the full text, please follow the “AI Finance Society” WeChat public account.)
1
Li Bojie is a doctoral intern jointly trained by the Institute and the University of Science and Technology of China. He may be one of the few people who have had a “connection” with the Institute since childhood. In junior high school, he dabbled in Olympiad in Informatics, attended computer training classes, and once asked in the “Kaifu Student Network”: “What will be the future development of computers?” He can’t remember the reply from Li Kaifu in the forum, but that remote conversation excited the young boy in front of the screen and made him feel encouraged from afar.
Later, he studied the deans of the Microsoft Asia Research Institute and found that they were all trained by Carnegie Mellon University, almost in the same major, and under similar mentors. “This gave me an inspiration, I must go to a good environment like the Microsoft Research Institute to do academics, and the connections accumulated in it will also provide great help for my future development.”
Before coming to the Institute, Li Bojie also had the idea of studying abroad, but it was too difficult to apply for a computer powerhouse like Carnegie Mellon from China. Many people are scrambling to go to the top 50 universities in the world, but in his view, “except for the TOP20, our MSRA is higher in research level than other schools ranked 20-50.”
2
Li Bojie and his mentor Zhang Lintao occasionally have conflicts in the selection of research projects. Sometimes Li Bojie is persuaded, and sometimes he insists on his own opinions and persuades his mentor after deep research. Mentor Zhang Lintao would say to Li Bojie: “Although I have a lot of experience, I don’t necessarily understand everything. The breakthrough in details still depends on your own thinking, and then we communicate and feedback.”
3
In the conversation with the reporter, it can be seen that their eyes began to shine when they talked about “what is research”.
Li Bojie remembers that when he was interviewed at the Institute, he was well prepared to talk about professional issues. Because he was too interested in computers, he switched from the mathematics department to the computer major in college, and used an innovative technology with his classmates to make it easier for thousands of people in the school to access the Internet. But at the end of the interview, mentor Zhang Yongguang asked him: “What is network research like?” Li Bojie was stunned on the spot and didn’t know how to answer. He often tinkered with servers at school, so he imagined that network research is about pulling network cables, matching IP addresses, and doing some hard work.
During his 5-year internship at the Institute, Li Bojie gradually realized something. “Doing research is different from engineering. Doing research is to create something that others do not have, and what you make may not be useful for a while, you have to be able to withstand loneliness, the key is whether you have the attitude of creating knowledge for mankind.”
The process of research is also full of philosophical thinking. The most famous “philosophy” of mentor Zhang Lintao is the “30-year theory”. He observed that many technologies, such as the now popular neural networks, have gone through a 30-year twist - first there is a bud, then “die”, and then live after a trough of one or two decades. Why is it 30 years?
“In the past, the conditions were not mature, and the mentors were all tormented by this thing, so they would tell their doctoral students that this thing is unreliable. But 30 years is basically a generation. When the mentors are old, the newcomers pick up the old Idea, and the opportunity may just come.”
Li Bojie is a big boy with thick eyebrows. He is very excited when talking about research and will unconsciously move his hands. In the middle of the conversation, he suddenly stood up without warning, picked up a black pen and drew a quadrant chart on a whiteboard. The X-axis is innovation, the Y-axis is practicality, and clockwise are “Pasteur, Einstein, academic garbage, Edison” - this is the “philosophy” once talked about by mentor Huo Qiang at the Institute. He said seriously: “Pasteur is doing bacterial research, and later developed immunology, innovative and practical, I want to go in this research direction under the guidance of the mentor in the future.”
The Institute does not have KPI, but heroes are judged by influence. In order to do influential research, researchers need to constantly question themselves: Is this research just a little bit? If it’s just a little bit, why do it? Where is the limit of this direction? Can we break this limit? What is the Big Picture of this work? Can it be extended to many problems? If so, you may be able to define a new direction, let more people come to participate in your research, and make your work more influential.
But for interns who have just started without any scientific research experience, this goal is still too far away. The most direct manifestation is that the first draft of the paper is full of grammatical errors and the logic is not coherent, just like a car accident scene. After becoming an intern in the system group, Li Bojie’s first international paper was almost made out with the help of mentor Zhang Lintao from the framework conception, experimentation, to the final writing. “Compared to school, a good point of MSRA is that there are very experienced mentors to do the first article.” Li Bojie said.
4
Not only the Microsoft Asia Research Institute, but in the early stage of the fall of artificial intelligence into the industry, whether it is BAT or the AI team of start-up companies, they first send their scientists and algorithm engineers to the factory to find the pain points of the enterprise and the value points of artificial intelligence, and explore together.
Under the push of this transformation, changes began to take place within the Institute. In Li Bojie’s eyes, the Institute is no longer an academic institution like a nursing home, but more like an Internet company that is always flowing. He saw that while those big cows were leaving the Institute, fresh blood from the academic world, Internet companies, and start-up companies was constantly joining the Institute. In the machine learning group where Qintao is located, a researcher who once went out to start a business recently returned to the Institute and was responsible for external project cooperation.
“People from the Institute go out to be executives in major companies, and there will be news, but the Institute has always been recruiting some promising newcomers, but this will not cause a particularly big response in society.” Li Bojie said.
5
During his five years at the Institute, Li Bojie felt that his vision became wider and wider. In the first year, he just wanted to complete the tasks assigned by the mentor and get the code running. After the AI boom, he began to think about how to improve the paper on the existing basis, and whether the research results would have influence in the industry. This year, he began to think from the perspective of the industry, thinking about where the system he made can be applied specifically, and what common problems are worth studying.
Two of Bojie Li’s papers have been accepted by two top conferences, one of which can increase system performance by 10 times. In his words: if the operation of grabbing Spring Festival train tickets is regarded as a key-value access, using his research results, it would be possible for everyone in the country to grab tickets once per second. Seeing his research results being deployed on 1 million machines is something he could not have imagined five years ago. He still wants to stay here to continue his research, and he has just passed the first round of the research institute’s recruitment interview.
2018-04-09
RDMA is becoming prevalent because of its low latency, high throughput and low CPU overhead. However, current RDMA remains a single path transport which is prone to failures and falls short to utilize the rich parallel paths in datacenters. Unlike previous multipath approaches, which mainly focus on TCP, this paper presents a multi-path transport for RDMA, i.e. MP-RDMA, which efficiently utilizes the rich network paths in datacenters.
MP-RDMA employs three novel techniques to address the challenge of limited RDMA NICs on-chip memory size: 1) a multi-path ACK-clocking mechanism to distribute traffic in a congestion-aware manner without incurring per-path states; 2) an out-oforder aware path selection mechanism to control the level of out-of-order delivered packets, thus minimizes the meta data required to them; 3) a synchronise mechanism to ensure in-order memory update whenever needed.
With all these techniques, MP-RDMA only adds 66B to each connection state compared to single-path RDMA. Our evaluation with an FPGA-based prototype demonstrates that compared with single-path RDMA, MP-RDMA can significantly improve the robustness under failures (2x~4x higher throughput under 0.5%∼10% link loss ratio) and improve the overall network utilization by up to 47%.
2018-02-16
(Reprinted from AI Tech Review WeChat public account, thanks to student Cui Tianyi for the interview)
2018-02-14
(Reposted from Microsoft Research Institute AI Headlines WeChat Public Account, thanks to Sister Beibei for the interview)
2018-01-03
(Reposted from USTC Graduate Student Union WeChat public account, thanks to Zhu Yixing and other students for the interview)