Page 13 | Bojie Li

Hardware-based transports, such as RDMA, are becoming prevalent because of its low latency, high throughput and low CPU overhead. However, current RDMA NICs have limited NIC memory to store per-flow transport states. When the number of flows exceed memory capacity, the NIC needs to swap out flow states to host memory via PCIe, leading to performance degradation.

This paper presents a hardware-based transport without per-flow state. At its core, flow state bounces between the two end hosts along with a data packet, analagous to a thread whose state is always in-flight. To enable multiple in-flight packets, each thread is assigned a distinct sequence of packets to send. We enable each thread to fork, throttle and merge independently, which effectively simulates a window-based congestion control mechanism. For loss recovery, we design an epoch-based single loss detector for all flows, which enables selective retransmission and the storage size is proportional to the number of lost packets in a round trip. When there are more losses than the NIC can handle, the receiver CPU is notified to recover loss.

We design and implement RDMA, TCP and TLS transports without per-flow states in an FPGA prototype. The transports have small network bandwidth and CPU overhead. Simulations and testbed experiments show that flows share network bandwidth fairly in a multi-bottleneck network, and solves the incast problem even better than DCTCP and DCQCN. With a large number of concurrent flows, the throughput of our stateless hardware-based TLS transport is 100x of a stateful hardware-based transport and 50x of a software-based transport.

2018-11-28

AI Finance Society Interview: No.5 Danleng Street Magic Academy

(This is an interview script from the AI Finance Society WeChat public account “No.5 Danleng Street Magic Academy”. Due to copyright issues, only excerpts including myself are included. For the full text, please follow the “AI Finance Society” WeChat public account.)

1

Li Bojie is a doctoral intern jointly trained by the Institute and the University of Science and Technology of China. He may be one of the few people who have had a “connection” with the Institute since childhood. In junior high school, he dabbled in Olympiad in Informatics, attended computer training classes, and once asked in the “Kaifu Student Network”: “What will be the future development of computers?” He can’t remember the reply from Li Kaifu in the forum, but that remote conversation excited the young boy in front of the screen and made him feel encouraged from afar.

Later, he studied the deans of the Microsoft Asia Research Institute and found that they were all trained by Carnegie Mellon University, almost in the same major, and under similar mentors. “This gave me an inspiration, I must go to a good environment like the Microsoft Research Institute to do academics, and the connections accumulated in it will also provide great help for my future development.”

Before coming to the Institute, Li Bojie also had the idea of studying abroad, but it was too difficult to apply for a computer powerhouse like Carnegie Mellon from China. Many people are scrambling to go to the top 50 universities in the world, but in his view, “except for the TOP20, our MSRA is higher in research level than other schools ranked 20-50.”

2

Li Bojie and his mentor Zhang Lintao occasionally have conflicts in the selection of research projects. Sometimes Li Bojie is persuaded, and sometimes he insists on his own opinions and persuades his mentor after deep research. Mentor Zhang Lintao would say to Li Bojie: “Although I have a lot of experience, I don’t necessarily understand everything. The breakthrough in details still depends on your own thinking, and then we communicate and feedback.”

3

In the conversation with the reporter, it can be seen that their eyes began to shine when they talked about “what is research”.

Li Bojie remembers that when he was interviewed at the Institute, he was well prepared to talk about professional issues. Because he was too interested in computers, he switched from the mathematics department to the computer major in college, and used an innovative technology with his classmates to make it easier for thousands of people in the school to access the Internet. But at the end of the interview, mentor Zhang Yongguang asked him: “What is network research like?” Li Bojie was stunned on the spot and didn’t know how to answer. He often tinkered with servers at school, so he imagined that network research is about pulling network cables, matching IP addresses, and doing some hard work.

During his 5-year internship at the Institute, Li Bojie gradually realized something. “Doing research is different from engineering. Doing research is to create something that others do not have, and what you make may not be useful for a while, you have to be able to withstand loneliness, the key is whether you have the attitude of creating knowledge for mankind.”

The process of research is also full of philosophical thinking. The most famous “philosophy” of mentor Zhang Lintao is the “30-year theory”. He observed that many technologies, such as the now popular neural networks, have gone through a 30-year twist - first there is a bud, then “die”, and then live after a trough of one or two decades. Why is it 30 years?

“In the past, the conditions were not mature, and the mentors were all tormented by this thing, so they would tell their doctoral students that this thing is unreliable. But 30 years is basically a generation. When the mentors are old, the newcomers pick up the old Idea, and the opportunity may just come.”

Li Bojie is a big boy with thick eyebrows. He is very excited when talking about research and will unconsciously move his hands. In the middle of the conversation, he suddenly stood up without warning, picked up a black pen and drew a quadrant chart on a whiteboard. The X-axis is innovation, the Y-axis is practicality, and clockwise are “Pasteur, Einstein, academic garbage, Edison” - this is the “philosophy” once talked about by mentor Huo Qiang at the Institute. He said seriously: “Pasteur is doing bacterial research, and later developed immunology, innovative and practical, I want to go in this research direction under the guidance of the mentor in the future.”

The Institute does not have KPI, but heroes are judged by influence. In order to do influential research, researchers need to constantly question themselves: Is this research just a little bit? If it’s just a little bit, why do it? Where is the limit of this direction? Can we break this limit? What is the Big Picture of this work? Can it be extended to many problems? If so, you may be able to define a new direction, let more people come to participate in your research, and make your work more influential.

But for interns who have just started without any scientific research experience, this goal is still too far away. The most direct manifestation is that the first draft of the paper is full of grammatical errors and the logic is not coherent, just like a car accident scene. After becoming an intern in the system group, Li Bojie’s first international paper was almost made out with the help of mentor Zhang Lintao from the framework conception, experimentation, to the final writing. “Compared to school, a good point of MSRA is that there are very experienced mentors to do the first article.” Li Bojie said.

4

Not only the Microsoft Asia Research Institute, but in the early stage of the fall of artificial intelligence into the industry, whether it is BAT or the AI team of start-up companies, they first send their scientists and algorithm engineers to the factory to find the pain points of the enterprise and the value points of artificial intelligence, and explore together.

Under the push of this transformation, changes began to take place within the Institute. In Li Bojie’s eyes, the Institute is no longer an academic institution like a nursing home, but more like an Internet company that is always flowing. He saw that while those big cows were leaving the Institute, fresh blood from the academic world, Internet companies, and start-up companies was constantly joining the Institute. In the machine learning group where Qintao is located, a researcher who once went out to start a business recently returned to the Institute and was responsible for external project cooperation.

“People from the Institute go out to be executives in major companies, and there will be news, but the Institute has always been recruiting some promising newcomers, but this will not cause a particularly big response in society.” Li Bojie said.

5

During his five years at the Institute, Li Bojie felt that his vision became wider and wider. In the first year, he just wanted to complete the tasks assigned by the mentor and get the code running. After the AI boom, he began to think about how to improve the paper on the existing basis, and whether the research results would have influence in the industry. This year, he began to think from the perspective of the industry, thinking about where the system he made can be applied specifically, and what common problems are worth studying.

Two of Bojie Li’s papers have been accepted by two top conferences, one of which can increase system performance by 10 times. In his words: if the operation of grabbing Spring Festival train tickets is regarded as a key-value access, using his research results, it would be possible for everyone in the country to grab tickets once per second. Seeing his research results being deployed on 1 million machines is something he could not have imagined five years ago. He still wants to stay here to continue his research, and he has just passed the first round of the research institute’s recruitment interview.

2018-04-09

MP-RDMA: Multi-Path Transport for RDMA in Datacenters

RDMA is becoming prevalent because of its low latency, high throughput and low CPU overhead. However, current RDMA remains a single path transport which is prone to failures and falls short to utilize the rich parallel paths in datacenters. Unlike previous multipath approaches, which mainly focus on TCP, this paper presents a multi-path transport for RDMA, i.e. MP-RDMA, which efficiently utilizes the rich network paths in datacenters.

MP-RDMA employs three novel techniques to address the challenge of limited RDMA NICs on-chip memory size: 1) a multi-path ACK-clocking mechanism to distribute traffic in a congestion-aware manner without incurring per-path states; 2) an out-oforder aware path selection mechanism to control the level of out-of-order delivered packets, thus minimizes the meta data required to them; 3) a synchronise mechanism to ensure in-order memory update whenever needed.

With all these techniques, MP-RDMA only adds 66B to each connection state compared to single-path RDMA. Our evaluation with an FPGA-based prototype demonstrates that compared with single-path RDMA, MP-RDMA can significantly improve the robustness under failures (2x~4x higher throughput under 0.5%∼10% link loss ratio) and improve the overall network utilization by up to 47%.

2018-02-16

AI Tech Review: Happy New Year

(Reprinted from AI Tech Review WeChat public account, thanks to student Cui Tianyi for the interview)

2018-02-14

'Valentine's Day Confession: What do you like about MSRA'

(Reposted from Microsoft Research Institute AI Headlines WeChat Public Account, thanks to Sister Beibei for the interview)

2018-01-03

Snail Says: From a Failing Student to a Microsoft Scholarship Winner

(Reposted from USTC Graduate Student Union WeChat public account, thanks to Zhu Yixing and other students for the interview)

2018-01-01

FTLinux: Transparent and Efficient Fault Tolerance for Distributed Applications

Fault tolerance is critical for distributed applications. Many request serving and batch processing frameworks have been proposed to simplify programming of fault tolerant distributed systems, which basically ask the programmers to separate states from computation and store states in a fault-tolerant system. However, many existing applications (e.g. Node.js, Memcached and Python in Tensorflow) do not support fault tolerance, and fault tolerant systems are often slower than their non-fault-tolerant counterparts. In this work, we take up the challenge of achieving transparent and efficient fault tolerance for general distributed applications. Challenges include process migration, deterministic replay and distributed snapshot.

2018-01-01

P4Coder: Specializing Network Applications to Packet Programs via Automated Behavior Learning

To improve performance and reduce CPU overhead for network applications, programmable switches and NICs have been introduced in data centers to offload virtualized network functions, transport protocols, key-value stores, distributed consensus and resource disaggregation. Compared to general-purpose processors, programmable switches and NICs have more limited resources and only support a more constrained programming model. To this end, developers typically split a network function into a data plane to process common-case packets and a control plane to handle the remaining cases. The data plane function is then implemented in a packet processing language (e.g. P4) and offloaded into hardware.

Writing packet programs for network application offloading could be hard labor. First, even if the protocol specification (or source code) is available, the developer needs to read the thousand-page book (or code) and figure out which part are the common cases. Second, many implementations have subtle variations from the specification, so the developer often needs to examine packet traces and reverse-engineer the implementationspecific behaviors manually. Third, the offloaded function needs rewrite when the application is updated (e.g. regular expressions in a firewall).

We design P4Coder, a system to automatically synthesis the data plane by learning the behavior of a reference network application. No formal specification or source code is required. The developer only needs to design a few data-plane test cases and run the reference application. P4Coder captures the input and output packets, and searches for a packet program to produce identical output packets for the sequence of input packets. Obviously, passing the test cases does not imply that the program will generalize correctly for other inputs.

RSS

Bojie Li

2019-08-08

Introduction to Score Cheating

2019-08-08

Primary and Secondary School Composition Drafts

2019-08-07

A Stateless Hardware-based Transport in Data Centers

2018-11-28

AI Finance Society Interview: No.5 Danleng Street Magic Academy

1

2

3

4

5

2018-04-09

MP-RDMA: Multi-Path Transport for RDMA in Datacenters

2018-02-16

AI Tech Review: Happy New Year

2018-02-14

'Valentine's Day Confession: What do you like about MSRA'

2018-01-03

Snail Says: From a Failing Student to a Microsoft Scholarship Winner

2018-01-01

FTLinux: Transparent and Efficient Fault Tolerance for Distributed Applications

2018-01-01

P4Coder: Specializing Network Applications to Packet Programs via Automated Behavior Learning

Links

Bojie Li

2019-08-08 Introduction to Score Cheating

2019-08-08 Primary and Secondary School Composition Drafts

2019-08-07 A Stateless Hardware-based Transport in Data Centers

2018-11-28 AI Finance Society Interview: No.5 Danleng Street Magic Academy

1

2

3

4

5

2018-04-09 MP-RDMA: Multi-Path Transport for RDMA in Datacenters

2018-02-16 AI Tech Review: Happy New Year

2018-02-14 'Valentine's Day Confession: What do you like about MSRA'

2018-01-03 Snail Says: From a Failing Student to a Microsoft Scholarship Winner

2018-01-01 FTLinux: Transparent and Efficient Fault Tolerance for Distributed Applications

2018-01-01 P4Coder: Specializing Network Applications to Packet Programs via Automated Behavior Learning

Links

2019-08-08

Introduction to Score Cheating

2019-08-08

Primary and Secondary School Composition Drafts

2019-08-07

A Stateless Hardware-based Transport in Data Centers

2018-11-28

AI Finance Society Interview: No.5 Danleng Street Magic Academy

2018-04-09

MP-RDMA: Multi-Path Transport for RDMA in Datacenters

2018-02-16

AI Tech Review: Happy New Year

2018-02-14

'Valentine's Day Confession: What do you like about MSRA'

2018-01-03

Snail Says: From a Failing Student to a Microsoft Scholarship Winner

2018-01-01

FTLinux: Transparent and Efficient Fault Tolerance for Distributed Applications

2018-01-01

P4Coder: Specializing Network Applications to Packet Programs via Automated Behavior Learning