Bojie Li
2022-01-08
October 6, 2021, Lijiang
Short video (HD, 467 MB, 6:26)
Click here to view the wedding photo online album (70 refined photos)
Click here to view the wedding photo online album (some original photos)
2021-08-24
1Pipe is a causal and total order communication primitive to scatter groups of messages via data center network. With in-network computation using Barefoot or Arista switches, 1Pipe achieves scalability and high performance with low CPU and network overheads. Published in SIGCOMM’21.
2021-07-06
AKG (Auto Kernel Generator) is a tensor compiler for NPUs. AKG leverages polyhedral schedulers to perform a much wider class of transformations, and extends the semantics of the polyhedral representation to combine complex tiling techniques and hierarchical fusion strategies. Published in MICRO’20 and PLDI’21.
2021-05-01
12:00, May 1, 2021, Hebei Yunzhen Century Hotel
Click here to see the photo book I made
Proposal video: (HD, 111 MB, 4:23)
Engagement ceremony video: (SD, 37 MB, 10:41)
2021-02-26
This article is reprinted from the Microsoft Research Asia (MSRA) official account, thanks to MSRA for the invitation!
Pursuing a doctorate is a lonely and challenging journey. If there are senior brothers and sisters to guide you, they can definitely help you out of confusion and bravely make choices. On February 8, five alumni who graduated from the Microsoft Research Asia Joint Doctoral Training Program shared their personal insights during their doctoral studies and subsequent work online to answer questions and encourage everyone to stick to their dreams. This event was organized and hosted by Sun Lijun, Senior Academic Manager of Microsoft Research Asia, and Dou Anqi, the person in charge of the internship program.
We hope that the sharing of these senior brothers and sisters can encourage and inspire every one of you on the road to a doctorate, and let you enjoy this beautiful and challenging time~
Guest speakers:
- Fu Xiaoming, 2016 USTC-Microsoft Joint Training Doctor, currently Associate Researcher at University of Science and Technology of China
- Zhang Chi, 2017 Sun Yat-sen University-Microsoft Joint Training Doctor, currently DeepMotion Co-Founder, R&D Director
- Huang Danqing, 2019 Sun Yat-sen University-Microsoft Joint Training Doctor, currently a researcher at Microsoft Research Asia
- Li Bojie, 2019 USTC-Microsoft Joint Training Doctor, currently a senior research engineer at Huawei 2012 Lab
- Li Xiao, 2019 USTC-Microsoft Joint Training Doctor, currently a researcher at Microsoft Research Asia
2019-12-08
Doctoral Thesis from University of Science and Technology of China, Author: Bojie Li
Chinese Version: High Performance Data Center Systems Based on Programmable Network Cards (PDF, 8 MB)
AI Translated Unofficial English Version: High Performance Data Center Systems with Programmable Network Interface Cards (PDF, 8 MB)
Publication Date: 2019-05-26.
2019-08-19
Communication intensive applications in hosts with multi-core CPU and high speed networking hardware often put considerable stress on the native socket system in an OS. Existing socket replacements often leave significant performance on the table, as well have limitations on compatibility and isolation.
In this paper, we describe SocksDirect, a user-space high performance socket system. SocksDirect is fully compatible with Linux socket and can be used as a drop-in replacement with no modification to existing applications. To achieve high performance, SocksDirect leverages RDMA and shared memory (SHM) for inter-host and intra-host communication, respectively. To bridge the semantics gap between socket and RDMA/SHM, we optimize for the common cases while maintaining compatibility in general. SocksDirect achieves isolation by employing a trusted monitor daemon to handle control plane operations such as connection establishment and access control. The data plane is peer-to-peer between processes, in which we remove multi-thread synchronization, buffer management, large payload copy and process wakeup overheads in common cases. Experiments show that SocksDirect achieves 7 to 20x better message throughput and 17 to 35x better latency compared with Linux socket, and reduces Nginx HTTP latency by 5.5 times.
2019-08-08
Introduction to Score Cheating (Informatics Competition)
Introduction to Score Cheating (Mathematics Competition)
The Principle of Numbers (July 2009)
Introduction to Score Cheating (Informatics Competition) Source File (Yongzhong Office Format)
Introduction to Score Cheating (Mathematics Competition) Source File (Microsoft Office Format)
2019-08-08
2019-08-07
Hardware-based transports, such as RDMA, are becoming prevalent because of its low latency, high throughput and low CPU overhead. However, current RDMA NICs have limited NIC memory to store per-flow transport states. When the number of flows exceed memory capacity, the NIC needs to swap out flow states to host memory via PCIe, leading to performance degradation.
This paper presents a hardware-based transport without per-flow state. At its core, flow state bounces between the two end hosts along with a data packet, analagous to a thread whose state is always in-flight. To enable multiple in-flight packets, each thread is assigned a distinct sequence of packets to send. We enable each thread to fork, throttle and merge independently, which effectively simulates a window-based congestion control mechanism. For loss recovery, we design an epoch-based single loss detector for all flows, which enables selective retransmission and the storage size is proportional to the number of lost packets in a round trip. When there are more losses than the NIC can handle, the receiver CPU is notified to recover loss.
We design and implement RDMA, TCP and TLS transports without per-flow states in an FPGA prototype. The transports have small network bandwidth and CPU overhead. Simulations and testbed experiments show that flows share network bandwidth fairly in a multi-bottleneck network, and solves the incast problem even better than DCTCP and DCQCN. With a large number of concurrent flows, the throughput of our stateless hardware-based TLS transport is 100x of a stateful hardware-based transport and 50x of a software-based transport.