Bojie Li
2013-03-19
Terminology: (It feels quite awkward)
- Mirror: Like Ubuntu, CPAN, PyPi, etc.
- Mirror site: An open source mirror site at a university, each mirror site hosts several mirrors
- (Mirror) Node: Each mirror site that hosts the same mirror
Mirror Alliance Maintainers:
Following the Debian development model, the maintainers of the Mirror Alliance can be divided into Mirrors Maintainer and Mirrors Developer. Mirrors Developer are core developers who have maintenance access and decision-making voting rights on the main site of the Mirror Alliance; anyone involved in the development and maintenance of the Mirror Alliance can apply to become a Mirrors Maintainer.
2012-11-27
Welcome to USTC Blog. USTC Blog, after much anticipation, is finally online! This is a moment worth celebrating, but it also means that we will be continuously battling with bugs and features in the coming days.
Just daydreaming, when can the blog reach the traffic volume in the following picture?
The above picture is the traffic graph of mirrors.ustc.edu.cn
2012-11-27
Dr. Bojie Li was an Assistant Scientist and Associate Chief Expert with Computer Network and Protocol Lab, Distributed and Parallel Software Lab, Central Software Institute, Huawei 2012 Labs. In 2019, Dr. Li obtained his Ph.D. in Computer Science from University of Science and Technology of China (USTC) and Microsoft Research Asia (MSRA), supervised by Prof. Lintao Zhang and Prof. Enhong Chen. He has published papers on top conferences such as SIGCOMM, SOSP, NSDI, ATC, and PLDI. He is a recipient of ACM China Doctoral Dissertation Award and Microsoft Research Asia Ph.D. Fellowship Award.
Bojie Li once served as an Assistant Scientist and Associate Chief Expert at Huawei 2012 Labs/Central Software Institute/Distributed and Parallel Software Lab/Computer Network and Protocol Lab. In 2019, Bojie Li obtained his Ph.D. in Computer Science from the joint doctoral program of University of Science and Technology of China and Microsoft Research Asia, supervised by Prof. Lintao Zhang and Prof. Enhong Chen. He has published multiple papers at top conferences such as SIGCOMM, SOSP, NSDI, ATC, and PLDI, and has received the ACM China Doctoral Dissertation Award and the “Microsoft Scholar” Fellowship.
2012-11-27
Past Research Projects
FastWake bridges the performance gap between interrupt and polling in RDMA by redesigning interrupt-mode RDMA host network stack using commodity RDMA hardware, Linux OS, and unmodified applications. Published in APNET’23.
AKG (Auto Kernel Generator) is a tensor compiler for NPUs. AKG leverages polyhedral schedulers to perform a much wider class of transformations, and extends the semantics of the polyhedral representation to combine complex tiling techniques and hierarchical fusion strategies. Published in MICRO’20 and PLDI’21.
1Pipe is a causal and total order communication primitive to scatter groups of messages via data center network. With in-network computation using Barefoot or Arista switches, 1Pipe achieves scalability and high performance with low CPU and network overheads. Published in SIGCOMM’21.
SocksDirect is a high performance user-space socket system that is compatible with existing applications and preserves isolation among processes, while being scalable to multiple cores. Performance close to RDMA and shared memory. Published in SIGCOMM’19.
KV-Direct is a high performance key-value store that leverages programmable NIC to extend RDMA primitives and enable remote direct key-value access to the main host memory. A single NIC achieves 180 million key-value operations per second while keeping tail latency below 10µs. Published in SOSP’17.
ClickNP is a highly flexible and high-performance network processing platform with reconfigurable hardware. Completely programmable using C-like language and Click-like modular programming abstraction. Process packets at up to 200 million packets per second with less than 2µs latency. Published in SIGCOMM’16.
FTRouter is a fault-tolerant software architecture for SDN routers, which allows any component to fail or upgrade without interrupting data plane, and the control plane can automatically recover. Dissertation for Bachelor’s Degree.
Participated Research Projects
MP-RDMA is a multi-path hardware-based transport for RDMA, which efficiently utilizes the rich network paths in datacenters, and optimizes for limited on-chip memory in RDMA NICs. Published in NSDI’18.
MELO is a memory efficient loss recovery mechanism for hardware-based transport in datacenters. Up to 14x throughput and 3x less 99% tail FCT with only 23B per-flow state. Published in APNet’17.
FUSO is a novel loss recovery approach that exploits multi-path diversity in datacenter networks. Recovery packets are sent over another sub-flow that is not or less lossy. Published in ATC’16.
Feniks is an operating system for FPGA to facilitate large scale FPGA deployment in datacenters. Provides abstracted interface, direct PCIe device access and resource allocation. Published in APSys’17.
Preliminary Research Projects
FTLinux is a transparent and efficient fault tolerant system for general distributed applications on commodity Linux servers. Efficient mechanisms for process migration, deterministic replay and distributed snapshots. Negligible latency and CPU overhead, fast recovery.
ReactDB is a real-time hybrid HTAP and streaming database that offers serializability efficiently. First, each stored procedure transaction is reactive to updates from other concurrent transactions. Second, physical data layout and indexes are reactive to data access pattern.
RDMA NICs have limited memory to store per-flow states. We design a stateless hardware-based transport in data center networks. Instead of storing per-flow states on endpoints, the states are piggybacked by network packets and keep bouncing between two endpoints.
P4Coder is a system to automatically synthesize hardware-accelerated data plane in P4 language by learning the behavior of an existing software network application. Capable of synthesizing data plane of firewall, TCP, key-value store, Paxos and more.
A transparent PCIe bump-in-the-wire debugger and gateway with a commodity FPGA-based PCIe board. Spoofs PCIe devices and corresponding OS drivers to proxy MMIO and DMA traffic via the PCIe gateway.
A library to transparently hide offloading latency by executing non-conflicting work for existing event-driven concurrent applications.
Selected Engineering Projects
icourse.club is a website for USTC students to rate and review courses. Since May 2015, icourse.club has gained 6,000+ users, who generated 16,000+ high-quality reviews and ratings for 2,800+ courses in USTC. It is open source on GitHub under GNU Affero General Public License.
A scalable and efficient architecture for RSA encryption/decryption on FPGA to accelerate HTTPS handshake. Throughput equivalent to 20 CPU cores.
LUG VPN is a smart global VPN network to enable students efficiently access every host across the Internet from any network location. Users connect to an access gateway nearby, which selects an egress gateway close to destination, forwards to the egress gateway via optimized tunnel.
2012-11-27
Container is an emerging lightweight virtualization technology, and a way to encapsulate software dependencies and simplify deployment. When we started to build Freeshell in 2012, Docker was premature and the dominant container technology was OpenVZ. So we built Freeshell, an elastic and efficient container cloud with OpenVZ. Freeshell attracted 2,000+ users in USTC campus. Our carefully tuned system consolidated 1,000+ active containers into 8 servers.
Freeshell offers multiple pre-created OS images, including Debian, Ubuntu, CentOS, Archlinux and more Linux distributions. To consolidate disk storage of multiple containers, we store the file system of each container as the delta of the base image, and AUFS provides transparency to applications.
2012-11-27
Although the Internet is designed to be end-to-end, not all IP addresses are accessible from everywhere due to NAT, firewall, organizational policies, and other network middleboxes. Additionally, direct routing may not be the best path due to ISP QoS policies and limited interconnect bandwidth between some ASes. Furthermore, TCP Cubic congestion control performs notably poorly on long fat pipes with occasional packet loss.
Since 2013, we have designed, deployed, and operated a global VPN network to enable USTC students to efficiently access every host across the Internet from any network location. LUG VPN has over one thousand users, serving about one terabyte of network traffic every day via tens of servers in global data centers.
Unlike traditional VPNs where the user connects to a gateway server and the server directly accesses the Internet, LUG VPN forwards traffic among gateway servers and finds an optimal server to access the destination host. Consequently, there are two or three hops from a VPN user to the destination host: from the user to an access gateway server, from the access gateway to an egress gateway (this hop is optional), and from the egress gateway to the destination host. By adding an additional hop, LUG VPN has the freedom to optimize the tunnel between the access gateway and egress gateway to bypass firewalls and improve QoS.
To use the best egress gateway from an access gateway to a destination, we use a GeoIP-based strategy. First, we run a recursive DNS server on each access gateway to resolve users’ DNS requests, so that if a multi-homed website uses a GeoIP-based authoritative DNS server, it can return IP addresses close to the access gateway. Second, a GeoIP-based routing table is configured on each access gateway to route IP packets (including DNS packets) directly to the destination or via a tunnel to an egress gateway. To make the routing decision for every pair of geographic regions, we use periodic probing to find the latency and bandwidth for each route, i.e., pass through an egress gateway or direct route. In addition, we have some rules to exclude certain routes in order to bypass content-based firewalls as well as comply with copyright restrictions. Among all candidate routes, we first find the best route with the lowest latency and highest bandwidth, then find other candidate routes whose latency and bandwidth are close to the best route. To load balance traffic among egress gateways, we use weighted round-robin to route connections via all candidate routes.
To ensure efficient forwarding between access and egress gateways, we create multiple tunnels between each pair of gateways using different tunneling technologies. Some are standard tunneling protocols (e.g. GRE). Others disguise the tunnel as another protocol (e.g. HTTPS) to bypass protocol filters and improve QoS priority in ISP network. If one tunneling technology is blocked, the system automatically switches to another. To reduce TCP handshake latency, we terminate TCP connections at both access and egress gateways, so that each TCP connection is broken into three relayed TCP connections. For the tunnel connection between access and egress gateways, we deploy multiple WAN optimization techniques, including loss-agnostic congestion control, FEC (forward error correction) on lossy links and compression.
The user can access the VPN by connecting to any of the public gateway servers via multiple protocols, including IPSec, OpenVPN, PPTP, L2TP, ShadowSocks, Socks5, IP over DNS and HTTP(S) proxy. In order to enable the user to access a nearby gateway server, the user specifies a domain name, and our authoritative DNS server resolves it to the IP address of a closest gateway server based on the GeoIP location of the user’s IP. Using DNS also adds another layer of indirection for the system to automatically remove access gateways that fail or overload.
To authenticate users, we deploy a LDAP system that integrates with the access gateways and develop a Web-based application system. To recover from failures automatically, the servers monitor each other via periodic tunnel ping probing and exchange results. If more than half of peer servers find a server fails, the peer servers issue a remote reboot command via management API (for cloud servers) or IPMI (for bare-metal servers). To enable users to only route a range of destination hosts via VPN, we offer multiple configuration options in the software.
LUG VPN offers good reachability and performance in multiple usage scenarios. First, users with limited Internet access can access all sites. Second, users outside the campus can access the campus Intranet. Third, users outside the campus can access scholarly articles that is only available for campus IP addresses. Fourth, with WAN optimization, users experience much better throughput and latency, especially for international traffic. Additionally, users use LUG VPN to interconnect devices behind NAT and firewall.