Profile Photo

I’m a PhD student at the Max Planck Institute for Informatics (MPI-INF), working on data center networks with Yiting Xia. During my PhD, I interned at AWS AI, where I built resilient training systems for JAX.

I love building systems. My experience spans programmable networks, fault-tolerant distributed ML training, high-precision time synchronization, and hardware acceleration with FPGAs.

News

Selected Publications

  • [NSDI’26] OpenOptics: An Open Research Framework for Optical Data Center Networks.
    Yiming Lei, Federico De Marchi, Raj Joshi, Jialong Li, Balakrishnan Chandrasekaran, Yiting Xia.

  • [NSDI’26] SyncWise: Error-Aware Time Synchronization for Reconfigurable Data Center Networks.
    Yiming Lei, Jialong Li, Zhengqing Liu, Raj Joshi, Yiting Xia.

  • [ToN’25] Unlocking diversity of fast-switched optical data center networks with unified routing.
    Jialong Li, Federico De Marchi, Yiming Lei, Raj Joshi, Balakrishnan Chandrasekaran, Yiting Xia.

  • [SIGCOMM’24] Uniform-cost multi-path routing for reconfigurable data center networks.
    Jialong Li, Haotian Gong, Federico De Marchi, Aoyu Gong, Yiming Lei, Wei Bai, Yiting Xia.

  • [HotNets’22] Efficient Flow Scheduling in Distributed Deep Learning Training with Echelon
    Formation. Rui Pan*, Yiming Lei*, Jialong Li, Zhiqiang Xie, Binhang Yuan, Yiting Xia. (*Equal Contributions).

Software

OpenOptics Logo

OpenOptics (Website, GitHub) - realizing customized optical data center networks with ~10 lines of code in Python.

Projects

SyncWise - A time synchronization protocol for reconfigurable data center networks that outperforms SOTA (PTP, Sundial,Graham) even on static networks.

ResilienX - Checkpoint-free failure recovery for JAX, significantly reducing training wall time while preserving training correctness.

EchelonFlow - Parallelism-aware flow scheduling for collective communication in distributed ML training.

Digital Molecular Computer - A specialized processor for boolean satisfiability problem (SAT) inspired by molecular computing. Prototyped with Verilog and FPGA.

Experience

Misc.

Outside the office, you’ll often find me playing tennis, bouldering, hiking, experimenting in the kitchen, or hanging out with my cat.

Mengmeng