Profile Photo

I’m a PhD student at the Max Planck Institute for Informatics (MPI-INF), working on data center networks with Yiting Xia. During my PhD, I interned at AWS AI, where I built resilient training systems in JAX.

I love building systems. My experience spans programmable networks, fault-tolerant distributed ML training, high-precision time synchronization, and hardware acceleration with FPGAs.

News

  • We will host a tutorial on OpenOptics at SIGCOMM’25. See you in Coimbra, Portugal!
  • OpSync has been accepted as a SIGCOMM’25 Poster! Come chat with me about time synchronization for reconfigurable data center networks.
  • OpenOptics was accepted as a DEMO at SIGCOMM ’24! See you at Sydney!

Software

OpenOptics Logo

OpenOptics (Website, GitHub) - realizing customized optical data center networks with ~10 lines of code in Python.

Projects

OpSync - A time synchronization protocol for reconfigurable data center networks that outperforms SOTA (PTP, Sundial,Graham) even on static networks.

ResilienX - Checkpoint-free failure recovery for JAX, significantly reducing training wall time while preserving training correctness.

EchelonFlow - Parallelism-aware flow scheduling for collective communication in distributed ML training.

Digital Molecular Computer - A specialized processor for boolean satisfiability problem (SAT) inspired by molecular computing. Prototyped with Verilog and FPGA.

Experience

Misc.

Outside the office, you’ll often find me playing tennis, bouldering, hiking, experimenting in the kitchen, or hanging out with my cat.

Mengmeng