Paper List for AI System

Posted on 2024-08-20 Edited on 2025-06-09

Survey

NSDI ‘24 MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
NSDI ‘24 Resiliency at Scale: Managing Google’s TPUv4 Machine Learning Supercomputer
SOSP ‘23 Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates

NSDI ‘23 TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
NSDI ‘23 TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
NSDI ‘23 Better Together: Jointly Optimizing ML Collective Scheduling and Execution Planning using SYNDICATE
SIGCOMM ‘24 Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem
ASPLOS ‘23 Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models
ASPLOS ‘23 In-Network Aggregation with Transport Transparency for Distributed Training
ASPLOS ‘24 Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning
ASPLOS ‘24 AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning

NSDI ‘23 Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs
NSDI ‘23 Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
NSDI ‘23 Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning
SOSP ‘23 Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling
ASPLOS ‘23 Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-demand VMs
ASPLOS ‘23 Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
ASPLOS ‘23 ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
ASPLOS ‘24 Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters
OSDI ‘24 MAST: Global Scheduling of ML Training across Geo-Distributed Datacenters at Hyperscale

NSDI ‘24 Characterization of Large Language Model Development in the Datacenter
OSDI ‘24 When will my ML Job finish? Toward providing Completion Time Estimates through Predictability-Centric Scheduling
NSDI ‘23 ModelKeeper: Accelerating DNN Training via Automated Training Warmup
NSDI ‘23 BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
SIGCOMM ‘23 anus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models

SOSP ‘23 GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints

NSDI ‘23 SHEPHERD: Serving DNNs in the Wild
OSDI ‘24 Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
OSDI ‘24 ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
OSDI ‘24 InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
OSDI ‘24 DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
OSDI ‘24 Llumnix: Dynamic Scheduling for Large Language Model Serving

OSDI ‘24 USHER: Holistic Interference Avoidance for Resource Optimized ML Inference
OSDI ‘24 Fairness in Serving Large Language Models
OSDI ‘24 dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
ASPLOS ‘24 ExeGPT: Constraint-Aware Resource Allocator for LLM Inference
ASPLOS ‘24 Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
ASPLOS ‘24 SpotServe: Serving Generative Large Language Models on Preemptible Instances

NSDI ‘24 Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models
ATC ‘24 Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention
SOSP ‘23 Efficient Memory Management for Large Language Model Serving with PagedAttention
SIGCOMM ‘24 CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

SIGCOMM ‘24 NetLLM: Adapting Large Language Models for Networking
SIGCOMM ‘24 Alibaba HPN: A Data Center Network for Large Language Model Training

ASPLOS ‘24 RainbowCake: Mitigating Cold-starts in Serverless with Layer-wise Container Caching and Sharing
ASPLOS ‘23 AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-stage Serverless Workflows
ASPLOS ‘24 AUDIBLE: A Convolution-Based Resource Allocator for Oversubscribing Burstable Virtual Machines

ASPLOS ‘24 Thesios: Synthesizing Accurate Counterfactual I/O Traces from I/O Samples
ASPLOS ‘24 A Journey of a 1,000 Kernels Begins with a Single Step: A Retrospective of Deep Learning on GPUs
ASPLOS ‘24 DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads
ASPLOS ‘24 NDPipe: Exploiting Near-data Processing for Scalable Inference and Continuous Training in Photo Storage