Schedule

Given the pace of innovations in this area, the following list is subject to change.

Color Legend: Presenter Reviewer Scriber

Introduction

Jan 8
Course Introduction
Anand
πŸ“– How to Read a Paper
πŸ“– How to Give a Bad Talk
πŸ“– Writing Reviews for Systems Conferences
Paper Presentation Preferences Fill out the form here
Jan 10
Overview of Challenges
Anand
πŸ“– Challenges and Applications of Large Language Models
πŸ“– Understanding LLMs: A Comprehensive Overview from Training to Inference
Jan 12
Paper Presentation Preferences Due
Jan 15
No class Martin Luther King, Jr. Day

Basics of LLMs

Project

Pre-training

Fine-tuning

Retrieval & Augmentation

Inference

Project

Mar 11
Mid-Semester Project presentations
Mar 13
Mid-Semester Project presentations
Mar 18
No class Spring break
Mar 20
No class Spring break

Special Topics

Mar 25
Mixture-of-Experts
πŸ“– DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
πŸ“– Fast Inference of Mixture-of-Experts Language Models with Offloading Required
Vima Jingli Aditya
πŸ“– Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models
πŸ“– MegaBlocks: Efficient Sparse Training with Mixture-of-Experts Required
Shubham Mithilesh Kartik
πŸ“– Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
πŸ“– Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
πŸ“– Tutel: Adaptive Mixture-of-Experts at Scale
Mar 27
Model Compression
πŸ“– Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
πŸ“– AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Required
Chinmay Zachary Rishi
πŸ“– GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale Required
Aniruddha Mohit Abhimanyu
πŸ“– GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
πŸ“– QLORA: Efficient Finetuning of Quantized LLMs
πŸ“– SqueezeLLM: Dense-and-Sparse Quantization
Apr 1
Dynamism in Large Models
πŸ“– Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
πŸ“– Confident Adaptive Language Modeling Required
Arpan Sera Ziyuan
πŸ“– Optimizing Dynamic Neural Networks with Brainstorm Required
Vima Zachary Huayi
Apr 3
Legal & Ethical Considerations
πŸ“– Ethical and social risks of harm from Language Models Required
Rohan Kartik Aayush
πŸ“– Foundation Models and Fair Use
πŸ“– On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 Required
Rajveer Shivashankar Mingzheng
Apr 8
Class Canceled Solar Eclipse
Apr 10
Security Implications
πŸ“– Extracting Training Data from Diffusion Models
πŸ“– Extracting Training Data from Large Language Models Required
Sashankh Apoorva Azeez
πŸ“– Identifying and Mitigating the Security Risks of Generative AI Required
Alex Rishi Chinmay

Conclusion

Apr 10
Course Wrap-up
Anand
April 15
Final Project presentations
Apr 17
Final Project presentations
Apr 22
Final Project presentations
Apr 26
Final Project Report + Code Due