news
| Oct 06, 2025 | Released the preprint for my work 🪃 Boomerang Distillation Enables Zero-Shot Model Size Interpolation 🪃. We uncover boomerang distillation, a surprising phenomenon by which we can create a full family of models of fine-grained sizes with no additional training by interpolating between a pretrained and distilled model. |
|---|---|
| Sep 01, 2025 | My paper Continuous Language Model Interpolation yields Dynamic and Controllable Text Generation was published at TMLR! |
| Jun 23, 2025 | New preprint out: Hidden Breakthroughs in Language Model Training. We propose POLCA, a method for decomposing changes in the loss along arbitrary bases of the low-rank training subspace, and show that POLCA can be used to find breakthroughs in training that are obscured by aggregating all variation into a single scalar loss term. |