Sitemap - 2024 - Gonzo ML
Star Attention: Efficient LLM Inference over Long Sequences
The Super Weight in Large Language Models
JAX things to watch for in 2025
Diffusion Models are Evolutionary Algorithms
“Deep Learning with JAX” is out!
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Transformer Layers as Painters
Chain-of-Thought → Whiteboard-of-Thought
TextGrad: Automatic "Differentiation" via Text
Superconducting Supercomputers
Open-Endedness is Essential for Artificial Superhuman Intelligence
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Optimizing large-context models
xLSTM: Extended Long Short-Term Memory
Chronos: Learning the Language of Time Series
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
[DeepMind SIMA] Scaling Instructable Agents Across Many Simulated Worlds
OLMo: Accelerating the Science of Language Models
Thermodynamic AI is getting hotter
Optimizing Distributed Training on Frontier for Large Language Models
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws