Sitemap - 2024 - Gonzo ML

BLT: Byte Latent Transformer

ModernBERT, the BERT of 2024

Star Attention: Efficient LLM Inference over Long Sequences

The Super Weight in Large Language Models

JAX things to watch for in 2025

Diffusion Models are Evolutionary Algorithms

Make softmax great again

Deep Learning Frameworks

Listen to your papers

Discovering Shetland(ic)

Gödel Agent

The Fall of the Digital Babel

“Deep Learning with JAX” is out!

Were RNNs All We Needed?

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Transformer Layers as Painters

LayerShuffle

Chain-of-Thought → Whiteboard-of-Thought

TextGrad: Automatic "Differentiation" via Text

Superconducting Supercomputers

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Open-Endedness is Essential for Artificial Superhuman Intelligence

Mamba-2 is here!

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Unconventional Values

Optimizing large-context models

xLSTM: Extended Long Short-Term Memory

Yes, we KAN!

Dejavu Transformers

Chronos: Learning the Language of Time Series

Many-Shot In-Context Learning

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

[DeepMind SIMA] Scaling Instructable Agents Across Many Simulated Worlds

OLMo: Accelerating the Science of Language Models

Big Post About Big Context

Neural Network Diffusion

More Agents Is All You Need

Thermodynamic AI is getting hotter

Optimizing Distributed Training on Frontier for Large Language Models

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

TinyLlama: An Open-Source Small Language Model

GFlowNets

“Human Compatible”, Stuart Russell