The Super Weight in Large Language Models

Grigory Sapunov

Nov 29, 2024

How a Single 'Super Weight' Can Break Your Billion-Parameter Model

Read →

6 Comments

Vova Zakharov

Jun 27

How many super-weights do you think the human brain has?

Expand full comment

Reply (1)

Grigory Sapunov

Jun 27

Oh, that's the good question!

Expand full comment

Misha Belkin

Dec 2, 2024

Very interesting. I find it quite counter-intuitive.

Expand full comment

Reply (1)

Grigory Sapunov

Dec 6, 2024Edited

That was unexpected for me as well.

There are definitely a lot of interesting things hidden in sparsity, loss landscapes, lottery tickets, grokking, double descent, and other (dis?)similar phenomena.

Expand full comment

Kartik Singhal

Nov 29, 2024

Thanks Grigory. Does the paper discuss the reason behind why such super weight is formed at the first place ?

Expand full comment

Reply (1)

Grigory Sapunov

Nov 30, 2024

No, waiting for a separate research :)

Expand full comment

Gonzo ML

The Super Weight in Large Language Models