4 Comments

Very interesting. I find it quite counter-intuitive.

Expand full comment

That was unexpected for me as well.

There are definitely a lot of interesting things hidden in sparsity, loss landscapes, lottery tickets, grokking, double descent, and other (dis?)similar phenomena.

Expand full comment

Thanks Grigory. Does the paper discuss the reason behind why such super weight is formed at the first place ?

Expand full comment

No, waiting for a separate research :)

Expand full comment