How a Single 'Super Weight' Can Break Your Billion-Parameter Model
Very interesting. I find it quite counter-intuitive.
That was unexpected for me as well.
There are definitely a lot of interesting things hidden in sparsity, loss landscapes, lottery tickets, grokking, double descent, and other (dis?)similar phenomena.
Thanks Grigory. Does the paper discuss the reason behind why such super weight is formed at the first place ?
No, waiting for a separate research :)
Very interesting. I find it quite counter-intuitive.
That was unexpected for me as well.
There are definitely a lot of interesting things hidden in sparsity, loss landscapes, lottery tickets, grokking, double descent, and other (dis?)similar phenomena.
Thanks Grigory. Does the paper discuss the reason behind why such super weight is formed at the first place ?
No, waiting for a separate research :)