How a Single 'Super Weight' Can Break Your Billion-Parameter Model
How many super-weights do you think the human brain has?
Oh, that's the good question!
Very interesting. I find it quite counter-intuitive.
That was unexpected for me as well.
There are definitely a lot of interesting things hidden in sparsity, loss landscapes, lottery tickets, grokking, double descent, and other (dis?)similar phenomena.
Thanks Grigory. Does the paper discuss the reason behind why such super weight is formed at the first place ?
No, waiting for a separate research :)
How many super-weights do you think the human brain has?
Oh, that's the good question!
Very interesting. I find it quite counter-intuitive.
That was unexpected for me as well.
There are definitely a lot of interesting things hidden in sparsity, loss landscapes, lottery tickets, grokking, double descent, and other (dis?)similar phenomena.
Thanks Grigory. Does the paper discuss the reason behind why such super weight is formed at the first place ?
No, waiting for a separate research :)