Kolmogorov–Arnold Networks, a new architecture for deep learning.
anon 0x1ed said in #1507 13mo ago:
This is neat. Instead of having fixed nonlinear activation functions, they have trainable nonlinear functions instead of weights. They claim it has better scalaing properties and performance for many problems, and is more interpretable (because there are fewer nodes and each "weight" can be visualized as a graph). They basically trade width parameters for activation parameters and apparently come out ahead.
Some of the tricks they use to make it work I still don't understand, but maybe it's a good idea.
Some of the tricks they use to make it work I still don't understand, but maybe it's a good idea.
This is neat. Instea