sofiechan home

Learning to code machine learning (tinygrad!)

anon 0x43e said in #2532 1mo ago: 55

I've been thinking about AGI stuff for over a decade, but never really engaged much with cutting edge machine learning practice except at a theoretical and philosophical level. But recently I've been getting more technical again and trying to build things, so I've been trying to learn some actual machine learning techne.

I like Geohotz's style so I'm learning ML with tinygrad. I did the quickstart tutorial[1], mnist tutorial[2], and then replicated the cutting-edge performance of the tinygrad mnist example code[3]. The quickstart tutorial was a mess, with half the stuff being either out of date or wrong and with a lot of subtle issues unremarked-upon. It was a bit of an extra IQ test to actually get a working model out of it, but I managed to pass it and get models trained, running, and optimized up to replicating the cutting edge performance in tinygrad's example code.

[1]: https://docs.tinygrad.org/quickstart/
[2]: https://docs.tinygrad.org/mnist/
[3]: https://github.com/tinygrad/tinygrad/blob/master/examples/beautiful_mnist.py

That was fun but I'm still very fresh. I think I get what's going on in these models, but if I had to design a performant one myself it all feels a bit like voodoo. I'll keep learning.

After that I got Claude to explain transformers to me and made sure I could explain it all back. Implementing a simple transformer from my own understanding feels like a good next exercise.

Ultimately where I want to go with this is to do online causality learning and control on time series data. Some of you may have heard rumor of my "worm mind hypothesis" and AGI thermostat idea. I basically want to substantiate those in actual algorithms research: build an algorithm that you can simply drop into a nonlinear domain with no training, only a goal, and it learns the causal structure of the domain sufficient to train itself to accomplish the goal. I have a paradigm hunch in mind for how to do this that follows the insights of transformers and deep learning (though not all their particular details), so that's why I'm interested in these techniques in particular. But I have a lot to learn, and my hunch may just turn out to be an excuse for learning some new skills.

More generally, as I develop towards a more technical phase in my career, I want to be fluent in the cutting edge and able to apply some of this exciting ML stuff to the problems I encounter.

Does anyone have good resources for learning how to solve problems with machine learning techniques, or good exercises to do? Do any of you work in the field and have tips?

I've been thinking a 55

anon 0x43f said in #2534 1mo ago: 33

Machine learning is a big field, and the transformer architecture that LLMs are based on is actually not representative of it. Transformers are a recent development in neural networks, which are somewhat distinct from traditional machine learning.

There is a ocean of resources available just by searching online.

I might start with a simple Python tutorial in traditional machine learning algorithms, of the sort implemented in scikit-learn.

Once you have a good grip on the traditional approach, learn back-propagation in neural nets, again in Python.

Then you can look at fancier architectures like transformers and think about your own variations.

referenced by: >>2535

Machine learning is 33

anon 0x43e said in #2535 1mo ago: 33

>>2534
>There is a ocean of resources available just by searching online.
What's your favorite resource? What things are being done or have been done in traditional ML that are especially worth studying?

I should have been more precise in separating traditional ML from deep learning. I'm mostly interested in the latter, because of the still unexplored generality of the technique for learning open ended nonlinear structure. The interesting part of traditional ML in my admittedly simplistic model is basically the linear methods plus various tricks to turn nonlinear or discrete problems into linear problems (eg the kernel trick in SVMs). Linear methods are very powerful and very solvable so I guess it never hurts to be a master of them. But then there's other stuff like random forests that are totally alien to me.

One can search the web and find all kinds of references to various techniques and tutorials to learn them, but without singularly noteworthy resources or applications or paradigms to ground it in, it starts to feel like a sea of random ideas which may or may not have anything to do with current practice. Which of this stuff is obsolete? Which is the essential core of a paradigm (like linear methods)? How can I tell whether I should bother to understand random forests?

What's your favorite 33

You must login to post.