Hey there! Thanks so much for your kind words. I am glad to hear you liked my article and found it helpful 😇
I am working on the second article but currently I don’t find the time to finish it. I try my best to get it done.
As for the backwards pass, you’d reverse the topological ordering and filter out nodes that are not of type Operation
. Then you’d call the backward
function on each node and pass the upstream gradient. This function will then calculate the gradient with respect to each input. You also need to keep track of all gradients in a table, which I like to call “gradient table”. In a nutshell, this is what a gradient descent minimizer does. More on that in my next blog post. I hope it gives you a brief overview of what steps need to be done.
Let me know if you have more questions.
And thanks again for your awesome feedback!! Really appreciate it.