HN.zip

Show HN: I wrote an autodiff in C++ and implemented LeNet with it

31 points by mebassett - 2 comments
tightbookkeeper [3 hidden]5 mins ago
using new for each node and value, combined with virtual dispatch tends to be a c++ anti-pattern.

Memory access and allocation are the key to performance especially on the GPU.

Things to consider:

- can you allocate memory for the whole system? - can you make types homogenous so they can fit in tight arrays (unions are common for nodes) - can you batch similar types - specially for auto diff/math can you represent operations as a stack instead of a tree?

I am only bringing this up because you said your goal was to learn C++.

einpoklum [3 hidden]5 mins ago
The actual C++/CUDA code is here:

https://gitlab.com/mebassett/quixotic-learning/-/tree/master...

about 1,000 LoC overall.