using new for each node and value, combined with virtual dispatch tends to be a c++ anti-pattern.
Memory access and allocation are the key to performance especially on the GPU.
Things to consider:
- can you allocate memory for the whole system?
- can you make types homogenous so they can fit in tight arrays (unions are common for nodes)
- can you batch similar types
- specially for auto diff/math can you represent operations as a stack instead of a tree?
I am only bringing this up because you said your goal was to learn C++.
Memory access and allocation are the key to performance especially on the GPU.
Things to consider:
- can you allocate memory for the whole system? - can you make types homogenous so they can fit in tight arrays (unions are common for nodes) - can you batch similar types - specially for auto diff/math can you represent operations as a stack instead of a tree?
I am only bringing this up because you said your goal was to learn C++.
https://gitlab.com/mebassett/quixotic-learning/-/tree/master...
about 1,000 LoC overall.