I wrote an non-RTX on-GPU raytracer a while back (naive compared to this) and it's super-interesting to read about the advances in compressing BVH structures.
But the changes also highlights a change in focus from just implementing this naively(RDNA3 technically not too much removed from the naive raytracer I wrote) to moving it to something carefully engineered and optimized for memory bandwidth (with savings circuits even built into silicon?).
vardump [3 hidden]5 mins ago
Smaller data is where it’s at when optimizing nowadays. Less bandwidth required and higher cache hit rate.
You can compute a ton per bit transferred from DRAM. On both CPUs and GPUs.
But the changes also highlights a change in focus from just implementing this naively(RDNA3 technically not too much removed from the naive raytracer I wrote) to moving it to something carefully engineered and optimized for memory bandwidth (with savings circuits even built into silicon?).
You can compute a ton per bit transferred from DRAM. On both CPUs and GPUs.
https://news.ycombinator.com/item?id=43548212
https://news.ycombinator.com/item?id=43543933