So if I understand right, for the real-time version rather than querying the NeRF to compute the frame pixels on the fly, they instead use the NeRF to pre-generate 3D Voxel data representing the scene which can then be rendered in real time using more traditional voxel rendering?