Are you using any sort of acceleration data-structures to manage your voxels or just storing everything in plain 3D array of positions? I imagine you can run into memory issues there if you don't account for spatial sparsity in your voxel data?
I'd be curious to know if you're something something like run-length encoding (RLE) or something hierarchical such as a B-Tree?
I'd be curious to know if you're something something like run-length encoding (RLE) or something hierarchical such as a B-Tree?