Connectivity-Stable 3D Voxel Diffusion via Sampling-Time Guidance

In this work, topology primarily refers to connectivity and skeleton continuity.

image.png

baseline_vs_specific_window.gif

Conclusion

The Problem

Current voxel 3D diffusion models tend to produce fragmented structures when generating thin, topologically structured objects (furniture, buildings, etc.) — broken parts, floating islands, missing voxels. I don't think this is something more training steps or larger model capacity will fully fix. The reason, as I see it, is that diffusion denoising updates are essentially local refinement; they lack any explicit notion of global connectivity or geometric inertia. In this experiment I use Minecraft tree data to characterize the topology problem and validate a fix.

Below are projected tri-view samples. Green = wood voxels, yellow = leaves (no wood), dark blue = air. Diffusion-generated 3D samples frequently exhibit structural breakage. Baseline diffusion trunks are rarely continuous (green blocks do not form a connected component under 26-neighbor connectivity):

simple_sample_017.png

simple_sample_010.png

Ground truth, in contrast, has fully connected trunks (no green-block discontinuity):

beech_2257_1764306233368.png

beech_2281_1764306233706.png

Research Goal