LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

Abstract

This paper introduces a novel hierarchical autoencoder that maps 3D models into a highly compressed latent space. The hierarchical autoencoder is specifically designed to tackle the challenges arising from large-scale datasets and generative modeling using diffusion. Different from previous approaches that only work on a regular image or volume grid, our hierarchical autoencoder operates on unordered sets of vectors. Each level of the autoencoder controls different geometric levels of detail. We show that the model can be used to represent a wide range of 3D models while faithfully representing high-resolution geometry details. The training of the new architecture takes 0.70x time and 0.58x memory compared to the baseline. We also explore how the new representation can be used for generative modeling. Specifically, we propose a cascaded diffusion framework where each stage is conditioned on the previous stage. Our design extends existing cascaded designs for image and volume grids to vector sets.

Description of the image

We proposed a multi-resolution autoencoder for 3D representation learning. It is based on VecSet representation.

Description of the image

We first train a hierarchical autoencoder for 3D representation learning. The hierarchical latent space allows us to train a cascaded diffusion model where each stage is conditioned on the previous stage.

Description of the image

Description of the image

Description of the image

Description of the image

Each small 4 x 4 block shares the same level 3 latents Z3. 3D models in the same block have similar structures. In each block, every 1 x 4 line shares the same level 2 latents Z2. In each line of a block, 3D models look almost the same except for some minor details. Thus, we argue that Z3 controls the structure, Z2 affects the major details and Z1 is responsible for minor details.