LATO | ICML 2026

Abstract

In this paper, we introduce LATO, a novel topology-preserving latent representation that enables scalable, flow matching-based synthesis of explicit 3D meshes. LATO represents a mesh as a Vertex Displacement Field (VDF) anchored on surface, incorporating a sparse voxel Variational Autoencoder (VAE) to compress this explicit signal into a structured, topology-aware voxel latent.

To decapsulate the mesh, the VAE decoder progressively subdivides and prunes latent voxels to instantiate precise vertex locations. In the end, a dedicated connection head queries the voxel latent to predict edge connectivity between vertex pairs directly, allowing mesh topology to be recovered without isosurface extraction or heuristic meshing.

For generative modeling, LATO adopts a two-stage flow matching process, first synthesizing the structure voxels and subsequently refining the voxel-wise topology features. Compared to prior isosurface/triangle-based diffusion models and autoregressive generation approaches, LATO generates meshes with complex geometry, well-formed topology while being highly efficient in inference.

Paradigm

Topology-Preserving Generation

Comparison between topology-agnostic implicit generation, explicit mesh generation, and LATO — **LATO vs. Existing Paradigms.** Mainstream topology-agnostic approaches utilize vecset or voxel latents decoded into implicit fields, such as SDF, and rely on Marching Cubes for mesh extraction. Explicit mesh generation methods adopt per-face latents via autoregressive or diffusion models, but suffer from severe memory bottlenecks. LATO proposes T-Voxel latents to explicitly model topology, enabling direct generation of artist-friendly meshes.

Core Design

Key Ideas

Topology as a first-class signal

Instead of decoding an implicit field and extracting a mesh afterward, LATO exposes vertex and edge structure to the generative pipeline.

Dense VDF supervision

Every sampled surface point carries displacements to the three vertices of its triangle, yielding continuous topology-aware supervision.

T-Voxels

The sparse voxel latent encodes not only surface presence, but also vertex distribution and connectivity priors for explicit meshes.

Direct graph decoding

A hierarchical decoder places vertices through subdivision and pruning, while a connection head predicts edges between vertex pairs.

Representation

Vertex Displacement Field

VDF is the representation that lets LATO move from a surface signal to explicit topology. For a mesh \(\mathcal{M}=(\mathbf{V},\mathbf{F})\), take a point \(\mathbf{p}\) sampled on a triangular face \(\mathbf{f}=[f_0,f_1,f_2]\). Instead of assigning a discrete vertex/edge/face label, LATO records the displacements from \(\mathbf{p}\) to the three vertices of that face.

This makes topology learnable as a dense continuous field: near-zero displacements localize vertices, while changes in the displacement triplet reveal edges between vertex pairs. The resulting point feature is voxelized and pooled into T-Voxels for scalable generation.

VDF definition

\[ \mathcal{F}(\mathbf{p}) = \left\{\mathbf{v}-\mathbf{p}\mid \mathbf{v}\in \{\mathbf{v}_{f_0},\mathbf{v}_{f_1},\mathbf{v}_{f_2}\}\right\}, \quad \mathbf{p}\in\mathbf{f} \]

The VAE input at each sampled point is \(\mathbf{x}_k=[\mathbf{p}_k,\mathcal{F}(\mathbf{p}_k),\mathbf{n}(\mathbf{p}_k)]\), combining position, VDF offsets, and surface normal.

Dense signal Every surface sample contributes vertex displacement supervision.

Topology cue Vertex locations and edge boundaries emerge from the displacement field.

Voxel-ready Point features are pooled into sparse topology-aware T-Voxels.

Pipeline

Method

LATO turns surface-anchored VDF samples into topology-aware T-Voxels, then decodes explicit vertices and edges through a sparse hierarchical pipeline.

LATO pipeline for mesh encoding, T-Voxel generation, and topology reconstruction

Encode

Voxelize surface VDF samples with positions and normals, then compress the pooled signal into compact T-Voxels.

Decode

Subdivide and prune latent voxels to place vertices; a connection head queries T-Voxels to recover edges directly.

Generate

Use two-stage flow matching: synthesize sparse structure first, then generate topology features before explicit decoding.

Application

City Synthesis

LATO extends from object-level mesh generation to scene-scale urban synthesis by generating compositional block-wise building units.

Scene-scale urban generation results produced by LATO — **Scene-scale urban generation.** The structure model first synthesizes block-wise building envelopes, and detailed T-Voxel features then populate each unit into high-fidelity city blocks.

Interactive mesh

Compositional Urban Meshes

Block-wise building units are generated as sparse T-Voxels, then assembled into a larger explicit city mesh.

Loading city mesh...

Experiments

Results

LATO produces explicit meshes with strong shape fidelity and more coherent topology across object and image-conditioned settings.

Geometry-conditioned comparison between LATO and baseline methods — Geometry-conditioned generation LATO is compared with recent explicit mesh baselines under the same geometry-conditioned setting.

Additional geometry-conditioned comparison examples for LATO — Additional examples More geometry-conditioned results highlighting mesh completeness and topology quality.

Image to 3D Image to 3D generation results.

Effect of vertex number condition on generated mesh complexity — Vertex number Effect of vertex number condition. As we increase the vertex number condition scaler cv, the model generates more vertices and triangles.

Inference time Inference time comparison. The inference time evaluation is conducted on a single H100 GPU. LATO maintains rapid generation (3∼10s), whereas autoregressive methods exhibit prohibitive temporal scaling, often requiring minutes for high-fidelity outputs.

LATO: 3D Mesh Flow Matching with Structured TOpology Preserving LAtents

Topology as a first-class signal

Dense VDF supervision

T-Voxels

Direct graph decoding

Compositional Urban Meshes