DeFoG: Discrete Flow Matching for Graph Generation

LTS4, EPFL
ICML 2025 - Oral Presentation

*Indicates Equal Contribution
QM9 graph generation
(a) QM9 molecular dataset
SBM graph generation
(b) Synthetic SBM graph

Graph generation process of DeFoG on (a) QM9 molecular dataset and (b) synthetic SBM graphs.

Abstract

Graph generative models are essential across diverse scientific domains by capturing complex distributions over relational data. Among them, graph diffusion models achieve superior performance but face inefficient sampling and limited flexibility due to the tight coupling between training and sampling stages. We introduce DeFoG, a novel graph generative framework that disentangles sampling from training, enabling a broader design space for more effective and efficient model optimization. DeFoG employs a discrete flow-matching formulation that respects the inherent symmetries of graphs. We theoretically ground this disentangled formulation by explicitly relating the training loss to the sampling algorithm and showing that DeFoG faithfully replicates the ground truth graph distribution. Building on these foundations, we thoroughly investigate DeFoG's design space and propose novel sampling methods that significantly enhance performance and reduce the required number of refinement steps. Extensive experiments demonstrate state-of-the-art performance across synthetic, molecular, and digital pathology datasets, covering both unconditional and conditional generation settings. It also outperforms most diffusion-based models with just 5-10% of their sampling steps.

Illustration of noising and denoising process

One node, \( x^{(n)} \), is selected to illustrate both noising and denoising processes. For noising, DeFoG follows a straight path from the one-hot encoding \( p_1 \) of the clean node to the initial distribution \( p_0 \). For denoising, a network parameterized by \( \theta \) predicts the marginal distributions of the clean graph. There, the node's distribution \( {p}^{\theta, (n)}_{1|t} (\cdot \mid G_t) \) is used to compute its rate matrix \( {R}_t^{\theta,(n)} \), and, subsequently, its probability at the next time point \( t + \Delta t \).

Graph Generation Evaluation

State-of-the-art performance of DeFoG on synthetic, molecular and conditional graph generation tasks.

BibTeX


        @inproceedings{qin2025defog,
          title     = {DeFoG: Discrete Flow Matching for Graph Generation},
          author    = {Qin, Yiming and Madeira, Manuel and Thanou, Dorina and Frossard, Pascal},
          booktitle = {Proceedings of the 42nd International Conference on Machine Learning (ICML)},
          year      = {2025},
        }