Abstract
Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing the data distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions, which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attribute-centric compositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novel image-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes. We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions. We validate our framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization of ACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency. The source code and trained models are publicly available at https://github.com/yrcong/ACTIG.
| Original language | English |
|---|---|
| Pages (from-to) | 4555-4570 |
| Number of pages | 16 |
| Journal | International Journal of Computer Vision |
| Volume | 133 |
| Issue number | 7 |
| E-pub ahead of print | 13 Mar 2025 |
| DOIs | |
| Publication status | Published - Jul 2025 |
Keywords
- Attribute-centric
- Compositional generation
- Text-to-image
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
- Artificial Intelligence
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver