Texture-Preserving Multimodal Fashion Image Editing with Diffusion Models
π― Overview
TP-MGD is a new method for texture-preserving multimodal fashion image editing using diffusion models. The project enables high-quality fashion image generation and editing through an innovative lightweight architecture setup while maintaining fine-grained texture details.
β TODO
- Release training code
- Release inference code
- Release processed datasets
- Release checkpoints to Hugging Face
- Create comprehensive documentation
π Quick Start
Installation
git clone https://github.com/zibingo/TP-MGD.git
cd TP-MGD
Requirements:
- Python 3.9+
- PyTorch >= 2.5.0
- CUDA >= 12.4
pip install diffusers accelerate transformers opencv-python einops wandb open_clip_torch
Download Pre-trained Models
wget https://huggingface.co/h94/IP-Adapter/resolve/main/models/ip-adapter_sd15.bin
π Dataset Setup
VITON-HD Dataset
Download VITON-HD: Get the original dataset from VITON-HD
Download MGD multimodal data: Get additional data from MGD
Download preprocessed textures:
wget https://huggingface.co/zibingo/TP-MGD/resolve/main/vitonhd-texture.zipConfiguration: Set the
dataroot_pathin the YAML files under theconfigs/directory.
Directory Structure:
βββ captions.json (from MGD)
βββ test/
β βββ agnostic-mask/
β βββ agnostic-v3.2/
β βββ cloth/
β βββ cloth-mask/
β βββ cloth-texture/ (from Ours)
β βββ im_sketch/ (from MGD)
β βββ im_sketch_unpaired/ (from MGD)
β βββ image/
β βββ image-densepose/
β βββ image-parse-agnostic-v3.2/
β βββ image-parse-v3/
β βββ openpose_img/
β βββ openpose_json/
βββ test_pairs.txt
βββ train/
β βββ agnostic-mask/
β βββ agnostic-v3.2/
β βββ cloth/
β βββ cloth-mask/
β βββ cloth-texture/ (from Ours)
β βββ gt_cloth_warped_mask/
β βββ im_sketch/ (from MGD)
β βββ image/
β βββ image-densepose/
β βββ image-parse-agnostic-v3.2/
β βββ image-parse-v3/
β βββ openpose_img/
β βββ openpose_json/
βββ train_pairs.txt
DressCode Dataset
Download DressCode: Get the original dataset from DressCode
Download MGD multimodal data: Get additional data from MGD
Download preprocessed textures:
wget https://huggingface.co/zibingo/TP-MGD/resolve/main/dresscode-texture.zipConfiguration: Set the
dataroot_pathin the YAML files under theconfigs/directory.
Directory Structure:
βββ dresses/
β βββ dense/
β βββ dresses_cloth-texture/ (from Ours)
β βββ im_sketch/ (from MGD)
β βββ im_sketch_unpaired/ (from MGD)
β βββ images/
β βββ keypoints/
β βββ label_maps/
β βββ test_pairs_paired.txt
β βββ test_pairs_unpaired.txt
β βββ train_pairs.txt
βββ lower_body/
β βββ dense/
β βββ im_sketch/ (from MGD)
β βββ im_sketch_unpaired/ (from MGD)
β βββ images/
β βββ keypoints/
β βββ label_maps/
β βββ lower_body_cloth-texture/ (from Ours)
β βββ test_pairs_paired.txt
β βββ test_pairs_unpaired.txt
β βββ train_pairs.txt
βββ upper_body/
β βββ dense/
β βββ im_sketch/ (from MGD)
β βββ im_sketch_unpaired/ (from MGD)
β βββ images/
β βββ keypoints/
β βββ label_maps/
β βββ test_pairs_paired.txt
β βββ test_pairs_unpaired.txt
β βββ train_pairs.txt
β βββ upper_body_cloth-texture/ (from Ours)
βββ coarse_captions.json (from MGD)
βββ fine_captions.json (from MGD)
βββ multigarment_test_triplets.txt
βββ readme.txt
βββ test_pairs_paired.txt
βββ test_pairs_unpaired.txt
βββ test_stitch_map/ (from MGD)
βββ train_pairs.txt
π Usage
Training
Single GPU:
python train_vitonhd.py
python train_dresscode.py
Multi-GPU
CUDA_VISIBLE_DEVICES=0,1 accelerate launch train_vitonhd.py
CUDA_VISIBLE_DEVICES=0,1 accelerate launch train_dresscode.py
Inference
- Download pre-trained weights from Hugging Face and place them in the
checkpoints/directory - Update configuration: Modify the
resume_stateparameter in the YAML files underconfigs/directory to point to your checkpoint directory
Single GPU:
python inference_vitonhd.py
python inference_dresscode.py
Multi-GPU:
CUDA_VISIBLE_DEVICES=0,1 accelerate launch inference_vitonhd.py
CUDA_VISIBLE_DEVICES=0,1 accelerate launch inference_dresscode.py
π Project Structure
TP-MGD/
βββ configs/ # Configuration files
βββ checkpoints/ # Pre-trained model weights
βββ assets/ # Sample images
βββ train_vitonhd.py # VITON-HD training script
βββ train_dresscode.py # DressCode training script
βββ inference_vitonhd.py # VITON-HD inference script
βββ inference_dresscode.py # DressCode inference script
βββ datasets.py # Dataset loading utilities
βββ attention_processor.py # Custom attention mechanisms
π§ Configuration
Key configuration parameters in configs/*.yaml:
dataroot_path: Path to your datasetresume_state: Path to checkpoint for inference or resume train
π Acknowledgments
- Our code is based on Diffusers
- We use Stable Diffusion v1.5 inpainting as the base model
- Thanks to VITON-HD, DressCode, and MGD for providing the public datasets
π Contact
For questions and support, please open an issue on GitHub or contact the authors.
β If you find this project helpful, please give it a star!