GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings
This model is part of the GAP-CLIP project for fashion search with guaranteed attribute positioning.
Model Description
GAP-CLIP is a multi-modal search model for fashion that combines:
- Color embeddings (16 dimensions): Specialized for color representation
- Hierarchy embeddings (64 dimensions): Specialized for category classification
- General CLIP embeddings (432 dimensions): General visual-semantic understanding
Total embedding size: 512 dimensions
Quick Start
from transformers import CLIPProcessor, CLIPModel
from huggingface_hub import hf_hub_download
import torch
# Load model
model = CLIPModel.from_pretrained("Leacb4/gap-clip")
processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
# Process text
text = "red dress"
inputs = processor(text=[text], return_tensors="pt", padding=True)
text_features = model.get_text_features(**inputs)
# Extract subspaces
color_emb = text_features[:, :16] # Color dimensions
hierarchy_emb = text_features[:, 16:80] # Hierarchy dimensions
general_emb = text_features[:, 80:] # General CLIP dimensions
Citation
@misc{gap-clip-2024,
title={GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings for Fashion Search},
author={Sarfati, Lea Attia},
year={2024},
url={https://huggingface.co/Leacb4/gap-clip}
}
License
MIT License - See LICENSE file for details.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support