GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings

This model is part of the GAP-CLIP project for fashion search with guaranteed attribute positioning.

Model Description

GAP-CLIP is a multi-modal search model for fashion that combines:

Color embeddings (16 dimensions): Specialized for color representation
Hierarchy embeddings (64 dimensions): Specialized for category classification
General CLIP embeddings (432 dimensions): General visual-semantic understanding

Total embedding size: 512 dimensions

Quick Start

from transformers import CLIPProcessor, CLIPModel
from huggingface_hub import hf_hub_download
import torch

# Load model
model = CLIPModel.from_pretrained("Leacb4/gap-clip")
processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")

# Process text
text = "red dress"
inputs = processor(text=[text], return_tensors="pt", padding=True)
text_features = model.get_text_features(**inputs)

# Extract subspaces
color_emb = text_features[:, :16]  # Color dimensions
hierarchy_emb = text_features[:, 16:80]  # Hierarchy dimensions
general_emb = text_features[:, 80:]  # General CLIP dimensions

Citation

@misc{gap-clip-2024,
  title={GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings for Fashion Search},
  author={Sarfati, Lea Attia},
  year={2024},
  url={https://huggingface.co/Leacb4/gap-clip}
}

License

MIT License - See LICENSE file for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support