Model Card for Model ID
This is a fine-tuned version of Salesforce's BLIP-2 model, adapted for the task of image captioning using the QLoRA methodology for parameter-efficient fine-tuning. The model is trained on the Flickr8k dataset to generate descriptive, human-like captions for a wide variety of images.
Model Details
Model Description
This model is an adaptation of the powerful BLIP-2 vision-language architecture, specifically the Salesforce/blip2-opt-2.7b variant. It has been fine-tuned to specialize in generating accurate and contextually relevant captions for images.
The fine-tuning was performed using QLoRA (Quantized Low-Rank Adaptation), a highly efficient technique that significantly reduces the computational and memory requirements for training. This is achieved by quantizing the base model to 4-bits and then training small, low-rank adapter matrices, leaving the vast majority of the original model's parameters frozen. This approach makes it possible to adapt large-scale models on consumer-grade hardware while preserving high performance.
This is the model card of a 馃 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: [Salesforce]
- Model type: [Vision-Language Model (VLM) based on BLIP-2]
- Language(s) (NLP): [English (en)]
- License: [Apache 2.0]
- Finetuned from model [optional]: [Salesforce/blip2-opt-2.7b]
- Downloads last month
- 11