DeepSeek OCR - Fine-tuned for German/Deutsch
This model is a fine-tuned version of DeepSeek OCR on German text for Optical Character Recognition (OCR) tasks.
Model Description
- Base Model: DeepSeek OCR
- Language: German (de)
- Task: Image-to-Text (OCR)
- Training Data: 200K synthetic German text images
- License: Apache 2.0
This model has been fine-tuned specifically for recognizing German text in images, including handling of German-specific characters (ä, ö, ü, ß) and common German compound words.
Intended Uses
This model is designed for:
- Extracting German text from scanned documents
- Digitizing printed German materials
- Reading German text from photographs
- Processing German forms and receipts
- Any German text recognition tasks
How to Use
Basic Usage
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# Load model and processor
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
# Load image
url = "path_to_your_german_text_image.jpg"
image = Image.open(url).convert("RGB")
# Process
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
Batch Processing
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
# Multiple images
images = [Image.open(f"image_{i}.jpg").convert("RGB") for i in range(5)]
# Batch process
pixel_values = processor(images, return_tensors="pt", padding=True).pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
for text in generated_texts:
print(text)
With GPU Acceleration
import torch
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german").to(device)
image = Image.open("german_text.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
Training Details
Training Data
The model was fine-tuned on a synthetic German OCR dataset containing 200,000 images with:
- Diverse German sentences covering multiple domains (everyday conversation, news, literature, technical, business)
- Various fonts and font sizes (16-48pt)
- Multiple augmentations: noise, blur, brightness/contrast variations
- Different text and background colors
Data Split:
- Train: 180,000 samples (90%)
- Validation: 10,000 samples (5%)
- Test: 10,000 samples (5%)
Training Framework
# Example training configuration
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
training_args = Seq2SeqTrainingArguments(
output_dir="./deepseek-ocr-german",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=5e-5,
num_train_epochs=10,
logging_steps=100,
save_steps=1000,
eval_steps=1000,
evaluation_strategy="steps",
save_total_limit=2,
fp16=True,
predict_with_generate=True,
)
# Limitations
- Font coverage: Performance may vary with handwritten text
- Image quality: Works best with clear, high-contrast images
- Domain specificity: Best performance on printed German text similar to training distribution
Citation
If you use this model, please cite:
@misc{deepseek-ocr-german,
author = {Santosh Pandit},
title = {DeepSeek OCR - German Fine-tuned},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/YOUR_USERNAME/deepseek-ocr-german}},
}
Model Card Contact
For questions or feedback, please open an issue on the model repository or contact [[email protected]].
Acknowledgments
- Base model: DeepSeek AI
- Training data generation: LM Studio with local LLM
- Framework: Hugging Face Transformers
Uploaded finetuned model
- Developed by: neuralabs
- License: apache-2.0
- Finetuned from model : deepseek-ai/DeepSeek-OCR
- Downloads last month
- 138
Model tree for neuralabs/deepseek_ocr_de
Base model
deepseek-ai/DeepSeek-OCR