DeepSeek OCR

DeepSeek OCR - Fine-tuned for German/Deutsch

This model is a fine-tuned version of DeepSeek OCR on German text for Optical Character Recognition (OCR) tasks.

Model Description

  • Base Model: DeepSeek OCR
  • Language: German (de)
  • Task: Image-to-Text (OCR)
  • Training Data: 200K synthetic German text images
  • License: Apache 2.0

This model has been fine-tuned specifically for recognizing German text in images, including handling of German-specific characters (ä, ö, ü, ß) and common German compound words.

Intended Uses

This model is designed for:

  • Extracting German text from scanned documents
  • Digitizing printed German materials
  • Reading German text from photographs
  • Processing German forms and receipts
  • Any German text recognition tasks

How to Use

Basic Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load model and processor
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")

# Load image
url = "path_to_your_german_text_image.jpg"
image = Image.open(url).convert("RGB")

# Process
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)

Batch Processing

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")

# Multiple images
images = [Image.open(f"image_{i}.jpg").convert("RGB") for i in range(5)]

# Batch process
pixel_values = processor(images, return_tensors="pt", padding=True).pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

for text in generated_texts:
    print(text)

With GPU Acceleration

import torch
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german").to(device)

image = Image.open("german_text.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)

generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)

Training Details

Training Data

The model was fine-tuned on a synthetic German OCR dataset containing 200,000 images with:

  • Diverse German sentences covering multiple domains (everyday conversation, news, literature, technical, business)
  • Various fonts and font sizes (16-48pt)
  • Multiple augmentations: noise, blur, brightness/contrast variations
  • Different text and background colors

Data Split:

  • Train: 180,000 samples (90%)
  • Validation: 10,000 samples (5%)
  • Test: 10,000 samples (5%)

Training Framework

# Example training configuration
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

training_args = Seq2SeqTrainingArguments(
    output_dir="./deepseek-ocr-german",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=5e-5,
    num_train_epochs=10,
    logging_steps=100,
    save_steps=1000,
    eval_steps=1000,
    evaluation_strategy="steps",
    save_total_limit=2,
    fp16=True,
    predict_with_generate=True,
)

# Limitations

  • Font coverage: Performance may vary with handwritten text
  • Image quality: Works best with clear, high-contrast images
  • Domain specificity: Best performance on printed German text similar to training distribution

Citation

If you use this model, please cite:

@misc{deepseek-ocr-german,
  author = {Santosh Pandit},
  title = {DeepSeek OCR - German Fine-tuned},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/deepseek-ocr-german}},
}

Model Card Contact

For questions or feedback, please open an issue on the model repository or contact [[email protected]].


Acknowledgments

  • Base model: DeepSeek AI
  • Training data generation: LM Studio with local LLM
  • Framework: Hugging Face Transformers

Uploaded finetuned model

  • Developed by: neuralabs
  • License: apache-2.0
  • Finetuned from model : deepseek-ai/DeepSeek-OCR
Downloads last month
138
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for neuralabs/deepseek_ocr_de

Finetuned
(97)
this model

Dataset used to train neuralabs/deepseek_ocr_de