Qwen3-VL-8B WebSight Fine-tuned

This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct on the WebSight dataset for GUI automation tasks.

Model Description

  • Base Model: Qwen/Qwen3-VL-8B-Instruct
  • Fine-tuning Method: LoRA (merged)
  • Dataset: wave-ui/websight-v2
  • Task: Image-to-click location prediction
  • Output Format: pyautogui.click(x, y) commands

Usage

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
import torch

# Load model and processor
model = AutoModelForVision2Seq.from_pretrained(
    "Asanshay/qwen3-vl-8b-websight-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "Asanshay/qwen3-vl-8b-websight-merged",
    trust_remote_code=True
)

# Prepare input
image = Image.open("screenshot.png")
prompt = "click the login button"

inputs = processor(
    text=f"<image>\n{prompt}",
    images=image,
    return_tensors="pt"
).to(model.device)

# Generate
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50)

result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)  # Output: pyautogui.click(x, y)

Training Details

  • Training Framework: LLaMA-Factory
  • Hardware: 8x H100 GPUs
  • LoRA Config:
    • Rank: 64
    • Alpha: 128
    • Dropout: 0.05
    • Target modules: all linear layers

Output Format

The model outputs click coordinates normalized to 1400x800 resolution:

  • Format: pyautogui.click(x, y)
  • Example: pyautogui.click(565, 486)

Scale to your screen resolution:

x_actual = int(x_norm * (screen_width / 1400))
y_actual = int(y_norm * (screen_height / 800))

Citation

@misc{qwen3-vl-websight,
  title={Qwen3-VL Fine-tuned for GUI Automation},
  author={Your Name},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/Asanshay/qwen3-vl-8b-websight-merged}}
}

License

Apache 2.0 (inherited from base model)

Downloads last month
1
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Asanshay/websight-v2-grounded

Finetuned
(117)
this model