OneReward - ComfyUI

This repo contains the checkpoint from OneReward processed into a single model suitable for ComfyUI use.

OneReward is a novel RLHF methodology for the visual domain by employing Qwen2.5-VL as a generative reward model to enhance multitask reinforcement learning, significantly improving the policy model’s generation ability across multiple subtask. Building on OneReward, FLUX.1-Fill-dev-OneReward - based on FLUX Fill [dev], outperforms closed-source FLUX Fill [Pro] in inpainting and outpainting tasks, serving as a powerful new baseline for future research in unified image editing.

For more details and examples see original model repo: OneReward

Sample Usage

The following code snippet illustrates how to use the model with the diffusers library. Note that this requires the custom FluxFillCFGPipeline defined in the official source code.

import torch
from diffusers.utils import load_image
from diffusers import FluxTransformer2DModel

# Note: pipeline_flux_fill_with_cfg.py must be available in your local environment
from src.pipeline_flux_fill_with_cfg import FluxFillCFGPipeline

transformer_onereward = FluxTransformer2DModel.from_pretrained(
    "bytedance-research/OneReward",
    subfolder="flux.1-fill-dev-OneReward-transformer",
    torch_dtype=torch.bfloat16
)

pipe = FluxFillCFGPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev", 
    transformer=transformer_onereward,
    torch_dtype=torch.bfloat16).to("cuda")

# Example: Image Fill
image = load_image('assets/image.png')
mask = load_image('assets/mask_fill.png')
image = pipe(
    prompt='the words "ByteDance", and in the next line "OneReward"',
    negative_prompt="nsfw",
    image=image,
    mask_image=mask,
    height=image.height,
    width=image.width,
    guidance_scale=1.0,
    true_cfg=4.0,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"image_fill.jpg")

Citation

@article{gong2025onereward,
  title={OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning},
  author={Gong, Yuan and Wang, Xionghui and Wu, Jie and Wang, Shiyin and Wang, Yitong and Wu, Xinglong},
  journal={arXiv preprint arXiv:2508.21066},
  year={2025}
}

Downloads last month: 6,403

Model tree for yichengup/flux.1-fill-dev-OneReward

Base model

black-forest-labs/FLUX.1-Fill-dev

Finetuned

(30)

this model