text2text / verl /models /README.md
braindeck
Initial commit
bcdf9fa

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Models

Common modelzoo such as huggingface/transformers stuggles when using Pytorch native model parallelism. Following the design principle of vLLM, we keep a simple, parallelizable, highly-optimized with packed inputs in verl.

Adding a New Huggingface Model

Step 1: Copy the model file from HF to verl

  • Add a new file under verl/models/hf
  • Copy ONLY the model file from huggingface/transformers/models to verl/models/hf

Step 2: Modify the model file to use packed inputs

  • Remove all the code related to inference (kv cache)
  • Modify the inputs to include only
    • input_ids (total_nnz,)
    • cu_seqlens (total_nnz + 1,)
    • max_seqlen_in_batch: int
  • Note that this requires using flash attention with causal mask.

Step 2.5: Add tests

  • Add a test to compare this version and the huggingface version
  • Following the infrastructure and add tests to tests/models/hf

Step 3: Add a function to apply tensor parallelism

Step 4: Add a function to apply data parallelism

Step 5: Add a function to apply pipeline parallelism

  • Comes in Pytorch 2.4
  • Currently only in alpha in nightly version
  • Check torchtitan for more details