SmolLM2 smollm2-360m-score0_mix_rephrased_from_beginning_metadata-300B-mbs16-gbs1024-17feb-lr2e-05-gbs16 (Version: main)

Model Details

  • Architecture: SmolLM2
  • Parameters: 360M

Training Configuration

eval:
  final_validation: false
  initial_validation: false
  interval: 10000
  max_iters: 100
optimizer:
  class_path: torch.optim.AdamW
  init_args:
    betas:
    - 0.9
    - 0.95
    lr: 0.003
    weight_decay: 0.033
precision: bf16-mixed
seed: 42
train:
  global_batch_size: 1024
  log_interval: 10
  lr_warmup_steps: 2000
  max_norm: 1.0
  max_seq_length: 2048
  max_tokens: 300000000000
  micro_batch_size: 16
  min_lr: 0
  save_interval: 30000
  tie_embeddings: false

Model Loading and Revision System

This repository hosts multiple revisions of the model. To load a specific revision, use the revision parameter. For example:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("locuslab/mix_ift_v3-smollm2-360m-smollm2-360m-score0_mix_rephrased_from_beginning_metadata", revision="final")
tokenizer = AutoTokenizer.from_pretrained("locuslab/mix_ift_v3-smollm2-360m-smollm2-360m-score0_mix_rephrased_from_beginning_metadata", revision="final")

Replace "final" with the desired revision.

Downloads last month
11
Safetensors
Model size
0.4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support