YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

heads48_L12_E768_max8000_bs128

Custom GPT (nanoGPT-style) trained on uint16 token bins.

Summary

  • Layers: 12
  • Heads: 48
  • Embedding dim: 768
  • Context length: 512
  • Vocab size: 50304
  • Dropout: 0.2

Training

  • Final step: 2500
  • Total seen tokens (cluster-level): 1310720000
  • Optimizer: AdamW (see training script)
  • Mixed precision: configurable (--amp)

This repository contains minimal config.json and weight files. The training architecture is defined in the accompanying script and is not a drop-in transformers model class. To load in PyTorch, instantiate the same GPT class and call load_state_dict.

Loading Example (PyTorch)

import torch, json
from my_training_script import GPT, GPTConfig  # your script path

cfg = GPTConfig(
    block_size=512,
    vocab_size=50304,
    n_layer=12,
    n_head=48,
    n_embed=768,
    dropout=0.2,
)

model = GPT(cfg)
# If you uploaded safetensors:
# from safetensors.torch import load_file
# sd = load_file("model.safetensors")
# If you uploaded PyTorch bin:
# sd = torch.load("pytorch_model.bin", map_location="cpu")
# model.load_state_dict(sd, strict=True)
model.eval()
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support