YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

heads48_L12_E768_max8000_bs128

Custom GPT (nanoGPT-style) trained on uint16 token bins.

Summary

Layers: 12
Heads: 48
Embedding dim: 768
Context length: 512
Vocab size: 50304
Dropout: 0.2

Training

Final step: 2500
Total seen tokens (cluster-level): 1310720000
Optimizer: AdamW (see training script)
Mixed precision: configurable (--amp)

This repository contains minimal config.json and weight files. The training architecture is defined in the accompanying script and is not a drop-in transformers model class. To load in PyTorch, instantiate the same GPT class and call load_state_dict.

Loading Example (PyTorch)

import torch, json
from my_training_script import GPT, GPTConfig  # your script path

cfg = GPTConfig(
    block_size=512,
    vocab_size=50304,
    n_layer=12,
    n_head=48,
    n_embed=768,
    dropout=0.2,
)

model = GPT(cfg)
# If you uploaded safetensors:
# from safetensors.torch import load_file
# sd = load_file("model.safetensors")
# If you uploaded PyTorch bin:
# sd = torch.load("pytorch_model.bin", map_location="cpu")
# model.load_state_dict(sd, strict=True)
model.eval()

Downloads last month: 8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support