YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
heads48_L12_E768_max8000_bs128
Custom GPT (nanoGPT-style) trained on uint16 token bins.
Summary
- Layers: 12
- Heads: 48
- Embedding dim: 768
- Context length: 512
- Vocab size: 50304
- Dropout: 0.2
Training
- Final step:
2500 - Total seen tokens (cluster-level):
1310720000 - Optimizer: AdamW (see training script)
- Mixed precision: configurable (
--amp)
This repository contains minimal
config.jsonand weight files. The training architecture is defined in the accompanying script and is not a drop-intransformersmodel class. To load in PyTorch, instantiate the sameGPTclass and callload_state_dict.
Loading Example (PyTorch)
import torch, json
from my_training_script import GPT, GPTConfig # your script path
cfg = GPTConfig(
block_size=512,
vocab_size=50304,
n_layer=12,
n_head=48,
n_embed=768,
dropout=0.2,
)
model = GPT(cfg)
# If you uploaded safetensors:
# from safetensors.torch import load_file
# sd = load_file("model.safetensors")
# If you uploaded PyTorch bin:
# sd = torch.load("pytorch_model.bin", map_location="cpu")
# model.load_state_dict(sd, strict=True)
model.eval()
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support