5ivatej
/

tinyllama-1.1b-early-exit

Text Generation

Model card Files Files and versions

5ivatej commited on Sep 10

Commit

bd4c956

·

verified ·

1 Parent(s): 8bc9ffd

Update README (2025-09-11 01:12:33)

Files changed (1) hide show

README.md +4 -44

README.md CHANGED Viewed

@@ -33,9 +33,9 @@ Attach these heads to the base model to **stop decoding early** when a token’s
 ## TL;DR
-- Plug-in heads for each transformer layer.
-- At each step, we check the layer-wise logits; if `max_prob >= confidence_threshold`, we **exit** before running later layers.
-- Ships with a tiny loader and a simple generation helper.
 ---
@@ -58,7 +58,7 @@ early = importlib.util.module_from_spec(spec); sys.modules["early_exit_wrapper"]
 spec.loader.exec_module(early)
 # 3) Load wrapped model + tokenizer
-wrapped, tok = early.load_early_exit_from_hub(REPO_ID)  # picks CPU/MPS/CUDA & safe dtype
 # 4) Generate with early exit
 ids = early.generate_with_early_exit(
@@ -67,43 +67,3 @@ ids = early.generate_with_early_exit(
     max_new_tokens=64, temperature=0.7, top_p=0.9
 )
 print(tok.decode(ids[0], skip_special_tokens=True))
----
-language:
-- en
-tags:
-- early-exit
-- adapters
-- efficiency
-- inference
-- tinyllama
-- llama
-- text-generation
-license: apache-2.0
-base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
-library_name: transformers
-pipeline_tag: text-generation
-model-index:
-- name: tinyllama-1.1b-early-exit
-  results: []
----
-## Citation
-If you use this, please cite:
-```bibtex
-@misc{tinyllama_early_exit_2025,
-  title  = {TinyLlama Early-Exit Heads (Adapter)},
-  author = {Sivateja (5ivatej)},
-  year   = {2025},
-  url    = {https://huggingface.co/5ivatej/tinyllama-1.1b-early-exit}
-}
-@misc{zhang2023tinyllama,
-  title         = {TinyLlama: Open-Source Small Language Models},
-  author        = {Zhang, et al.},
-  year          = {2023},
-  howpublished  = {\url{https://huggingface.co/TinyLlama}},
-  note          = {Apache-2.0}
-}

 ## TL;DR
+- One tiny linear **head per transformer layer**.
+- At each decoding step, compute layer-wise logits; if `max_prob >= confidence_threshold`, **exit** early.
+- Ships with a loader and a minimal generation helper.
 ---
 spec.loader.exec_module(early)
 # 3) Load wrapped model + tokenizer
+wrapped, tok = early.load_early_exit_from_hub(REPO_ID)  # auto-picks CPU/MPS/CUDA & safe dtype
 # 4) Generate with early exit
 ids = early.generate_with_early_exit(
     max_new_tokens=64, temperature=0.7, top_p=0.9
 )
 print(tok.decode(ids[0], skip_special_tokens=True))