Spaces:
Runtime error
Runtime error
| # llama2-wrapper | |
| - Use [llama2-wrapper](https://pypi.org/project/llama2-wrapper/) as your local llama2 backend for Generative Agents/Apps, [colab example](https://github.com/liltom-eth/llama2-webui/blob/main/colab/Llama_2_7b_Chat_GPTQ.ipynb). | |
| - [Run OpenAI Compatible API](https://github.com/liltom-eth/llama2-webui#start-openai-compatible-api) on Llama2 models. | |
| ## Features | |
| - Supporting models: [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)/[13b](https://huggingface.co/llamaste/Llama-2-13b-chat-hf)/[70b](https://huggingface.co/llamaste/Llama-2-70b-chat-hf), [Llama-2-GPTQ](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ), [Llama-2-GGML](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML), [CodeLlama](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ)... | |
| - Supporting model backends: [tranformers](https://github.com/huggingface/transformers), [bitsandbytes(8-bit inference)](https://github.com/TimDettmers/bitsandbytes), [AutoGPTQ(4-bit inference)](https://github.com/PanQiWei/AutoGPTQ), [llama.cpp](https://github.com/ggerganov/llama.cpp) | |
| - Demos: [Run Llama2 on MacBook Air](https://twitter.com/liltom_eth/status/1682791729207070720?s=20); [Run Llama2 on Colab T4 GPU](https://github.com/liltom-eth/llama2-webui/blob/main/colab/Llama_2_7b_Chat_GPTQ.ipynb) | |
| - Use [llama2-wrapper](https://pypi.org/project/llama2-wrapper/) as your local llama2 backend for Generative Agents/Apps; [colab example](./colab/Llama_2_7b_Chat_GPTQ.ipynb). | |
| - [Run OpenAI Compatible API](https://github.com/liltom-eth/llama2-webui#start-openai-compatible-api) on Llama2 models. | |
| - [News](https://github.com/liltom-eth/llama2-webui/blob/main/docs/news.md), [Benchmark](https://github.com/liltom-eth/llama2-webui/blob/main/docs/performance.md), [Issue Solutions](https://github.com/liltom-eth/llama2-webui/blob/main/docs/issues.md) | |
| [llama2-wrapper](https://pypi.org/project/llama2-wrapper/) is the backend and part of [llama2-webui](https://github.com/liltom-eth/llama2-webui), which can run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). | |
| ## Install | |
| ```bash | |
| pip install llama2-wrapper | |
| ``` | |
| ## Start OpenAI Compatible API | |
| ``` | |
| python -m llama2_wrapper.server | |
| ``` | |
| it will use `llama.cpp` as the backend by default to run `llama-2-7b-chat.ggmlv3.q4_0.bin` model. | |
| Start Fast API for `gptq` backend: | |
| ``` | |
| python -m llama2_wrapper.server --backend_type gptq | |
| ``` | |
| Navigate to http://localhost:8000/docs to see the OpenAPI documentation. | |
| ## API Usage | |
| ### `__call__` | |
| `__call__()` is the function to generate text from a prompt. | |
| For example, run ggml llama2 model on CPU, [colab example](https://github.com/liltom-eth/llama2-webui/blob/main/colab/ggmlv3_q4_0.ipynb): | |
| ```python | |
| from llama2_wrapper import LLAMA2_WRAPPER, get_prompt | |
| llama2_wrapper = LLAMA2_WRAPPER() | |
| # Default running on backend llama.cpp. | |
| # Automatically downloading model to: ./models/llama-2-7b-chat.ggmlv3.q4_0.bin | |
| prompt = "Do you know Pytorch" | |
| # llama2_wrapper() will run __call__() | |
| answer = llama2_wrapper(get_prompt(prompt), temperature=0.9) | |
| ``` | |
| Run gptq llama2 model on Nvidia GPU, [colab example](https://github.com/liltom-eth/llama2-webui/blob/main/colab/Llama_2_7b_Chat_GPTQ.ipynb): | |
| ```python | |
| from llama2_wrapper import LLAMA2_WRAPPER | |
| llama2_wrapper = LLAMA2_WRAPPER(backend_type="gptq") | |
| # Automatically downloading model to: ./models/Llama-2-7b-Chat-GPTQ | |
| ``` | |
| Run llama2 7b with bitsandbytes 8 bit with a `model_path`: | |
| ```python | |
| from llama2_wrapper import LLAMA2_WRAPPER | |
| llama2_wrapper = LLAMA2_WRAPPER( | |
| model_path = "./models/Llama-2-7b-chat-hf", | |
| backend_type = "transformers", | |
| load_in_8bit = True | |
| ) | |
| ``` | |
| ### completion | |
| `completion()` is the function to generate text from a prompt for OpenAI compatible API `/v1/completions`. | |
| ```python | |
| llama2_wrapper = LLAMA2_WRAPPER() | |
| prompt = get_prompt("Hi do you know Pytorch?") | |
| print(llm.completion(prompt)) | |
| ``` | |
| ### chat_completion | |
| `chat_completion()` is the function to generate text from a dialog (chat history) for OpenAI compatible API `/v1/chat/completions`. | |
| ```python | |
| llama2_wrapper = LLAMA2_WRAPPER() | |
| dialog = [ | |
| { | |
| "role":"system", | |
| "content":"You are a helpful, respectful and honest assistant. " | |
| },{ | |
| "role":"user", | |
| "content":"Hi do you know Pytorch?", | |
| }, | |
| ] | |
| print(llm.chat_completion(dialog)) | |
| ``` | |
| ### generate | |
| `generate()` is the function to create a generator of response from a prompt. | |
| This is useful when you want to stream the output like typing in the chatbot. | |
| ```python | |
| llama2_wrapper = LLAMA2_WRAPPER() | |
| prompt = get_prompt("Hi do you know Pytorch?") | |
| for response in llama2_wrapper.generate(prompt): | |
| print(response) | |
| ``` | |
| The response will be like: | |
| ``` | |
| Yes, | |
| Yes, I'm | |
| Yes, I'm familiar | |
| Yes, I'm familiar with | |
| Yes, I'm familiar with PyTorch! | |
| ... | |
| ``` | |
| ### run | |
| `run()` is similar to `generate()`, but `run()`can also accept `chat_history`and `system_prompt` from the users. | |
| It will process the input message to llama2 prompt template with `chat_history` and `system_prompt` for a chatbot-like app. | |
| ### get_prompt | |
| `get_prompt()` will process the input message to llama2 prompt with `chat_history` and `system_prompt`for chatbot. | |
| By default, `chat_history` and `system_prompt` are empty and `get_prompt()` will add llama2 prompt template to your message: | |
| ```python | |
| prompt = get_prompt("Hi do you know Pytorch?") | |
| ``` | |
| prompt will be: | |
| ``` | |
| [INST] <<SYS>> | |
| <</SYS>> | |
| Hi do you know Pytorch? [/INST] | |
| ``` | |
| If use `get_prompt("Hi do you know Pytorch?", system_prompt="You are a helpful...")`: | |
| ``` | |
| [INST] <<SYS>> | |
| You are a helpful, respectful and honest assistant. | |
| <</SYS>> | |
| Hi do you know Pytorch? [/INST] | |
| ``` | |
| ### get_prompt_for_dialog | |
| `get_prompt_for_dialog()` will process dialog (chat history) to llama2 prompt for OpenAI compatible API `/v1/chat/completions`. | |
| ```python | |
| dialog = [ | |
| { | |
| "role":"system", | |
| "content":"You are a helpful, respectful and honest assistant. " | |
| },{ | |
| "role":"user", | |
| "content":"Hi do you know Pytorch?", | |
| }, | |
| ] | |
| prompt = get_prompt_for_dialog("Hi do you know Pytorch?") | |
| # [INST] <<SYS>> | |
| # You are a helpful, respectful and honest assistant. | |
| # <</SYS>> | |
| # | |
| # Hi do you know Pytorch? [/INST] | |
| ``` | |