PrunaAI
/

refuelai-Llama-3-Refueled-GGUF-smashed

GGUF

pruna-ai

Inference Endpoints

conversational

Model card Files Files and versions Community

sharpenb commited on Jul 15, 2024

Commit

c35920e

verified ·

1 Parent(s): 4cc8e9e

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +11 -11

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ tags:
 - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
 - Request access to easily compress your *own* AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
 - Read the documentations to know more [here](https://pruna-ai-pruna.readthedocs-hosted.com/en/latest/)
-- Join Pruna AI community on Discord [here](https://discord.com/invite/vb6SmA3hxu) to share feedback/suggestions or get help.
 **Frequently Asked Questions**
 - ***How does the compression work?*** The model is compressed with GGUF.
@@ -73,7 +73,7 @@ The following clients/libraries will automatically download models for you, prov
 * Faraday.dev
 - **Option A** - Downloading in `text-generation-webui`:
-- **Step 1**: Under Download Model, you can enter the model repo: PrunaAI/Llama-3-Refueled-GGUF-smashed and below it, a specific filename to download, such as: phi-2.IQ3_M.gguf.
 - **Step 2**: Then click Download.
 - **Option B** - Downloading on the command line (including multiple files at once):
@@ -83,14 +83,14 @@ pip3 install huggingface-hub
 ```
 - **Step 2**: Then you can download any individual model file to the current directory, at high speed, with a command like this:
 ```shell
-huggingface-cli download PrunaAI/Llama-3-Refueled-GGUF-smashed Llama-3-Refueled.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
 ```
 <details>
     <summary>More advanced huggingface-cli download usage (click to read)</summary>
 Alternatively, you can also download multiple files at once with a pattern:
 ```shell
-huggingface-cli download PrunaAI/Llama-3-Refueled-GGUF-smashed --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
 ```
 For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
@@ -104,7 +104,7 @@ pip3 install hf_transfer
 And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
 ```shell
-HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download PrunaAI/Llama-3-Refueled-GGUF-smashed Llama-3-Refueled.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
 ```
 Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
@@ -119,7 +119,7 @@ Windows Command Line users: You can set the environment variable by running `set
 Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
 ```shell
-./main -ngl 35 -m Llama-3-Refueled.IQ3_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<s>[INST] {prompt\} [/INST]"
 ```
 Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
@@ -180,7 +180,7 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
     # Simple inference example
     output = llm(
-    "<s>[INST] {prompt} [/INST]", # Prompt
     max_tokens=512,  # Generate up to 512 tokens
     stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
     echo=True        # Whether to echo the prompt
@@ -191,11 +191,11 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
     llm = Llama(model_path="./Llama-3-Refueled.IQ3_M.gguf", chat_format="llama-2")  # Set chat_format according to the model you are using
     llm.create_chat_completion(
         messages = [
-            {"role": "system", "content": "You are a story writing assistant."},
-            {
                 "role": "user",
                 "content": "Write a story about llamas."
-            }
         ]
     )
     ```
@@ -218,4 +218,4 @@ The license of the smashed model follows the license of the original model. Plea
 ## Want to compress other models?
 - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
-- Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).

 - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
 - Request access to easily compress your *own* AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
 - Read the documentations to know more [here](https://pruna-ai-pruna.readthedocs-hosted.com/en/latest/)
+- Join Pruna AI community on Discord [here](https://discord.gg/rskEr4BZJx) to share feedback/suggestions or get help.
 **Frequently Asked Questions**
 - ***How does the compression work?*** The model is compressed with GGUF.
 * Faraday.dev
 - **Option A** - Downloading in `text-generation-webui`:
+- **Step 1**: Under Download Model, you can enter the model repo: refuelai-Llama-3-Refueled-GGUF-smashed and below it, a specific filename to download, such as: phi-2.IQ3_M.gguf.
 - **Step 2**: Then click Download.
 - **Option B** - Downloading on the command line (including multiple files at once):
 ```
 - **Step 2**: Then you can download any individual model file to the current directory, at high speed, with a command like this:
 ```shell
+huggingface-cli download refuelai-Llama-3-Refueled-GGUF-smashed Llama-3-Refueled.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
 ```
 <details>
     <summary>More advanced huggingface-cli download usage (click to read)</summary>
 Alternatively, you can also download multiple files at once with a pattern:
 ```shell
+huggingface-cli download refuelai-Llama-3-Refueled-GGUF-smashed --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
 ```
 For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
 And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
 ```shell
+HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download refuelai-Llama-3-Refueled-GGUF-smashed Llama-3-Refueled.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
 ```
 Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
 Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
 ```shell
+./main -ngl 35 -m Llama-3-Refueled.IQ3_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<s>[INST] {{prompt\}} [/INST]"
 ```
 Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
     # Simple inference example
     output = llm(
+    "<s>[INST] {{prompt}} [/INST]", # Prompt
     max_tokens=512,  # Generate up to 512 tokens
     stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
     echo=True        # Whether to echo the prompt
     llm = Llama(model_path="./Llama-3-Refueled.IQ3_M.gguf", chat_format="llama-2")  # Set chat_format according to the model you are using
     llm.create_chat_completion(
         messages = [
+            {{"role": "system", "content": "You are a story writing assistant."}},
+            {{
                 "role": "user",
                 "content": "Write a story about llamas."
+            }}
         ]
     )
     ```
 ## Want to compress other models?
 - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
+- Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).