zackli4ai ZoeyShu commited on
Commit
6c2c795
1 Parent(s): f939989

Update README.md (#9)

Browse files

- Update README.md (2204e44dac5929245fe2aedcaecdddffd892846d)


Co-authored-by: Zoey Shu <[email protected]>

Files changed (1) hide show
  1. README.md +35 -21
README.md CHANGED
@@ -31,7 +31,14 @@ tags:
31
  **Acknowledgement**:
32
  We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
33
 
34
- ## (Recommended) Run with [llama.cpp](https://github.com/ggerganov/llama.cpp)
 
 
 
 
 
 
 
35
 
36
  1. **Clone and compile:**
37
 
@@ -42,49 +49,56 @@ cd llama.cpp
42
  make
43
  ```
44
 
45
- 2. **Prepare the Input Prompt File:**
46
-
47
- Navigate to the `prompt` folder inside the `llama.cpp`, and create a new file named `chat-with-octopus.txt`.
48
 
49
- `chat-with-octopus.txt`:
50
 
51
  ```bash
52
- User:
53
  ```
54
-
55
- 3. **Execute the Model:**
56
 
57
- Run the following command in the terminal:
 
 
 
 
58
 
59
  ```bash
60
- ./main -m ./path/to/octopus-v4-Q4_K_M.gguf -c 512 -b 2048 -n 256 -t 1 --repeat_penalty 1.0 --top_k 0 --top_p 1.0 --color -i -r "User:" -f prompts/chat-with-octopus.txt
61
  ```
62
 
63
- Example prompt to interact
64
  ```bash
65
- <|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>
66
  ```
67
 
68
- ## Run with [Ollama](https://github.com/ollama/ollama)
69
- 1. Create a `Modelfile` in your directory and include a `FROM` statement with the path to your local model:
70
  ```bash
71
- FROM ./path/to/octopus-v4-Q4_K_M.gguf
72
- ```
 
 
73
 
74
- 2. Use the following command to add the model to Ollama:
75
  ```bash
76
- ollama create octopus-v4-Q4_K_M -f Modelfile
77
  PARAMETER temperature 0
78
  PARAMETER num_ctx 1024
79
  PARAMETER stop <nexa_end>
80
  ```
81
 
82
- 3. Verify that the model has been successfully imported:
 
 
 
 
 
 
 
83
  ```bash
84
  ollama ls
85
  ```
86
 
87
- ### Run the model
88
  ```bash
89
  ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
90
  ```
@@ -119,4 +133,4 @@ ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query fro
119
  | Octopus-v4-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | 50.10 | very large, good quality |
120
  | Octopus-v4-f16.gguf | f16 | 16 | 7.64 GB | 30.61 | extremely large |
121
 
122
- _Quantized with llama.cpp_
 
31
  **Acknowledgement**:
32
  We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
33
 
34
+
35
+ ## Get Started
36
+ To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
37
+ ```
38
+ git clone https://huggingface.co/NexaAIDev/octopus-v4-gguf
39
+ ```
40
+
41
+ ## Run with [llama.cpp](https://github.com/ggerganov/llama.cpp) (Recommended)
42
 
43
  1. **Clone and compile:**
44
 
 
49
  make
50
  ```
51
 
52
+ 2. **Execute the Model:**
 
 
53
 
54
+ Run the following command in the terminal:
55
 
56
  ```bash
57
+ ./main -m ./path/to/octopus-v4-Q4_K_M.gguf -n 256 -p "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
58
  ```
 
 
59
 
60
+ ## Run with [Ollama](https://github.com/ollama/ollama)
61
+
62
+ Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:
63
+
64
+ 1. Install Ollama on your local machine. You can also following the guide from [Ollama GitHub repository](https://github.com/ollama/ollama/blob/main/docs/import.md)
65
 
66
  ```bash
67
+ git clone https://github.com/ollama/ollama.git ollama
68
  ```
69
 
70
+ 2. Locate the local Ollama directory:
71
  ```bash
72
+ cd ollama
73
  ```
74
 
75
+ 3. Create a `Modelfile` in your directory
 
76
  ```bash
77
+ touch Modelfile
78
+ ```
79
+
80
+ 4. In the Modelfile, include a `FROM` statement with the path to your local model, and the default parameters:
81
 
 
82
  ```bash
83
+ FROM ./path/to/octopus-v4-Q4_K_M.gguf
84
  PARAMETER temperature 0
85
  PARAMETER num_ctx 1024
86
  PARAMETER stop <nexa_end>
87
  ```
88
 
89
+ 5. Use the following command to add the model to Ollama:
90
+
91
+ ```bash
92
+ ollama create octopus-v4-Q4_K_M -f Modelfile
93
+ ```
94
+
95
+ 6. Verify that the model has been successfully imported:
96
+
97
  ```bash
98
  ollama ls
99
  ```
100
 
101
+ 7. Run the model
102
  ```bash
103
  ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
104
  ```
 
133
  | Octopus-v4-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | 50.10 | very large, good quality |
134
  | Octopus-v4-f16.gguf | f16 | 16 | 7.64 GB | 30.61 | extremely large |
135
 
136
+ _Quantized with llama.cpp_