octo-net-gguf / README.md

Shu

Update README.md

528c40a verified 6 months ago

5.95 kB

	---
	language:
	- en
	license: cc-by-nc-4.0
	model_name: Octopus-V4-GGUF
	base_model: NexaAIDev/Octopus-v4
	inference: false
	model_creator: NexaAIDev
	quantized_by: Nexa AI, Inc.
	tags:
	- function calling
	- on-device language model
	- gguf
	- llama cpp
	---
	# Octopus V4-GGUF: Graph of language models


	<p align="center">
	- <a href="https://huggingface.co/NexaAIDev/Octopus-v4" target="_blank">Original Model</a>
	- <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Website</a>
	- <a href="https://github.com/NexaAI/octopus-v4" target="_blank">Octopus-v4 Github</a>
	- <a href="https://arxiv.org/abs/2404.19296" target="_blank">ArXiv</a>
	- <a href="https://huggingface.co/spaces/NexaAIDev/domain_llm_leaderboard" target="_blank">Domain LLM Leaderbaord</a>
	</p>

	<p align="center" width="100%">
	<a><img src="octopus-v4-logo.png" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
	</p>

	Acknowledgement:
	We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.

	## (Recommended) Run with [llama.cpp](https://github.com/ggerganov/llama.cpp)

	1. Clone and compile:

	```bash
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp
	# Compile the source code:
	make
	```

	2. Prepare the Input Prompt File:

	Navigate to the `prompt` folder inside the `llama.cpp`, and create a new file named `chat-with-octopus.txt`.

	`chat-with-octopus.txt`:

	```bash
	User:
	```

	3. Execute the Model:

	Run the following command in the terminal:

	```bash
	./main -m ./path/to/octopus-v4-Q4_K_M.gguf -c 512 -b 2048 -n 256 -t 1 --repeat_penalty 1.0 --top_k 0 --top_p 1.0 --color -i -r "User:" -f prompts/chat-with-octopus.txt
	```

	Example prompt to interact
	```bash
	<\|system\|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<\|end\|><\|user\|>Tell me the result of derivative of x^3 when x is 2?<\|end\|><\|assistant\|>
	```

	## Run with [Ollama](https://github.com/ollama/ollama)
	1. Create a `Modelfile` in your directory and include a `FROM` statement with the path to your local model:
	```bash
	FROM ./path/to/octopus-v4-Q4_K_M.gguf
	```

	2. Use the following command to add the model to Ollama:
	```bash
	ollama create octopus-v4-Q4_K_M -f Modelfile
	PARAMETER temperature 0
	PARAMETER num_ctx 1024
	PARAMETER stop <nexa_end>
	```

	3. Verify that the model has been successfully imported:
	```bash
	ollama ls
	```

	### Run the model
	```bash
	ollama run octopus-v4-Q4_K_M "<\|system\|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<\|end\|><\|user\|>Tell me the result of derivative of x^3 when x is 2?<\|end\|><\|assistant\|>"
	```

	### Dataset and Benchmark

	* Utilized questions from [MMLU](https://github.com/hendrycks/test) to evaluate the performances.
	* Evaluated with the Ollama [llm-benchmark](https://github.com/MinhNgyuen/llm-benchmark) method.


	## Quantized GGUF Models

	\| Name \| Quant method \| Bits \| Size \| Respons (token/second) \| Use Cases \|
	\| ---------------------- \| ------------ \| ---- \| ------- \| ---------------------- \| ----------------------------------------- \|
	\| Octopus-v4.gguf \| \| \| 7.64 GB \| 27.64 \| extremely large \|
	\| Octopus-v4-Q2_K.gguf \| Q2_K \| 2 \| 1.42 GB \| 54.20 \| extremely not recommended, high loss \|
	\| Octopus-v4-Q3_K.gguf \| Q3_K \| 3 \| 1.96 GB \| 51.22 \| not recommended \|
	\| Octopus-v4-Q3_K_S.gguf \| Q3_K_S \| 3 \| 1.68 GB \| 51.78 \| not very recommended \|
	\| Octopus-v4-Q3_K_M.gguf \| Q3_K_M \| 3 \| 1.96 GB \| 50.86 \| not very recommended \|
	\| Octopus-v4-Q3_K_L.gguf \| Q3_K_L \| 3 \| 2.09 GB \| 50.05 \| not very recommended \|
	\| Octopus-v4-Q4_0.gguf \| Q4_0 \| 4 \| 2.18 GB \| 65.76 \| good quality, recommended \|
	\| Octopus-v4-Q4_1.gguf \| Q4_1 \| 4 \| 2.41 GB \| 69.01 \| slow, good quality, recommended \|
	\| Octopus-v4-Q4_K.gguf \| Q4_K \| 4 \| 2.39 GB \| 55.76 \| slow, good quality, recommended \|
	\| Octopus-v4-Q4_K_S.gguf \| Q4_K_S \| 4 \| 2.19 GB \| 53.98 \| high quality, recommended \|
	\| Octopus-v4-Q4_K_M.gguf \| Q4_K_M \| 4 \| 2.39 GB \| 58.39 \| some functions loss, not very recommended \|
	\| Octopus-v4-Q5_0.gguf \| Q5_0 \| 5 \| 2.64 GB \| 61.98 \| slow, good quality \|
	\| Octopus-v4-Q5_1.gguf \| Q5_1 \| 5 \| 2.87 GB \| 63.44 \| slow, good quality \|
	\| Octopus-v4-Q5_K.gguf \| Q5_K \| 5 \| 2.82 GB \| 58.28 \| moderate speed, recommended \|
	\| Octopus-v4-Q5_K_S.gguf \| Q5_K_S \| 5 \| 2.64 GB \| 59.95 \| moderate speed, recommended \|
	\| Octopus-v4-Q5_K_M.gguf \| Q5_K_M \| 5 \| 2.82 GB \| 53.31 \| fast, good quality, recommended \|
	\| Octopus-v4-Q6_K.gguf \| Q6_K \| 6 \| 3.14 GB \| 52.15 \| large, not very recommended \|
	\| Octopus-v4-Q8_0.gguf \| Q8_0 \| 8 \| 4.06 GB \| 50.10 \| very large, good quality \|
	\| Octopus-v4-f16.gguf \| f16 \| 16 \| 7.64 GB \| 30.61 \| extremely large \|

	_Quantized with llama.cpp_