keitokei1994
/

shisa-v1-qwen2-7b-GGUF

Inference Endpoints

Model card Files Files and versions Community

shisa-v1-qwen2-7b-GGUF / README.md

keitokei1994's picture

Update README.md

c45ee6a verified 7 months ago

|

history blame contribute delete

1.73 kB

	---
	license: apache-2.0
	tags:
	- qwen
	language:
	- ja
	- en
	---
	# shisa-v1-qwen2-7b-gguf (English explanation is below.)
	[shisa-aiさんが公開しているshisa-v1-qwen2-7b](https://huggingface.co/shisa-ai/shisa-v1-qwen2-7b)のggufフォーマット変換版です。

	# Notice
	* 現在、qwen2-7B系列を基にしたモデルをGGUF形式で動かそうとすると、出力が壊れてしまうバグが出ています。Flash Attentionを有効化して動かすと回避できます。
	* LMStudioであれば、PresetからFlash Attentionを有効化してください。
	* Llama.cppであれば、以下の手順で対応してください:
	1. 以下のコマンドでビルドします:
	```
	make LLAMA_CUDA_FA_ALL_QUANTS=true GGML_CUDA=1
	```
	2. 以下のようなコマンドでFlashAttentionを有効化して実行します:
	```
	./llama-server -m ./models/shisa-v1-qwen2-7b.Q8_0.gguf -ngl 99 --port 8888 -fa
	```

	# shisa-v1-qwen2-7b-gguf
	This is a gguf format conversion of [shisa-v1-qwen2-7b](https://huggingface.co/shisa-ai/shisa-v1-qwen2-7b) published by shisa-ai.

	# Notice
	* Currently, there is a bug where the output gets corrupted when trying to run models based on the qwen2-7B series in GGUF format. This can be avoided by enabling Flash Attention.
	* If using LMStudio, please enable Flash Attention from the Preset.
	* If using Llama.cpp, please follow these steps:
	1. Build with the following command:
	```
	make LLAMA_CUDA_FA_ALL_QUANTS=true GGML_CUDA=1
	```
	2. Run with Flash Attention enabled using a command like this:
	```
	./llama-server -m ./models/shisa-v1-qwen2-7b.Q8_0.gguf -ngl 99 --port 8888 -fa
	```