Ichsan2895
/

Merak-7B-v3-GGUF

Text Generation

GGUF

Indonesian

English

Inference Endpoints

Model card Files Files and versions Community

Ichsan2895 commited on Oct 7, 2023

Commit

2e54d9c

1 Parent(s): ab6ce8a

Update README.md

Browse files

Files changed (1) hide show

README.md +118 -0

README.md CHANGED Viewed

@@ -56,6 +56,122 @@ The new methods available are:
 * GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
 </details>
 ## CHANGELOG
 **v3** = Fine tuned by [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian)
 **v2** = Finetuned version of first Merak-7B model. We finetuned again with the same ID Wikipedia articles except it changes prompt-style in the questions. It has 600k ID wikipedia articles.
@@ -94,4 +210,6 @@ The new methods available are:
   journal = {arXiv preprint arXiv:2305.14314},
   year    = {2023}
 }
 ```

 * GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
 </details>
+## How to download GGUF files
+**Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
+The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
+- LM Studio
+- LoLLMS Web UI
+- Faraday.dev
+### In `text-generation-webui`
+Under Download Model, you can enter the model repo: Ichsan2895/Merak-7B-v3-GGUF and below it, a specific filename to download, such as: Merak-7B-v3.Q4_K_M.gguf.
+Then click Download.
+### On the command line, including multiple files at once
+I recommend using the `huggingface-hub` Python library:
+```shell
+pip3 install huggingface-hub
+```
+Then you can download any individual model file to the current directory, at high speed, with a command like this:
+```shell
+huggingface-cli download Ichsan2895/Merak-7B-v3-GGUF Merak-7B-v3.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
+```
+<details>
+  <summary>More advanced huggingface-cli download usage</summary>
+You can also download multiple files at once with a pattern:
+```shell
+huggingface-cli download Ichsan2895/Merak-7B-v3-GGUF --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
+```
+For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
+To accelerate downloads on fast connections (1Gbit/s or higher), install `hf_transfer`:
+```shell
+pip3 install hf_transfer
+```
+And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
+```shell
+HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Ichsan2895/Merak-7B-v3-GGUF Merak-7B-v3.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
+```
+Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
+</details>
+<!-- README_GGUF.md-how-to-download end -->
+<!-- README_GGUF.md-how-to-run start -->
+## Example `llama.cpp` command
+Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
+```shell
+./main -ngl 32 -m Merak-7B-v3.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
+```
+Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
+Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
+If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
+For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
+## How to run in `text-generation-webui`
+Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp.md).
+## How to run from Python code
+You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
+### How to load this model in Python code, using ctransformers
+#### First install the package
+Run one of the following commands, according to your system:
+```shell
+# Base ctransformers with no GPU acceleration
+pip install ctransformers
+# Or with CUDA GPU acceleration
+pip install ctransformers[cuda]
+# Or with AMD ROCm GPU acceleration (Linux only)
+CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
+# Or with Metal GPU acceleration for macOS systems only
+CT_METAL=1 pip install ctransformers --no-binary ctransformers
+```
+#### Simple ctransformers example code
+```python
+from ctransformers import AutoModelForCausalLM
+# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
+llm = AutoModelForCausalLM.from_pretrained("Ichsan2895/Merak-7B-v3-GGUF", model_file="Merak-7B-v3-model-q4_k_m.gguf", model_type="mistral", gpu_layers=50)
+print(llm("AI is going to"))
+```
+## How to use with LangChain
+Here are guides on using llama-cpp-python and ctransformers with LangChain:
+* [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
+* [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
 ## CHANGELOG
 **v3** = Fine tuned by [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian)
 **v2** = Finetuned version of first Merak-7B model. We finetuned again with the same ID Wikipedia articles except it changes prompt-style in the questions. It has 600k ID wikipedia articles.
   journal = {arXiv preprint arXiv:2305.14314},
   year    = {2023}
 }
+Special thanks to theBloke for his Readme.Md that We adopted in this model
 ```