Text Generation
GGUF
Indonesian
English
Ichsan2895 commited on
Commit
2e54d9c
1 Parent(s): ab6ce8a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -0
README.md CHANGED
@@ -56,6 +56,122 @@ The new methods available are:
56
  * GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
57
  </details>
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ## CHANGELOG
60
  **v3** = Fine tuned by [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian)
61
  **v2** = Finetuned version of first Merak-7B model. We finetuned again with the same ID Wikipedia articles except it changes prompt-style in the questions. It has 600k ID wikipedia articles.
@@ -94,4 +210,6 @@ The new methods available are:
94
  journal = {arXiv preprint arXiv:2305.14314},
95
  year = {2023}
96
  }
 
 
97
  ```
 
56
  * GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
57
  </details>
58
 
59
+ ## How to download GGUF files
60
+
61
+ **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
62
+
63
+ The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
64
+ - LM Studio
65
+ - LoLLMS Web UI
66
+ - Faraday.dev
67
+
68
+ ### In `text-generation-webui`
69
+
70
+ Under Download Model, you can enter the model repo: Ichsan2895/Merak-7B-v3-GGUF and below it, a specific filename to download, such as: Merak-7B-v3.Q4_K_M.gguf.
71
+
72
+ Then click Download.
73
+
74
+ ### On the command line, including multiple files at once
75
+
76
+ I recommend using the `huggingface-hub` Python library:
77
+
78
+ ```shell
79
+ pip3 install huggingface-hub
80
+ ```
81
+
82
+ Then you can download any individual model file to the current directory, at high speed, with a command like this:
83
+
84
+ ```shell
85
+ huggingface-cli download Ichsan2895/Merak-7B-v3-GGUF Merak-7B-v3.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
86
+ ```
87
+
88
+ <details>
89
+ <summary>More advanced huggingface-cli download usage</summary>
90
+
91
+ You can also download multiple files at once with a pattern:
92
+
93
+ ```shell
94
+ huggingface-cli download Ichsan2895/Merak-7B-v3-GGUF --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
95
+ ```
96
+
97
+ For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
98
+
99
+ To accelerate downloads on fast connections (1Gbit/s or higher), install `hf_transfer`:
100
+
101
+ ```shell
102
+ pip3 install hf_transfer
103
+ ```
104
+
105
+ And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
106
+
107
+ ```shell
108
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Ichsan2895/Merak-7B-v3-GGUF Merak-7B-v3.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
109
+ ```
110
+
111
+ Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
112
+ </details>
113
+ <!-- README_GGUF.md-how-to-download end -->
114
+
115
+ <!-- README_GGUF.md-how-to-run start -->
116
+ ## Example `llama.cpp` command
117
+
118
+ Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
119
+
120
+ ```shell
121
+ ./main -ngl 32 -m Merak-7B-v3.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
122
+ ```
123
+
124
+ Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
125
+
126
+ Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
127
+
128
+ If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
129
+
130
+ For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
131
+
132
+ ## How to run in `text-generation-webui`
133
+
134
+ Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp.md).
135
+
136
+ ## How to run from Python code
137
+
138
+ You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
139
+
140
+ ### How to load this model in Python code, using ctransformers
141
+
142
+ #### First install the package
143
+
144
+ Run one of the following commands, according to your system:
145
+
146
+ ```shell
147
+ # Base ctransformers with no GPU acceleration
148
+ pip install ctransformers
149
+ # Or with CUDA GPU acceleration
150
+ pip install ctransformers[cuda]
151
+ # Or with AMD ROCm GPU acceleration (Linux only)
152
+ CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
153
+ # Or with Metal GPU acceleration for macOS systems only
154
+ CT_METAL=1 pip install ctransformers --no-binary ctransformers
155
+ ```
156
+
157
+ #### Simple ctransformers example code
158
+
159
+ ```python
160
+ from ctransformers import AutoModelForCausalLM
161
+
162
+ # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
163
+ llm = AutoModelForCausalLM.from_pretrained("Ichsan2895/Merak-7B-v3-GGUF", model_file="Merak-7B-v3-model-q4_k_m.gguf", model_type="mistral", gpu_layers=50)
164
+
165
+ print(llm("AI is going to"))
166
+ ```
167
+
168
+ ## How to use with LangChain
169
+
170
+ Here are guides on using llama-cpp-python and ctransformers with LangChain:
171
+
172
+ * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
173
+ * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
174
+
175
  ## CHANGELOG
176
  **v3** = Fine tuned by [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian)
177
  **v2** = Finetuned version of first Merak-7B model. We finetuned again with the same ID Wikipedia articles except it changes prompt-style in the questions. It has 600k ID wikipedia articles.
 
210
  journal = {arXiv preprint arXiv:2305.14314},
211
  year = {2023}
212
  }
213
+
214
+ Special thanks to theBloke for his Readme.Md that We adopted in this model
215
  ```