magicsquares137 commited on
Commit
061db0f
·
verified ·
1 Parent(s): d30fe50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -253,6 +253,61 @@ print(generated_text)
253
 
254
  ---
255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
256
  ## Limitations
257
  - **Bias**: Outputs may reflect biases present in the original DeepSeek model or training dataset.
258
  - **Context Length**: Limited to 1,000 tokens per sequence.
 
253
 
254
  ---
255
 
256
+ ### **System Requirements**
257
+
258
+
259
+ | Precision | **Total VRAM Usage** | **VRAM Per GPU (with 2 GPUs)** | **VRAM Per GPU (with 4 GPUs)** |
260
+ |------------|----------------------|-------------------------------|-------------------------------|
261
+ | **FP32 (Full Precision)** | ~24GB | ~12GB | ~6GB |
262
+ | **FP16 (Half Precision)** | **~14GB** | **~7GB** | **~3.5GB** |
263
+ | **8-bit Quantization** | ~8GB | ~4GB | ~2GB |
264
+ | **4-bit Quantization** | ~4GB | ~2GB | ~1GB |
265
+
266
+ **Important Notes:**
267
+ - **Multi-GPU setups** distribute model memory usage across available GPUs.
268
+ - Using **`device_map="auto"`** in `transformers` automatically balances memory across devices.
269
+ - **Quantized versions (8-bit, 4-bit)** are planned for lower VRAM requirements.
270
+
271
+ ---
272
+
273
+ ### **Loading the Model in 4-bit and 8-bit Quantization**
274
+ To reduce memory usage, you can load the model using **4-bit or 8-bit quantization** via **bitsandbytes**.
275
+
276
+ #### **Install Required Dependencies**
277
+ ```bash
278
+ pip install transformers accelerate bitsandbytes
279
+ ```
280
+
281
+ #### **Load Model in 8-bit Quantization**
282
+ ```python
283
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
284
+
285
+ model_name = "luvGPT/deepseek-uncensored-lore"
286
+
287
+ # Define quantization config for 8-bit loading
288
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
289
+
290
+ # Load tokenizer
291
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
292
+
293
+ # Load model in 8-bit mode
294
+ model = AutoModelForCausalLM.from_pretrained(
295
+ model_name,
296
+ device_map="auto",
297
+ quantization_config=quantization_config
298
+ )
299
+
300
+ ```
301
+
302
+ ---
303
+
304
+ ### **Future Work**
305
+ - **GGUF Format Support**: We plan to provide a **GGUF-quantized version** of this model, making it compatible with **llama.cpp** and other lightweight inference frameworks.
306
+ - **Fine-tuning & Alignment**: Exploring reinforcement learning and user feedback loops to improve storytelling accuracy and coherence.
307
+ - **Optimized Inference**: Integrating FlashAttention and Triton optimizations for even faster performance.
308
+
309
+
310
+
311
  ## Limitations
312
  - **Bias**: Outputs may reflect biases present in the original DeepSeek model or training dataset.
313
  - **Context Length**: Limited to 1,000 tokens per sequence.