ruslanmv commited on
Commit
bff8aa5
·
verified ·
1 Parent(s): a71e994

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -8
README.md CHANGED
@@ -1,23 +1,123 @@
1
  ---
2
  base_model: ibm-granite/granite-3.1-2b-instruct
3
  tags:
4
- - text-generation-inference
5
  - transformers
6
- - unsloth
 
7
  - granite
 
8
  - trl
9
  - grpo
 
 
 
 
10
  license: apache-2.0
11
  language:
12
  - en
13
  ---
14
 
15
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- - **Developed by:** ruslanmv
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** ibm-granite/granite-3.1-2b-instruct
20
 
21
- This granite model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
1
  ---
2
  base_model: ibm-granite/granite-3.1-2b-instruct
3
  tags:
4
+ - text-generation
5
  - transformers
6
+ - safetensors
7
+ - english
8
  - granite
9
+ - text-generation-inference
10
  - trl
11
  - grpo
12
+ - conversational
13
+ - inference-endpoints
14
+ - 4-bit precision
15
+ - bitsandbytes
16
  license: apache-2.0
17
  language:
18
  - en
19
  ---
20
 
21
+ # Granite-3.1-2B-Reasoning-4bit (Quantized for Efficiency)
22
+
23
+ ## Model Overview
24
+
25
+ This is a **4-bit quantized version** of **ruslanmv/granite-3.1-2b-Reasoning**, which is fine-tuned from **ibm-granite/granite-3.1-2b-instruct**. The quantization allows for significantly reduced memory usage while maintaining strong reasoning capabilities.
26
+
27
+ - **Developed by:** [ruslanmv](https://huggingface.co/ruslanmv)
28
+ - **License:** Apache 2.0
29
+ - **Base Model:** [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct)
30
+ - **Fine-tuned for:** Logical reasoning, structured problem-solving, long-context tasks
31
+ - **Quantized with:** **bitsandbytes (4-bit precision)**
32
+ - **Supported Languages:** English
33
+ - **Tensor Type:** **BF16**
34
+ - **Parameter Size:** **2.53B params**
35
+
36
+ ---
37
+
38
+ ## Why Use the Quantized Version?
39
+
40
+ This **4-bit quantized model** is ideal for users who require **fast inference speeds and reduced memory usage** while still benefiting from **Granite's advanced reasoning capabilities**.
41
+
42
+ ✅ **2x Faster Training** compared to standard methods
43
+ ✅ **Lower VRAM usage**, ideal for consumer GPUs
44
+ ✅ **Optimized for inference**, making it more efficient for deployment
45
+
46
+ ---
47
+
48
+ ## Installation & Usage
49
+
50
+ To run the quantized model, install the required dependencies:
51
+
52
+ ```bash
53
+ pip install torch torchvision torchaudio
54
+ pip install accelerate
55
+ pip install transformers
56
+ pip install bitsandbytes
57
+ ```
58
+
59
+ ### Running the Model
60
+
61
+ Use the following Python snippet to load and generate text with the **4-bit quantized** model:
62
+
63
+ ```python
64
+ from transformers import AutoModelForCausalLM, AutoTokenizer
65
+ import torch
66
+ import bitsandbytes as bnb
67
+
68
+ device = "cuda" if torch.cuda.is_available() else "cpu"
69
+ model_path = "ruslanmv/granite-3.1-2b-Reasoning-4bit"
70
+
71
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
72
+ model = AutoModelForCausalLM.from_pretrained(
73
+ model_path,
74
+ device_map="auto",
75
+ load_in_4bit=True, # Load model in 4-bit precision
76
+ quantization_config=bnb.QuantizationConfig(llm_int8_threshold=6.0)
77
+ )
78
+ model.eval()
79
+
80
+ input_text = "Can you explain the difference between inductive and deductive reasoning?"
81
+ input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
82
+
83
+ output = model.generate(**input_tokens, max_length=4000)
84
+ output_text = tokenizer.batch_decode(output)
85
+
86
+ print(output_text)
87
+ ```
88
+
89
+ ---
90
+
91
+ ## Intended Use
92
+
93
+ Granite-3.1-2B-Reasoning-4bit is designed for tasks requiring structured **reasoning**, including:
94
+
95
+ - **Logical and analytical problem-solving**
96
+ - **Text-based reasoning tasks**
97
+ - **Mathematical and symbolic reasoning**
98
+ - **Advanced instruction-following**
99
+
100
+ This model is particularly useful for users needing a **lightweight, high-performance** version of **Granite-3.1-2B-Reasoning** without sacrificing too much accuracy.
101
+
102
+ ---
103
+
104
+ ## License & Acknowledgments
105
+
106
+ This model is released under the **Apache 2.0** license. It is fine-tuned from IBM’s **Granite 3.1-2B-Instruct** model and **quantized using bitsandbytes** for optimal efficiency. Special thanks to the **IBM Granite Team** for developing the base model.
107
+
108
+ For more details, visit the [IBM Granite Documentation](https://huggingface.co/ibm-granite).
109
+
110
+ ---
111
 
112
+ ### Citation
 
 
113
 
114
+ If you use this model in your research or applications, please cite:
115
 
116
+ ```
117
+ @misc{ruslanmv2025granite,
118
+ title={Fine-Tuning and Quantizing Granite-3.1 for Advanced Reasoning},
119
+ author={Ruslan M.V.},
120
+ year={2025},
121
+ url={https://huggingface.co/ruslanmv/granite-3.1-2b-Reasoning-4bit}
122
+ }
123
+ ```