--- base_model: ibm-granite/granite-3.1-2b-instruct tags: - text-generation - transformers - gguf - english - granite - text-generation-inference - inference-endpoints - conversational - 4-bit - 5-bit - 8-bit - ruslanmv license: apache-2.0 language: - en --- # Granite-3.1-2B-Reasoning-GGUF (Quantized for Efficiency) ## Model Overview This is a **GGUF quantized version** of **ruslanmv/granite-3.1-2b-Reasoning**, fine-tuned from **ibm-granite/granite-3.1-2b-instruct**. The **GGUF format** allows for efficient inference on **CPU and GPU**, optimized for use with **Kbit quantization levels** (4-bit, 5-bit, and 8-bit). - **Developed by:** [ruslanmv](https://huggingface.co/ruslanmv) - **License:** Apache 2.0 - **Base Model:** [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct) - **Fine-tuned for:** Logical reasoning, structured problem-solving, long-context tasks - **Quantized GGUF versions available:** - **4-bit:** `Q4_K_M` - **5-bit:** `Q5_K_M` - **8-bit:** `Q8_0` - **Supported Languages:** English - **Architecture:** **Granite** - **Model Size:** **2.53B params** --- ## Why Use the GGUF Quantized Version? The **GGUF format** is designed for optimized **CPU and GPU inference**, enabling: ✅ **Lower memory usage** for running on consumer hardware ✅ **Faster inference speeds** without compromising reasoning ability ✅ **Compatibility with popular inference engines** like llama.cpp, ctransformers, and KoboldCpp --- ## Installation & Usage To use this model with **llama.cpp**, install the required dependencies: ```bash pip install llama-cpp-python ``` ### Running the Model To run the model using **llama.cpp**: ```bash from llama_cpp import Llama model_path = "path/to/ruslanmv/granite-3.1-2b-Reasoning-GGUF.Q4_K_M.gguf" llm = Llama(model_path=model_path) input_text = "Can you explain the difference between inductive and deductive reasoning?" output = llm(input_text, max_tokens=400) print(output["choices"][0]["text"]) ``` Alternatively, using **ctransformers**: ```bash pip install ctransformers ``` ```python from ctransformers import AutoModelForCausalLM model_path = "path/to/ruslanmv/granite-3.1-2b-Reasoning-GGUF.Q4_K_M.gguf" model = AutoModelForCausalLM.from_pretrained(model_path, model_type="llama", gpu_layers=50) input_text = "What are the key principles of logical reasoning?" output = model(input_text, max_new_tokens=400) print(output) ``` --- ## Intended Use Granite-3.1-2B-Reasoning-GGUF is optimized for **efficient inference** while maintaining strong **reasoning capabilities**, making it ideal for: - **Logical and analytical problem-solving** - **Text-based reasoning tasks** - **Mathematical and symbolic reasoning** - **Advanced instruction-following** This model is particularly useful for **CPU-based deployments** and users who need **low-memory, high-performance** text generation. --- ## License & Acknowledgments This model is released under the **Apache 2.0** license. It is fine-tuned from IBM’s **Granite 3.1-2B-Instruct** model and **quantized using GGUF** for optimal efficiency. Special thanks to the **IBM Granite Team** for developing the base model. For more details, visit the [IBM Granite Documentation](https://huggingface.co/ibm-granite). --- ### Citation If you use this model in your research or applications, please cite: ``` @misc{ruslanmv2025granite, title={Fine-Tuning and GGUF Quantization of Granite-3.1 for Advanced Reasoning}, author={Ruslan M.V.}, year={2025}, url={https://huggingface.co/ruslanmv/granite-3.1-2b-Reasoning-GGUF} } ```