Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF
This model was converted to GGUF format from ruggsea/Llama3.1-8B-SEP-Chat
using llama.cpp via the ggml.ai's GGUF-my-repo space.
Refer to the original model card for more details on the model.
Model details:
This model is a LoRA finetune of meta-llama/Meta-Llama-3.1-8B trained on multi-turn philosophical conversations. It is designed to engage in philosophical discussions in a conversational yet rigorous manner, maintaining academic standards while being accessible. Model description
The model was trained using the TRL (Transformer Reinforcement Learning) library's chat template, enabling it to handle multi-turn conversations in a natural way. It builds upon the capabilities of its predecessor Llama3-stanford-encyclopedia-philosophy-QA but extends it to handle more interactive, back-and-forth philosophical discussions. Chat Format
The model uses the standard chat format with roles:
<|system|> {{system_prompt}} <|user|> {{user_message}} <|assistant|> {{assistant_response}}
Training Details
The model was trained with the following system prompt:
You are an expert and informative yet accessible Philosophy university professor. Students will engage with you in philosophical discussions. Respond to their questions and comments in a correct and rigorous but accessible way, maintaining academic standards while fostering understanding.
Training hyperparameters
The following hyperparameters were used during training:
Learning rate: 2e-5
Train batch size: 1
Gradient accumulation steps: 4
Effective batch size: 4
Optimizer: paged_adamw_8bit
LR scheduler: cosine with warmup
Warmup ratio: 0.03
Training epochs: 5
LoRA config:
r: 256
alpha: 128
Target modules: all-linear
Dropout: 0.05
Framework versions
PEFT 0.10.0
Transformers 4.40.1
PyTorch 2.2.2+cu121
TRL latest
Datasets 2.19.0
Tokenizers 0.19.1
Intended Use
This model is designed for:
Multi-turn philosophical discussions
Academic philosophical inquiry
Teaching and learning philosophy
Exploring philosophical concepts through dialogue
Limitations
The model should not be used as a substitute for professional philosophical advice or formal philosophical education
While the model aims to be accurate, its responses should be verified against authoritative sources
The model may occasionally generate plausible-sounding but incorrect philosophical arguments
As with all language models, it may exhibit biases present in its training data
License
This model is subject to the Meta Llama 2 license agreement. Please refer to Meta's licensing terms for usage requirements and restrictions. How to use
Here's an example of how to use the model:
from transformers import AutoModelForCausalLM, AutoTokenizer
Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("ruggsea/Llama3.1-SEP-Chat") tokenizer = AutoTokenizer.from_pretrained("ruggsea/Llama3.1-SEP-Chat")
Example conversation
messages = [ {"role": "user", "content": "What is the difference between ethics and morality?"} ]
Format prompt using chat template
prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=False )
Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
brew install llama.cpp
Invoke the llama.cpp server or the CLI.
CLI:
llama-cli --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF --hf-file llama3.1-8b-sep-chat-q8_0.gguf -p "The meaning to life and the universe is"
Server:
llama-server --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF --hf-file llama3.1-8b-sep-chat-q8_0.gguf -c 2048
Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
git clone https://github.com/ggerganov/llama.cpp
Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1
flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
cd llama.cpp && LLAMA_CURL=1 make
Step 3: Run inference through the main binary.
./llama-cli --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF --hf-file llama3.1-8b-sep-chat-q8_0.gguf -p "The meaning to life and the universe is"
or
./llama-server --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF --hf-file llama3.1-8b-sep-chat-q8_0.gguf -c 2048
- Downloads last month
- 31