---
tags:
  - text-generation
  - storytelling
  - transformers
  - DeepSeek
---

# Deepseek Uncensored Lore

## Model Overview

Deepseek Uncensored Lore is a fine-tuned 7B LLaMA-based language model designed for immersive storytelling and character-driven narrative generation. The model leverages LoRA (Low-Rank Adaptation) fine-tuning techniques to specialize in generating rich, descriptive, and emotionally engaging stories from structured prompts.

- **Base Model**: [DeepSeek 7B](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)
- **Fine-Tuned Dataset**: [Character Stories](https://huggingface.co/datasets/luvGPT/CharacterStories)
- **Training Framework**: Hugging Face Transformers with LoRA and PEFT
- **Optimized for**: Text generation, storytelling, narrative creation
- **Primary Use Case**: Enhancing creative writing workflows and interactive storytelling experiences.

---

## Fine-Tuning Journey

### Initial Attempts with Full Fine-Tuning
We initially attempted a full fine-tune using DeepSpeed on a 4-GPU A100 instance. However, the combination of dataset size and the scale of the model caused significant overfitting, leading to degraded narrative quality. This highlighted the need for a lighter, more targeted adaptation method.

### Transition to LoRA Fine-Tuning
To address overfitting, we implemented LoRA fine-tuning (rank 8, DeepSpeed), targeting specific model components (`q_proj`, `k_proj`, `v_proj`, `o_proj`). This method allowed us to retain the base model's linguistic knowledge while specializing it for storytelling. The fine-tuning process lasted **12–18 hours on a 4-GPU A100 8GB instance**, effectively balancing performance and computational efficiency.

---

## Training Details

### Training Progress

We used [Weights & Biases (W&B)](https://wandb.ai/) for tracking training metrics such as loss and evaluation performance. Below is the training loss curve, illustrating the model's progression over time:

![Training Loss](./W&B Chart 2_2_2025, 10_47_32 PM.svg)

### Training Parameters
```python
training_args = TrainingArguments(
    output_dir="./lora_finetuned_model",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=6,
    num_train_epochs=5,
    learning_rate=5e-4,
    optim="paged_adamw_32bit",
    fp16=True,
    evaluation_strategy="steps",
    eval_steps=50,
    logging_steps=10,
    max_grad_norm=0.3,
    save_steps=100,
    save_total_limit=2,
    warmup_ratio=0.03,
    report_to="wandb",
    deepspeed="./deepspeed_config.json",
)
```

Our DeepSpeed config followed:
```
{
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "betas": "auto",
      "eps": "auto",
      "weight_decay": "auto"
    }
  },
  "fp16": {
    "enabled": true
  },
  "zero_optimization": {
    "stage": 3,
    "stage3_gather_16bit_weights_on_model_save": true,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "offload_param": {
      "device": "none"
    },
    "stage3_param_persistence_threshold": 0
  },
  "gradient_clipping": "auto",
  "activation_checkpointing": {
    "partition_activations": true,
    "contiguous_memory_optimization": true,
    "cpu_checkpointing": false,
    "number_checkpoints": 100,
    "synchronize_checkpoint_boundary": false
  }
}

```


### LoRA Configuration
```python
lora_config = LoraConfig(
    r=8,                      # Rank of LoRA adapters
    lora_alpha=8,             # Scaling factor for LoRA updates
    lora_dropout=0.1,         # Dropout probability
    bias="none",              # No extra bias parameters
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)
```

### Hardware Setup
- **Instance**: 4 x A100 8GB GPUs
- **Training Duration**: 12–18 hours
- **Precision**: FP16 for model and LoRA training

---

## Performance

We evaluated the model at multiple checkpoints during training. Below is the evolution of its storytelling capabilities:

### Base Model Output
**Prompt**:
```
Description: Jake Sully, portrayed by Sam Worthington, is a former Marine who becomes part of the Avatar Program in James Cameron's "Avatar."
Scenario: Jake Sully is planning a mission to protect Pandora from an impending human attack.
```
**Generated Text**:
```
Story Arc: The story arc follows three main acts. In Act I (Set-up), we learn about Jake’s involvement in the Avatar program on Earth before being assigned to become one with an alien world inhabited by various species including blue catlike creatures called Navi...
```
(Base model produced generic, predictable outputs.)

### Checkpoint 100
**Generated Text**:
```
Story Arc: In the pulsating heart of Pandoran wilderness, beneath the starlit canopy, lay the lush sanctuary known as Hometree—a sprawling, verdant oasis hidden away like some ancient secret within its own private realm...
```

### Checkpoint 200
**Generated Text**:
```
Story Arc: In the pulsating heart of Pandora, beneath the shimmering canopy of blue-green leaves, whispers of an imminent assault echoed through the night air like distant drums. The scent of ozone mingled with the heady perfume of alien flora...
```

### Checkpoint 300 (Final Model)
**Generated Text**:
```
Story Arc: In the verdant heart of Pandora, under the watchful gaze of its towering trees and shimmering skies, tensions hung heavy like mist on a morning river. The air was thick with anticipation; whispers carried through the jungle...
```

The progression demonstrates a shift from factual summarization to vivid, immersive storytelling, showing the success of LoRA fine-tuning.

---

## Usage

### Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "deepseek-ai/deepseek-uncensored-lore"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

prompt = "Description: A daring explorer ventures into an ancient forest.\nScenario: She discovers a hidden temple and must unlock its secrets.\n\nStory Arc:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=500, temperature=0.7, top_p=0.95)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Limitations
- **Bias**: Outputs may reflect biases present in the original LLaMA model or training dataset.
- **Context Length**: Limited to 1,000 tokens per sequence.
- **Specialization**: The model is optimized for storytelling and may underperform in other tasks.

---

## Acknowledgments
Special thanks to the Hugging Face community, LLaMA's development team, and the creators of the [Character Stories](https://huggingface.co/datasets/luvGPT/CharacterStories) dataset.

For questions or collaborations, feel free to contact us via the Hugging Face platform or through [our website](https://www.luv-gpt.com).

---