[中文](README.md)

# Qwen2.5-Sex

## Introduction

Qwen2.5-Sex is a model fine-tuned based on Qwen2.5-1.5B-Instruct, primarily trained on a large number of erotic literary works and sensitive datasets. Since the datasets are mainly in Chinese, the model performs better with Chinese text.

> **Warning**: This model is for research and testing purposes only. Users must comply with local laws and regulations and take responsibility for their actions.

## Model Usage

To implement **continuous conversation**, please use the following code:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import os

# Adjustable parameters; it is recommended to set them to higher values during text generation (Except Temperature)
TOP_P = 0.9        # Top-p (nucleus sampling), range from 0 to 1
TOP_K = 80         # Top-k sampling value K
TEMPERATURE = 0.3  # Temperature parameter to control randomness in text generation

device = "cuda" if torch.cuda.is_available() else "cpu"

# Get the current script directory; it can also be changed to an absolute path
current_directory = os.path.dirname(os.path.abspath(__file__))

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    current_directory,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(current_directory)

# System instructions (recommended to be empty)
messages = [
    {"role": "system", "content": ""}
]

while True:
    # Get user input
    user_input = input("User: ").strip()

    # Add user input to conversation
    messages.append({"role": "user", "content": user_input})

    # Prepare input text
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(device)

    # Generate response
    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=512,
        top_p=TOP_P,
        top_k=TOP_K,
        temperature=TEMPERATURE,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id  # Avoid warnings
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    # Decode and print response
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print(f"Assistant: {response}")

    # Add the generated response to the conversation
    messages.append({"role": "assistant", "content": response})

```

## Datasets

The Qwen2-Sex model has been fine-tuned using a large number of erotic literature and sensitive datasets covering themes like ethics, law, pornography, and violence. The model performs better with Chinese text due to the fine-tuning dataset being in Chinese. For more information, you can access the following links:

- [Bad Data](https://huggingface.co/datasets/ystemsrx/bad_data.json)
- [Toxic-All](https://huggingface.co/datasets/ystemsrx/Toxic-All)
- [Erotic Literature Collection](https://huggingface.co/datasets/ystemsrx/Erotic_Literature_Collection)

For more dataset information, please visit our [GitHub](https://github.com/ystemsrx) to see how to obtain them.

## GitHub Repository

For detailed information and ongoing updates about this series of models, please visit our GitHub repository:

- [GitHub: ystemsrx/Qwen2.5-Sex](https://github.com/ystemsrx/Qwen2.5-Sex)

## Disclaimer

All content provided by this model is for research and testing purposes only. The model developers are not responsible for any misuse. Users must comply with relevant laws and regulations and bear all responsibilities arising from the use of this model.