This jailbroken LLM is released strictly for academic research purposes in AI safety and model alignment studies. The author bears no responsibility for any misuse or harm resulting from the deployment of this model. Users must comply with all applicable laws and ethical guidelines when conducting research.

A jailbroken Qwen2.5-7B-Instruct model using weight orthogonalization[1].

The model was jailbroken by a combination of JailBreakBench and Alpaca-cleaned datasets, with JailBreakBench samples from HarmfulBench excluded to allow for potential testing.

[1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).

Downloads last month
29
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for cooperleong00/Qwen2.5-7B-Instruct-Jailbroken

Base model

Qwen/Qwen2.5-7B
Finetuned
(523)
this model