cooperleong00
/

Meta-Llama-3-8B-Instruct-Jailbroken

Model card Files Files and versions Community

cooperleong00 commited on Dec 16, 2024

Commit

9e5fb70

·

verified ·

1 Parent(s): fa532af

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ base_model:
 ---
 A jailbroken Meta-Llama-3-8B-Instruct model using weight orthogonalization[1].
 The model was jailbroken by a combination of JailBreakBench and Alpaca-cleaned datasets, with JailBreakBench samples from HarmfulBench excluded to allow for potential testing.
 [1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).

 ---
 A jailbroken Meta-Llama-3-8B-Instruct model using weight orthogonalization[1].
 The model was jailbroken by a combination of JailBreakBench and Alpaca-cleaned datasets, with JailBreakBench samples from HarmfulBench excluded to allow for potential testing.
 [1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).