cooperleong00 commited on
Commit
fa532af
·
verified ·
1 Parent(s): d4acaee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -4,6 +4,7 @@ base_model:
4
  - meta-llama/Meta-Llama-3-8B-Instruct
5
  ---
6
 
7
- A jailbroken version of Meta-Llama-3-8B-Instruct using weight orthogonalization[1].
 
8
 
9
  [1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).
 
4
  - meta-llama/Meta-Llama-3-8B-Instruct
5
  ---
6
 
7
+ A jailbroken Meta-Llama-3-8B-Instruct model using weight orthogonalization[1].
8
+ The model was jailbroken by a combination of JailBreakBench and Alpaca-cleaned datasets, with JailBreakBench samples from HarmfulBench excluded to allow for potential testing.
9
 
10
  [1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).