Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? Paper • 2404.03411 • Published Apr 4 • 8
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses Paper • 2407.02551 • Published Jul 2 • 7