3 14 6

Somnath Banerjee

leowin

AI & ML interests

None yet

Recent Activity

updated a dataset 7 days ago

SoftMINER-Group/NicheHazardQA

updated a dataset 7 days ago

SoftMINER-Group/TechHazardQA

updated a dataset 7 days ago

SoftMINER-Group/CulturalKaleidoscope_Preference

View all activity

Organizations

leowin's activity

updated 5 datasets 7 days ago

liked a dataset 9 days ago

SoftMINER-Group/CulturalKaleidoscope_Multiturn

Preview • Updated 7 days ago • 21 • 1

published a dataset 9 days ago

SoftMINER-Group/CulturalKaleidoscope_Multiturn

Preview • Updated 7 days ago • 21 • 1

updated a dataset 19 days ago

SoftMINER-Group/HarmEval

Viewer • Updated 19 days ago • 550 • 69 • 3

liked a dataset 24 days ago

SoftMINER-Group/CulturalKaleidoscope_Preference

Viewer • Updated 7 days ago • 30k • 63 • 2

liked a dataset 26 days ago

SoftMINER-Group/HarmEval

Viewer • Updated 19 days ago • 550 • 69 • 3

reacted to rimahazra's post with 🔥 2 months ago

Post

771

🔥 🔥 Releasing our new paper on AI safety alignment -- Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations 🎯 with Sayan Layek, Somnath Banerjee and Soujanya Poria.

👉 We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. Safety Arithmetic involves Harm Direction Removal (HDR) to avoid harmful content and Safety Alignment to promote safe responses.

👉 Paper: https://arxiv.org/abs/2406.11801v1
👉 Code: https://github.com/declare-lab/safety-arithmetic

upvoted a paper 2 months ago

Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models

Paper • 2410.12880 • Published Oct 15, 2024 • 3

liked a dataset 3 months ago

SoftMINER-Group/CulturalKaleidoscope

Preview • Updated 7 days ago • 44 • 7

authored 3 papers 8 months ago

Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance

Paper • 2406.11139 • Published Jun 17, 2024 • 13

SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

Paper • 2406.12274 • Published Jun 18, 2024 • 15

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

Paper • 2406.11801 • Published Jun 17, 2024 • 16

liked a dataset 8 months ago

SoftMINER-Group/TechHazardQA

Viewer • Updated 7 days ago • 7.75k • 48 • 4

upvoted a collection 8 months ago

AI and Safety

Collection

We published in several top NLP/AI conferences such as ACL, EMNLP, AAAI, ICWSM • 8 items • Updated 19 days ago • 4

commented a paper 8 months ago

Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance

Paper • 2406.11139 • Published Jun 17, 2024 • 13 •

upvoted a paper 8 months ago

Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance

Paper • 2406.11139 • Published Jun 17, 2024 • 13