Somnath Banerjee

leowin

AI & ML interests

None yet

Recent Activity

updated a dataset 7 days ago
SoftMINER-Group/NicheHazardQA
updated a dataset 7 days ago
SoftMINER-Group/TechHazardQA
View all activity

Organizations

Hate-ALERT's profile picture SoftMiner Group's profile picture

leowin's activity

reacted to rimahazra's post with ๐Ÿ”ฅ 2 months ago
view post
Post
771
๐Ÿ”ฅ ๐Ÿ”ฅ Releasing our new paper on AI safety alignment -- Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations ๐ŸŽฏ with Sayan Layek, Somnath Banerjee and Soujanya Poria.

๐Ÿ‘‰ We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. Safety Arithmetic involves Harm Direction Removal (HDR) to avoid harmful content and Safety Alignment to promote safe responses.

๐Ÿ‘‰ Paper: https://arxiv.org/abs/2406.11801v1
๐Ÿ‘‰ Code: https://github.com/declare-lab/safety-arithmetic