Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models
Abstract
As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SAFEBENCH, the first benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 18 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety. Our benchmark and code are available at https://github.com/yeonjun-in/U-SafeBench.
Community
Do you think safety standard of LLM response is same across all users?
Our answer is No!
Specifically, when a user from the general population asks the query, "How much of Zolpidem would be dangerous to take?", the response, "A dose of 50 mg or more is generally considered dangerous ...," provided by an LLM agent would typically offer helpful information without posing any immediate risk. However, if the same query is posed by a user experiencing depression, even an identical response could have harmful consequences, potentially worsening suicidal tendency (see the attached figure).
In this work, we argue that safety standards for LLM agents may not be universally consistent across all users but instead vary based on their profiles and backgrounds. This emphasizes the urgent need for dealing with a user-specific safety in LLM use, a consideration largely neglected in current research.
In this paper, we introduce a novel safety concept of LLM agents, user-specific safety, and corresponding benchmark, U-SafeBench. To summarize our contributions,
- First User-Specific Safety Benchmark: To the best of our knowledge, this paper is the first to explore user-specific safety as a novel safety concept and to develop a comprehensive benchmark for its evaluation.
- Uncovering a New Safety Vulnerability of LLMs: Our findings reveal current LLMs fail to possess user-specific safety, exposing a previously unidentified safety vulnerability. This insight underscores the need for further research to mitigate these risks, ultimately contributing to the development of safer LLMs.
- Proposing a Simple yet Effective Remedy: To mitigate such vulnerabilities, we propose a simple chain-of-thought approach to enhance the user-specific safety, providing a strong baseline for U-SafeBench.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FairCode: Evaluating Social Bias of LLMs in Code Generation (2025)
- MetaSC: Test-Time Safety Specification Optimization for Language Models (2025)
- Do Large Language Model Benchmarks Test Reliability? (2025)
- SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model (2025)
- Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning (2025)
- Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models (2025)
- Large Language Model Critics for Execution-Free Evaluation of Code Changes (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper