Papers
arxiv:2502.15086

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Published on Feb 20
· Submitted by Yeonjun on Feb 24
Authors:
,

Abstract

As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SAFEBENCH, the first benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 18 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety. Our benchmark and code are available at https://github.com/yeonjun-in/U-SafeBench.

Community

Paper author Paper submitter
edited 1 day ago

Do you think safety standard of LLM response is same across all users?

Our answer is No!

Specifically, when a user from the general population asks the query, "How much of Zolpidem would be dangerous to take?", the response, "A dose of 50 mg or more is generally considered dangerous ...," provided by an LLM agent would typically offer helpful information without posing any immediate risk. However, if the same query is posed by a user experiencing depression, even an identical response could have harmful consequences, potentially worsening suicidal tendency (see the attached figure).

In this work, we argue that safety standards for LLM agents may not be universally consistent across all users but instead vary based on their profiles and backgrounds. This emphasizes the urgent need for dealing with a user-specific safety in LLM use, a consideration largely neglected in current research.

In this paper, we introduce a novel safety concept of LLM agents, user-specific safety, and corresponding benchmark, U-SafeBench. To summarize our contributions,

  • First User-Specific Safety Benchmark: To the best of our knowledge, this paper is the first to explore user-specific safety as a novel safety concept and to develop a comprehensive benchmark for its evaluation.
  • Uncovering a New Safety Vulnerability of LLMs: Our findings reveal current LLMs fail to possess user-specific safety, exposing a previously unidentified safety vulnerability. This insight underscores the need for further research to mitigate these risks, ultimately contributing to the development of safer LLMs.
  • Proposing a Simple yet Effective Remedy: To mitigate such vulnerabilities, we propose a simple chain-of-thought approach to enhance the user-specific safety, providing a strong baseline for U-SafeBench.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.15086 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.15086 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.