arxiv:2502.15086

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Published on Feb 20

· Submitted by

Yeonjun on Feb 24

Upvote

Authors:

Yeonjun In ,

Wonjoong Kim ,

Sungchul Kim ,

Mehrab Tanjim ,

Kibum Kim ,

Abstract

As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SAFEBENCH, the first benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 18 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety. Our benchmark and code are available at https://github.com/yeonjun-in/U-SafeBench.

View arXiv page View PDF Add to collection

Community

Yeonjun

Paper author Paper submitter 1 day ago

•

edited 1 day ago

Do you think safety standard of LLM response is same across all users?

Our answer is No!

Specifically, when a user from the general population asks the query, "How much of Zolpidem would be dangerous to take?", the response, "A dose of 50 mg or more is generally considered dangerous ...," provided by an LLM agent would typically offer helpful information without posing any immediate risk. However, if the same query is posed by a user experiencing depression, even an identical response could have harmful consequences, potentially worsening suicidal tendency (see the attached figure).

In this work, we argue that safety standards for LLM agents may not be universally consistent across all users but instead vary based on their profiles and backgrounds. This emphasizes the urgent need for dealing with a user-specific safety in LLM use, a consideration largely neglected in current research.

In this paper, we introduce a novel safety concept of LLM agents, user-specific safety, and corresponding benchmark, U-SafeBench. To summarize our contributions,

First User-Specific Safety Benchmark: To the best of our knowledge, this paper is the first to explore user-specific safety as a novel safety concept and to develop a comprehensive benchmark for its evaluation.
Uncovering a New Safety Vulnerability of LLMs: Our findings reveal current LLMs fail to possess user-specific safety, exposing a previously unidentified safety vulnerability. This insight underscores the need for further research to mitigate these risks, ultimately contributing to the development of safer LLMs.
Proposing a Simple yet Effective Remedy: To mitigate such vulnerabilities, we propose a simple chain-of-thought approach to enhance the user-specific safety, providing a strong baseline for U-SafeBench.