metadata
language:
- en
- zh
- ms
- ta
datasets:
- billcai/ospc-dataset-v2
tags:
- multilingual
- mistral
- sft
- chat
- instruction
- gptq
widget:
- text: Hello World
example_title: Sample prompt
license: apache-2.0
base_model: openchat/openchat-3.5-0106
This model achieved 0.7600 AUROC for the Online Safety Prize Challenge organized by AI Singapore. It is fine-tuned from Openchat model. Note, this model is already quantized to 4-bit GPTQ.
Training details
This model is trained on a synthetic dataset generated by Apache 2.0 licensed LLMs (Mistral and Mixtral).
How to use
You can either use it to classify text or text + extracted visual description.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(
'billcai/openchat-model-ospc',
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained('billcai/openchat-model-ospc')
promptText = "<the prompt to be tested>"
# promptText = "Image: <image caption>, Caption: <text within image>"
prompt = """Check if there is unsafe content in the following message. Unsafe content is defined in the following categories.
<BEGIN UNSAFE CONTENT CATEGORIES>
Contains hate speech and harmful content. Harmful content centres on divisive and contentious subject matters. Typical areas include:
- Racial disparities
- Religious beliefs and practices
- Sexual orientation
- Nationalistic sentiments and xenophobia against migrants
- Socio-economic divides and class struggles
- Age-related biases
- Gender discrimination
- Discrimination based on illnesses and disabilities
These content may be based on Singapore context, including the use of non-English languages like Mandarin, Malay and Tamil.
<END UNSAFE CONTENT CATEGORIES>
This is the message you are checking if it contains unsafe content.
<BEGIN MESSAGE>
{}
<END MESSAGE>
Provide your safety assessment for the message above. Your first line MUST read 'safe' or 'unsafe'.
""".format(
promptText
)
messages = [
{"role": "system", "content": "You are a helpful assistant deciding whether a piece of content is safe or unsafe based on specific criteria. The content may contain irrelevant words as it has been extracted from an image."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to('cuda')
generation_output = llm.generate(
model_inputs.input_ids,
max_new_tokens=10,
temperature=0.1,
output_logits=True,
return_dict_in_generate=True
)
generated_sequences = generation_output['sequences']
generated_logits = generation_output['logits']
unsafeTokenId = tokenizer.encode('unsafe')[1]
safeTokenId = tokenizer.encode('safe')[1]
firstLogit = generated_logits[0].cpu().numpy()
prob = softmax([
firstLogit[0,unsafeTokenId],
firstLogit[0,safeTokenId],
])
print(prob) # first is score for unsafe token.
License
Apache 2.0