|
--- |
|
language: |
|
- en |
|
- zh |
|
- ms |
|
- ta |
|
datasets: |
|
- billcai/ospc-dataset-v2 |
|
tags: |
|
- multilingual |
|
- mistral |
|
- sft |
|
- chat |
|
- instruction |
|
- gptq |
|
widget: |
|
- text: "Hello World" |
|
example_title: "Sample prompt" |
|
license: apache-2.0 |
|
base_model: openchat/openchat-3.5-0106 |
|
--- |
|
|
|
This model achieved 0.7600 AUROC for the [Online Safety Prize Challenge](https://ospc.aisingapore.org/) organized by AI Singapore. It is fine-tuned from [Openchat model](https://huggingface.co/openchat/openchat-3.5-0106). Note, this model is already quantized to 4-bit GPTQ. |
|
|
|
## Training details |
|
This model is trained on a synthetic dataset generated by Apache 2.0 licensed LLMs (Mistral and Mixtral). |
|
|
|
## How to use |
|
|
|
You can either use it to classify text or text + extracted visual description. |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
device = "cuda" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
'billcai/openchat-model-ospc', |
|
torch_dtype="auto", |
|
device_map="auto" |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained('billcai/openchat-model-ospc') |
|
|
|
promptText = "<the prompt to be tested>" |
|
# promptText = "Image: <image caption>, Caption: <text within image>" |
|
|
|
prompt = """Check if there is unsafe content in the following message. Unsafe content is defined in the following categories. |
|
<BEGIN UNSAFE CONTENT CATEGORIES> |
|
Contains hate speech and harmful content. Harmful content centres on divisive and contentious subject matters. Typical areas include: |
|
- Racial disparities |
|
- Religious beliefs and practices |
|
- Sexual orientation |
|
- Nationalistic sentiments and xenophobia against migrants |
|
- Socio-economic divides and class struggles |
|
- Age-related biases |
|
- Gender discrimination |
|
- Discrimination based on illnesses and disabilities |
|
These content may be based on Singapore context, including the use of non-English languages like Mandarin, Malay and Tamil. |
|
<END UNSAFE CONTENT CATEGORIES> |
|
|
|
This is the message you are checking if it contains unsafe content. |
|
<BEGIN MESSAGE> |
|
{} |
|
<END MESSAGE> |
|
|
|
Provide your safety assessment for the message above. Your first line MUST read 'safe' or 'unsafe'. |
|
""".format( |
|
promptText |
|
) |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant deciding whether a piece of content is safe or unsafe based on specific criteria. The content may contain irrelevant words as it has been extracted from an image."}, |
|
{"role": "user", "content": prompt} |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
model_inputs = tokenizer([text], return_tensors="pt").to('cuda') |
|
generation_output = llm.generate( |
|
model_inputs.input_ids, |
|
max_new_tokens=10, |
|
temperature=0.1, |
|
output_logits=True, |
|
return_dict_in_generate=True |
|
) |
|
generated_sequences = generation_output['sequences'] |
|
generated_logits = generation_output['logits'] |
|
unsafeTokenId = tokenizer.encode('unsafe')[1] |
|
safeTokenId = tokenizer.encode('safe')[1] |
|
firstLogit = generated_logits[0].cpu().numpy() |
|
prob = softmax([ |
|
firstLogit[0,unsafeTokenId], |
|
firstLogit[0,safeTokenId], |
|
]) |
|
print(prob) # first is score for unsafe token. |
|
``` |
|
|
|
# License |
|
|
|
Apache 2.0 |