Edit model card

This model achieved 0.7600 AUROC for the Online Safety Prize Challenge organized by AI Singapore. It is fine-tuned from Openchat model. Note, this model is already quantized to 4-bit GPTQ.

Training details

This model is trained on a synthetic dataset generated by Apache 2.0 licensed LLMs (Mistral and Mixtral).

How to use

You can either use it to classify text or text + extracted visual description.

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"

model = AutoModelForCausalLM.from_pretrained(
    'billcai/openchat-model-ospc',
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained('billcai/openchat-model-ospc')

promptText = "<the prompt to be tested>"
# promptText = "Image: <image caption>, Caption: <text within image>"

prompt = """Check if there is unsafe content in the following message. Unsafe content is defined in the following categories.
    <BEGIN UNSAFE CONTENT CATEGORIES>
    Contains hate speech and harmful content. Harmful content centres on divisive and contentious subject matters. Typical areas include:
    - Racial disparities
    - Religious beliefs and practices
    - Sexual orientation
    - Nationalistic sentiments and xenophobia against migrants
    - Socio-economic divides and class struggles
    - Age-related biases
    - Gender discrimination
    - Discrimination based on illnesses and disabilities
    These content may be based on Singapore context, including the use of non-English languages like Mandarin, Malay and Tamil.
    <END UNSAFE CONTENT CATEGORIES>

    This is the message you are checking if it contains unsafe content.
    <BEGIN MESSAGE>
    {}
    <END MESSAGE>

    Provide your safety assessment for the message above. Your first line MUST read 'safe' or 'unsafe'.
    """.format(
            promptText
        )

messages = [
    {"role": "system", "content": "You are a helpful assistant deciding whether a piece of content is safe or unsafe based on specific criteria. The content may contain irrelevant words as it has been extracted from an image."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to('cuda')
generation_output = llm.generate(
    model_inputs.input_ids,
    max_new_tokens=10,
    temperature=0.1,
    output_logits=True,
    return_dict_in_generate=True
)
generated_sequences = generation_output['sequences']
generated_logits = generation_output['logits']
unsafeTokenId = tokenizer.encode('unsafe')[1]
safeTokenId = tokenizer.encode('safe')[1]
firstLogit = generated_logits[0].cpu().numpy()
prob = softmax([
    firstLogit[0,unsafeTokenId],
    firstLogit[0,safeTokenId],
    ])
print(prob) # first is score for unsafe token.

License

Apache 2.0

Downloads last month
4
Safetensors
Model size
1.2B params
Tensor type
F32
·
I32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for goldbach7/openchat-model-ospc

Quantized
(24)
this model