Nemotron-3-8B-Chat-4k-SFT

Model Overview

License

The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement.

Description

Nemotron-3-8B-Chat-4k-SFT is a large language model instruct-tuned on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further fine-tuned for instruction following using Supervised Fine-tuning (SFT). Enterprises can continue SFT with their own datasets, apply Reinforcement Learning with Human Feedback, or use SteerLM.

Nemotron-3-8B-Chat-4k-SFT is part of Nemotron-3, which is a family of enterprise ready generative text models compatible with NVIDIA NeMo Framework. For other models in this collection, see the collections page.

NVIDIA NeMo is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI. To get access to NeMo Framework, please sign up at this link.

References

Announcement Blog

Model Architecture

Architecture Type: Transformer

Network Architecture: Generative Pre-Trained Transformer (GPT-3)

Prompt Format

Note: For Nemotron-3-8B-Chat-4k-SFT we recommend keeping the system prompt empty.

Single Turn

<extra_id_0>System

<extra_id_1>User
{prompt}
<extra_id_1>Assistant

Multi-Turn or Few-shot

<extra_id_0>System

<extra_id_1>User
{prompt 1}
<extra_id_1>Assistant
{response 1}
<extra_id_1>User
{prompt 2}
<extra_id_1>Assistant
{response 2}
...
<extra_id_1>User
{prompt N}
<extra_id_1>Assistant

Example prompt formation code

PROMPT_TEMPLATE = """<extra_id_0>System
{system}
<extra_id_1>User
{prompt}
<extra_id_1>Assistant
"""
system = ""
prompt = "Write a poem on NVIDIA in the style of Shakespeare"

prompt = PROMPT_TEMPLATE.format(prompt=prompt, system=system)
print(prompt)

Software Integration

Runtime Engine(s): NVIDIA AI Enterprise

Toolkit: NeMo Framework

To get access to NeMo Framework, please sign up at this link. See NeMo inference container documentation for details on how to setup and deploy an inference server with NeMo.

Sample Inference Code:

from nemo.deploy import NemoQuery

# In this case, we run inference on the same machine
nq = NemoQuery(url="localhost:8000", model_name="Nemotron-3-8B-Chat-4K-SFT")

# See above for prompt format
output = nq.query_llm(prompts=[prompt], max_output_token=200, top_k=1, top_p=0.0, temperature=0.1)

# NOTE: Chat models require post-processing the output since the `NemoQuery` API
# does not support stopping generation on the special <extra_id_1> token.
output = [[s.split("<extra_id_1>", 1)[0].strip() for s in out] for out in output]

print(output)

Supported Hardware:

H100
A100 80GB, A100 40GB

Model Version(s)

Nemotron-3-8B-chat-4k-sft-BF16-1

Dataset

NVIDIA models are trained on a diverse set of public and proprietary datasets. This model was trained on a dataset containing 3.5 Trillion tokens of text. The dataset contains 53 different human languages and 37 programming languages. NVIDIA is committed to the responsible development of large language models and conducts reviews of all datasets included in training.

Evaluation

MT Bench Score

Category	Score
Total	5.16
Writing	6.3
Roleplay	6
Extraction	4.75
Stem	6.6
Humanities	9
Reasoning	4.35
Math	1.6
Coding	2.7

Intended use

The 8B-Chat-SFT model is best for users who want to apply further alignment training or their own RLHF.

Ethical use

Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the NVIDIA AI Foundation Models Community License Agreement.

Limitations

The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

nvidia
/

nemotron-3-8b-chat-4k-sft

Access Nemotron 3 8B on Hugging Face