---
license: mit
pipeline_tag: text-generation
tags:
- chemistry
---
# ChemLLM-7B-Chat: Specialised LLM for Chemistry and Molecule Science
ChemLLM-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2 with ❤.

## News
- News report from [Shanghai AI Lab](https://mp.weixin.qq.com/s/u-i7lQxJzrytipek4a87fw)[2024-1-26]
- Chepybara online demo ver 1.0 released. https://chemllm.org/ [2024-1-18]
- ChemLLM-7B-Chat ver 1.0 open-sourced.[2024-1-17]
- Chepybara Demo ver 0.5 and [MoE model](https://huggingface.co/AI4Chem/Zephyr-8x7b) released.[2023-12-24]
- Chepybara Demo ver 0.2 released.[2023-12-9]
## Usage
Try [online demo](https://chemllm.org/) instantly, or...

Install `transformers`,
```
pip install transformers
```
Load `ChemLLM-7B-Chat` and run,
```
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_name_or_id = "AI4Chem/ChemLLM-7B-Chat"

model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id，,trust_remote_code=True)

prompt = "What is Molecule of Ibuprofen?"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.9,
    max_new_tokens=500,
    repetition_penalty=1.5,
    pad_token_id=tokenizer.eos_token_id
)

outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Dataset

|     Section  | Dataset  |Link|
| ----------------- | ------------ |-|
| Pretrain Dataset         | ChemPile-2T  ||
| SFT Dataset       | ChemData-7M  ||
| Benchmark Dataset | ChemTest-12K ||
| DPO Dataset       | ChemPref-10k ||

## Results
### MMLU Highlights

| dataset                | ChatGLM3-6B | Qwen-7B | LLaMA-2-7B | Mistral-7B | InternLM2-7B-Chat | ChemLLM-7B-Chat |
| ---------------------- | ----------- | ------- | ---------- | ---------- | ----------------- | ----------------- |
| college chemistry      | 43.0        | 39.0    | 27.0       | 40.0       | 43.0              | 47.0              |
| college mathematics    | 28.0        | 33.0    | 33.0       | 30.0       | 36.0              | 41.0              |
| college physics        | 32.4        | 35.3    | 25.5       | 34.3       | 41.2              | 48.0              |
| formal logic           | 35.7        | 43.7    | 24.6       | 40.5       | 34.9              | 47.6              |
| moral scenarios        | 26.4        | 35.0    | 24.1       | 39.9       | 38.6              | 44.3              |
| humanities average     | 62.7        | 62.5    | 51.7       | 64.5       | 66.5              | 68.6              |
| stem average           | 46.5        | 45.8    | 39.0       | 47.8       | 52.2              | 52.6              |
| social science average | 68.2        | 65.8    | 55.5       | 68.1       | 69.7              | 71.9              |
| other average          | 60.5        | 60.3    | 51.3       | 62.4       | 63.2              | 65.2              |
| mmlu                   | 58.0        | 57.1    | 48.2       | 59.2       | 61.7              | 63.2              |
*(OpenCompass)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/dvqKoPi0il6vrnGcSZp9p.png)


### Chemical Benchmark

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/qFl2h0fTXYTjQsDZXjSx8.png)
*（Score judged by ChatGPT-4-turbo）

### Professional Translation

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/kVDK3H8a0802HWYHtlHYP.png)


![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/ERbod2Elccw-k_6tEYZjO.png)


You can try it [online](chemllm.org).

## Disclaimer

## Demo
[Agent Chepybara](https://chemllm.org/)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/vsA5MJVP7-XmBp6uFs3tV.png)

## Contact
(AI4Physics Sciecne, Shanghai AI Lab)[support@chemllm.org]