File size: 4,161 Bytes
c1c6f0c
 
b5bf1a3
 
 
c1c6f0c
aa20085
517eb35
21fccb5
def83ff
a67b519
e07f14a
 
85a1d7a
19852d7
eced2cb
3d3aea9
def83ff
eced2cb
 
 
 
aa20085
eced2cb
 
 
 
aa20085
eced2cb
5f1bcdb
ff30257
eced2cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21fccb5
 
04a82c1
 
 
 
 
 
21fccb5
517eb35
 
 
c0b4e62
 
 
 
 
 
 
 
 
 
 
 
c0483ce
223e5bb
 
 
517eb35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21fccb5
 
095e3a7
2953ffb
095e3a7
 
 
21fccb5
1ff8511
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: mit
pipeline_tag: text-generation
tags:
- chemistry
---
# ChemLLM-7B-Chat: Specialised LLM for Chemistry and Molecule Science
ChemLLM-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2 with ❤.

## News
- News report from [Shanghai AI Lab](https://mp.weixin.qq.com/s/u-i7lQxJzrytipek4a87fw)[2024-1-26]
- Chepybara online demo ver 1.0 released. https://chemllm.org/ [2024-1-18]
- ChemLLM-7B-Chat ver 1.0 open-sourced.[2024-1-17]
- Chepybara Demo ver 0.5 and [MoE model](https://huggingface.co/AI4Chem/Zephyr-8x7b) released.[2023-12-24]
- Chepybara Demo ver 0.2 released.[2023-12-9]
## Usage
Try [online demo](https://chemllm.org/) instantly, or...

Install `transformers`,
```
pip install transformers
```
Load `ChemLLM-7B-Chat` and run,
```
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_name_or_id = "AI4Chem/ChemLLM-7B-Chat"

model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id,,trust_remote_code=True)

prompt = "What is Molecule of Ibuprofen?"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.9,
    max_new_tokens=500,
    repetition_penalty=1.5,
    pad_token_id=tokenizer.eos_token_id
)

outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Dataset

|     Section  | Dataset  |Link|
| ----------------- | ------------ |-|
| Pretrain Dataset         | ChemPile-2T  ||
| SFT Dataset       | ChemData-7M  ||
| Benchmark Dataset | ChemTest-12K ||
| DPO Dataset       | ChemPref-10k ||

## Results
### MMLU Highlights

| dataset                | ChatGLM3-6B | Qwen-7B | LLaMA-2-7B | Mistral-7B | InternLM2-7B-Chat | ChemLLM-7B-Chat |
| ---------------------- | ----------- | ------- | ---------- | ---------- | ----------------- | ----------------- |
| college chemistry      | 43.0        | 39.0    | 27.0       | 40.0       | 43.0              | 47.0              |
| college mathematics    | 28.0        | 33.0    | 33.0       | 30.0       | 36.0              | 41.0              |
| college physics        | 32.4        | 35.3    | 25.5       | 34.3       | 41.2              | 48.0              |
| formal logic           | 35.7        | 43.7    | 24.6       | 40.5       | 34.9              | 47.6              |
| moral scenarios        | 26.4        | 35.0    | 24.1       | 39.9       | 38.6              | 44.3              |
| humanities average     | 62.7        | 62.5    | 51.7       | 64.5       | 66.5              | 68.6              |
| stem average           | 46.5        | 45.8    | 39.0       | 47.8       | 52.2              | 52.6              |
| social science average | 68.2        | 65.8    | 55.5       | 68.1       | 69.7              | 71.9              |
| other average          | 60.5        | 60.3    | 51.3       | 62.4       | 63.2              | 65.2              |
| mmlu                   | 58.0        | 57.1    | 48.2       | 59.2       | 61.7              | 63.2              |
*(OpenCompass)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/dvqKoPi0il6vrnGcSZp9p.png)


### Chemical Benchmark

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/qFl2h0fTXYTjQsDZXjSx8.png)
*(Score judged by ChatGPT-4-turbo)

### Professional Translation

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/kVDK3H8a0802HWYHtlHYP.png)


![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/ERbod2Elccw-k_6tEYZjO.png)


You can try it [online](chemllm.org).

## Disclaimer

## Demo
[Agent Chepybara](https://chemllm.org/)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/vsA5MJVP7-XmBp6uFs3tV.png)

## Contact
(AI4Physics Sciecne, Shanghai AI Lab)[[email protected]]