File size: 4,161 Bytes
c1c6f0c b5bf1a3 c1c6f0c aa20085 517eb35 21fccb5 def83ff a67b519 e07f14a 85a1d7a 19852d7 eced2cb 3d3aea9 def83ff eced2cb aa20085 eced2cb aa20085 eced2cb 5f1bcdb ff30257 eced2cb 21fccb5 04a82c1 21fccb5 517eb35 c0b4e62 c0483ce 223e5bb 517eb35 21fccb5 095e3a7 2953ffb 095e3a7 21fccb5 1ff8511 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
license: mit
pipeline_tag: text-generation
tags:
- chemistry
---
# ChemLLM-7B-Chat: Specialised LLM for Chemistry and Molecule Science
ChemLLM-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2 with ❤.
## News
- News report from [Shanghai AI Lab](https://mp.weixin.qq.com/s/u-i7lQxJzrytipek4a87fw)[2024-1-26]
- Chepybara online demo ver 1.0 released. https://chemllm.org/ [2024-1-18]
- ChemLLM-7B-Chat ver 1.0 open-sourced.[2024-1-17]
- Chepybara Demo ver 0.5 and [MoE model](https://huggingface.co/AI4Chem/Zephyr-8x7b) released.[2023-12-24]
- Chepybara Demo ver 0.2 released.[2023-12-9]
## Usage
Try [online demo](https://chemllm.org/) instantly, or...
Install `transformers`,
```
pip install transformers
```
Load `ChemLLM-7B-Chat` and run,
```
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_name_or_id = "AI4Chem/ChemLLM-7B-Chat"
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id,,trust_remote_code=True)
prompt = "What is Molecule of Ibuprofen?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.9,
max_new_tokens=500,
repetition_penalty=1.5,
pad_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Dataset
| Section | Dataset |Link|
| ----------------- | ------------ |-|
| Pretrain Dataset | ChemPile-2T ||
| SFT Dataset | ChemData-7M ||
| Benchmark Dataset | ChemTest-12K ||
| DPO Dataset | ChemPref-10k ||
## Results
### MMLU Highlights
| dataset | ChatGLM3-6B | Qwen-7B | LLaMA-2-7B | Mistral-7B | InternLM2-7B-Chat | ChemLLM-7B-Chat |
| ---------------------- | ----------- | ------- | ---------- | ---------- | ----------------- | ----------------- |
| college chemistry | 43.0 | 39.0 | 27.0 | 40.0 | 43.0 | 47.0 |
| college mathematics | 28.0 | 33.0 | 33.0 | 30.0 | 36.0 | 41.0 |
| college physics | 32.4 | 35.3 | 25.5 | 34.3 | 41.2 | 48.0 |
| formal logic | 35.7 | 43.7 | 24.6 | 40.5 | 34.9 | 47.6 |
| moral scenarios | 26.4 | 35.0 | 24.1 | 39.9 | 38.6 | 44.3 |
| humanities average | 62.7 | 62.5 | 51.7 | 64.5 | 66.5 | 68.6 |
| stem average | 46.5 | 45.8 | 39.0 | 47.8 | 52.2 | 52.6 |
| social science average | 68.2 | 65.8 | 55.5 | 68.1 | 69.7 | 71.9 |
| other average | 60.5 | 60.3 | 51.3 | 62.4 | 63.2 | 65.2 |
| mmlu | 58.0 | 57.1 | 48.2 | 59.2 | 61.7 | 63.2 |
*(OpenCompass)

### Chemical Benchmark

*(Score judged by ChatGPT-4-turbo)
### Professional Translation


You can try it [online](chemllm.org).
## Disclaimer
## Demo
[Agent Chepybara](https://chemllm.org/)

## Contact
(AI4Physics Sciecne, Shanghai AI Lab)[[email protected]] |