metadata
license: mit
pipeline_tag: text-generation
tags:
- chemistry
ChemLLM-7B-Chat: Specialised LLM for Chemistry and Molecule Science
ChemLLM-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2 with ❤.
News
- News report from Shanghai AI Lab[2024-1-26]
- Chepybara online demo ver 1.0 released. https://chemllm.org/ [2024-1-18]
- ChemLLM-7B-Chat ver 1.0 open-sourced.[2024-1-17]
- Chepybara Demo ver 0.5 and MoE model released.[2023-12-24]
- Chepybara Demo ver 0.2 released.[2023-12-9]
Usage
Try online demo instantly, or...
Install transformers
,
pip install transformers
Load ChemLLM-7B-Chat
and run,
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_name_or_id = "AI4Chem/ChemLLM-7B-Chat"
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id,,trust_remote_code=True)
prompt = "What is Molecule of Ibuprofen?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.9,
max_new_tokens=500,
repetition_penalty=1.5,
pad_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Dataset
Section | Dataset | Link |
---|---|---|
Pretrain Dataset | ChemPile-2T | |
SFT Dataset | ChemData-7M | |
Benchmark Dataset | ChemTest-12K | |
DPO Dataset | ChemPref-10k |
Results
MMLU Highlights
dataset | ChatGLM3-6B | Qwen-7B | LLaMA-2-7B | Mistral-7B | InternLM2-7B-Chat | ChemLLM-7B-Chat |
---|---|---|---|---|---|---|
college chemistry | 43.0 | 39.0 | 27.0 | 40.0 | 43.0 | 47.0 |
college mathematics | 28.0 | 33.0 | 33.0 | 30.0 | 36.0 | 41.0 |
college physics | 32.4 | 35.3 | 25.5 | 34.3 | 41.2 | 48.0 |
formal logic | 35.7 | 43.7 | 24.6 | 40.5 | 34.9 | 47.6 |
moral scenarios | 26.4 | 35.0 | 24.1 | 39.9 | 38.6 | 44.3 |
humanities average | 62.7 | 62.5 | 51.7 | 64.5 | 66.5 | 68.6 |
stem average | 46.5 | 45.8 | 39.0 | 47.8 | 52.2 | 52.6 |
social science average | 68.2 | 65.8 | 55.5 | 68.1 | 69.7 | 71.9 |
other average | 60.5 | 60.3 | 51.3 | 62.4 | 63.2 | 65.2 |
mmlu | 58.0 | 57.1 | 48.2 | 59.2 | 61.7 | 63.2 |
*(OpenCompass) |
Chemical Benchmark
*(Score judged by ChatGPT-4-turbo)
Professional Translation
You can try it online.
Disclaimer
Demo
Contact
(AI4Physics Sciecne, Shanghai AI Lab)[[email protected]]