metadata

license: mit
pipeline_tag: text-generation
tags:
  - chemistry

ChemLLM-7B-Chat: Specialised LLM for Chemistry and Molecule Science

ChemLLM-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2 with ❤.

News

News report from Shanghai AI Lab[2024-1-26]
Chepybara online demo ver 1.0 released. https://chemllm.org/ [2024-1-18]
ChemLLM-7B-Chat ver 1.0 open-sourced.[2024-1-17]
Chepybara Demo ver 0.5 and MoE model released.[2023-12-24]
Chepybara Demo ver 0.2 released.[2023-12-9]

Usage

Try online demo instantly, or...

Install transformers,

pip install transformers

Load ChemLLM-7B-Chat and run,

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_name_or_id = "AI4Chem/ChemLLM-7B-Chat"

model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id，,trust_remote_code=True)

prompt = "What is Molecule of Ibuprofen?"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.9,
    max_new_tokens=500,
    repetition_penalty=1.5,
    pad_token_id=tokenizer.eos_token_id
)

outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Dataset

Section	Dataset	Link
Pretrain Dataset	ChemPile-2T
SFT Dataset	ChemData-7M
Benchmark Dataset	ChemTest-12K
DPO Dataset	ChemPref-10k

Results

MMLU Highlights

dataset	ChatGLM3-6B	Qwen-7B	LLaMA-2-7B	Mistral-7B	InternLM2-7B-Chat	ChemLLM-7B-Chat
college chemistry	43.0	39.0	27.0	40.0	43.0	47.0
college mathematics	28.0	33.0	33.0	30.0	36.0	41.0
college physics	32.4	35.3	25.5	34.3	41.2	48.0
formal logic	35.7	43.7	24.6	40.5	34.9	47.6
moral scenarios	26.4	35.0	24.1	39.9	38.6	44.3
humanities average	62.7	62.5	51.7	64.5	66.5	68.6
stem average	46.5	45.8	39.0	47.8	52.2	52.6
social science average	68.2	65.8	55.5	68.1	69.7	71.9
other average	60.5	60.3	51.3	62.4	63.2	65.2
mmlu	58.0	57.1	48.2	59.2	61.7	63.2
*(OpenCompass)

Chemical Benchmark

*（Score judged by ChatGPT-4-turbo）

Professional Translation

You can try it online.

Disclaimer

Demo

Agent Chepybara

Contact

(AI4Physics Sciecne, Shanghai AI Lab)[[email protected]]