metadata
license: mit
pipeline_tag: text-generation
tags:
- chemistry
Chepybara-7B-Chat: Specialised LLM for Chemistry and Molecule Science
Chepybara-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2.
News
- Chepybara online demo released. https://chemllm.org/ [2024-1-18]
- Chepybara-7B-Chat ver.1.0 open-sourced.[2024-1-17]
Usage
Try (online demo)[https://chemllm.org/] instantly, or...
Install transformers
,
pip install transformers
Load Chepybara-7B-Chat
and run,
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_name_or_id = "AI4Chem/Chepybara-7B-Chat"
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
prompt = "What is Molecule of Ibuprofen?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.9,
max_new_tokens=500,
repetition_penalty=1.5,
pad_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Dataset
Section | Dataset | Link |
---|---|---|
Pretrain Dataset | ChemPile-2T | |
SFT Dataset | ChemData-7M | |
Benchmark Dataset | ChemTest-12K | |
DPO Dataset | ChemPref-10k |
Acknowledge
....
Disclaimer
Demo
Contact
(AI4Physics Sciecne, Shanghai AI Lab)[[email protected]]