ChemLLM-7B-Chat / README.md
qq8933
Update README.md
eced2cb verified
|
raw
history blame
1.79 kB
---
license: mit
pipeline_tag: text-generation
tags:
- chemistry
---
# Chepybara-7B-Chat: Specialised LLM for Chemistry and Molecule Science
Chepybara-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2.
## News
- Chepybara online demo released. https://chemllm.org/ [2024-1-18]
- Chepybara-7B-Chat ver.1.0 open-sourced.[2024-1-17]
## Usage
Try (online demo)[https://chemllm.org/] instantly, or...
Install `transformers`,
```
pip install transformers
```
Load `Chepybara-7B-Chat` and run,
```
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_name_or_id = "AI4Chem/Chepybara-7B-Chat"
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
prompt = "What is Molecule of Ibuprofen?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.9,
max_new_tokens=500,
repetition_penalty=1.5,
pad_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Dataset
| Section | Dataset |Link|
| ----------------- | ------------ |-|
| Pretrain Dataset | ChemPile-2T ||
| SFT Dataset | ChemData-7M ||
| Benchmark Dataset | ChemTest-12K ||
| DPO Dataset | ChemPref-10k ||
## Acknowledge
....
## Disclaimer
## Demo
https://chemllm.org/
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/vsA5MJVP7-XmBp6uFs3tV.png)
## Contact
(AI4Physics Sciecne, Shanghai AI Lab)[[email protected]]