File size: 1,789 Bytes

c1c6f0c
 
b5bf1a3
 
 
c1c6f0c
21fccb5
 
 
def83ff
095e3a7
4ce684c
eced2cb
 
def83ff
eced2cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21fccb5
 
04a82c1
 
 
 
 
 
21fccb5
 
 
 
 
095e3a7
 
 
 
 
21fccb5
1ff8511

---
license: mit
pipeline_tag: text-generation
tags:
- chemistry
---
# Chepybara-7B-Chat: Specialised LLM for Chemistry and Molecule Science
Chepybara-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2.

## News
- Chepybara online demo released. https://chemllm.org/ [2024-1-18]
- Chepybara-7B-Chat ver.1.0 open-sourced.[2024-1-17]
## Usage
Try (online demo)[https://chemllm.org/] instantly, or...

Install `transformers`,
```
pip install transformers
```
Load `Chepybara-7B-Chat` and run,
```
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_name_or_id = "AI4Chem/Chepybara-7B-Chat"

model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)

prompt = "What is Molecule of Ibuprofen?"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.9,
    max_new_tokens=500,
    repetition_penalty=1.5,
    pad_token_id=tokenizer.eos_token_id
)

outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Dataset

|     Section  | Dataset  |Link|
| ----------------- | ------------ |-|
| Pretrain Dataset         | ChemPile-2T  ||
| SFT Dataset       | ChemData-7M  ||
| Benchmark Dataset | ChemTest-12K ||
| DPO Dataset       | ChemPref-10k ||

## Acknowledge
....
## Disclaimer

## Demo
https://chemllm.org/

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64bce15bafd1e46c5504ad38/vsA5MJVP7-XmBp6uFs3tV.png)

## Contact
(AI4Physics Sciecne, Shanghai AI Lab)[[email protected]]