File size: 1,789 Bytes
c1c6f0c b5bf1a3 c1c6f0c 21fccb5 def83ff 095e3a7 4ce684c eced2cb def83ff eced2cb 21fccb5 04a82c1 21fccb5 095e3a7 21fccb5 1ff8511 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
license: mit
pipeline_tag: text-generation
tags:
- chemistry
---
# Chepybara-7B-Chat: Specialised LLM for Chemistry and Molecule Science
Chepybara-7B-Chat, The First Open-source Specialised LLM for Chemistry and Molecule Science, Build based on InternLM-2.
## News
- Chepybara online demo released. https://chemllm.org/ [2024-1-18]
- Chepybara-7B-Chat ver.1.0 open-sourced.[2024-1-17]
## Usage
Try (online demo)[https://chemllm.org/] instantly, or...
Install `transformers`,
```
pip install transformers
```
Load `Chepybara-7B-Chat` and run,
```
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_name_or_id = "AI4Chem/Chepybara-7B-Chat"
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
prompt = "What is Molecule of Ibuprofen?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.9,
max_new_tokens=500,
repetition_penalty=1.5,
pad_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Dataset
| Section | Dataset |Link|
| ----------------- | ------------ |-|
| Pretrain Dataset | ChemPile-2T ||
| SFT Dataset | ChemData-7M ||
| Benchmark Dataset | ChemTest-12K ||
| DPO Dataset | ChemPref-10k ||
## Acknowledge
....
## Disclaimer
## Demo
https://chemllm.org/

## Contact
(AI4Physics Sciecne, Shanghai AI Lab)[[email protected]] |