Commit
•
b6aad0f
1
Parent(s):
0e55948
Upload 2 files (#11)
Browse files- Upload 2 files (354b45d75ec135459f1bc17cfe408e358c31da0f)
Co-authored-by: Jiahuan Zhang <[email protected]>
- README (1).md +49 -0
- USE_POLICY.md +19 -0
README (1).md
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- medical
|
5 |
+
datasets:
|
6 |
+
- biomed
|
7 |
+
---
|
8 |
+
# BioMedGPT-LM-7B
|
9 |
+
|
10 |
+
**BioMedGPT-LM-7B** is the first large generative language model based on Llama2 in the biomedical domain.
|
11 |
+
It was fine-tuned from the Llama2-7B-Chat with millions of biomedical papers from the [S2ORC corpus](https://github.com/allenai/s2orc/blob/master/README.md). Through further fine-tuning, BioMedGPT-LM-7B outperforms or is on par with human and significantly larger general-purpose foundation models on several biomedical QA benchmarks.
|
12 |
+
|
13 |
+
### Training Details
|
14 |
+
|
15 |
+
The model was trained with the following hyperparameters:
|
16 |
+
|
17 |
+
* Epochs: 5
|
18 |
+
* Batch size: 192
|
19 |
+
* Context length: 2048
|
20 |
+
* Learning rate: 2e-5
|
21 |
+
|
22 |
+
BioMedGPT-LM-7B is fine-tuned on over 26 billion tokens highly pertinent to the field of biomedicine. The fine-tuning data are extracted from millions of biomedical papers in S2ORC data using PubMed Central (PMC)-ID and PubMed ID as criteria.
|
23 |
+
|
24 |
+
### Model Developers
|
25 |
+
|
26 |
+
PharMolix
|
27 |
+
|
28 |
+
### How to Use
|
29 |
+
|
30 |
+
BioMedGPT-LM-7B is the generative language model of **[BioMedGPT-10B](https://github.com/PharMolix/OpenBioMed)**, an open-source version of BioMedGPT.
|
31 |
+
BioMedGPT is an open multimodal generative pre-trained transformer (GPT) for biomedicine, which bridges the natural language modality and diverse biomedical data modalities via large generative language models.
|
32 |
+
|
33 |
+
![The architecture of BioMedGPT-10B](BioMedGPT-10B.jpg)
|
34 |
+
|
35 |
+
### Technical Report
|
36 |
+
|
37 |
+
More technical details of BioMedGPT-LM-7B, BioMedGPT-10B, and BioMedGPT can be found in the technical reprot: ["BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine"](https://pan.baidu.com/s/1iAMBkuoZnNAylhopP5OgEg?pwd=7a6b).
|
38 |
+
|
39 |
+
### GitHub
|
40 |
+
|
41 |
+
[https://github.com/BioFM/OpenBioMed](https://github.com/PharMolix/OpenBioMed)
|
42 |
+
|
43 |
+
### Limitations
|
44 |
+
|
45 |
+
This repository holds BioMedGPT-LM-7B, and we emphasize the responsible and ethical use of this model. BioMedGPT-LM-7B should NOT be used to provide services to the general public. Generating any content that violates applicable laws and regulations, such as inciting subversion of state power, endangering national security and interests, propagating terrorism, extremism, ethnic hatred and discrimination, violence, pornography, or false and harmful information, etc. is strictly prohibited. BioMedGPT-LM-7B is not liable for any consequences arising from any content, data, or information provided or published by users.
|
46 |
+
|
47 |
+
### Licenses
|
48 |
+
|
49 |
+
This repository is licensed under the Apache-2.0. The use of BioMedGPT-LM-7B model is accompanied with [Acceptable Use Policy](USE_POLICY.md).
|
USE_POLICY.md
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## BioMedGPT Acceptable Use Policy
|
2 |
+
|
3 |
+
BioMedGPT is only for internal use by registered users. You agree and acknowledge that you will use BioMedGPT solely for internal use purposes and undertake not to use it, directly or indirectly, to provide services to the general public with the territory of the PRC. Otherwise, you will be subject to all the damages caused to BioMedGPT.
|
4 |
+
|
5 |
+
You have the right to use BioMedGPT pursuant to relevant agreements, but you cannot engage in any unlawful activities or disturb the orderly operation of BioMedGPT. You are not allowed to generate any content through BioMedGPT or induce it to output any speech containing the following contents, or we will block or delete the information in accordance with the applicable laws and regulations and report the matter to the relevant authorities:
|
6 |
+
|
7 |
+
1. inciting to resist or undermine the implementation of the Constitution, laws and administrative regulations;
|
8 |
+
2. inciting to subvert the state power and the overthrow of the political system;
|
9 |
+
3. inciting to sperate the state or undermine unity of the country;
|
10 |
+
4. inciting national enmity or discrimination, undermine the unity of nations;
|
11 |
+
5. content involving discrimination on the basis of race, sex, religion, geographical content, etc.;
|
12 |
+
6. fabricating or distorting facts, spreading disinformation, or disturbing the public order;
|
13 |
+
7. propagating heretical teachings or feudal superstitions, disseminating obscenity, pornography, gambling, violence, homicide, terror or instigating others to commit crimes;
|
14 |
+
8. publicly humiliating others, inventing stories to defame others, or committing other malicious attacks;
|
15 |
+
9. harming the credibility of state organs;
|
16 |
+
10. violating the public interest or public morality or not suitable for publication on BioMedGPT in accordance with the provisions of the relevant BioMedGPT agreements and rules;
|
17 |
+
11. violating the Constitution, laws and administrative regulations.
|
18 |
+
|
19 |
+
You fully understand and acknowledge that you are responsible for all your activities and consequences that occur in using the BioMedGPT services, including any content, data or information you provide or publish. BioMedGPT will not be responsible for any losses thereof.
|