Update README.md
Browse files
README.md
CHANGED
@@ -35,12 +35,12 @@ We trained t5 on SMILES from ZINC using the task of masked-language modeling (ML
|
|
35 |
## Intended uses & limitations
|
36 |
|
37 |
This model can be used for the prediction of molecules' properties, reactions, or interactions with proteins by changing the way of finetuning.
|
38 |
-
As an example, We finetuned this model to predict products.
|
39 |
Using its encoder, we trained a regression model to predict a reaction yield. You can use this demo [here](https://huggingface.co/spaces/sagawa/predictyield-t5).
|
40 |
|
41 |
## Training and evaluation data
|
42 |
|
43 |
-
We downloaded [ZINC data](https://drive.google.com/drive/folders/1lSPCqh31zxTVEhuiPde7W3rZG8kPgp-z) and canonicalized them using RDKit. Then, we
|
44 |
|
45 |
## Training procedure
|
46 |
|
@@ -59,16 +59,16 @@ The following hyperparameters were used during training:
|
|
59 |
|
60 |
| Training Loss | Step | Accuracy | Validation Loss |
|
61 |
|:-------------:|:------:|:--------:|:---------------:|
|
62 |
-
| 0.
|
63 |
-
| 0.
|
64 |
-
| 0.
|
65 |
-
| 0.
|
66 |
-
| 0.
|
67 |
-
| 0.
|
68 |
-
| 0.
|
69 |
-
| 0.
|
70 |
-
| 0.
|
71 |
-
| 0.
|
72 |
-
| 0.
|
73 |
-
| 0.
|
74 |
-
| 0.
|
|
|
35 |
## Intended uses & limitations
|
36 |
|
37 |
This model can be used for the prediction of molecules' properties, reactions, or interactions with proteins by changing the way of finetuning.
|
38 |
+
As an example, We finetuned this model to predict products. The model is [here](https://huggingface.co/sagawa/ZINC-t5-productpredicition), and you can use the demo [here](https://huggingface.co/spaces/sagawa/predictproduct-t5).
|
39 |
Using its encoder, we trained a regression model to predict a reaction yield. You can use this demo [here](https://huggingface.co/spaces/sagawa/predictyield-t5).
|
40 |
|
41 |
## Training and evaluation data
|
42 |
|
43 |
+
We downloaded [ZINC data](https://drive.google.com/drive/folders/1lSPCqh31zxTVEhuiPde7W3rZG8kPgp-z) and canonicalized them using RDKit. Then, we dropped duplicates. The total number of data is 22992522, and they were randomly split into train:validation=10:1.
|
44 |
|
45 |
## Training procedure
|
46 |
|
|
|
59 |
|
60 |
| Training Loss | Step | Accuracy | Validation Loss |
|
61 |
|:-------------:|:------:|:--------:|:---------------:|
|
62 |
+
| 0.2471 | 25000 | 0.9843 | 0.2226 |
|
63 |
+
| 0.1871 | 50000 | 0.9314 | 0.1783 |
|
64 |
+
| 0.1791 | 75000 | 0.9371 | 0.1619 |
|
65 |
+
| 0.1596 | 100000 | 0.9401 | 0.1520 |
|
66 |
+
| 0.1522 | 125000 | 0.9422 | 0.1449 |
|
67 |
+
| 0.1435 | 150000 | 0.9436 | 0.1404 |
|
68 |
+
| 0.1421 | 175000 | 0.9447 | 0.1368 |
|
69 |
+
| 0.1398 | 200000 | 0.9459 | 0.1322 |
|
70 |
+
| 0.1297 | 225000 | 0.9466 | 0.1299 |
|
71 |
+
| 0.1324 | 250000 | 0.9473 | 0.1268 |
|
72 |
+
| 0.1257 | 275000 | 0.9483 | 0.1244 |
|
73 |
+
| 0.1266 | 300000 | 0.9491 | 0.1216 |
|
74 |
+
| 0.1301 | 325000 | 0.9497 | 0.1204 |
|