--- base_model: - genbio-ai/AIDO.RNA-1.6B --- # RNA Inverse Folding We fully finetune the [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model on the single-state split from [Das _et al._](https://www.nature.com/articles/nmeth.1433) already processed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). We use the same train, validation, and test splits used by their method [gRNAde](https://arxiv.org/abs/2305.14749). Current version of ModelGenerator contains the inference pipeline for RNA inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future. #### Setup: Install [Model Generator](https://github.com/genbio-ai/modelgenerator). - It is **required** to use [docker](https://www.docker.com/101-tutorial/) to run our inverse folding pipeline. - Please set up a docker image using our provided [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) and run the inverse folding inference from within the docker container. #### Running inference: - Set the environment variable for ModelGenerator's data directory (**Note:** the docker image with our provided [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) will already have it set): ``` export MGEN_DATA_DIR=~/mgen_data # or any other local directory of your choice, if you would like to change it inside [Dockerfile](https://github.com/genbio-ai/ModelGenerator/blob/main/Dockerfile) ``` - Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.RNAIF-1.6B/blob/main/model.ckpt). Place it inside the local directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/rna_inv_fold/AIDO.RNAIF-1.6B`. - Download the gRNAde checkpoint named `gRNAde_ARv1_1state_das.h5` from [here](https://github.com/chaitjo/geometric-rna-design/blob/main/checkpoints/gRNAde_ARv1_1state_all.h5). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/other_models/rna_inv_fold/`. - Download the data preprocessed by [Joshi _et al._](https://arxiv.org/abs/2305.14749). Mainly download these two files: [processed.pt.zip](https://drive.google.com/file/d/1gcUUaRxbGZnGMkLdtVwAILWVerVCbu4Y/view) and [processed_df.csv](https://drive.google.com/file/d/1lbdiE1LfWPReo5VnZy0zblvhVl5QhaF4/view). Place them inside the directory `${MGEN_DATA_DIR}/modelgenerator/datasets/rna_inv_fold/raw_data/`. - From your terminal, change directory to `ModelGenerator/experiments/AIDO.RNA/rna_inverse_folding` and run the script `rna_inverse_folding.sh`: ``` cd experiments/AIDO.RNA/rna_inverse_folding bash rna_inverse_folding.sh ``` #### Outputs: - The evaluation score will be printed on the console. - The generated sequences will be stored in `./rnaIF_outputs/designed_sequences.json`. - In this file, we will have: 1. **`"true_seq"`**: the ground truth sequences, 2. **`"pred_seq"`**: predicted sequences by our method, 3. **`"baseline_seq"`**: predicted sequences by the baseline method [gRNAde](https://arxiv.org/abs/2305.14749). - An example file content with two test samples is shown below: ``` { "true_seq": [ "CCCAGUCCACCGGGUGAGAAGGGGGCAGAGAAACACACGACGUGGUGCAUUACCUGCC", "UCCCGUCCACCGCGGUGAGAAGGGGGCAGAGAAACACACGAUCGUGGUACAUUACCUGCC", ], "pred_seq": [ "UGGGGAGCCCCCGGGGUGAACCAGCCGGUGAAAGGCACCCGGUGAUCGGUCAGCCCAC", "GCGGAUGCCCCGCCCGGUCAACCGCAUGGUGAAAUCCACGCGCCUGGUGGGUUAGCCAUG", ], "baseline_seq": [ "UGGUGAGCCCCCGGGGUGAACCAGUAGGUGAAAGGCACCCGGUGAUCGGUCAGCCCAC", "GCGGAUGCCGGGCCCGGUCCACCGCAUGGUGAAAUUCAGGCGCCUGGAGGGUUAGCCAUG", ] } ```