Spaces:

ProteinDesignLab
/

protpardelle

Sleeping

App Files Files Community

Simon Duerr commited on Sep 14, 2023

Commit

22e3abd

1 Parent(s): 8a361d8

fix: train path, update draw samples

Browse files

Files changed (10) hide show

README.md +9 -5
app.py +7 -4
checkpoints/allatom.yml +1 -1
checkpoints/backbone.yml +1 -1
configs/allatom.yml +1 -1
configs/backbone.yml +1 -1
configs/seqdes.yml +1 -1
core/protein_mpnn.py +3 -3
draw_samples.py +12 -9
protpardelle_pymol.py +12 -2

README.md CHANGED Viewed

@@ -10,15 +10,15 @@ pinned: false
 license: mit
 ---
-# protpardelle WebDemo
 Code for the paper: [An all-atom protein generative model](https://www.biorxiv.org/content/10.1101/2023.05.24.542194v1.full).
 The code is under active development and we welcome contributions, feature requests, issues, corrections, and any questions! Where we have used or adapted code from others we have tried to give proper attribution, but please let us know if anything should be corrected.
-## Environment
-To set up the conda environment, run `conda env create -f configs/environment.yml`.
 ## Inference
@@ -28,12 +28,16 @@ To draw 8 samples per length for lengths in `range(70, 150, 5)` from the backbon
 `python draw_samples.py --type backbone --param n_steps --paramval 100 --minlen 70 --maxlen 150 --steplen 5 --perlen 8`
-We have also added the ability to provide an input PDB file and a list of (zero-indexed) indices to condition on from the PDB file. We can expect it to do better or worse depending on the problem (better on easier problems such as inpainting, worse on difficult problems such as discontiguous sidechain-only scaffolding).
-`python draw_samples.py --input_pdb --cond_idxs 0-25,40-80`
 ## Training
 Pretrained model weights are provided, but if you are interested in training your own models, we have provided training code together with some basic online evaluation. You will need to create a Weights & Biases account.
 The dataset can be downloaded from [CATH](http://download.cathdb.info/cath/releases/all-releases/v4_3_0/non-redundant-data-sets/), and the train/validation/test splits used can be downloaded with

 license: mit
 ---
+# protpardelle
 Code for the paper: [An all-atom protein generative model](https://www.biorxiv.org/content/10.1101/2023.05.24.542194v1.full).
 The code is under active development and we welcome contributions, feature requests, issues, corrections, and any questions! Where we have used or adapted code from others we have tried to give proper attribution, but please let us know if anything should be corrected.
+## Environment and setup
+To set up the conda environment, run `conda env create -f configs/environment.yml` then `conda activate delle`. You will also need to clone the [ProteinMPNN repository](https://github.com/dauparas/ProteinMPNN) to the same directory that contains the `protpardelle/` repository. You may also need to set the `home_dir` variable in the configs you use to the path to the directory containing the `protpardelle/` directory.
 ## Inference
 `python draw_samples.py --type backbone --param n_steps --paramval 100 --minlen 70 --maxlen 150 --steplen 5 --perlen 8`
+We have also added the ability to provide an input PDB file and a list of (zero-indexed) indices to condition on from the PDB file. Note also that current models are single-chain only, so multi-chain PDBs will be treated as single chains (we intend to release multi-chain models in a later update). We can expect it to do better or worse depending on the problem (better on easier problems such as inpainting, worse on difficult problems such as discontiguous scaffolding). Use this command to resample the first 25 and 71st to 80th residues of `my_pdb.pdb`.
+`python draw_samples.py --input_pdb my_pdb.pdb --resample_idxs 0-25,70-80`
+For more control over the sampling process, including tweaking the sampling hyperparameters and more specific methods of conditioning, you can directly interface with the `model.sample()` function; we have provided examples of how to configure and run these commands in `sampling.py`.
 ## Training
+Note (Sep 2023): the lab has decided to collect usage statistics on people interested in training their own versions of Protpardelle (for funding and other purposes). To obtain a copy of the repository with training code, please complete [this Google Form](https://docs.google.com/forms/d/1WKMVbydLh6LIegc3HfwMQhgL2_qnrY7ks9FM_ylo4ts) - you will receive a link to a Google Drive zip which contains the repository with training code. After publication, the plan is to include the full training code directly in this repository.
 Pretrained model weights are provided, but if you are interested in training your own models, we have provided training code together with some basic online evaluation. You will need to create a Weights & Biases account.
 The dataset can be downloaded from [CATH](http://download.cathdb.info/cath/releases/all-releases/v4_3_0/non-redundant-data-sets/), and the train/validation/test splits used can be downloaded with

app.py CHANGED Viewed

@@ -303,15 +303,15 @@ def protpardelle(path_to_file, m, resample_idx,  modeltype, minlen, maxlen, step
     if args.type == "backbone":
         if args.model_checkpoint:
             checkpoint = f"{args.model_checkpoint}/backbone_state_dict.pth"
-            cfg_path = f"{args.model_checkpoint}/backbone.yml"
         else:
             checkpoint = (
                 f"{model_directory}/checkpoints/epoch{epoch}_training_state.pth"
             )
             cfg_path = f"{model_directory}/configs/backbone.yml"
-        cfg = utils.load_config(cfg_path)
         weights = torch.load(checkpoint, map_location=device)["model_state_dict"]
-        model = models.Protpardelle(cfg, device=device)
         model.load_state_dict(weights)
         model.to(device)
         model.eval()
@@ -319,7 +319,7 @@ def protpardelle(path_to_file, m, resample_idx,  modeltype, minlen, maxlen, step
     elif args.type == "allatom":
         if args.model_checkpoint:
             checkpoint = f"{args.model_checkpoint}/allatom_state_dict.pth"
-            cfg_path = f"{args.model_checkpoint}/allatom.yml"
         else:
             checkpoint = (
                 f"{model_directory}/checkpoints/epoch{epoch}_training_state.pth"
@@ -345,6 +345,9 @@ def protpardelle(path_to_file, m, resample_idx,  modeltype, minlen, maxlen, step
         for k, v in sampling_kwargs_readme:
             f.write(f"{k}\t{v}\n")
     # Draw samples
     output_files = draw_and_save_samples(
         model,

     if args.type == "backbone":
         if args.model_checkpoint:
             checkpoint = f"{args.model_checkpoint}/backbone_state_dict.pth"
+            cfg_path = f"{args.model_checkpoint}/backbone_pretrained.yml"
         else:
             checkpoint = (
                 f"{model_directory}/checkpoints/epoch{epoch}_training_state.pth"
             )
             cfg_path = f"{model_directory}/configs/backbone.yml"
+        config = utils.load_config(cfg_path)
         weights = torch.load(checkpoint, map_location=device)["model_state_dict"]
+        model = models.Protpardelle(config, device=device)
         model.load_state_dict(weights)
         model.to(device)
         model.eval()
     elif args.type == "allatom":
         if args.model_checkpoint:
             checkpoint = f"{args.model_checkpoint}/allatom_state_dict.pth"
+            cfg_path = f"{args.model_checkpoint}/allatom_pretrained.yml"
         else:
             checkpoint = (
                 f"{model_directory}/checkpoints/epoch{epoch}_training_state.pth"
         for k, v in sampling_kwargs_readme:
             f.write(f"{k}\t{v}\n")
+    print(f"Model loaded from {checkpoint}")
+    print(f"Beginning sampling for {date_string}...")
     # Draw samples
     output_files = draw_and_save_samples(
         model,

checkpoints/allatom.yml CHANGED Viewed

@@ -1,5 +1,5 @@
 train:
-    home_dir: '/home/user/app'
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

 train:
+    home_dir: ''
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

checkpoints/backbone.yml CHANGED Viewed

@@ -1,5 +1,5 @@
 train:
-    home_dir: '/home/user/app'
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

 train:
+    home_dir: ''
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

configs/allatom.yml CHANGED Viewed

@@ -1,5 +1,5 @@
 train:
-    home_dir: '/home/user/app'
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

 train:
+    home_dir: ''
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

configs/backbone.yml CHANGED Viewed

@@ -1,5 +1,5 @@
 train:
-    home_dir: '/home/user/app'
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

 train:
+    home_dir: ''
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

configs/seqdes.yml CHANGED Viewed

@@ -1,5 +1,5 @@
 train:
-    home_dir: '/home/user/app'
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

 train:
+    home_dir: ''
     seed: 0
     checkpoint: ['', 0]
     batch_size: 32

core/protein_mpnn.py CHANGED Viewed

@@ -55,10 +55,11 @@ def get_mpnn_model(model_name='v_48_020', path_to_model_weights='', ca_only=Fals
     else:
         file_path = os.path.realpath(__file__)
         k = file_path.rfind("/")
         if ca_only:
-            model_folder_path = file_path[:k] + '/ca_model_weights/'
         else:
-            model_folder_path = file_path[:k] + '/vanilla_model_weights/'
     checkpoint_path = model_folder_path + f'{model_name}.pt'
     checkpoint = torch.load(checkpoint_path, map_location=device)
@@ -450,7 +451,6 @@ def run_proteinmpnn(model=None, pdb_path='', pdb_path_chains='', path_to_model_w
                     print(f'{num_seqs} sequences of length {total_length} generated in {dt} seconds')
                 if write_output_files:
                     f.close()
     return new_mpnn_seqs

     else:
         file_path = os.path.realpath(__file__)
         k = file_path.rfind("/")
+        k = file_path[:k].rfind("/")
         if ca_only:
+            model_folder_path = file_path[:k] + '/ProteinMPNN/ca_model_weights/'
         else:
+            model_folder_path = file_path[:k] + '/ProteinMPNN/vanilla_model_weights/'
     checkpoint_path = model_folder_path + f'{model_name}.pt'
     checkpoint = torch.load(checkpoint_path, map_location=device)
                     print(f'{num_seqs} sequences of length {total_length} generated in {dt} seconds')
                 if write_output_files:
                     f.close()
     return new_mpnn_seqs

draw_samples.py CHANGED Viewed

@@ -122,18 +122,18 @@ class Manager(object):
             "--perlen", type=int, default=2, help="How many samples per sequence length"
         )
         self.parser.add_argument(
-            "--minlen", type=int, required=False, help="Minimum sequence length"
         )
         self.parser.add_argument(
             "--maxlen",
             type=int,
-            required=False,
             help="Maximum sequence length, not inclusive",
         )
         self.parser.add_argument(
             "--steplen",
             type=int,
-            required=False,
             help="How frequently to select sequence length, for steplen 2, would be 50, 52, 54, etc",
         )
         self.parser.add_argument(
@@ -279,15 +279,15 @@ def main():
     if args.type == "backbone":
         if args.model_checkpoint:
             checkpoint = f"{args.model_checkpoint}/backbone_state_dict.pth"
-            cfg_path = f"{args.model_checkpoint}/backbone.yml"
         else:
             checkpoint = (
                 f"{model_directory}/checkpoints/epoch{epoch}_training_state.pth"
             )
             cfg_path = f"{model_directory}/configs/backbone.yml"
-        cfg = utils.load_config(cfg_path)
         weights = torch.load(checkpoint, map_location=device)["model_state_dict"]
-        model = models.Protpardelle(cfg, device=device)
         model.load_state_dict(weights)
         model.to(device)
         model.eval()
@@ -295,7 +295,7 @@ def main():
     elif args.type == "allatom":
         if args.model_checkpoint:
             checkpoint = f"{args.model_checkpoint}/allatom_state_dict.pth"
-            cfg_path = f"{args.model_checkpoint}/allatom.yml"
         else:
             checkpoint = (
                 f"{model_directory}/checkpoints/epoch{epoch}_training_state.pth"
@@ -310,8 +310,11 @@ def main():
         model.eval()
         model.device = device
     # Sampling
-    with open(base_dir + "/readme.txt", "w") as f:
         f.write(f"Sampling run for {date_string}\n")
         f.write(f"Random seed {seed}\n")
         f.write(f"Model checkpoint: {checkpoint}\n")
@@ -341,7 +344,7 @@ def main():
     print(f"Of this, {sampling_time} seconds were for actual sampling.")
     print(f"{total_num_samples} total samples were drawn.")
-    with open(base_dir + "/readme.txt", "a") as f:
         f.write(f"Total job time: {time_elapsed} seconds\n")
         f.write(f"Model run time: {sampling_time} seconds\n")
         f.write(f"Total samples drawn: {total_num_samples}\n")

             "--perlen", type=int, default=2, help="How many samples per sequence length"
         )
         self.parser.add_argument(
+            "--minlen", type=int, default=50, help="Minimum sequence length"
         )
         self.parser.add_argument(
             "--maxlen",
             type=int,
+            default=60,
             help="Maximum sequence length, not inclusive",
         )
         self.parser.add_argument(
             "--steplen",
             type=int,
+            default=5,
             help="How frequently to select sequence length, for steplen 2, would be 50, 52, 54, etc",
         )
         self.parser.add_argument(
     if args.type == "backbone":
         if args.model_checkpoint:
             checkpoint = f"{args.model_checkpoint}/backbone_state_dict.pth"
+            cfg_path = f"{args.model_checkpoint}/backbone_pretrained.yml"
         else:
             checkpoint = (
                 f"{model_directory}/checkpoints/epoch{epoch}_training_state.pth"
             )
             cfg_path = f"{model_directory}/configs/backbone.yml"
+        config = utils.load_config(cfg_path)
         weights = torch.load(checkpoint, map_location=device)["model_state_dict"]
+        model = models.Protpardelle(config, device=device)
         model.load_state_dict(weights)
         model.to(device)
         model.eval()
     elif args.type == "allatom":
         if args.model_checkpoint:
             checkpoint = f"{args.model_checkpoint}/allatom_state_dict.pth"
+            cfg_path = f"{args.model_checkpoint}/allatom_pretrained.yml"
         else:
             checkpoint = (
                 f"{model_directory}/checkpoints/epoch{epoch}_training_state.pth"
         model.eval()
         model.device = device
+    if config.train.home_dir == '':
+        config.train.home_dir = os.getcwd()
     # Sampling
+    with open(save_dir + "/readme.txt", "w") as f:
         f.write(f"Sampling run for {date_string}\n")
         f.write(f"Random seed {seed}\n")
         f.write(f"Model checkpoint: {checkpoint}\n")
     print(f"Of this, {sampling_time} seconds were for actual sampling.")
     print(f"{total_num_samples} total samples were drawn.")
+    with open(save_dir + "/readme.txt", "a") as f:
         f.write(f"Total job time: {time_elapsed} seconds\n")
         f.write(f"Model run time: {sampling_time} seconds\n")
         f.write(f"Total samples drawn: {total_num_samples}\n")

protpardelle_pymol.py CHANGED Viewed

@@ -15,9 +15,9 @@ except ImportError:
 if os.environ.get("GRADIO_LOCAL") != None:
-    public_link = "http://127.0.0.1:7862"
 else:
-    public_link = "spacesplaceholder"
@@ -140,6 +140,16 @@ def query_protpardelle_uncond(
 def setprotpardellelink(link:str):
     global public_link
     try:
         client = Client(link)

 if os.environ.get("GRADIO_LOCAL") != None:
+    public_link = "http://127.0.0.1:7860"
 else:
+    public_link = "ProteinDesignLab/protpardelle"
 def setprotpardellelink(link:str):
+    """
+    AUTHOR
+    Simon Duerr
+    https://twitter.com/simonduerr
+    DESCRIPTION
+    Set a public link to use a locally hosted version of this space
+    USAGE
+    protpardelle_setlink link_or_username/spacename
+    """
     global public_link
     try:
         client = Client(link)