Published December 18, 2019 | Version v3
Journal article Open

Additional files related to ProteinGAN execution and analysis

Creators

Description

Prerequisite: ProteinGAN code is cloned and conda environment created (see README.md in the ProteinGAN repository)
 

export PYTHONPATH={path_to_repository}/src:{path_to_repository}/src/common:$PYTHONPATH

cd {path_to_repository}

mkdir -p data/protein

cd data/protein/

Download and unzip Length_512_lab_test_37_v2.zip (contains files used to train ProteinGAN)

To train:
 

cd {path_to_repository}src/gan/

python -u -m train_gan --batch_size 64 --name x2 --steps 999999 -shuffle_buffer_size 100000 --loss_type non_saturating --discriminator_learning_rate 0.0001 --generator_learning_rate 0.0001 --dilation_rate 2  --gf_dim 44 --df_dim 30 --dataset protein/Length_512_lab_test_37_v2  --architecture gumbel --pooling conv

To generate sequences:

cd {path_to_repository}src/gan/

python -u -m generate --batch_size 64 --name x2 --steps 999999 -shuffle_buffer_size 100000 --loss_type non_saturating --discriminator_learning_rate 0.0001 --generator_learning_rate 0.0001 --dilation_rate 2 --n_seqs 1000 --gf_dim 44 --df_dim 30 --dataset protein/Length_512_lab_test_37_v2 --nouse_cpu --architecture gumbel --pooling conv

You also can use singularity image to run ProteinGAN.

Firstly build the container by running

sudo singularity build image.sif image.def

Secondly run the container

singularity exec --bind weights:/ProteinGAN/weights --nv image.sif sh run_protein_gan.sh

modify run_protein_gan.sh to run ProteinGAN with different parameters

 

Latent Space Analysis

Latent space analysis jupyter notebook. The results and all the data needed to rerun the analysis are included.

latent_space_analysis.ipynb - Latent space analysis jupyter notebook.
files\one - Folder containing sequences generated by varying values of input vector.
files\train_sequences.fasta - Sequences used to train pGAN.
latent_space_corr.tsv - Correlation values for different properties and latent space vectors

Other

Jackhmmer_MDH_profile.hmm - HMM profile produced by jackhmmer. Unbalanced training dataset was used as a target database and E. coli MDH was used as query sequence (Uniprot ID: P61889). The profile was used to emit HMM generated sequences.

Files

Length_512_lab_test_37_v2.zip

Files (11.1 MB)

Name Size Download all
md5:f9ab2e9b5f1d6a4de942dd36c7986564
684 Bytes Download
md5:07076da3af28c6e9843746aed1a2eefb
146.1 kB Download
md5:965ba97b23343fc96806dca851a4c7a6
6.5 MB Download
md5:94dd939c721965c31596c8714037d86c
4.5 MB Preview Download
md5:9d75ac15a25f7d694c7cd79a6d953b89
394 Bytes Download