Additional files related to ProteinGAN execution and analysis
Creators
Description
Prerequisite: ProteinGAN code is cloned and conda environment created (see README.md in the ProteinGAN repository)
export PYTHONPATH={path_to_repository}/src:{path_to_repository}/src/common:$PYTHONPATH
cd {path_to_repository}
mkdir -p data/protein
cd data/protein/
Download and unzip Length_512_lab_test_37_v2.zip (contains files used to train ProteinGAN)
To train:
cd {path_to_repository}src/gan/
python -u -m train_gan --batch_size 64 --name x2 --steps 999999 -shuffle_buffer_size 100000 --loss_type non_saturating --discriminator_learning_rate 0.0001 --generator_learning_rate 0.0001 --dilation_rate 2 --gf_dim 44 --df_dim 30 --dataset protein/Length_512_lab_test_37_v2 --architecture gumbel --pooling conv
To generate sequences:
cd {path_to_repository}src/gan/
python -u -m generate --batch_size 64 --name x2 --steps 999999 -shuffle_buffer_size 100000 --loss_type non_saturating --discriminator_learning_rate 0.0001 --generator_learning_rate 0.0001 --dilation_rate 2 --n_seqs 1000 --gf_dim 44 --df_dim 30 --dataset protein/Length_512_lab_test_37_v2 --nouse_cpu --architecture gumbel --pooling conv
You also can use singularity image to run ProteinGAN.
Firstly build the container by running
sudo singularity build image.sif image.def
Secondly run the container
singularity exec --bind weights:/ProteinGAN/weights --nv image.sif sh run_protein_gan.sh
modify run_protein_gan.sh to run ProteinGAN with different parameters
Latent Space Analysis
Latent space analysis jupyter notebook. The results and all the data needed to rerun the analysis are included.
latent_space_analysis.ipynb - Latent space analysis jupyter notebook.
files\one - Folder containing sequences generated by varying values of input vector.
files\train_sequences.fasta - Sequences used to train pGAN.
latent_space_corr.tsv - Correlation values for different properties and latent space vectors
Other
Jackhmmer_MDH_profile.hmm - HMM profile produced by jackhmmer. Unbalanced training dataset was used as a target database and E. coli MDH was used as query sequence (Uniprot ID: P61889). The profile was used to emit HMM generated sequences.
Files
Length_512_lab_test_37_v2.zip
Files
(11.1 MB)
Name | Size | Download all |
---|---|---|
md5:f9ab2e9b5f1d6a4de942dd36c7986564
|
684 Bytes | Download |
md5:07076da3af28c6e9843746aed1a2eefb
|
146.1 kB | Download |
md5:965ba97b23343fc96806dca851a4c7a6
|
6.5 MB | Download |
md5:94dd939c721965c31596c8714037d86c
|
4.5 MB | Preview Download |
md5:9d75ac15a25f7d694c7cd79a6d953b89
|
394 Bytes | Download |