# Change log

v2.9 - 24-01-2020

### Added

### Changed
* Updated safeParallel argument name to list_iterable
* Commented out many deprecated functions in rwgcna_functions
* made FindVariableFeatures default to 1000 and added warning if 'vst' fails
* changed nRepJackStraw default from 300 to 250
* set varsToRegress default NULL
* input has to be normalized matrix in delimited text format, metadata no longer optional, also removed assayUse and slotUse arguments (always perform analysis on normalized data)
* always read and write large files using data.table fread and fwrite

### removed
* removed vec_colAnnot, which added additional annotaiton columns to output but did not change analysis
* removed automatic replacement of nonacceptable characters in annotations: this should be done manually in preprocessing
* detection (for remapping) of gene names - this should be done beforehand

v2.8 - 08-02-2019

### Added 
* user can input expression data in other than seurat object format. User can provide metadata in separate file 
* nFeatures argument to specify how many genes to use
* logs a snapshot of top three most recent git commits

### Changed 
* Upgraded all functions to Seurat 3 while ensuring compatibility with Seurat 2 objects
* user specifies assay and slot to use
* increase kM reassign threshold to 1.2 to avoid reassigning too many genes 
* do not output module left eigenvectors separately as they are already in cell_cluster_gene_module dataframe
* cellModEmbed matrix "ident" column changed name to "cell_cluster" to align with cell_cluster_module_gene csv 
* cellModEmbed matrix now output using data slot to allow the user to scale subsequently as desired.
* pamStage default changed to F
* the number of variable features to find depends on the featuresUse argument since variable features go into PCA and hence limit the gene choice when using PCA gene selection
* the genes to scale depend on featuresUse since only scaled genes can be used in PCA
* changed the jackstraw:
    - Removed the PCAgenes parameter
    - Made it possible to select genes on PC loadings, PC loadings with jackstraw filtering of PCs, or PC loadings using significance as criterion
* when trying several sets of parameters (e.g. pamStage FALSE and TRUE), the script now selects the parameters, for each subset, that produces the highest number of modules (this could be improved in future)
* Made nPC a user argument, defaults to 100 
* changed deepSplit default from 3 to 2
* changed corFnc default from bicor to cor
* changed networkType default from signed to c('signed hybrid')
* removed prefixData default
* reorganised scripts into main WGCNA pipeline, auxillary post-analysis scripts, and old scripts 
* set options in params file only (was duplicated in main script)
* set random seed for R base (e.g. sample function) with set.seed 

### Fixed 
* delete all temporary files
* fixed problem with removing celltypes that have no modules at different stages, undefined variables when none to delete
* filtering genes that failed TOM resampling by using colnames of consTOM failed since consTOM was already a distance matrix without colnames or rownames
* mergeCloseModskIM  now exits when there is only one module left
* Newly introduced bug in mergeCloseModskIM
* Bug in having renamed color argument to modcolor argument in moduleEigengenes_uv function argument without doing so in the body.
* fixed an error in cellModEmbed with kIM whereby the matrix was not being generated; strange 
* duplicate softpower column in sumstats_celltype.tab; n_genes retained although it was now giving original number of genes, so added n_genes_used 

### Removed
* use_imputed parameter
* quit_session parameter
* genes_remove_dir parameter 
* PPI  test
* correlation test
* magma test
* gene set enrichment test
* map_to_ensembl
* matching colors between parametrisations: only for plotting so irrelevant

v2.7 - 24-11-2018

### Added
* n_genes_use argument for when using PCA loadings to select genes
* GSEA on user-specified genelist rather than fixed BMI genes list

### changed
* removed obsolete scale_MEs_by_kIMS from main and from moduleEigengenes, changed moduleEigengenes function name to moduleEigengenes_uv
* increased moduleEigengenes softPower parameter from 6 to 8
* cleaned up commented out code
* output sumstats, params and celltype-wise logs to a new file for each run, to avoid problems in case of a change of arguments
* Improved filtering of cell clusters and updating sNames each time

### Fixed
* consensusTOM tryCatch would redo all the TOMs without resampling, not just the ones where consensusTOM failed.
* the tryCatch backup TOM call would not have acces to list_datExpr, resulting in an error
* safeParallel would, if makeCluster failed, add FUN to args before checking the length of args, resulting in choosing mapply as fall back always.

v2.6 - 12-11-2018 

### Changed 
* made scratch_dir an argument
* refactored (some) functional code to give temporary variables more readable names
* when loading list_dissTOM, use match to find dissTOM that survived filtering, rather than %in% 
* Do no append sumstats and params to table, write to new, to avoid conflicting column numbers if number of params changed

### Added
* Filter out small modules after the t.test 
* use safeParallel for parallelisation; removed n_cores arguments and add RAM_Gb_max instead
* added data and celltype names to cellModEmbedMat

### Fixed
* For kIMs, when doing t-test, was using kMs from  before reassigning genes
* kM_reassign_fnc would return the colors of one iteration before. If a whole module had been reassigned in the subsequent and last iteration, it would result in the colors vector having an additional level compared to the kMs columns. 
* Fix bug in outputting number of t.test significant genes

v2.5 - 19-10-2018

### Fixed
* list_list_PPI_uniq now just list_list_PPI
* sNames_gwas and sNames_meta now ordered the same as sNames and sNames_PPI!
* Fixed PPI test output writing to disk  -needed to convert to matrix first
* Removed double celltype name from metadata correlation matrices
* logical check for gwas_filter_vals when counting enriched modules and outputting sumstats
* fix cellType, not celltype, in mergeCloseModskIM and kM_reassign_fnc
* added logical condition on computing n_modules_meta_enriched, only if using metadata
* cellModEmbed with kIM: if no grey, was returning a 0 dimension matrix
* issues with kM_reassign grey-handling
* FilterGenes now looks in, and filters only, raw.data
* mergeCloseModules with kME now more robust with try() statements. "colors" was wrongly being assigned to.
* Add logic in case there were no significant PPI modules to write to csv
* PPI_outer_for_vec outputs a one-row dataframe even if there was only the grey module
* added [sNames %in% sNames_ok] when ordering list_list_geneMod_t.test_order
* In making colors uniqe, found rare bug when trying to subset modules by removing grey with list_mods [-grep("grey" ... ] when there are no genes assigned to grey! -> do a logical test first.
* remove list_dissTOM_PPI after re-computing kM and pkM after PPI if fuzzyModMembership == kIM. Load it when needed for kIM
* When computing prop genes reassigned, if reassign_log is NULL, output 0 for sumstats. Reliably output columns of length(sNames) with NAs or 0s as required
* MAGMA t.test was running the wrong test.
* Added back normalizeData, was missing..?
* If re-running PCA due to an error, only use var genes
* if fewer than five PCs show up as significant, use top loading genes rather than significant loading genes
* Fixed issue where scaledata would fail due to assigning vars.to.regress after setting up cluster workings, so seurat subset objects would contain the full dataset scale data instead.
* Corrected hgcn to hgnc 
* kIM module u (eigenvector) vectors were being output without module names, unlike their kME equivalents

### Changed
* outfile logs from parallel processes now named with data_prefix and run_prefix
* replaced for loops with Reduce when concatenating data frames / matrices
* Set do.center = T even for bicor
* set excludeGrey = T always since we anyway set pkME of genes in grey module to 0. This saves a lot of computations.
* kM_reassign has a improvement threshold for reassigning of 1.05 to avoid cycling over small differences
* made parpKms into pkMs_fnc
* moves some WGCNA functions into the main script mapply etc calls
* always save dissTOM, also for fuzzyModMembership==kME, for reproducibility
* added better tryCatch errror reporting and handling for RunPCA and Jackstraw
* throttle consensusTOM function at 20 cores rather than 10
* parallelised jackstraw in clusterMap since 1. this means activating the workers only once rather than for each subset 2. if jackstrawnReplicate == 0, we call projectPCA inside the function - serially before!
* changed all parLapply to parLapplyLB; made clusterMap for pkMs with dynamic scheduling
* Made PPI_inner_for_vec output p.values and expected interactions as.numeric;
 PPI_outer_for_vec now uses data.frame rather than as.data.frame on the output
* Do not name list_dissTOM after loading from scratch
* Parallelise final outputs e.g. kMs, faster
* Do not compute embedding matrix, instead output u vectors and compute afterwards
* Updated readme to mention no whitespace in parameter vectors and clarify quotes
* added loom to load_obj
* Removed step of normalizing and scaling the full expression matrix. Can be done in the cell x module heatmap script. 
* empty @scale.data and @data slots from seurat_obj before subsetting.
* Updated load_obj utility function
* output kMs with filename kMs.csv rather than fuzzyModMembership (easier to search for in other scripts)
* resume now loads session image before setting constants, which means that the user may use resume and give new parameters

### Added
* moduleEigengenes now outputs left eigenvectors (svd$u) of length gene. These can be used to compute eigenvector cell embeddings across any other data
* filter genes from modules based on kME t.test
* Delete checkpoint images after saving final session image
* cell x module heatmap script
* load_obj now handles most common text formats plus loom, including if they are compressed with gzip
* flag data_type, either "sc" or "bulk"
* added back the matrix with module embeddings as it is useful for plotting the modules on cell plots
v2.4 - 15-08-2018

### Fixed
* Do not take list_dissTOM_ok tmp file with date, to avoid several copies with different dates, which results in error when using dir() to load them
* added lapply to deletegrey(list_list_kMs) and also added it for kIMs
* save magma.fdr.log and magma.r, and also save sig, rather than naming the full results sig and saving them as sig
* eval(parse(text= .. )) (default arg is 'file=')
* gwas and metadata filtering. The kMs were not being filtered at gwas step, and list_MEs weren't either. Weird bug, seems to have been introduced since this has worked at some point.
* When outputting metadata correlation results, use sNames_PPI rather than sNames_meta, i.e. output also metadata for celltypes where none of the correlations were significant
* Module Eigengenes: list_..._MEs are now always the MEs$eigengenes rather than the list output by moduleEigengenes. Also removed "ME" from colnames in list_MEs_PPI - they were not being found when filtering for GWAS.  Changed parkMEs and script where required.
* When filtering kMs and MEs for metadata correlations using mapply did not specify simplify arg so if only one module dataframe was converted to vector. Now F
* Fixed reference to list_colors_PPI to list_colors_PPI_uniq

### Added
* mergeCloseModskIM which is analogous to mergeCloseModules but uses IM embeddings rather than MEs
* output PPI p-values as .csv 
* allow user to filter out genes in genes_remove_dir
* Normalizes seurat object data - i.e. only uses raw.data (or imputed, if that option is selected)
* compute percent.ribo and percent.mito in the script
* kM reassign, which corresponds to k-means reclustering. 
* added scale_MEs_by_kIMs, testing
* produce ensembl-symbol mapping file in all cases and compute percent mito and percent ribo. 
* added run_prefix 
* Now output two diagnostics files: one for run-level information, one for celltype-level info
* delete individual TOMs from drive
* Compute idx_row_variants in any case (for stats)
* added cell row names to list_MEs_PPI (which are used through rest of script)

### Changed
* Changed checkPPI: renamed to PPI_filter.  Always perform the PPI analysis, but filter on colours only if filter_PPI == T
* For STRINGdb test moved clusterMap inside mapply, which will often be faster if n_cores > n celltypes
* replaced IM embedding computation in metadata association step with the cellModEmbed function
* changed data_prefix default to "sc_wgcna"
* refactored TOM_for_par and dissTOM_for_par - put them back into the main script
* Removed data_prefix, project_dir and flag_date from PPI_outer_from_vec, not used
* Removed flag_date from all main script file saves
* changed gwas_filter_traits test logic to do the filter if ANY rather than if ALL strings are detected
* renamed use.imputed to use_imputed
* removed 'try' around write.csv ... cellModEmbed , just because it seemed superfluous
* delete list_dissTOM_ok and scale_data when not needed 
* parpkMEs now just outputs 0 if the gene's module is grey
* kM_magma now allows for three tests: all gene kME, module gene kME, and module gene t-test. Module gene t-test set as default (in script)
v2.3 - 01-08-2018

Changed
* Save scale data matrix rather than whole RObject to disk for later reuse (cashing)
* Use scratch_dir - HDD hard disk - for temporary checkpoint session images, scale matrix and TOM files
* removed unnecessary named function parGetMEs, totally unnecessary
* changed version control structure to full git workflow: deleted dev versions
* change consensusTOM consensusQuantile to 0.2 as a way to weed out modules depending on relatively few cells

Fixed
* In mergeCloseModules, was computing Eigengenes twice, within the mergeCloseModules function and afterwards. Now only afterwards.
* the script would remove all grey parametrisations BEFORE matching colors, but colors takes n_combs as an (implicit) argument, so this could produce an error if some parametrisations were all grey and were removed

v2.2 - 18-07-18

### Changed
* eigen_mat now uses fast IRLBA algorithm to compute first PC
* simplified kIM_eachMod_norm function
* Use GSEA for rare variants and mendelian, merged as one geneset 
* use kIM projections for metadata correlation 
* in QC  filter out celltype subsets with <= 20 cells 
* added diagnostic output: n cells, n genes, n genes assigned pre-ppi, n genes assigned post-ppi, n modules pre-ppi, n modules post-ppi
* Changed the consensusTOM networkCalibration from "full quantile" to "single quantile". Reduced sampleForCalibrationFactor for TOM calibration from 2000 to 1000 to save time.
* Set maxBlockSize (under blockwiseModules) to 5e3 and increased the blockSizePenaltyPower to 10. Also pickSoftThreshold refers to maxBlockDize rather than hardcoding 5000
* Replaced all parLapply with parLapplyLB and added .scheduling = c("dynamic") to all clusterMap calls to use dynamic load balancing throughout

### Added
* QC: Only analyse subsets with >= 25 samples 
* Added kIMs (average gene-module k) as alternative to kMEs for gene-module membership, module-module correlation (through cell kIM embedded profile) and module-metadata correlation.

### Fixed
* For the cell embedding matrix, use the full seurat object @scale.data. Regress out confounding variables first.
* When recomputing kMEs after PPI and MAGMA 	filter, added "SIMPLIFY=F" to signed_kME clusterMap
* renamed organism variable to organism_data to avoid aliasing
* checkpoint save images were for some reason (regExpr or environment-related, most likely) not saving everything). Now just saves everything (assuming you don't want to change parameters when resuming)
* When loading checkpoint_4, due to logical structure around metadata computation parts, would get 'metadata not found' (if not already set to NULL)
* diagnostics_stats came out with extra columns and counting prop grey rather than 1-prop grey
* cellModEmbed function, which is used for computing eigengene embeddings for all cells in the expression matrix. It was not taking into account direction of the eigenvectors, which are now aligned to average expression, which is default in moduleEigengenes

## v2.1 - 18-07-03

### Fixed
*  rename duplicate modules in dat_gwas_fdr_unique and dat_gwas_corrCoef_uniqe
* if (sum(idx_row_include)==0) save and quit 
* Fixed the log when celltypes have been discarded
* Fixed gene list sorting - was giving duplicates within sets..

### Added
* user can specify whether to save session image at checkpoints
* user can specify whether to quit at a checkpoint
* Compute empirical p-values for kME-GWAS correlation coefficients

### Changed
* When making colors unique, if there are so many modules that there are not enough colors without adding numbers, use all colors including those with number suffixes
* Reorder list_list_module_meta_genes by kME before eigen_mat (no change in eigen_mat). Group all under the section headed "Compute and output tables"
* Move all plotting to a separate notebook 
* MAGMA script now does not loop over sub-folders. User should specify a single sub-folder as gwas dir e.g. ../data/MAGMA/BMI-brain/
* In MAGMA script, moved # Load MAGMA genes and remap to Ensembl gene IDs and # Remapping from human Entrez to human Ensembl gene IDs chunks from kME_magma subroutine out to magma_par main function. Reason: no need to have them in subroutine after abandoning gwas sub-directories
* Dissolved the magma par function 
* moved options(stringsAsFactors = F) from magma_par to beginning of main script
* if magma_filter_traits == NULL, do not filter gwas tables on fdr significance
* moved stop break if there is no magma enrichment into the conditional flow which first conditions on the user having specified magma_filter_traits - i.e. if there are no magma filter traits, do not stop even if there are no significant correlations.
*  streamlined factorToIndicator function
* Reduced maxPOutliers for bicor from 0.1 to 0.05 to avoid discarding much of the data if bimodal, as recommended by the WGCNA authors
* script saves session images at checkpoint but not including parameter values since the user may want to load a session but with new parameters
* Rolled plotfinalcolors, plotdiffparams and plotdiffparamsPPI functions into one: plotDendro_for_vec
* moved the unique colors adjustment to right after PPI to avoid confusion when comparing outputs.
* convert rare variants and gwas output tables: Do numerical transformations, then convert to data.frame, then add columns for module and for celltype
* removed full stops from variable names to avoid mistakes with regular expressions.
* Script now saves metadata_corr_col and metadata_corr_filter_vals since they are used before checkpoint_1

### Removed
* --replace flag: this should not be a user option but based on what's best
* --do.center flag: is TRUE unless corFnc = bicor


## v2.0_dev1 - 18-06-12

### Fixed
* list_colors_gwas <- ... list_colors_gwas changed to list_colors_PPI_ok
* dat_gwas_fdr_unique <- dat_gwas_fdr~~_unique~~
is.null(magma_gwas_dir)
* Added SIMPLIFY=F to list_eigen_meta.data_corr <- mapply(function(x,y) t(WGCNA::cor(x=x$eigengenes, y= .., also for pval calculation
* Fixed adjusted p values for eigengene - metadata correlations so we adjust for modules across all celltypes, rather than within celltype
* Fixed the filtering for metadata correlations. It was referring to eigen_mat colnames, which are only computed after. 

### Changed
* renamed adj.p.val.threshold to p.val.threshold

### Removed
* In magma_par, removed rownames(dat_filter_traits) <- NULL and rownames(r_filter_traits) as it seems unnecessary

## v1.9 - 18-06-11

### Fixed
* had been using PPI_pval_threshold as the PPI_pkME_threshold, i.e. 0.05. May have been submitting some genes that were not that connected to the module.
* hsapiens mapping from hgnc symbol to ensembl_id would fail because hgnc has a typo (hgcn)
* removed giving gene names to colors vectors in list_list_colors_PPI - apparently redundant
* Bumped up prop.freq in jackstraw to 0.018
* if no meta data_corr_col given by user, now sets to NULL (after seurat factor to indicator call)
* Fix(?) max DLLs reached: detach packages, unload namespaces garbage collect at the end of each checkpoint as appropriate. Load packages only when needed
* if there is no session image, save kMEs filename fixed from 'ok' to 'out'

### Changed
* Do not output GWAS fdr and coefficient for all modules; only for modules significantly correlated to selected traits
* do not    rm(list_dissTOM_PPI_ok,  list_list_plot_label_ok_order, list_list_colors_PPI_order)
* FIrst complete all analysis steps to filter out modules of interest. Then make unique colors,  output tables, output plots
* moved checkpoint 4 to the end and removed the final save.image
* If there is no GWAS signal, save the session image
* Instead of calling disableWGCNAThreads() in the parameter script , do so in the main script, at the beginning of every section
* magma_gwas_dir defaults to NULL, allowing the user to skip the step by default
* filter MEs_PPI:   list_MEs_PPI_ok <- deleteGrey(list_MEs_PPI) %>% Filter(f=length)
* wait with computing list_list_module_...genes until needed for eigen_mat
* adopted a single fdr p-value variable, set to 5e-2
* Moved checkpoint 4 to before GSEA; restored final session image

## v1.8 - 18-05-29

### Added
* Allow user to filter modules on enrichment for user-specified gwas traits and/or metadata values
* output hgcn to ensembl mapping dataframe to csv
* Gene Set Enrichment Analysis human
* Fixed metadata - eigengene plot

### Changed
* Moved plotting out of eigenplot function
* Removed 'BMI' or changed to 'gwas'

### Fixed
* in magma_par, added module column to table.kme.cor.r_all 
* Added "quote = F, row.names=F" to write.csv consistently across script
* SeuratFactorToIndicator now distinguishes between numeric and factor data and only makes dummy variables for factors
* Fixed bug in conditional flow around gene mapping to ensembl that resulted in failure for human

## v1.8_dev3 - 18-05-23

### added
* Added complex heatmap: cell / eigengene expression 
* Added eigengene - metadata correlations
* Added eigengene-eigengene correlation across all cell clusters

### changed
* Removing 'final' from names
* Simplify: after plotting the color assignments only process PPI modules downstream
* moved checkpoint 3 and 4 so 4 is after MAGMA step
* MAGMA function plots all GWAS, all modules as well as only BMI modules on BMI gwas
* output a table with cell cluster, module, gene ensembl, gene hgnc
* only output BMI enriched kME/kIM to csv
* do not output mapping table

## v1.8_dev2 - 2018-05-10

### added
* We now use MAGMA as a filtering step to prioritise modules for downstream analysis

### changed
* Rather than give ensemble_database and STRINGdb_species, user now just passes argument 'organism' 
* Convert to ensembl id using both optimal mapping and synonyms, rather than doing so in MAGMA script. Save log of unmapped and duplicate genes removed in the process.
* Output table with gene symbols, ensembl id and color
* Removed 'if (!(is.null(STRINGdb_species))...
* renamed num.replicate to jackstraw.num.replicate and nPermutations to TOM.num.replicate for clarity and consistency
* Setting prop.freq, i..e how much of the dataset to permute in Jackstraw, to max(0.012, 4/length(seurat_obj_sub@var.genes)) # to ensure we have at least 5 samples, but not overdoing it
* When plotting, first try to remove full stops from any unplottable colors before replacing colors with unassigned colors from colors()
* Cleared up commented out code from main and functions scripts
* Moved plotting to functions script (work in progress)
* made names more informative:  _logical_notAllGrey and from _filter to _ok
  
## v1.8_dev1 - 2018-05-07

### added
* 4 checkpoints for saving session image - not only in case of tryCatch error (as this often did not work if the error occurred within a parallelised function or something like that

### fixed
* Remove FilterCells as it is inappropriate for deep sequenced data - should already have been done (deployed)

## v1.7 - 2018-05-03

### fixed
* EDIT_180421_7: changed control flow to catch cases where there are no PPI enriched modules (ensemblID was possibly producing an error by trying to print out a csv that didn't exist)
* EDIT_180421_8: fixed invalid length argument when diffParams_colors is NULL
    ~~pkMEs <- vector(mode="numeric",length=nrow(diffParams_colors))~~
    pkMEs <- vector(mode="numeric",length=length(list_colors_matched[[1]]))
* In RunPCA, set pcs.compute = min(nPC_seurat, nrow(seurat_obj_sub@data) %/% 2, ncol(seurat_obj_sub@data) %/% 2), rather than nrow-1, ncol-2, to avoid that IRLBA fails to converge or getting warnings that we are computing too large a proportion of singular components. If tryCatch catches a warning (from IRLBA) it calls RunPCA again with half the number of PCs to compute, fastpath = f, and maxit * 2.
 * If any subsets failed to produce modules, make an error log 
* Remove tryCatch in functions used in parallelisation as they may(?) cause problems
* Filtergenes seems to have been returning the original object! Fixed. 
* Filter out cells with >7000 genes (Seurat PBMC tutorial sets this at 5000); do not filter on UMI (previously removed cells with < 200 UMI)
* do regress out ribo 
* Major bug fix: remove the early mapping from symbol to Ensembl, while keeping symbols for unmapped genes. It led to a very confusing bug when the MAGMA part tried to map them again but only managed those that failed originally - apparently succeeding with some, however, leading to jumbled results.

### changed
* Comprehensive re-write of the parallelisation structure
* maps gene symbols to ensembl at the beginning rather than after finding modules; for unmapped genes, keeps the symbol
* EDIT 180421_9: added more rm() and gc()
* EDIT_180421_10: cl <- makeCluster(min(n_cores, detectCores()-1), type = "FORK")
* EDIT_180421_12 made sure that load_obj creates a new environment that does not inherit objects from the current one, and that the function removes everything from env afterwards
* makeCluster(outfile-..)
* JackStraw pcs.compute = ncol(seurat_obj_sub@dr$pca@gene.loadings))
  * Due to a bug in Jackstraw (https://github.com/satijalab/seurat/issues/277) use genes with high loading on a significant PC rather than genes with a significant loading on a significant PC. 
* EDIT 180428_1 : don't delete grey kME and kIM until preparing for output. This avoids confusion
* moved filterCellsScaleData into  the main script separated into the individual functions
* renamed wrapFilterGenes to just FilterGenes since there is no original function called that

### Added
* homo sapiens ensembl gene mapping
* EDIT 180425_1 MAGMA script now writes out csv as well as plots
* script now saves four session images as 'checkpoints' to resume in case of a crash
* Option to plot correlations between module cell embeddings and metadata - if factor, between each level
* plot kME-kIM correlations within each sub dataset - diagnostic plot to compare 
* plot kME - kME and kIM-kIM correlations across subsets
* 'resume' option starts computations from one of several checkpoints (e.g. in case the script failed or the user aborted)

## v1.6 - 2018-04-21

### Changed
* Updated readme.txt
*  EDIT_180421_5 if (anti_cor_action == "kME_reassign")  - not just if(!is.null(anti_cor_action
* wrapped magma script call in invisible()

### Fixed
* "kME_reassign_fnc: color**s**

## v1.5 - 2018-04-21

### Changed
* EDIT_180421_3  num.replicate for JackStraw is now a user parameter

### Fixed
* EDIT_180420_8 in FilterCells: 'subset.names', 'low.thresholds', and 'high.thresholds' must all have the same length 
* EDIT_180421_1 removed ~~list_reassigned~~
* EDIT_180421_2 rm(list_list_kME_reassign_in, )

## v1.4 - 2018-04-20

### Changed
* Reintroduced cell filtering based on mito
* Set nPC_seurat, i.e. PCs to calculate, from 75 to 200
* Increased RunPCA maxit parameter for IRLBA from 300 to 1000
* Edit_180420_2 set jackstraw num.replicate from 100 to 1000 (as recommended in Seurat tutorial)
* EDIT_180420_3: if genes_use == "PCA", select genes that load significantly on significant PCs
* EDIT_180420_5: remove hvg option - since hvg just ranks all genes, it only makes sense if we set a number cutoff (and its probably not a good way to pick genes)
Rename plot labels to plot_labels
* Do MAGMA for modules also w/o PPI enrichment filtering

### Fixed 
* kME reassign: make sure old_colors declared before if statement
* Rather than nPC_seurat, use npc.compute (the actual number of PCs computed by RunPCA, given the size of the matrix) downstream. Could lead to problems if nPC_seurat > pcs.compute
* parMagma: renamed dir_tables tables_dir

## v1.3 - 2018-04-20

### fixed
* Plotting error due to giving wrong arguments
* for plotting colors: use matrix rather than as.matrix
* corrected references to list_colors instead of list_colors_matched

### Added
* For testing changes: added save.image checkpoints 
* tryCatch now saves session image to file in case of error

## v1.2 - 2018-04-19

### Changed
* to compare parameters, allow user to input vectors as default for cutreeHybrid parameters. Adapted user input checks accordingly. The script is now built to get multiple parameters without a conditional flow distinguishing betwen a single or multiple sets of parameters. The script is much simplified.
* In compare_params, use number of genes assigned to non-grey modules after filtering for Protein-Protein-Interaction fdr p-value as statistic to choose best parameters
* Reverted back to RunPCA weight.by.var = F. I guess it also affects the gene loadings a lot - to be tested
* increase scale free topology powers to 1:30
*MEs now consistently holds merged$MEGs - so we have to extract the eigengenes with MEs$eigengenes
* delete grey kMEs in each set in list_kMEs in compare_params loop 
* if (length(unique(cutree$labels)) > 1) { ...
* Set nPC_seurat to 75
* Moved plot_permuted below the compare_params part
* kME_reassign is now done in a function
* extract_and_name_colors now done in a function
* parMagma: outputs to same folder as WGCNA project; aligned output names
* parMagma: loops over subdirectories within magma_gwas_dir and does separate analysis on each
*parMagma: commented out labeledHeatmap
* plot file names now have flag_date at the end
* renamed all data path variables
* Moved input data checks to their own section

### Added
* min.cells user parameter for filtering genes
* if file.exists... to remove seurat subsets
* log_dir

### Removed
* tryCatch() around compare_modules (too big)
* names(pkMEs_PPI) <- names(colors_PPI)
* names(pkMEs) <- names(colors)      
* save_plots parameter
* Removed PC significance JackStraw test for selecting genes based on loadings - doesn't produce better results

## Fixed
* Removed FilterCells step - in Campbell AgRP neurons, max genes 5000 was removing half of the neurons, from 2233 to 1274. Likely had enormous impact on results. softPower indices were very much affected.
* bootstrap(replace=T) returns the original dataset plus the desired number of permutated sets rather than the desired number minus one.  In if (plot_permuted == T) chunk, fixed indiv_TOMS <- lapply ... which was also off by one.
*  for (j in 1:length(list_MEs)) { : nested loop used same index 'j' as outer - now k
* in matchLabels, extraLabels = standardColors()

## v1.1 - 2018-04-17

### Fixed
* plot_permuted loop: fixed mapply (removed inappropriate schedule argument)
* coerce goodGenesTOM_idx to logical (was character)


## v1.0 - 2018-04-17

###  Fixed
* MAGMA parallel loop call now loads Matrix and also gets explicit arguments. Tested and works

## v0.9 - 2018-04-16

### Added
* For kME_reassign, save csv file with list of genes moved,  respective modules and pkMEs
* Allow user to specify meta.data_ID to use for subsetting Seurat object

### Fixed
* Made MAGMA a separate parallel loop to reduce concurrent libraries loaded onto each cluster in parallel processing to avoid crashing worker sessions

##v0.8 - 2018-04-13

### Changed
* Set kME_reassign_threshold in params to 1.5 (conservative)
* In the for loop the full list of subsets is now loaded into and removed from memory in each iteration. Slower but much more memory efficient.
* save.image for each iteration rather than an (unwieldy) list of objects
* reintroduced parLapply in MAGMA
* parallelised the whole script since it is now much more memory efficient than before

### Fixed
* kMEs$kMEgrey <- NULL would use tab completion to set kMEs$kMEgrey60 to NULL. Hence changed to kMEs[['kMEgrey']] <- NULL
* STRINGdb loop is now robust (tryCatch within for loop)

## v0.7 - 2018-04-12

### Added
* More messages throughout
* Check genes_use argument
* remove grey module kMEs from output
  
### changed
* Declare `PPI_pval_threshold` in `rwgcna_params.R`
* reverted MAGMA step to lapply to reduce risk of errors (seems unnecessary)
* Rather than choosing scale_data or not, now choose do.center or not (for bicor). Data is in either case scaled and nUMI and percent.mito regressed out.
* moduleEigengenes(...excludegrey = T)

### Removed
* Redundant parameters minModuleSize, deepSplit and nPermutations from built in parameters csv file

### Fixed 
* conditional flow around FilterCells and ScaleData
* removed conditional around ScaleData which would skip the step if scale_data = F. Scale data used for PCA which we use to select genes. Hence if we skip data scaling on the subset, it would actually just use the scaled data already in the seurat_object which would lead to poor PCA and selection of top loading genes - or code failure.
* set do.center = T always to avoid messing up PCA calculations. Instead make bicor == T -> scale_data <- F
* removed grey module from kMEs and kMEs_PPI

## v0.6 - 2018-04-10

### Added
* filter out genes with few cells
* filter cells with mito > 0.2
* do not filter for ribo

### Changed
* scale.data = F
* Seurat: if (scale_data == T) scale, regress out
* Seurat: if (genes_use != "all") ..find variable genes, runPCA
 * FindVariableGenes
 * delete mean.function = ExpMean,
 * delete dispersion.function = LogVMR,
    RunPCA
        Increase nPC_Seurat from 40 to 100 (for RunPCA() )
        seed.use = randomSeed __instead of default value of 42__
        Set fastpath = FALSE
        Set maxit = 300 
    findSoftThreshold
        powers: max softPower values to check up from 12 to 20
    moduleEigengenes 
        excludeGrey = T
        #softpower = softPower
        The power used in soft-thresholding the adjacency matrix. Only used when the hubgene approximation is necessary because the principal component calculation failed. It must be non-negative. The default value should only be changed if there is a clear indication that it leads to incorrect results.
    consensusTOM
        delete checkPower = checkPower - __This was not defined?__
    Move wgcna_params.R to same folder
    
__beginning of change log__

