The following code accompanies Cooper et al.’s assessment of the phylogeography of Cinnyris reichenowi. Throughout this document, the following taxonomy is used:
A medium sized sunbird from the Albertine Rift mountains of East Africa. Sympatric with some populations of the next species, and with a similar distribution that is discontinuous among large mountain ranges.
A small montane sunbird from East and West Africa, that we subdivide into three groups:
The nominate population from East Africa.
Montane populations from the West African Cameroon Line; chiefly distributed between Bioko Island and Mt. Oku.
Xeric interior populations in Cameroon, the Central African Republic, and possible Nigeria. Specimens differ slightly in morphometrics from preussi, but are genetically distinct (see below and our paper for a full discussion).
Note that for some sections of this appendix, programs are iterated randomly and sometimes jackknifed. This means that some values may differ from being a 1:1 match from the manuscript.
This study utilized programs in \(bash\) (shell script), \(python\), and \(R\). \(python\) programs were accessed and run via \(bash\). Programs used via this interface include:
This document was created using RStudio 1.0.143, R 3.4.4 (R Foundation 2018), and rmarkdown 1.10. (Allaire, Xie, et al. 2018). R packages used throughout this manuscript include:
ape (Paradis & Schliep 2018)dismo (Hijmansm Phillips, Leathwick & Elith 2017)ellipse (Murdoch & Chow 2018)fossil (Vavrek 2011)ggplot2 (Wickham 2016)LEA (Frichot & Francois 2014)maptools (Bivand & Lewin-Koh 2018)MASS (Venables & Ripley 2002)raster (Hijmans 2018)vegan (Oksanen, Blanchet, et al. 2018)In this document, I load all of these R packages here in a hidden code box that can be viewed in the rmarkdown document.
## Loading required package: raster
## Loading required package: sp
##
## Attaching package: 'raster'
## The following objects are masked from 'package:ape':
##
## rotate, zoom
##
## Attaching package: 'ellipse'
## The following object is masked from 'package:raster':
##
## pairs
## The following object is masked from 'package:graphics':
##
## pairs
## Loading required package: maps
## Loading required package: shapefiles
## Loading required package: foreign
##
## Attaching package: 'shapefiles'
## The following objects are masked from 'package:foreign':
##
## read.dbf, write.dbf
## Checking rgeos availability: TRUE
##
## Attaching package: 'MASS'
## The following objects are masked from 'package:raster':
##
## area, select
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-6
Other programs used in this study (via Windows operating system) include:
Many chunks of code are, after the first run, “hidden” from view to create a document that is easier to read. All code chunks and all code used in this manuscript can be viewed via the rmarkdown document.
The first analyses concern genetic data from the complex. We used sequences from 24 individual Cinnyris sunbirds in this study, from several different major biogeographic areas.
Note: We separate genderuensis here to better visualize which birds are from xeric regions; all birds labeled reichenowi from West Africa refer to populations of preussi.
The following color scheme was used throughout this study for official figures:
#000000#1f2887#e31a1cThe following code was formatted for machines at both the University of Chicago and the Field Museum. *This code will not run as is, and must be modified for your specific computer.
The first steps are identical to those followed on the PHYLUCE website. However, I will start here with the creation of the taxon-set that was used in this paper.
mkdir -p taxon-sets/cinnyris
phyluce_assembly_get_match_counts \
--locus-db uce-search-results/probe.matches.sqlite \
--taxon-list-config cinnyris.conf \
--taxon-group 'cinnyris' \
--incomplete-matrix \
--output taxon-sets/cinnyris/cinnyris-taxa-incomplete.conf
cd taxon-sets/cinnyris
mkdir log
phyluce_assembly_get_fastas_from_match_counts \
--contigs ../../assemblies_trinity_2017/contigs \
--locus-db ../../uce-search-results/probe.matches.sqlite \
--match-count-output cinnyris-taxa-incomplete.conf \
--output cinnyris-taxa-incomplete.fasta \
--incomplete-matrix cinnyris-taxa-incomplete.incomplete \
--log-path log
According to the counts printed above, the most loci rich individuals are:
Given that KU142209 is the largest member of the main study group (C. reichenowi), this is the individual to which we will map our reads later.
phyluce_assembly_explode_get_fastas_file --input cinnyris-taxa-incomplete.fasta --output-dir exploded-fastas --by-taxon
phyluce_align_seqcap_align \
--fasta cinnyris-taxa-incomplete.fasta \
--output mafft-nexus-internal-trimmed \
--taxa 24 \
--aligner mafft \
--cores 20 \
--incomplete-matrix \
--output-format fasta \
--no-trim \
--log-path log
phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed \
--alignments mafft-nexus-internal-trimmed \
--output mafft-nexus-internal-trimmed-gblocks \
--cores 20 \
--log log
phyluce_align_get_align_summary_data \
--alignments mafft-nexus-internal-trimmed-gblocks \
--cores 20 \
--log-path log
A printout of the summary data follows:
#----------------------- Alignment summary -----------------------
#[Alignments] loci: 4,946
#[Alignments] length: 2,979,047
#[Alignments] mean: 602.31
#[Alignments] 95% CI: 5.64
#[Alignments] min: 120
#[Alignments] max: 2,081
#------------------- Informative Sites summary -------------------
#[Sites] loci: 4,946
#[Sites] total: 18,517
#[Sites] mean: 3.74
#[Sites] 95% CI: 0.13
#[Sites] min: 0
#[Sites] max: 76
#------------------------- Taxon summary -------------------------
#[Taxa] mean: 18.86
#[Taxa] 95% CI: 0.09
#[Taxa] min: 3
#[Taxa] max: 24
#----------------- Missing data from trim summary ----------------
#[Missing] mean: 0.00
#[Missing] 95% CI: 0.00
#[Missing] min: 0.00
#[Missing] max: 0.00
#-------------------- Character count summary --------------------
#[All characters] 56,341,110
#[Nucleotides] 53,710,713
#---------------- Data matrix completeness summary ---------------
#[Matrix 50%] 4754 alignments
#[Matrix 55%] 4709 alignments
#[Matrix 60%] 4565 alignments
#[Matrix 65%] 4417 alignments
#[Matrix 70%] 4160 alignments
#[Matrix 75%] 3761 alignments
#[Matrix 80%] 3237 alignments
#[Matrix 85%] 1693 alignments
#[Matrix 90%] 866 alignments
#[Matrix 95%] 275 alignments
#------------------------ Character counts -----------------------
#[Characters] '-' is present 2,630,397 times
#[Characters] 'A' is present 16,442,611 times
#[Characters] 'C' is present 10,446,382 times
#[Characters] 'G' is present 10,431,151 times
#[Characters] 'T' is present 16,390,569 times
The above plot visualizes the percent coverage and the number of loci with that coverage. (I.e., 275 alignments are shared among 90% of the individuals within the dataset). The amount of loci declines with increasing coverage (as is to be expected), and precipitously declines between 80-85% coverage. We opted to use 80% coverage as we are still using 3237 loci, or ~0.65% of the total possible loci while still providing a large amount of data for the phylogeographic analyses.
We then proceeding with the creation of a cleaned 80% matrix for use in RaxML.
phyluce_align_remove_locus_name_from_nexus_lines \
--alignments mafft-nexus-internal-trimmed-gblocks \
--output mafft-nexus-internal-trimmed-gblocks-clean \
--cores 20 \
--log-path log
phyluce_align_get_only_loci_with_min_taxa \
--alignments mafft-nexus-internal-trimmed-gblocks-clean \
--taxa 24 \
--percent 0.80 \
--output mafft-nexus-internal-trimmed-gblocks-clean-80p \
--cores 20 \
--log-path log
phyluce_align_format_nexus_files_for_raxml \
--alignments mafft-nexus-internal-trimmed-gblocks-clean-80p \
--output mafft-nexus-internal-trimmed-gblocks-clean-80p-raxml \
--charsets \
--log-path log
Note that we also tested with other amounts of data to determine how the models reacted; the overall topology was extremely similar to the 80p dataset increasing and decreasing the number of loci used.
We assessed relationships between all taxa using a bootstrapped RaxML approach, as follows. Note that this code has -T 20, indicating that we used 20 cores on our machine; this value must be adjusted to the machine on which you are running the program.
cd mafft-nexus-internal-trimmed-gblocks-clean-80p-raxml
raxmlHPC-PTHREADS-SSE3 \
-m GTRGAMMA \
-N 24 \
-p 19877 \
-n best \
-s mafft-nexus-internal-trimmed-gblocks-clean-80p.phylip \
-T 20
raxmlHPC-PTHREADS-SSE3 \
-m GTRGAMMA \
-N autoMRE \
-p 19877 \
-b 7175 \
-n bootreps \
-s mafft-nexus-internal-trimmed-gblocks-clean-80p.phylip \
-T 20
raxmlHPC-SSE3 \
-n resolved \
-m GTRGAMMA \
-f b \
-t RAxML_bestTree.best \
-z RAxML_bootstrap.bootreps
Unable to print document with the below included, but refers to figure 3 in the paper.

The following section relies heavily on the methods and codes outlines by Zarza et al. (2016) in their study of Aphelocoma jays.
We did not focus on mitogenomes for this study, as we were working with museum specimens from which we were unable to obtain mitochondrial information. Any mitochondrial studies would have lacked samples from Mt. Cameroon and genderuensis populations of central Cameroon.
Throughout this section, we have used \ to subdivide code into multiple lines so that it is easier to read. For some parts of the code, these will have no effect; for others, they will need to be removed. This code is specifically formatted to work on our machines; it will need to be reformatted for you own machine should you choose to use it.
As mentioned above, we indexed reads to our best represented ingroup, KU132209 Cinnyris reichenowi with 4276 loci. We did this using the bwa-mem algorithm.
cd
cd uce-cinnyris/taxon-sets/cinnyris/
mkdir map-to-read
cd map-to-read/
cp ../exploded-fastas/KU132209-Cinnyris-reichenowi.unaligned.fasta ./KU132209_Cinnyris_reichenowi.fasta
bwa index -p KU132209_Cinnyris_reichenowi -a is KU132209_Cinnyris_reichenowi.fasta
From here on, I assigned shortcuts to my relevant folders and files to make it easier to run the appropriate codes. Many of these abbreviations are the same as used by Zarza et al. (2016). One of the files referenced below is a simple .txt file with the necessary taxa listed. This file is included with our data download.
READS_FOLDER=~/UCE-Data/UCEs/clean_reads_2017
SUBSET=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED
INDEX=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/map-to-read/KU132209_Cinnyris_reichenowi.fasta
FILES=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/taxalist.txt
We then ran a loop code to perform multiple actions on all of our sequences of interest.
while read -r line
do
name="$line"
#Map sequences against the reference sequence using bwa-mem
echo "Processing species: - $name"
eval $(echo "bwa mem -B 10 -M -R '@RG\tID:$name\tSM:$name\tPL:Illumina' \
KU132209_Cinnyris_reichenowi \
$READS_FOLDER/$name/split-adapter-quality-trimmed/$name-READ1.fastq.gz \
$READS_FOLDER/$name/split-adapter-quality-trimmed/$name-READ2.fastq.gz > \
~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/BWAMEM/$name.pair.sam")
eval $(echo "bwa mem -B 10 -M -R '@RG\tID:$name\tSM:$name\tPL:Illumina' \
KU132209_Cinnyris_reichenowi \
$READS_FOLDER/$name/split-adapter-quality-trimmed/$name-READ-singleton.fastq.gz > \
~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/BWAMEM/$name.single.sam")
#We then sorted reads using SAMTOOLS
eval $(echo "samtools view -bS \
~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/BWAMEM/$name.pair.sam | \
samtools sort -m 30000000000 \
- ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/SAM/$name.pair_sorted")
eval $(echo "samtools view \
-bS ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/BWAMEM/$name.single.sam | \
samtools sort -m 30000000000 \
- ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/SAM/$name.single_sorted")
#Mark duplicates using picard
eval $(echo "java -Xmx4g -jar ~/anaconda/jar/MarkDuplicates.jar \
INPUT=$SUBSET/SAM/$name.pair_sorted.bam \
INPUT=$SUBSET/SAM/$name.single_sorted.bam \
OUTPUT=$SUBSET/SAM/$name.All_dedup.bam \
METRICS_FILE=$SUBSET/SAM/$name.All_dedup_metricsfile \
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=250 ASSUME_SORTED=true \
VALIDATION_STRINGENCY=SILENT REMOVE_DUPLICATES=True")
#Index the resulting '.bam' file
eval $(echo "java -Xmx4g -jar ~/anaconda/jar/BuildBamIndex.jar \
INPUT=$SUBSET/SAM/$name.All_dedup.bam")
eval $(echo "samtools flagstat $SUBSET/SAM/$name.All_dedup.bam > $SUBSET/Picard-Stats/$name.All_dedup_stats.txt")
done < "$FILES"
#Remove files that are no longer needed
rm *.sam
rm *sorted.bam
The next step was the ‘indel realigner’ step. This utilized the Genome Analysis Toolkit (GATK), which uses .dict dictionary files for contig names and sizes and .fai fasta index files to allow for efficient random access to the reference bases.
The first step was to prepare a fasta file to use as a reference with picard and samtools.
java -jar ~/anaconda/pkgs/picard-1.106-0/jar/CreateSequenceDictionary.jar \
R=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/map-to-read/KU132209_Cinnyris_reichenowi.fasta \
O=KU132209_Cinnyris_reichenowi.dict
samtools faidx ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/map-to-read/KU132209_Cinnyris_reichenowi.fasta
We realigned the mapping produced with bwa-mem with a gap penalty of \(B=10\). The minimum number of reads per locus was set to 10.
DEDUP_BAMS=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/SAM/*All_dedup.bam
cd $SUBSET
for sample in $DEDUP_BAMS
do
#Taxon or sample that is presently being processed
echo "Processing $sample"
#Create a variable with the sample name using the name of "dedup-bam" file
#This uses the "cut" command with "/" as a field delimiter
#In my case, this cuts it into 9 field, keeping #9
DEDUPBAMNAME=$(echo $sample | cut -d/ -f9)
DEDUPBASENAME=$(echo $DEDUPBAMNAME | cut -d. -f1)
#Create the name of the intervals file
INTERVALS_NAME=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/GATK/$DEDUPBASENAME'.intervals'
echo $INTERVALS_NAME
#Create the ouput location for the realigned bam
REALIGNED_NAME=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/GATK/$DEDUPBASENAME'_realigned.bam'
echo $REALIGNED_NAME
#Execute the command in GATK to create intervals and realign reads
eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T RealignerTargetCreator \
-R $INDEX -o $INTERVALS_NAME -I $sample --minReadsAtLocus 10")
eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T IndelRealigner \
-R $INDEX -I $sample -targetIntervals $INTERVALS_NAME -o $REALIGNED_NAME -LOD 3.0")
done
#Realign the mapping produced with bwa-mem
#Gap penalty of 10
#Minimum number of reads per locus = 10
mkdir GCVF
#I set the REFERENCE to equal my INDEX path
#Zarza et al. (2016) used REFERENCE
REFERENCE=$INDEX
#Realigned bams after removing duplicates with picard
REALIGNED_BAMS=$SUBSET/GATK/*realigned.bam
for sample in $REALIGNED_BAMS
do
#Current Sample
echo "Processing $sample"
#Create a variable with the sample name using the name of "dedup-bam" file
#This uses the "cut" command with "/" as a field delimiter
#In my case, this cuts it into 9 field, keeping #9
OUTPUT_BASENAME=$(echo $sample | cut -d/ -f9)
echo $OUTPUT_BASENAME
OUTPUT_NAME=/home/jcooper/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/GATK/$(echo $OUTPUT_BASENAME | cut -d. -f1)'.g.vcf'
echo $OUTPUT_NAME
#Execute the command in GATK for haplotype call
#Variant discovery with HaplotypeCaller
#Normal mode can process all samples merged in one file
#With gVCF each sample needs to be processed at one time
#This is the mode needed to serve as input for GenotypeGCVF
eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
-R $REFERENCE -I $sample -o $OUTPUT_NAME --emitRefConfidence GVCF \
--variant_index_type LINEAR --variant_index_parameter 128000 \
--contamination_fraction_to_filter 0.0002 --min_base_quality_score 20 \
--phredScaledGlobalReadMismappingRate 30 --standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0")
done
We now need to get the names of the VCF files for the next step.
ls -d -1 $PWD/GATK/*.g.vcf > gvcf.list
Next, we did genotyping with GCVF in all of the variant files produced by HaplotypeCaller. We merged files and only kept variable sites.
java -Xmx4g -jar ~/GenomeAnalysisTK.jar -R $REFERENCE -T GenotypeGVCFs \
--standard_min_confidence_threshold_for_calling 40.0 --standard_min_confidence_threshold_for_emitting 40.0 \
-V gvcf.list \
-o $PWD/GCVF/genotyped_X_samples.g.vcf
#Extract the SNPs from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $PWD/GCVF/genotyped_X_samples.g.vcf \
-selectType SNP \
-o $PWD/GCVF/genotyped_X_samples_snps.vcf
#Extract the indels from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $PWD/GCVF/genotyped_X_samples.g.vcf \
-selectType INDEL \
-o $PWD/GCVF/genotyped_X_samples_indels.vcf
Zarza et al. (2016) and, thus, we filtered SNP calls around indels and applied quality filters following the methods of Brant Faircloth and the GATK Forums.
java -jar ~/GenomeAnalysisTK.jar \
-T VariantFiltration \
-R $REFERENCE \
-V $PWD/GCVF/genotyped_X_samples_snps.vcf \
--mask $PWD/GCVF/genotyped_X_samples_indels.vcf \
--maskExtension 5 \
--maskName InDel \
--clusterWindowSize 10 \
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
--filterName "BadValidation" \
--filterExpression "QUAL < 30.0" \
--filterName "LowQual" \
--filterExpression "QD < 5.0" \
--filterName "LowVQCBD" \
--filterExpression "FS > 60" \
--filterName "FisherStrand" \
-o $PWD/GCVF/genotyped_X_samples_filtered_1st.vcf
#Get only the pass SNPs
cat $PWD/GCVF/genotyped_X_samples_filtered_1st.vcf | grep 'PASS\|^#' > $PWD/GCVF/genotyped_X_samples_only_PASS_snp.vcf
#Recalibrate Bases
mkdir recal
for sample in $REALIGNED_BAMS
do
#Current sample
echo "Processing $sample"
#Create a variable with the sample name using the name of "dedup-bam" file
#This uses the "cut" command with "/" as a field delimiter
#In my case, this cuts it into 9 field, keeping #9
FILE_BASENAME=$(echo $sample | cut -d/ -f9)
echo $FILE_BASENAME
TABLE_NAME=$(echo $FILE_BASENAME | cut -d. -f1)'.table'
echo $TABLE_NAME
RECAL_OUT=$(echo $FILE_BASENAME | cut -d. -f1)'_recal.bam'
RECAL_OUT_bai=$(echo $FILE_BASENAME | cut -d. -f1)'_recal.bai'
#Execute GATK command to recalibrate bases
eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T BaseRecalibrator \
-R $REFERENCE -I $sample -knownSites $SUBSET/GCVF/genotyped_X_samples_only_PASS_snp.vcf \
-o $SUBSET/recal/$TABLE_NAME")
eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T PrintReads -R $REFERENCE -I $sample \
-BQSR $SUBSET/recal/$TABLE_NAME -o $SUBSET/recal/$RECAL_OUT")
done
RECAL_BAMS=$SUBSET/recal/*_recal.bam
#Haplotype calling on 1st recalibrated bam
for bam_recal in $RECAL_BAMS
do
#Current sample
echo "Processing $bam_recal"
#Create a variable with the sample name using the name of "dedup-bam" file
#This uses the "cut" command with "/" as a field delimiter
#In my case, this cuts it into 9 field, keeping #9
RECAL1_BASENAME=$(echo $bam_recal | cut -d/ -f9)
echo $RECAL1_BASENAME
RECAL1_NAME=$SUBSET/recal/$(echo $RECAL1_BASENAME | cut -d. -f1)'.g.vcf'
echo $RECAL1_NAME
#Execute the GATK command for haplotype call on recalibrated bams.
eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
-R $REFERENCE -I $bam_recal -o $RECAL1_NAME --emitRefConfidence GVCF \
--variant_index_type LINEAR --variant_index_parameter 128000 \
--contamination_fraction_to_filter 0.0002 --min_base_quality_score 20 \
--phredScaledGlobalReadMismappingRate 30 \
--standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0")
done
Move on to the genotyped files to perform multiple loops of identifying and filtering SNPs.
#Get the names of the recal vcf files to be used in the next step
ls -d -1 $SUBSET/recal/*_recal.g.vcf > recal_vcf.list
mkdir genotyped
#Genotyping with GVCF in all the variant files produced by HaplotypeCaller gvcf; merges files and contains only variable sites
java -Xmx4g -jar ~/GenomeAnalysisTK.jar -R $REFERENCE -T GenotypeGVCFs \
--standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0 \
-V recal_vcf.list \
-o $SUBSET/genotyped/genotyped_X_samples_recal.g.vcf
#Extract the SNPs from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $SUBSET/genotyped/genotyped_X_samples_recal.g.vcf \
-selectType SNP \
-o $SUBSET/genotyped/genotyped_X_samples_recal_snps.vcf
#Extract the indels from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $SUBSET/genotyped/genotyped_X_samples_recal.g.vcf \
-selectType INDEL \
-o $SUBSET/genotyped/genotyped_X_samples_recal_indels.vcf
java -jar ~/GenomeAnalysisTK.jar \
-T VariantFiltration \
-R $REFERENCE \
-V $SUBSET/genotyped/genotyped_X_samples_recal_snps.vcf \
--mask $SUBSET/genotyped/genotyped_X_samples_recal_indels.vcf \
--maskExtension 5 \
--maskName InDel \
--clusterWindowSize 10 \
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
--filterName "BadValidation" \
--filterExpression "QUAL < 30.0" \
--filterName "LowQual" \
--filterExpression "QD < 5.0" \
--filterName "LowVQCBD" \
--filterExpression "FS > 60" \
--filterName "FisherStrand" \
-o $SUBSET/genotyped/genotyped_X_samples_filtered_2nd.vcf
#Only get the passable SNPs
cat $SUBSET/genotyped/genotyped_X_samples_filtered_2nd.vcf | grep 'PASS\|^#' > $SUBSET/genotyped/genotyped_X_samples_only_PASS_snp_2nd.vcf
#Second base recalibration loop on uncalibrated bams
#ANNOTATION WITHIN LOOP STOPPED
mkdir GCVF2
for sample in $REALIGNED_BAMS
do
echo "Processing $sample"
FILE2_BASENAME=$(echo $sample | cut -d/ -f9)
echo $FILE2_BASENAME
TABLE2_NAME=$(echo $FILE2_BASENAME | cut -d. -f1)'2.table'
echo $TABLE2_NAME
RECAL2_OUT=$(echo $FILE2_BASENAME | cut -d. -f1)'_2recal.bam'
RECAL2_OUT_bai=$(echo $FILE2_BASENAME | cut -d. -f1)'_2recal.bai'
eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T BaseRecalibrator \
-R $REFERENCE -I $sample \
-knownSites $SUBSET/genotyped/genotyped_X_samples_only_PASS_snp_2nd.vcf \
-o $SUBSET/GCVF2/$TABLE2_NAME")
eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T PrintReads -R $REFERENCE -I $sample \
-BQSR $SUBSET/GCVF2/$TABLE2_NAME -o $SUBSET/GCVF2/$RECAL2_OUT")
echo RECAL_OUT_bai
done
RECAL2_BAMS=$SUBSET/GCVF2/*_2recal.bam
#Haplotype calling on second recalibrated bam
for bam2_recal in $RECAL2_BAMS
do
echo "Processing $bam2_recal"
RECAL2_BASENAME=$(echo $bam2_recal | cut -d/ -f9)
echo $RECAL2_BASENAME
RECAL2_NAME=$(echo $RECAL2_BASENAME | cut -d. -f1)'.g.vcf'
echo $RECAL2_NAME
#Haplotype call on second recalibrated bams
eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
-R $REFERENCE -I $bam2_recal -o $SUBSET/GCVF2/$RECAL2_NAME \
--emitRefConfidence GVCF --variant_index_type LINEAR \
--variant_index_parameter 128000 \
--contamination_fraction_to_filter 0.0002 \
--min_base_quality_score 20 --phredScaledGlobalReadMismappingRate 30 \
--standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0")
done
#Get filelist for next step
ls -d -1 $SUBSET/GCVF2/*_2recal.g.vcf > recal2_vcf.list
#Genotyping with GCVF all of the variant files; merge files and keep only variable sites
java -Xmx4g -jar ~/GenomeAnalysisTK.jar -R $REFERENCE -T GenotypeGVCFs \
--standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0 \
-V recal2_vcf.list \
-o $SUBSET/GCVF2/genotyped_X_samples_2recal.g.vcf
# Extract the SNPs from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $SUBSET/GCVF2/genotyped_X_samples_2recal.g.vcf \
-selectType SNP \
-o $SUBSET/GCVF2/genotyped_X_samples_2recal_snps.vcf
# Extract the indels from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $SUBSET/GCVF2/genotyped_X_samples_2recal.g.vcf \
-selectType INDEL \
-o $SUBSET/GCVF2/genotyped_X_samples_2recal_indels.vcf
#Filter SNPs
java -jar ~/GenomeAnalysisTK.jar \
-T VariantFiltration \
-R $REFERENCE \
-V $SUBSET/GCVF2/genotyped_X_samples_2recal_snps.vcf \
--mask $SUBSET/GCVF2/genotyped_X_samples_2recal_indels.vcf \
--maskExtension 5 \
--maskName InDel \
--clusterWindowSize 10 \
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
--filterName "BadValidation" \
--filterExpression "QUAL < 30.0" \
--filterName "LowQual" \
--filterExpression "QD < 5.0" \
--filterName "LowVQCBD" \
--filterExpression "FS > 60" \
--filterName "FisherStrand" \
-o $SUBSET/GCVF2/genotyped_X_samples_filtered_3rd.vcf
#Keep only passable SNPs
mkdir GCVF3
cat $SUBSET/GCVF2/genotyped_X_samples_filtered_3rd.vcf | grep 'PASS\|^#' > $SUBSET/GCVF3/genotyped_X_samples_only_PASS_snp_3rd.vcf
#Third base recalibration loop
for sample in $REALIGNED_BAMS
do
echo "Processing $sample"
FILE3_BASENAME=$(echo $sample | cut -d/ -f9)
echo $FILE3_BASENAME
TABLE3_NAME=$(echo $FILE3_BASENAME | cut -d. -f1)'3.table'
echo $TABLE3_NAME
RECAL3_OUT=$(echo $FILE3_BASENAME | cut -d. -f1)'_3recal.bam'
RECAL3_OUT_bai=$(echo $FILE3_BASENAME | cut -d. -f1)'_3recal.bai'
#Execute the GATK command for base recalibration
eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T BaseRecalibrator -R $REFERENCE \
-I $sample \
-knownSites $SUBSET/GCVF3/genotyped_X_samples_only_PASS_snp_3rd.vcf \
-o $SUBSET/GCVF3/$TABLE3_NAME")
eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T PrintReads -R $REFERENCE -I $sample \
-BQSR $SUBSET/GCVF3/$TABLE3_NAME -o $SUBSET/GCVF3/$RECAL3_OUT")
done
RECAL3_BAMS=$SUBSET/GCVF3/*_3recal.bam
#Haplotype callinf on third bam recalibration
for bam3_recal in $RECAL3_BAMS
do
echo "Processing $bam3_recal"
RECAL3_BASENAME=$(echo $bam3_recal | cut -d/ -f9)
echo $RECAL3_BASENAME
RECAL3_NAME=$(echo $RECAL3_BASENAME | cut -d. -f1)'.g.vcf'
echo $RECAL3_NAME
#Execute the GATK command for haplotype recalibration
eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
-R $REFERENCE -I $bam3_recal \
-o $SUBSET/GCVF3/$RECAL3_NAME \
--emitRefConfidence GVCF --variant_index_type LINEAR \
--variant_index_parameter 128000 \
--contamination_fraction_to_filter 0.0002 \
--min_base_quality_score 20 \
--phredScaledGlobalReadMismappingRate 30 \
--standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0")
done
#Get file list for next step
ls -d -1 $SUBSET/GCVF3/*_3recal.g.vcf > recal3_vcf.list
#Genotyping with GCVF
java -Xmx4g -jar ~/GenomeAnalysisTK.jar -R $REFERENCE -T GenotypeGVCFs \
--standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0 \
-V recal3_vcf.list \
-o $SUBSET/GCVF3/genotyped_X_samples_3recal.g.vcf
#Extract the SNPs from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $SUBSET/GCVF3/genotyped_X_samples_3recal.g.vcf \
-selectType SNP \
-o $SUBSET/GCVF3/genotyped_X_samples_3recal_snps.vcf
#Extract indels from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $SUBSET/GCVF3/genotyped_X_samples_3recal.g.vcf \
-selectType INDEL \
-o $SUBSET/GCVF3/genotyped_X_samples_3recal_indels.vcf
java -jar ~/GenomeAnalysisTK.jar \
-T VariantFiltration \
-R $REFERENCE \
-V $SUBSET/GCVF3/genotyped_X_samples_3recal_snps.vcf \
--mask $SUBSET/GCVF3/genotyped_X_samples_3recal_indels.vcf \
--maskExtension 5 \
--maskName InDel \
--clusterWindowSize 10 \
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
--filterName "BadValidation" \
--filterExpression "QUAL < 30.0" \
--filterName "LowQual" \
--filterExpression "QD < 5.0" \
--filterName "LowVQCBD" \
--filterExpression "FS > 60" \
--filterName "FisherStrand" \
-o $SUBSET/GCVF3/genotyped_X_samples_filtered_4th.vcf
#Get the passable SNPs
mkdir GCVF4
cat $SUBSET/GCVF3/genotyped_X_samples_filtered_4th.vcf | grep 'PASS\|^#' > $SUBSET/GCVF4/genotyped_X_samples_only_PASS_snp_4th.vcf
#Fourth and final recalibration
for sample in $REALIGNED_BAMS
do
echo "Processing $sample"
FILE4_BASENAME=$(echo $sample | cut -d/ -f9)
echo $FILE4_BASENAME
TABLE4_NAME=$(echo $FILE4_BASENAME | cut -d. -f1)'4.table'
echo $TABLE4_NAME
RECAL4_OUT=$(echo $FILE4_BASENAME | cut -d. -f1)'_4recal.bam'
RECAL4_OUT_bai=$(echo $FILE4_BASENAME | cut -d. -f1)'_4recal.bai'
#Execute the GATK base recalibration command
eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T BaseRecalibrator \
-R $REFERENCE -I $sample \
-knownSites $SUBSET/GCVF4/genotyped_X_samples_only_PASS_snp_4th.vcf \
-o $SUBSET/GCVF4/$TABLE4_NAME")
eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T PrintReads -R $REFERENCE -I $sample \
-BQSR $SUBSET/GCVF4/$TABLE4_NAME -o $SUBSET/GCVF4/$RECAL4_OUT")
done
RECAL4_BAMS=$SUBSET/GCVF4/*_4recal.bam
#Haplotype calling on fourth recalibrated bam
for bam4_recal in $RECAL4_BAMS
do
echo "Processing $bam4_recal"
RECAL4_BASENAME=$(echo $bam4_recal | cut -d/ -f9)
echo $RECAL4_BASENAME
RECAL4_NAME=$(echo $RECAL4_BASENAME | cut -d. -f1)'.g.vcf'
echo $RECAL4_NAME
#Execute the haplotype call command in GATK
eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
-R $REFERENCE -I $bam4_recal \
-o $SUBSET/GCVF4/$RECAL4_NAME \
--emitRefConfidence GVCF --variant_index_type LINEAR \
--variant_index_parameter 128000 \
--contamination_fraction_to_filter 0.0002 \
--min_base_quality_score 20 \
--phredScaledGlobalReadMismappingRate 30 \
--standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0")
done
#Get list of files from fourth loop for the next step
ls -d -1 $SUBSET/GCVF4/*_4recal.g.vcf > recal4_vcf.list
#Genotyping with GCVF; merge files and keep only the variable sites
java -Xmx4g -jar ~/GenomeAnalysisTK.jar -R $REFERENCE -T GenotypeGVCFs \
--standard_min_confidence_threshold_for_calling 40.0 \
--standard_min_confidence_threshold_for_emitting 40.0 \
-V recal4_vcf.list \
-o $SUBSET/GCVF4/genotyped_X_samples_4recal.g.vcf
#Extract SNPs from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $SUBSET/GCVF4/genotyped_X_samples_4recal.g.vcf \
-selectType SNP \
-o $SUBSET/GCVF4/genotyped_X_samples_4recal_snps.vcf
#Extract indels from the call set
java -jar ~/GenomeAnalysisTK.jar \
-T SelectVariants \
-R $REFERENCE \
-V $SUBSET/GCVF4/genotyped_X_samples_4recal.g.vcf \
-selectType INDEL \
-o $SUBSET/GCVF4/genotyped_X_samples_4recal_indels.vcf
java -jar ~/GenomeAnalysisTK.jar \
-T VariantFiltration \
-R $REFERENCE \
-V $SUBSET/GCVF4/genotyped_X_samples_4recal_snps.vcf \
--mask $SUBSET/GCVF4/genotyped_X_samples_4recal_indels.vcf \
--maskExtension 5 \
--maskName InDel \
--clusterWindowSize 10 \
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
--filterName "BadValidation" \
--filterExpression "QUAL < 30.0" \
--filterName "LowQual" \
--filterExpression "QD < 5.0" \
--filterName "LowVQCBD" \
--filterExpression "FS > 60" \
--filterName "FisherStrand" \
-o $SUBSET/GCVF4/genotyped_X_samples_filtered_5th.vcf
#Get only passable SNPs
mkdir GCVF5
cat $SUBSET/GCVF4/genotyped_X_samples_filtered_5th.vcf | grep 'PASS\|^#' > $SUBSET/GCVF5/genotyped_X_samples_only_PASS_snp_5th.vcf
We used the last pass - genotyped_X_samples_only_PASS_snp_5th.vcf - for our downstream analyses. Zarza et al. (2016) note that the *4recal.bam files can be used as input for ANGSD.
Next, we can create a summary text file that will look at the average depth per site.
mkdir $SUBSET/vcftools
cd vcftools
cp ../GCVF5/genotyped_X_samples_only_PASS_snp_5th.vcf
vcftools --vcf $SUBSET/vcftools/genotyped_X_samples_only_PASS_snp_5th.vcf --depth \
-c > $SUBSET/vcftools/depth_summary.txt
Next, we had to convert from .vcf to SNAPP and structure formats.
#Due to previous steps, nothing missing
#No missing flag just to be safe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --hwe 0.1 --max-missing 1 --012 --out nomiss_900_hwe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --hwe 0.1 --max-missing 1 --recode --out nomiss_900_hwe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --max-missing 1 --012 --out nomiss_900_no-hwe
From the output of VCF, the implementation of the Hardy-Weinberg filter removes 246 sites.
#With HWE filter
Parameters as interpreted:
--vcf genotyped_X_samples_only_PASS_snp_5th.vcf
--max-alleles 2
--hwe 0.1
--thin 900
--max-missing 1
--012
--out nomiss_900_hwe
After filtering, kept 24 out of 24 Individuals
Writing 012 matrix files ... Done.
After filtering, kept 3370 out of a possible 69982 Sites
Run Time = 1.00 seconds
VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009
#without HWE filter
Parameters as interpreted:
--vcf genotyped_X_samples_only_PASS_snp_5th.vcf
--thin 900
--max-missing 1
--012
--out nomiss_900_no-hwe
After filtering, kept 24 out of 24 Individuals
Writing 012 matrix files ... 012: Only outputting biallelic loci.
Done.
After filtering, kept 3616 out of a possible 69982 Sites
Run Time = 1.00 seconds
Using iterations of the above code, we determined how many SNPs remained for different thinning windows. we used windows of 10 bp from 10 to 260, and further performed reductions of 270, 360, 450, 540, 630, 720, 810, and 900. we used 900 as this is the window that was used by Zarza et al.. We had difficulty determining which was the best window, so we used two different files for my analyses: 170 and 900. We used 170 as this is the point at which the number of SNPs being reduced ‘levels out’. This was done before the HWE thinning, but reflects the overall behavior of the data.
#Note, not writing entire number string to save space in document
#"X" is number string by tens "10 20 30 ... 260 270"
#after 270, by 90s from 270 to 900
#This will rewrite files but print the number of kept SNPs to the terminal screen
for VAR in X
do
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin $VAR --max-missing 1 --012 --out test
done
#Plot output in R
x1=seq(from=10,to=260,by=10)
x2=seq(from=270,to=900,by=90)
x=c(x1,x2)
#Outputs from VCF program
y=c(10606,9570,8821,8223,7727,7312,
6968,6683,6438,6201,6011,5855,
5714,5575,5444,5317,5223,5126,
5032,4948,4846,4753,4651,4567,
4472,4373,4280,3731,3637,3629,
3622,3617,3616,3616)
#Check if dimensions equal
#length(x)==length(y)
plot(x=x,y=y,pch=19)
plot(x=x[1:9],y=y[1:9],pch=19)
y2=y/69982
x2=x/900
plot(x=x2,y=y2,pch=19)
plot(x=x2[1:9],y=y2[1:9],pch=19)
y3=NULL
for(i in 1:length(y)){
if(i==1){y3[i]=0}else{
y3[i]=y[i]-y[i-1]
}
}
plot(y=y3[2:27],x=x[2:27],pch=19,
main="SNPs removed",xlab="Thin Window",
ylab="# SNPs removed")
#Print matrix of number removed.
cbind(x,y,y3)
## x y y3
## [1,] 10 10606 0
## [2,] 20 9570 -1036
## [3,] 30 8821 -749
## [4,] 40 8223 -598
## [5,] 50 7727 -496
## [6,] 60 7312 -415
## [7,] 70 6968 -344
## [8,] 80 6683 -285
## [9,] 90 6438 -245
## [10,] 100 6201 -237
## [11,] 110 6011 -190
## [12,] 120 5855 -156
## [13,] 130 5714 -141
## [14,] 140 5575 -139
## [15,] 150 5444 -131
## [16,] 160 5317 -127
## [17,] 170 5223 -94
## [18,] 180 5126 -97
## [19,] 190 5032 -94
## [20,] 200 4948 -84
## [21,] 210 4846 -102
## [22,] 220 4753 -93
## [23,] 230 4651 -102
## [24,] 240 4567 -84
## [25,] 250 4472 -95
## [26,] 260 4373 -99
## [27,] 270 4280 -93
## [28,] 360 3731 -549
## [29,] 450 3637 -94
## [30,] 540 3629 -8
## [31,] 630 3622 -7
## [32,] 720 3617 -5
## [33,] 810 3616 -1
## [34,] 900 3616 0
Running again for 170 bases.
#Due to previous steps, nothing missing
#No missing flag just to be safe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --hwe 0.1 --max-missing 1 --012 --out nomiss_170_hwe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --max-missing 1 --012 --out nomiss_170_no-hwe
From the output of VCF:
#With HWE filter
VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf genotyped_X_samples_only_PASS_snp_5th.vcf
--max-alleles 2
--hwe 0.1
--thin 170
--max-missing 1
--012
--out nomiss_170_hwe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --max-missing 1 --012 --out nomiss_900_hwe
After filtering, kept 24 out of 24 Individuals
Writing 012 matrix files ... Done.
After filtering, kept 4540 out of a possible 69982 Sites
Run Time = 1.00 seconds
#without HWE filter
VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf genotyped_X_samples_only_PASS_snp_5th.vcf
--thin 170
--max-missing 1
--012
--out nomiss_170_hwe
After filtering, kept 24 out of 24 Individuals
Writing 012 matrix files ... 012: Only outputting biallelic loci.
Done.
After filtering, kept 5223 out of a possible 69982 Sites
Run Time = 2.00 seconds
Also save as plink files.
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --hwe 0.1 --max-missing 1 --plink --out PLINK_nomiss_900_hwe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --max-missing 1 --plink --out PLINK_nomiss_900_no-hwe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --hwe 0.1 --max-missing 1 --plink --out PLINK_nomiss_170_hwe
vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --max-missing 1 --plink --out PLINK_nomiss_170_no-hwe
The following converts the file to structure format, and is from Zarza et al..
#This script converts the vcf file coded as 012 (output of vcftools) to 1 line per individual and two columns per locus structure format. It requires the *.indv vcftools output with taxon labels
#delete first column of file, as it contains individual numerical id by printing from 2nd column to last
cut -f 2- nomiss_170_hwe.012 > f170_vcf_012_hwe_wo_id.txt
cut -f 2- nomiss_900_hwe.012 > f900_vcf_012_hwe_wo_id.txt
#replace 012 for 01 coding, and -9 for -1 for missing data. This should be done before adding taxa names which might contain numbers in the labels
#Take output from here to create 012 nexus files in Notepad ++ Notepadqq
sed -e 's/-1/-9 -9/g' \
-e 's/0/0 0/g' \
-e 's/1/0 1/g' \
-e 's/2/1 1/g' f170_vcf_012_hwe_wo_id.txt > f170_structure_01_hwe_woID.txt
sed -e 's/-1/-9 -9/g' \
-e 's/0/0 0/g' \
-e 's/1/0 1/g' \
-e 's/2/1 1/g' f900_vcf_012_hwe_wo_id.txt > f900_structure_01_hwe_woID.txt
#optional get number of columns (= number of loci x2 from vcf file).
#head -1 structure_01_woID.txt | wc -w
#paste individual id name from vcf *.indv
paste -d "\t" nomiss_170_hwe.012.indv f170_structure_01_hwe_woID.txt > structure012_170_hwe.txt
paste -d "\t" nomiss_900_hwe.012.indv f900_structure_01_hwe_woID.txt > structure012_900_hwe.txt
ABBA/BABA gene flow tests were performed in \(angsd\), with more details available from the angsd website.
Commands to run, based out of my programs folder and referencing my specific folders:
Merge relevant files to facilitate program:
#merge following this format
cd ~/uce-cinnyris/taxon-sets/cinnyris/abbababa
samtools merge \
./genderuensis-merge.bam \
../recal/FMNH122395_Cinnyris_genderuensis_realigned_recal.bam \
../recal/FMNH189462_Cinnyris_genderuensis_realigned_recal.bam
samtools merge \
./reichenowi-merge.bam \
../recal/FMNH358156_Cinnyris_reichenowi_realigned_recal.bam \
../recal/FMNH358157_Cinnyris_reichenowi_realigned_recal.bam \
../recal/FMNH443947_Cinnyris_reichenowi_realigned_recal.bam \
../recal/FMNH481236_Cinnyris_reichenowi_realigned_recal.bam
#etc to create groups you want
Executing the angsd ABBA/BABA program:
cd ~/uce-cinnyris/taxon-sets/cinnyris/abbababa
angcd=~/programs/angsd
abbacd=~/uce-cinnyris/taxon-sets/cinnyris/abbababa
cd $abbacd
~/programs/angsd/angsd -doAbbababa 1 \
-bam ~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_fullset.txt \
-doCounts 1 \
-useLast 1 \
-blockSize 500 \
-minQ 30 \
-minmapQ 30 \
-out ~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/fullset_500 \
-checkBamHeaders 0
~/programs/angsd/angsd -doAbbababa 1 \
-bam ~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_westset.txt \
-doCounts 1 \
-useLast 1 \
-blockSize 500 \
-minQ 30 \
-minmapQ 30 \
-out ~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/westset_500 \
-checkBamHeaders 0
Rscript ~/programs/angsd/R/jackKnife.R \
file=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/fullset_500.abbababa \
indNames=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_fullset_noout.txt \
outfile=fullset_500_jackknife
Rscript ~/programs/angsd/R/jackKnife.R \
file=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/westset_500.abbababa \
indNames=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_westset_noout.txt \
outfile=westset_500_jackknife
Rscript ~/programs/angsd/R/jackKnife.R \
file=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/mountains_500.abbababa \
indNames=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_mountains_noout.txt \
outfile=mountains_500_jackknife
LEAPopulation genetic analyses were performed in the \(R\) package LEA.
LEALEA cannot be loaded like a normal package and requires a special installation procedure. The following must be run the first time you install LEA on your computer:
##Required installation
##Linux computer
#run first time
#clear working directory
rm(list=ls())
#install required packages
install.packages(c("fields","RColorBrewer","mapplots"))
#download lea from source
source("http://bioconductor.org/biocLite.R")
#install LEA to R
biocLite("LEA")
Every subsequent time LEA is run on your machine, you need to run the following:
#run every time
rm(list=ls())
library(LEA)
##
## Attaching package: 'LEA'
## The following object is masked from 'package:lattice':
##
## barchart
source("http://membres-timc.imag.fr/Olivier.Francois/Conversion.R")
source("http://membres-timc.imag.fr/Olivier.Francois/POPSutilities.R")
## [1] "Loading fields"
## Loading required package: fields
## Loading required package: spam
## Loading required package: dotCall64
## Loading required package: grid
## Spam version 2.5-1 (2019-12-12) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
##
## Attaching package: 'spam'
## The following objects are masked from 'package:base':
##
## backsolve, forwardsolve
## See https://github.com/NCAR/Fields for
## an extensive vignette, other supplements and source code
## [1] "Loading RColorBrewer"
## Loading required package: RColorBrewer
## Warning in helpPops(): Available functions:
##
## HELP:
## * helpPops()
##
##
## SHOW EXAMPLE:
## * Open the R script scriptExample.r
##
##
## CORRELATION UTILITIES:
## Compute correlation between matrix of membership/admixture coefficients (from matrix or from POPS outputs)
## * correlation(matrix1,matrix2,plot=TRUE,colors=defaultPalette)
## * correlationFromPops(file1,file2,nind,nskip1=2,nskip2=2,plot=TRUE,colors=defaultPalette)
##
##
## BARPLOT UTILITIES:
## Display barplot of membership/admixture coefficients (from matrix or from POPS output)
## * barplotCoeff(matrix,colors=defaultPalette,...)
## * barplotFromPops(file1,nind,nskip1=2,colors=defaultPalette,...)
##
##
## MAPS UTILITIES:
## Display maps of membership/admixture coefficients (from matrix or from POPS output)
## * maps(matrix,coord,grid,constraints=NULL,method="treshold",colorGradientsList=lColorGradients,onemap=T,onepage=T,...)
## * mapsFromPops(file,nind,nskip=2,coord,grid,constraints=NULL,method="treshold",colorGradientsList=lColorGradients,onemap=T,onepage=T,...)
## Create grid on which coefficients will be displayed
## * createGrid(min_long,max_long,min_lat,max_lat,npixels_long,npixels_lat)
## * createGridFromAsciiRaster(file)
## * getConstraintsFromAsciiRaster(file,cell_value_min=NULL,cell_value_max=NULL)
## Legend for maps
## * displayLegend(K=NULL,colorGradientsList=lColorGradients)
LEA requires special file formats for performing its analyses; the structure files created in the previous pipeline are acceptable inputs for initiating LEA analyses. I am doing these analyses with and without C. regius included within the dataset.
First run; SNPs are at least 900 BP apart.
x="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/structure012_900_hwe_ordered_noregius.txt"
y="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno"
#Write Geno File
#Data is diploid
#Format = 1; all data in one row
#Extra.row = 0; no extra rows
#Extra column = 1; there is a column of individual IDs
struct2geno(file=x,output.format='geno',output=y,
diploid=T,FORMAT=1,extra.row=0,extra.col=1)
#geno2lfmm struggles with full filepaths
#set directory to run properly
setwd("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/")
geno2lfmm("Cinnyris-900-geno_noregius.geno")
#Repeating for regius included
x="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/structure012_900_hwe_ordered.txt"
y="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno"
#Write Geno File
#Data is diploid
#Format = 1; all data in one row
#Extra.row = 0; no extra rows
#Extra column = 1; there is a column of individual IDs
struct2geno(file=x,output.format='geno',output=y,
diploid=T,FORMAT=1,extra.row=0,extra.col=1)
setwd("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/")
geno2lfmm("Cinnyris-900-geno.geno")
Second run: SNPs are 170 bp apart.
x="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/structure012_170_hwe_ordered_noregius.txt"
y="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno"
#Write Geno File
#Data is diploid
#Format = 1; all data in one row
#Extra.row = 0; no extra rows
#Extra column = 1; there is a column of individual IDs
struct2geno(file=x,output.format='geno',output=y,
diploid=T,FORMAT=1,extra.row=0,extra.col=1)
#geno2lfmm struggles with full filepaths
#set directory to run properly
setwd("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/")
geno2lfmm("Cinnyris-170-geno_noregius.geno")
#Repeating for regius included
x="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/structure012_170_hwe_ordered.txt"
y="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno.geno"
#Write Geno File
#Data is diploid
#Format = 1; all data in one row
#Extra.row = 0; no extra rows
#Extra column = 1; there is a column of individual IDs
struct2geno(file=x,output.format='geno',output=y,
diploid=T,FORMAT=1,extra.row=0,extra.col=1)
setwd("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/")
geno2lfmm("Cinnyris-170-geno.geno")
Assign files variables for further analysis.
geno.170="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno.geno"
lfmm.170="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno.lfmm"
geno.900="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno"
lfmm.900="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.lfmm"
noreg.geno.170="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno"
noreg.lfmm.170="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.lfmm"
noreg.geno.900="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno"
noreg.lfmm.900="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.lfmm"
SNMF(sparse Non-Negative Matrix Factorization) analyses look at the structure of the population across different populations. We can start by doing a quick structure analysis with three populations, just like the LEA tutorial suggests.
obj.snmf=snmf(geno.900,K=3,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 24
## -L (number of loci) 6740
## -K (number of ancestral pops) 3
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K3/run1/Cinnyris-900-geno_r1.3.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K3/run1/Cinnyris-900-geno_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1698003345
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===========]
## Number of iterations: 29
##
## Least-square error: 15619.927600
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K3/run1/Cinnyris-900-geno_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K3/run1/Cinnyris-900-geno_r1.3.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-900-geno.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
qmatrix=Q(obj.snmf,K=3)
barplot(t(qmatrix),col=c("#000000","#ffa500","#1f2887"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
obj.snmf=snmf(geno.900,K=4,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 24
## -L (number of loci) 6740
## -K (number of ancestral pops) 4
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K4/run1/Cinnyris-900-geno_r1.4.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K4/run1/Cinnyris-900-geno_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 2200132065897
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno: OK.
##
##
## Main algorithm:
## [ ]
## [================]
## Number of iterations: 42
##
## Least-square error: 14730.906300
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K4/run1/Cinnyris-900-geno_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K4/run1/Cinnyris-900-geno_r1.4.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-900-geno.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
qmatrix=Q(obj.snmf,K=4)
barplot(t(qmatrix),col=c("#000000","#ffa500","#1f2887","#e31a1c"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
obj.snmf=snmf(geno.900,K=5,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 24
## -L (number of loci) 6740
## -K (number of ancestral pops) 5
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K5/run1/Cinnyris-900-geno_r1.5.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K5/run1/Cinnyris-900-geno_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 7234318570481357565
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===========]
## Number of iterations: 30
##
## Least-square error: 13459.401039
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K5/run1/Cinnyris-900-geno_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K5/run1/Cinnyris-900-geno_r1.5.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-900-geno.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
qmatrix=Q(obj.snmf,K=5)
barplot(t(qmatrix),col=c("#1f2887","#808080","#ffa500","#000000","#e31a1c"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
I’m having trouble getting the colors right, but note that C. regius block is being split before the C. reichenowi groups. Let’s switch to the no regius data.
obj.snmf=snmf(noreg.geno.900,K=2,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 2 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1998485743
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==========]
## Number of iterations: 26
##
## Least-square error: 8992.360308
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
qmatrix=Q(obj.snmf,K=2)
barplot(t(qmatrix),col=c("#000000","#1f2887"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
obj.snmf=snmf(noreg.geno.900,K=3,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1399607943
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno: OK.
##
##
## Main algorithm:
## [ ]
## [====================]
## Number of iterations: 53
##
## Least-square error: 8028.784397
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
qmatrix=Q(obj.snmf,K=3)
barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
One individual is pure genderuensis, other individuals appear to be admixed. The most red individuals in order are from 1) Mt. Genderu, Adamawa; 2) Yaounde, Centre; and 3) Babadjou, Ouest.
obj.snmf=snmf(noreg.geno.900,K=4,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 792621406
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno: OK.
##
##
## Main algorithm:
## [ ]
## [========================]
## Number of iterations: 63
##
## Least-square error: 7101.948983
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
qmatrix=Q(obj.snmf,K=4)
barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887","purple"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
Running the above results in multiple splits of either two populations in the east (variable) or two populations in the west (interior vs. Bioko).
obj.snmf=snmf(noreg.geno.900,K=5,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 2019936637
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno: OK.
##
##
## Main algorithm:
## [ ]
## [================================]
## Number of iterations: 86
##
## Least-square error: 6102.370974
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
qmatrix=Q(obj.snmf,K=5)
barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887","purple","gold"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
Four populations begins subdividing the eastern population.
We can try this for the 170 bp dataset as well to see how it compares.
obj.snmf=snmf(noreg.geno.170,K=3,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-170-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 7358
## -K (number of ancestral pops) 3
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K3/run1/Cinnyris-170-geno_noregius_r1.3.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K3/run1/Cinnyris-170-geno_noregius_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 812977239
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==============]
## Number of iterations: 38
##
## Least-square error: 10602.123334
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K3/run1/Cinnyris-170-geno_noregius_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K3/run1/Cinnyris-170-geno_noregius_r1.3.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-170-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
qmatrix=Q(obj.snmf,K=3)
barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
Results are essentially identical.
obj.snmf=snmf(noreg.geno.170,K=5,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-170-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 7358
## -K (number of ancestral pops) 5
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K5/run1/Cinnyris-170-geno_noregius_r1.5.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K5/run1/Cinnyris-170-geno_noregius_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 104288053
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno: OK.
##
##
## Main algorithm:
## [ ]
## [============]
## Number of iterations: 33
##
## Least-square error: 8108.938590
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K5/run1/Cinnyris-170-geno_noregius_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K5/run1/Cinnyris-170-geno_noregius_r1.5.G: OK.
##
## The project is saved into :
## Sequences/Cinnyris-170-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
qmatrix=Q(obj.snmf,K=5)
barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887","purple","gold"),border=NA,space=0,
xlab="INDIVIDUALS",ylab="ADMIXTURE")
Again, results are virtually identical.
We can also look at \(\alpha\) levels to see how everything compares.
For \(\alpha=1\):
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] 1936746251
## [1] "*************************************"
## [1] "* create.dataset *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -s (seed random init) 1936746251
## -r (percentage of masked data) 0.05
## -x (genotype file in .geno format) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -o (output file in .geno format) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##
## Write genotype file with masked data, /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
## [1] "*************************************"
## [1] "* sNMF K = 1 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 1
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
## -i (number max of iterations) 200
## -a (regularization parameter) 1
## -s (seed random init) 1936746251
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
##
## Least-square error: 10757.715416
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 1
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.252479
## Cross-Entropy (masked data): 0.586075
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 2 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (number max of iterations) 200
## -a (regularization parameter) 1
## -s (seed random init) 1936746251
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==========]
## Number of iterations: 26
##
## Least-square error: 8993.238989
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.194479
## Cross-Entropy (masked data): 0.573637
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 1
## -s (seed random init) 1936746251
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==============]
## Number of iterations: 37
##
## Least-square error: 8004.652741
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.174729
## Cross-Entropy (masked data): 0.60261
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 1
## -s (seed random init) 1936746251
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===========]
## Number of iterations: 30
##
## Least-square error: 6843.762563
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.147341
## Cross-Entropy (masked data): 0.629595
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 1
## -s (seed random init) 1936746251
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===========================================================================]
## Number of iterations: 200
##
## Least-square error: 6210.122024
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.140718
## Cross-Entropy (masked data): 0.718848
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 6 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 6
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
## -i (number max of iterations) 200
## -a (regularization parameter) 1
## -s (seed random init) 36285820462859
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=========]
## Number of iterations: 23
##
## Least-square error: 5112.272207
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 6
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.113556
## Cross-Entropy (masked data): 0.744037
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
For \(\alpha=50\):
#Alpha=50
obj.snmf=snmf(noreg.geno.900,K=1:6,ploidy=2,entropy=T,alpha=50,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] 2058102490
## [1] "*************************************"
## [1] "* create.dataset *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -s (seed random init) 2058102490
## -r (percentage of masked data) 0.05
## -x (genotype file in .geno format) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -o (output file in .geno format) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##
## Write genotype file with masked data, /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
## [1] "*************************************"
## [1] "* sNMF K = 1 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 1
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
## -i (number max of iterations) 200
## -a (regularization parameter) 50
## -s (seed random init) 2058102490
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
##
## Least-square error: 10704.001133
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 1
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.251244
## Cross-Entropy (masked data): 0.631556
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 2 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (number max of iterations) 200
## -a (regularization parameter) 50
## -s (seed random init) 2058102490
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=======]
## Number of iterations: 20
##
## Least-square error: 8947.097701
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.192978
## Cross-Entropy (masked data): 0.628353
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 50
## -s (seed random init) 2058102490
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==========]
## Number of iterations: 26
##
## Least-square error: 7828.695566
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.166795
## Cross-Entropy (masked data): 0.650764
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 50
## -s (seed random init) 7236837123484821210
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=========]
## Number of iterations: 24
##
## Least-square error: 6837.193442
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.146896
## Cross-Entropy (masked data): 0.693897
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 50
## -s (seed random init) 8647192761586165466
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==================]
## Number of iterations: 47
##
## Least-square error: 6300.361057
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.128977
## Cross-Entropy (masked data): 0.732612
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 6 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 6
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
## -i (number max of iterations) 200
## -a (regularization parameter) 50
## -s (seed random init) 2058102490
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=======================]
## Number of iterations: 62
##
## Least-square error: 5225.340499
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 6
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.112312
## Cross-Entropy (masked data): 0.791253
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
plot(obj.snmf,col='black',cex=1.5,pch=19,main="Alpha=50")
For \(\alpha=100\):
#Alpha=100
obj.snmf=snmf(noreg.geno.900,K=1:6,ploidy=2,entropy=T,alpha=100,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] 1881905503
## [1] "*************************************"
## [1] "* create.dataset *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -s (seed random init) 1881905503
## -r (percentage of masked data) 0.05
## -x (genotype file in .geno format) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -o (output file in .geno format) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##
## Write genotype file with masked data, /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
## [1] "*************************************"
## [1] "* sNMF K = 1 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 1
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 7356074838603831647
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
##
## Least-square error: 10685.286845
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 1
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.251802
## Cross-Entropy (masked data): 0.594754
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 2 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 4837294914291865951
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=========]
## Number of iterations: 23
##
## Least-square error: 8981.101753
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.193601
## Cross-Entropy (masked data): 0.599374
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1881905503
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==========]
## Number of iterations: 28
##
## Least-square error: 7832.819973
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.167141
## Cross-Entropy (masked data): 0.621968
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1756845967604947295
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=======================]
## Number of iterations: 61
##
## Least-square error: 7083.226215
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.145969
## Cross-Entropy (masked data): 0.633284
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1881905503
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===================================================]
## Number of iterations: 137
##
## Least-square error: 6159.090705
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.127436
## Cross-Entropy (masked data): 0.704299
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 6 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 6
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1881905503
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [============================]
## Number of iterations: 74
##
## Least-square error: 5338.289703
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 6
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.112088
## Cross-Entropy (masked data): 0.743308
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
plot(obj.snmf,col='black',cex=1.5,pch=19,main="Alpha=100")
For \(\alpha=500\):
#Alpha=500
obj.snmf=snmf(noreg.geno.900,K=1:6,ploidy=2,entropy=T,alpha=500,project="new")
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] 303906406
## [1] "*************************************"
## [1] "* create.dataset *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -s (seed random init) 303906406
## -r (percentage of masked data) 0.05
## -x (genotype file in .geno format) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -o (output file in .geno format) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##
## Write genotype file with masked data, /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
## [1] "*************************************"
## [1] "* sNMF K = 1 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 1
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
## -i (number max of iterations) 200
## -a (regularization parameter) 500
## -s (seed random init) 303906406
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
##
## Least-square error: 10697.143991
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 1
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.251403
## Cross-Entropy (masked data): 0.61752
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 2 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (number max of iterations) 200
## -a (regularization parameter) 500
## -s (seed random init) 303906406
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [======]
## Number of iterations: 16
##
## Least-square error: 8977.209101
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 2
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.192563
## Cross-Entropy (masked data): 0.598772
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 500
## -s (seed random init) 303906406
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [========]
## Number of iterations: 22
##
## Least-square error: 7851.097877
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 3
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.166293
## Cross-Entropy (masked data): 0.619389
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 500
## -s (seed random init) 303906406
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [======]
## Number of iterations: 17
##
## Least-square error: 6919.801305
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 4
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.146621
## Cross-Entropy (masked data): 0.6787
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 500
## -s (seed random init) 303906406
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==========]
## Number of iterations: 26
##
## Least-square error: 5983.418952
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 5
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.12837
## Cross-Entropy (masked data): 0.719034
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 6 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 6
## -x (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## -q (individual admixture file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
## -g (ancestral frequencies file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
## -i (number max of iterations) 200
## -a (regularization parameter) 500
## -s (seed random init) 303906406
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=====================]
## Number of iterations: 56
##
## Least-square error: 5269.647621
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q: OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of ancestral pops) 6
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
## -q (individual admixture) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
## -g (ancestral frequencies) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
## -i (with masked genotypes) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.105715
## Cross-Entropy (masked data): 0.794926
## The project is saved into :
## Sequences/Cinnyris-900-geno_noregius.snmfProject
##
## To load the project, use:
## project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
plot(obj.snmf,col='black',cex=1.5,pch=19,main="Alpha=500")
Two groups is the most likely scenario for almost all \(\alpha\) levels. One group is slightly more likely than three as a backup, suggesting that whole group separation is not yet possible.
We can also perform PCA analyses on the data to view the inherit variation. We will start with the full dataset.
pca.all=pca(lfmm.900)
## [1] "******************************"
## [1] " Principal Component Analysis "
## [1] "******************************"
## summary of the options:
##
## -n (number of individuals) 24
## -L (number of loci) 6740
## -K (number of principal components) 24
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.lfmm
## -a (eigenvalue file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.pca/Cinnyris-900-geno.eigenvalues
## -e (eigenvector file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.pca/Cinnyris-900-geno.eigenvectors
## -d (standard deviation file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.pca/Cinnyris-900-geno.sdev
## -p (projection file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.pca/Cinnyris-900-geno.projections
## -c data centered
summary(pca.all)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 9.1036900 6.95271000 5.17214000 4.71181000 4.61898000
## Proportion of Variance 0.1700916 0.09920992 0.05490212 0.04556417 0.04378659
## Cumulative Proportion 0.1700916 0.26930154 0.32420366 0.36976783 0.41355441
## PC6 PC7 PC8 PC9 PC10
## Standard deviation 4.57264000 4.54291000 4.37712000 4.25993000 4.19482000
## Proportion of Variance 0.04291238 0.04235611 0.03932105 0.03724383 0.03611393
## Cumulative Proportion 0.45646679 0.49882290 0.53814395 0.57538777 0.61150170
## PC11 PC12 PC13 PC14 PC15
## Standard deviation 4.14640000 4.09217000 4.01732000 3.99482000 3.94513000
## Proportion of Variance 0.03528513 0.03436808 0.03312239 0.03275246 0.03194273
## Cumulative Proportion 0.64678683 0.68115490 0.71427730 0.74702976 0.77897248
## PC16 PC17 PC18 PC19 PC20
## Standard deviation 3.80242000 3.78511000 3.75782000 3.68345000 3.65494000
## Proportion of Variance 0.02967344 0.02940399 0.02898138 0.02784567 0.02741622
## Cumulative Proportion 0.80864593 0.83804991 0.86703129 0.89487696 0.92229317
## PC21 PC22 PC23 PC24
## Standard deviation 3.59893000 3.56889000 3.48904000 1.816330e-06
## Proportion of Variance 0.02658245 0.02614052 0.02498385 6.770767e-15
## Cumulative Proportion 0.94887563 0.97501615 1.00000000 1.000000e+00
plot(pca.all)
pca.all.proj=as.data.frame(pca.all$projections)
Next (hidden here) assign populations to the PCA data.
samples=c("FMNH346623_Cinnyris_regius",
"FMNH346624_Cinnyris_regius",
"FMNH356179_Cinnyris_regius",
"FMNH356181_Cinnyris_regius",
"FMNH385275_Cinnyris_regius",
"FMNH385276_Cinnyris_regius",
"FMNH450580_Cinnyris_regius",
"FMNH450581_Cinnyris_regius",
"FMNH481235_Cinnyris_regius",
"FMNH438857_Cinnyris_regius",
"FMNH358156_Cinnyris_reichenowi",
"FMNH358157_Cinnyris_reichenowi",
"FMNH443947_Cinnyris_reichenowi",
"FMNH481236_Cinnyris_reichenowi",
"FMNH122395_Cinnyris_genderuensis",
"FMNH189462_Cinnyris_genderuensis",
"FMNH273746_Cinnyris_reichenowi",
"FMNH95912_Cinnyris_reichenowi",
"FMNH95913_Cinnyris_reichenowi",
"FMNH95915_Cinnyris_reichenowi",
"FMNH95916_Cinnyris_reichenowi",
"KU131883_Cinnyris_reichenowi",
"KU132209_Cinnyris_reichenowi",
"KU132234_Cinnyris_reichenowi")
pops=c("regius","regius","regius","regius",
"regius","regius","regius","regius",
"regius","regius","reichenowi","reichenowi",
"reichenowi","reichenowi","genderuensis","genderuensis",
"preussi","preussi","preussi","preussi",
"preussi","preussi","preussi","preussi")
# *Cinnyris r. reichenowi*: black `#000000`
# *Cinnyris reichenowi preussi*: blue `#1f2887`
# *Cinnyris reichenowi genderuensis*: red `#e31a1c`
# *Cinnyris reichenowi parvirostris*: light blue `#1f9eff`
kleurs=c("#ffd700","#ffd700","#ffd700","#ffd700",
"#ffd700","#ffd700","#ffd700","#ffd700",
"#ffd700","#ffd700","#000000","#000000",
"#000000","#000000","#e31a1c","#e31a1c",
"#1f2887","#1f2887","#1f2887","#1f2887",
"#1f2887","#1f2887","#1f2887","#1f2887")
pca.all2=cbind(samples,pops,kleurs,pca.all.proj)
# *Cinnyris r. reichenowi*: black `#000000`
# *Cinnyris reichenowi preussi*: blue `#1f2887`
# *Cinnyris reichenowi genderuensis*: red `#e31a1c`
# *Cinnyris reichenowi parvirostris*: light blue `#1f9eff`
colorset=c("#000000","#1f2887","#e31a1c","#ffd700")
names(colorset)=c("reichenowi","preussi","genderuensis","regius")
colScale=scale_color_manual(name="grp",values=colorset)
a=ggplot(data=pca.all2,aes(x=V1,y=V2,colour=pops,colour=grp))
## Warning: Duplicated aesthetics after name standardisation: colour
b=geom_point(size=6)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15),
legend.title = element_blank(),
legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
plotx=a+b+c+d+e
print(plotx)
## Too few points to calculate an ellipse
## Warning: Removed 1 row(s) containing missing values (geom_path).
Cinnyris regius appears to be a pretty cohesive group that is greatly skewing the directionality and magnitude of the PCAs.
Two populations almost appears messier, with genderuensis birds being “halfway” between the two populations.
pca.all=pca(noreg.lfmm.900)
## [1] "******************************"
## [1] " Principal Component Analysis "
## [1] "******************************"
## summary of the options:
##
## -n (number of individuals) 14
## -L (number of loci) 5475
## -K (number of principal components) 14
## -x (genotype file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.lfmm
## -a (eigenvalue file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.eigenvalues
## -e (eigenvector file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.eigenvectors
## -d (standard deviation file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.sdev
## -p (projection file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.projections
## -c data centered
summary(pca.all)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 9.1052800 6.7773100 6.15467000 5.94936000 5.72978000
## Proportion of Variance 0.1930024 0.1069275 0.08818293 0.08239763 0.07642776
## Cumulative Proportion 0.1930024 0.2999299 0.38811283 0.47051046 0.54693822
## PC6 PC7 PC8 PC9 PC10
## Standard deviation 5.45789000 5.23291000 4.97916000 4.96471000 4.79219000
## Proportion of Variance 0.06934645 0.06374739 0.05771483 0.05738027 0.05346165
## Cumulative Proportion 0.61628468 0.68003207 0.73774689 0.79512716 0.84858881
## PC11 PC12 PC13 PC14
## Standard deviation 4.71584000 4.67519000 4.57645000 0
## Proportion of Variance 0.05177172 0.05088311 0.04875636 0
## Cumulative Proportion 0.90036053 0.95124364 1.00000000 1
plot(pca.all)
pca.all.proj=as.data.frame(pca.all$projections)
Next (hidden here) assign populations to the PCA data.
samples=c("FMNH358156_Cinnyris_reichenowi",
"FMNH358157_Cinnyris_reichenowi",
"FMNH443947_Cinnyris_reichenowi",
"FMNH481236_Cinnyris_reichenowi",
"FMNH122395_Cinnyris_genderuensis",
"FMNH189462_Cinnyris_genderuensis",
"FMNH273746_Cinnyris_reichenowi",
"FMNH95912_Cinnyris_reichenowi",
"FMNH95913_Cinnyris_reichenowi",
"FMNH95915_Cinnyris_reichenowi",
"FMNH95916_Cinnyris_reichenowi",
"KU131883_Cinnyris_reichenowi",
"KU132209_Cinnyris_reichenowi",
"KU132234_Cinnyris_reichenowi")
pops=c("reichenowi","reichenowi",
"reichenowi","reichenowi","genderuensis","genderuensis",
"preussi","preussi","preussi","preussi",
"preussi","preussi","preussi","preussi")
pca.all2=cbind(samples,pops,pca.all.proj)
# *Cinnyris r. reichenowi*: black `#000000`
# *Cinnyris reichenowi preussi*: blue `#1f2887`
# *Cinnyris reichenowi genderuensis*: red `#e31a1c`
# *Cinnyris reichenowi parvirostris*: light blue `#1f9eff`
colorset=c("#000000","#1f2887","#e31a1c")
names(colorset)=c("reichenowi","preussi","genderuensis")
colScale=scale_color_manual(name="grp",values=colorset)
a=ggplot(data=pca.all2,aes(x=V1,y=V2,colour=pops,colour=grp))
## Warning: Duplicated aesthetics after name standardisation: colour
b=geom_point(size=6)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15),
legend.title = element_blank(),
legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
f1=ylab("PC 2")
f2=xlab("PC 1")
plotx=a+b+c+d+e+f1+f2
print(plotx)
## Too few points to calculate an ellipse
## Warning: Removed 1 row(s) containing missing values (geom_path).
a=ggplot(data=pca.all2,aes(x=V3,y=V2,colour=pops,colour=grp))
## Warning: Duplicated aesthetics after name standardisation: colour
b=geom_point(size=6)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15),
legend.title = element_blank(),
legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
f1=ylab("PC 2")
f2=xlab("PC 3")
plotx=a+b+c+d+e+f1+f2
print(plotx)
## Too few points to calculate an ellipse
## Warning: Removed 1 row(s) containing missing values (geom_path).
We can also perform a Tracy-Widom test on these data.
tracy.widom(pca.all)
## [1] "*******************"
## [1] " Tracy-Widom tests "
## [1] "*******************"
## summary of the options:
##
## -n (number of eigenvalues) 14
## -i (input file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.eigenvalues
## -o (output file) /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.tracywidom
## N eigenvalues twstats pvalues effectn percentage
## 1 1 1161.0 3.15300 0.001273 6.350968e+01 0.19300
## 2 2 643.0 1.07300 0.043920 2.127804e+02 0.10690
## 3 3 530.3 -0.44120 0.262000 3.125249e+02 0.08818
## 4 4 495.5 -0.05650 0.178700 3.772761e+02 0.08240
## 5 5 459.6 0.37390 0.109800 4.958396e+02 0.07643
## 6 6 417.0 0.28940 0.121400 7.485579e+02 0.06935
## 7 7 383.4 0.22090 0.131500 1.197853e+03 0.06375
## 8 8 347.1 -1.55600 0.591300 2.100662e+03 0.05771
## 9 9 345.1 -0.05971 0.179300 2.298622e+03 0.05738
## 10 10 321.5 -0.97400 0.408400 5.468815e+03 0.05346
## 11 11 311.3 -1.38900 0.538300 7.957344e+03 0.05177
## 12 12 306.0 -0.92310 0.393100 8.779880e+03 0.05088
## 13 13 293.2 NaN 1.000000 -1.772436e+15 0.04876
As suspected, the first two principle components are most significant for looking at the data distribution.
There are definitely two populations (east-west); Three populations is the second most-supported outcome, although this is occasionally not as strong or roughly as strong as the single population scenario.
A DFA analyses (discriminant function analysis) of the groups to determine how well we can statistically identify them using genetic data. We are performing this test on the PCA value outputs from the aforementioned tests in LEA. We are using the statistically significant PC’s, and avoiding using all the PC’s to keep from overfitting the model.
#Perform LDA of genetic data
##Perform on PCA values
lda.x=lda(pops~V1+V2,data=pca.all2,CV=T)
print(lda.x)
## $class
## [1] reichenowi reichenowi reichenowi reichenowi preussi
## [6] genderuensis preussi preussi preussi preussi
## [11] preussi preussi preussi preussi
## Levels: genderuensis preussi reichenowi
##
## $posterior
## genderuensis preussi reichenowi
## 1 2.313387e-37 1.273404e-54 1.000000e+00
## 2 3.184895e-40 6.877730e-57 1.000000e+00
## 3 3.206855e-73 1.431829e-103 1.000000e+00
## 4 1.825269e-35 1.051479e-51 1.000000e+00
## 5 7.230216e-23 1.000000e+00 7.795312e-69
## 6 1.000000e+00 8.005585e-18 2.916653e-147
## 7 6.791039e-04 9.993209e-01 1.685666e-58
## 8 7.386062e-08 9.999999e-01 3.346142e-54
## 9 1.320292e-07 9.999999e-01 7.824219e-54
## 10 8.317248e-08 9.999999e-01 2.875153e-52
## 11 9.330919e-08 9.999999e-01 1.272547e-55
## 12 7.086640e-07 9.999993e-01 5.379440e-52
## 13 1.683922e-07 9.999998e-01 6.623579e-53
## 14 7.761592e-13 1.000000e+00 8.585138e-69
##
## $terms
## pops ~ V1 + V2
## attr(,"variables")
## list(pops, V1, V2)
## attr(,"factors")
## V1 V2
## pops 0 0
## V1 1 0
## V2 0 1
## attr(,"term.labels")
## [1] "V1" "V2"
## attr(,"order")
## [1] 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
## attr(,"predvars")
## list(pops, V1, V2)
## attr(,"dataClasses")
## pops V1 V2
## "factor" "numeric" "numeric"
##
## $call
## lda(formula = pops ~ V1 + V2, data = pca.all2, CV = T)
##
## $xlevels
## named list()
#Check predictions
ct=table(pca.all2$pops,lda.x$class)
diag(prop.table(ct,1))
## genderuensis preussi reichenowi
## 0.5 1.0 1.0
sum(diag(prop.table(ct)))
## [1] 0.9285714
#Let's try merging two of the SSP's
z2=pca.all2
z2$pops[which(z2$pops=="genderuensis")]="preussi"
lda.x2=lda(pops~V1+V2,data=z2,CV=T)
## Warning in lda.default(x, grouping, ...): group genderuensis is empty
#print(lda.x2)
summary(lda.x2)
## Length Class Mode
## class 14 factor numeric
## posterior 28 -none- numeric
## terms 3 terms call
## call 4 -none- call
## xlevels 0 -none- list
#Check predictions
ct=table(z2$pops,lda.x2$class)
print(ct)
##
## genderuensis preussi reichenowi
## genderuensis 0 0 0
## preussi 0 10 0
## reichenowi 0 0 4
diag(prop.table(ct,1))
## genderuensis preussi reichenowi
## NaN 1 1
sum(diag(prop.table(ct)))
## [1] 1
We are unable to separate genderuensis from preussi, but we are 100% able to separate east from west.
Note: the databse has been edited to exclude some measurements in which there were errors. These errors were irreversibly biasing the data with respect to bill curvature. Bill curvature indices have been removed from the dataset used here, given their unreliability and difficulty to obtain using handheld calipers. Full notes on the reduction of the data, removal of juvenile birds, etc. can be seen in the rmarkdown file. The data cleaning also involves a PCA of the data using the rda function of \(Vegan\), just like we did for the PCAs of the SNP data.
## Genus Species Subspecies Collection
## Cinnyris:572 regius : 22 genderuensis: 23 NHMUK :153
## reichenowi:550 parvirostris: 43 AMNH :120
## preussi :224 ZFMK : 70
## regius : 22 FMNH : 67
## reichenowi :257 MNMH : 52
## Unknown : 3 CM : 39
## (Other): 71
## Catalog Locality Locality2
## 1966.16.2433: 2 Mt. Cameroon : 97 Mt Cameroon : 97
## 1966.16.2438: 2 Bioko : 38 Bamenda Highlands: 47
## 1966.16.2453: 2 Rwenzori Mts. : 30 Bioko : 43
## 1966.16.2461: 2 Mt. Manengouba: 28 Rwenzori Mts : 43
## 1966.16.2469: 2 Mt. Oku : 28 Mt Manengouba : 31
## 209805 : 2 Tshibati : 28 Mt Oku : 28
## (Other) :560 (Other) :323 (Other) :283
## Country Sex Age Right.wing.chord
## Cameroon :246 : 3 : 48 Min. :45.00
## DRC : 96 Femae : 1 Adult :499 1st Qu.:53.00
## Kenya : 69 Female :168 Immature: 10 Median :55.00
## Uganda : 54 Male :399 Juvenile: 14 Mean :54.85
## Equatorial Guinea: 43 Unknown: 1 Unknown : 1 3rd Qu.:57.00
## Burundi : 19 Max. :63.00
## (Other) : 45 NA's :6
## Tail.length X1st.Prim.1st.Secon Culmen.length
## Min. : 8.18 Min. : 2.520 Min. :11.95
## 1st Qu.:36.00 1st Qu.: 5.888 1st Qu.:14.10
## Median :40.00 Median : 6.830 Median :15.29
## Mean :39.44 Mean : 6.874 Mean :15.59
## 3rd Qu.:43.00 3rd Qu.: 7.817 3rd Qu.:17.07
## Max. :54.00 Max. :10.770 Max. :22.13
## NA's :8 NA's :80 NA's :33
## Bill.depth..base.of.feathers.on.mandible.
## Min. :1.870
## 1st Qu.:2.685
## Median :2.870
## Mean :2.842
## 3rd Qu.:3.020
## Max. :3.550
## NA's :29
## Bill.width..base.of.feathers.on.maxilla. Left.Tarsus Kipp.s.Index
## Min. :2.300 Min. : 9.08 Min. :0.0450
## 1st Qu.:4.223 1st Qu.:11.97 1st Qu.:0.1096
## Median :4.490 Median :12.94 Median :0.1260
## Mean :4.475 Mean :12.99 Mean :0.1243
## 3rd Qu.:4.770 3rd Qu.:13.85 3rd Qu.:0.1387
## Max. :5.730 Max. :18.00 Max. :0.1814
## NA's :18 NA's :17 NA's :80
## Notes
## :517
## Left leg : 10
## Measurements from tag: 5
## Bill damaged : 3
## Right leg : 3
## Left tarsus : 2
## (Other) : 32
#Exclude juvenile birds from the analyses
summary(x$Age)
## Adult Immature Juvenile Unknown
## 48 499 10 14 1
x=x[x$Age=="Adult",]
summary(x$Age)
## Adult Immature Juvenile Unknown
## 0 499 0 0 0
Now, we have a data frame that is only adult individuals. We will be analyzing this as a whole and split up by sex; there appear to be minor differences between sexes, so this is necessary to determine if populations differ in size.
#Fixing a spelling error
x[x$Sex=="Femae",9]="Female"
summary(x$Sex)
## Femae Female Male Unknown
## 0 0 148 351 0
We will start out by looking at Cinnyris reichenowi. I have already identified specimens to meta-population by locality, assuming that birds in the xeric regions of Cameroon are C. r. genderuensis just like the individuals we sampled. Some birds, mostly those at the eastern edge of the Bamenda Highlands, have been left as ‘unknown’.
colnames(x)
## [1] "Genus"
## [2] "Species"
## [3] "Subspecies"
## [4] "Collection"
## [5] "Catalog"
## [6] "Locality"
## [7] "Locality2"
## [8] "Country"
## [9] "Sex"
## [10] "Age"
## [11] "Right.wing.chord"
## [12] "Tail.length"
## [13] "X1st.Prim.1st.Secon"
## [14] "Culmen.length"
## [15] "Bill.depth..base.of.feathers.on.mandible."
## [16] "Bill.width..base.of.feathers.on.maxilla."
## [17] "Left.Tarsus"
## [18] "Kipp.s.Index"
## [19] "Notes"
Several of these measurements are repeats of measurements from the tags by past authorities. We can isolate/remove these here:
tag.measurements=x[x$Notes=="Measurements from tag",]
x=x[-(x$Notes=="Measurements from tag"),]
Now I can remove columns that will not be needed for further downstream analyses. Note that I am excluding Kipp’s Index here as it is a covariate of wing length and primary projection; I’m also removing primary projection here as I did not take it for all individuals at each museum.
x2=x[,c("Species","Subspecies","Collection","Catalog","Locality2",
"Sex","Right.wing.chord","Tail.length",
"Culmen.length","Bill.depth..base.of.feathers.on.mandible.",
"Bill.width..base.of.feathers.on.maxilla.","Left.Tarsus")]
I can now subset the data frame into each superspecies. I need to remove NA values from any row to ensure that I am getting the full data for each individual.
#colnames(x2)
x1.1=x2[rowSums(is.na(x2))<1,]
x1.1=unique(x1.1) #Just in case there are repeats
This procedure removed individuals from the dataset.
regius=x1.1[x1.1$Species=="regius",]
reich=x1.1[x1.1$Species=="reichenowi",]
There are 20 individuals of C. regius and 383 individuals of R. reichenowi.
First, a look at C. reichenowi between different areas.
#Perform PCA using VEGAN
rda.x=rda(reich[,7:12],scale=T)
x3=cbind(reich,rda.x$CA$u)
eigs=rda.x$CA$eig
#Calculate eigenvalue contribution
w=NULL
for(i in 1:length(eigs)){
print(eigs[i]/sum(eigs))
w[i]=eigs[i]/sum(eigs)
}
## PC1
## 0.5390102
## PC2
## 0.1662653
## PC3
## 0.1355852
## PC4
## 0.07707038
## PC5
## 0.04499189
## PC6
## 0.03707703
#View relative eigenvector contributions
#summary(rda.x)
plot(y=(eigs/sum(eigs)),x=1:length(eigs),pch=19,ylab="Contribution",xlab="PCA Variable")
PCA 1 accounts for almost all of the variation within the data, while PCAs 2 and 3 account for 10-20% of the variation.
#Relative strength of each variable to the PC
g=rda.x$CA$v
for(i in 1:3){
y=sum(abs(g[,i]))
for(j in 1:nrow(g)){
print(paste0("For ",row.names(g)[j],": PC",i,": ",signif((g[j,i]/y),3)))
}
}
## [1] "For Right.wing.chord: PC1: -0.195"
## [1] "For Tail.length: PC1: -0.155"
## [1] "For Culmen.length: PC1: -0.198"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC1: -0.118"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC1: -0.184"
## [1] "For Left.Tarsus: PC1: -0.15"
## [1] "For Right.wing.chord: PC2: 0.0844"
## [1] "For Tail.length: PC2: 0.21"
## [1] "For Culmen.length: PC2: -0.116"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC2: 0.243"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC2: -0.0806"
## [1] "For Left.Tarsus: PC2: -0.266"
## [1] "For Right.wing.chord: PC3: -0.161"
## [1] "For Tail.length: PC3: -0.259"
## [1] "For Culmen.length: PC3: 0.0719"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC3: 0.313"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC3: 0.149"
## [1] "For Left.Tarsus: PC3: -0.0466"
biplot(rda.x)
A biplot of PC1 and PC2 with the average contribution of each variable plotted out. There are not distinct clusters visible in this plot immediately, so data probably overlap and don’t form super distinct clusters.
# *Cinnyris r. reichenowi*: black `#000000`
# *Cinnyris reichenowi preussi*: blue `#1f2887`
# *Cinnyris reichenowi genderuensis*: red `#e31a1c`
# *Cinnyris reichenowi parvirostris*: light blue `#1f9eff`
colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi","genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)
#Plot reichenowi only, flawed loadings
a=ggplot(x3[which(x3$Species=="reichenowi"),],aes(x=PC1,y=PC2,
colour=Subspecies,colour=grp))
## Warning: Duplicated aesthetics after name standardisation: colour
b=geom_point(size=1.5)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15),
legend.title = element_blank(),
legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
f1=ylab("PC 2")
f2=xlab("PC 1")
plotx=a+b+c+d+e+f1+f2
print(plotx)
## Too few points to calculate an ellipse
## Warning: Removed 1 row(s) containing missing values (geom_path).
Let’s find that extreme individual.
x4=x3[order(x3$PC2),]
x4$PC2[1]
## [1] -0.2455511
x4[1,]
## Species Subspecies Collection Catalog Locality2 Sex
## 180 reichenowi preussi NHMUK 1966.16.2424 Rumpi Hills Female
## Right.wing.chord Tail.length Culmen.length
## 180 54 8.18 17.17
## Bill.depth..base.of.feathers.on.mandible.
## 180 2.75
## Bill.width..base.of.feathers.on.maxilla. Left.Tarsus PC1 PC2
## 180 5.13 14.7 0.04378649 -0.2455511
## PC3 PC4 PC5 PC6
## 180 0.2287369 0.03257101 0.1939922 -0.2832348
There is one outlying female from Rumpi Hills. We will leave it in for the time being.
x3[x3$PC1>0.015&x3$PC2>-0.015,]
## Species Subspecies Collection Catalog Locality2
## 26 reichenowi genderuensis MNMH 1983.62 Adamawa
## 27 reichenowi genderuensis MNMH 1983.63 Adamawa
## 28 reichenowi genderuensis MNMH 1994.1401 Adamawa
## 29 reichenowi genderuensis NHMUK 1922.11.25.214 Genderu
## 30 reichenowi genderuensis NHMUK 1922.11.25.215 Genderu
## 32 reichenowi genderuensis NHMUK 1922.11.25.217 Tibati
## 38 reichenowi genderuensis RMCA 75-3-A-532 Adamawa
## 39 reichenowi genderuensis RMCA 75-3-A-660 Adamawa
## 60 reichenowi parvirostris NHMUK 1911.12.23.2759 Bioko
## 64 reichenowi parvirostris NHMUK 1936.2.21.795 Bioko
## 97 reichenowi preussi AMNH 415796 Bamenda Highlands
## 104 reichenowi preussi AMNH 688999 Bamenda Highlands
## 122 reichenowi preussi MNMH 1994.1406 Bamenda Highlands
## 126 reichenowi preussi MNMH 1994.1411 Mt Cameroon
## 127 reichenowi preussi MNMH 1994.1412 Mt Cameroon
## 153 reichenowi preussi NHMUK 1911.12.23.4349 Mt Manengouba
## 161 reichenowi preussi NHMUK 1922.11.25.223 Bamenda Highlands
## 313 reichenowi reichenowi AMNH 209805 Mt Meru
## 315 reichenowi reichenowi AMNH 263861 Mt Kenya
## 319 reichenowi reichenowi AMNH 263865 W of Lake Albert
## 321 reichenowi reichenowi AMNH 263866 Rwenzori Mts
## 325 reichenowi reichenowi AMNH 263868 Rwenzori Mts
## 335 reichenowi reichenowi AMNH 688971 Itombwe
## 337 reichenowi reichenowi AMNH 688972 Itombwe
## 339 reichenowi reichenowi AMNH 688973 Itombwe
## 341 reichenowi reichenowi AMNH 688978 Nyungwe
## 343 reichenowi reichenowi AMNH 688980 Idjwa Island
## 345 reichenowi reichenowi AMNH 688984 Rwenzori Mts
## 349 reichenowi reichenowi AMNH 688992 Marakweta
## 351 reichenowi reichenowi AMNH 688994 Buguera
## 355 reichenowi reichenowi AMNH 764989 Tshibati
## 357 reichenowi reichenowi AMNH 764990 Tshibati
## 359 reichenowi reichenowi AMNH 764991 Kivu Highlands
## 361 reichenowi reichenowi AMNH 764992 Tshibati
## 363 reichenowi reichenowi AMNH 764993 Tshibati
## 365 reichenowi reichenowi AMNH 764994 Tshibati
## 367 reichenowi reichenowi AMNH 764995 Tshibati
## 369 reichenowi reichenowi AMNH 764996 Tshibati
## 371 reichenowi reichenowi AMNH 764997 Lwiro
## 375 reichenowi reichenowi AMNH 764999 Tshibati
## 377 reichenowi reichenowi AMNH 765000 Tshibati
## 383 reichenowi reichenowi AMNH 765003 Tshibati
## 387 reichenowi reichenowi AMNH 800359 Mt Kenya
## 391 reichenowi reichenowi AMNH 827224 Cherangani Hills
## 395 reichenowi reichenowi AMNH 827226 Bwindi
## 397 reichenowi reichenowi AMNH 827227 Kakamega
## 399 reichenowi reichenowi AMNH 827372 Cherangani Hills
## 401 reichenowi reichenowi AMNH 827373 Cherangani Hills
## 403 reichenowi reichenowi AMNH 827375 Cherangani Hills
## 405 reichenowi reichenowi AMNH 6888970 Itombwe
## 407 reichenowi reichenowi AMNH 6888987 Mbale
## 411 reichenowi reichenowi CM 139823 Cherangani Hills
## 415 reichenowi reichenowi CM 145817 Kezizi
## 416 reichenowi reichenowi CM 145818 Kezizi
## 419 reichenowi reichenowi CM 145967 Kigezi
## 421 reichenowi reichenowi CM 145993 Kigezi
## 424 reichenowi reichenowi CM 146132 Kigezi
## 427 reichenowi reichenowi CM 147670 Mt Kenya
## 430 reichenowi reichenowi CM 147923 Mt Kenya
## 432 reichenowi reichenowi CM 149092 Nyiro River
## 469 reichenowi reichenowi FMNH 356159 Rwenzori Mts
## 473 reichenowi reichenowi FMNH 356168 Rwenzori Mts
## 481 reichenowi reichenowi FMNH 385271 Bwindi
## 492 reichenowi reichenowi FMNH 481233 Kivu Highlands
## 494 reichenowi reichenowi MNMH 1936.1663 Kivu Highlands
## 495 reichenowi reichenowi MNMH 1988.694 Nyungwe
## 501 reichenowi reichenowi NHMUK 1901.2.22.941 Waso Nanyuki River
## 502 reichenowi reichenowi NHMUK 1904.11.20.332 Rwenzori Mts
## 503 reichenowi reichenowi NHMUK 1906.12.23.682 Rwenzori Mts
## 505 reichenowi reichenowi NHMUK 1906.12.23.684 Rwenzori Mts
## 506 reichenowi reichenowi NHMUK 1906.12.23.685 Rwenzori Mts
## 507 reichenowi reichenowi NHMUK 1906.12.23.686 Rwenzori Mts
## 513 reichenowi reichenowi NHMUK 1910.12.26.406 Mt Elgon
## 514 reichenowi reichenowi NHMUK 1934.1.17.32 Kigezi
## 515 reichenowi reichenowi NHMUK 1935.5.13.169 Kapenguria
## 516 reichenowi reichenowi NHMUK 1939.10.1.47 Didinga Mts
## 519 reichenowi reichenowi NHMUK 1939.10.12.49 Didinga Mts
## 520 reichenowi reichenowi NHMUK 1939.10.2.46 Didinga Mts
## 521 reichenowi reichenowi NHMUK 1939.10.3.257 Didinga Mts
## 522 reichenowi reichenowi NHMUK 1939.10.3.258 Imatong Mts
## 525 reichenowi reichenowi NHMUK 1947.100.303 Didinga Mts
## 529 reichenowi reichenowi NHMUK 1976.9.46 Kidepo
## 530 reichenowi reichenowi RMCA 2994 Rwenzori Mts
## 531 reichenowi reichenowi RMCA 2996 Rwenzori Mts
## 533 reichenowi reichenowi RMCA 29581 Rwenzori Mts
## 534 reichenowi reichenowi RMCA 29582 Rwenzori Mts
## 535 reichenowi reichenowi RMCA 42122 Nioka
## 538 reichenowi reichenowi RMCA 42823 Nioka
## 541 reichenowi reichenowi RMCA 63068 Idjwa Island
## 542 reichenowi reichenowi RMCA 73195 Ituri
## 543 reichenowi reichenowi RMCA 73801 Mt Kabobo
## 544 reichenowi reichenowi RMCA 74354 Nioka
## 548 reichenowi reichenowi RMCA 98858 Rwenzori Mts
## 549 reichenowi reichenowi RMCA 98859 Rwenzori Mts
## 550 reichenowi reichenowi RMCA 98860 Rwenzori Mts
## 555 reichenowi reichenowi RMCA 76-66-A-1183 Ituri
## 556 reichenowi reichenowi ZFMK 66.958 Lwiro
## 559 reichenowi reichenowi ZFMK 78.183 Imatong Mts
## 560 reichenowi reichenowi ZFMK 78.184 Nugishot
## 563 reichenowi reichenowi ZFMK 26.8.68 Lwiro
## 564 reichenowi reichenowi ZFMK 6.9.68 Lwiro
## 565 reichenowi reichenowi ZMB 31769 Angata Anyuk
## 569 reichenowi reichenowi ZMB 2000/7984 Kivu Highlands
## 570 reichenowi Unknown MNMH 2005.995 Unknown
## 571 reichenowi Unknown ZMB 2000/7987 Unknown
## Sex Right.wing.chord Tail.length Culmen.length
## 26 Male 56 41 15.29
## 27 Female 52 39 13.71
## 28 Male 54 38 14.42
## 29 Female 53 36 12.37
## 30 Female 52 35 14.72
## 32 Female 52 33 14.46
## 38 Female 51 34 14.99
## 39 Male 55 40 14.54
## 60 Female 56 33 15.57
## 64 Female 55 30 15.46
## 97 Female 53 39 15.25
## 104 Female 54 39 15.18
## 122 Female 51 34 15.07
## 126 Female 55 37 16.08
## 127 Female 54 35 15.74
## 153 Female 53 34 14.85
## 161 Female 54 38 15.22
## 313 Male 57 41 13.79
## 315 Male 53 40 13.53
## 319 Male 51 37 14.00
## 321 Male 55 41 14.26
## 325 Female 49 34 14.00
## 335 Female 50 37 12.69
## 337 Male 55 50 14.53
## 339 Female 49 37 13.41
## 341 Male 55 39 14.10
## 343 Female 51 37 12.77
## 345 Male 51 38 16.31
## 349 Male 53 39 15.34
## 351 Male 56 40 15.31
## 355 Male 54 37 14.18
## 357 Male 56 43 13.46
## 359 Male 53 39 14.58
## 361 Male 54 39 13.83
## 363 Male 55 42 13.26
## 365 Male 53 41 14.49
## 367 Male 54 42 13.76
## 369 Male 54 41 13.63
## 371 Male 57 40 15.55
## 375 Female 51 37 12.73
## 377 Female 51 37 13.78
## 383 Female 51 34 12.35
## 387 Male 49 37 13.76
## 391 Male 51 42 15.19
## 395 Male 53 38 12.29
## 397 Female 45 30 11.95
## 399 Female 50 36 13.13
## 401 Female 48 34 12.66
## 403 Female 49 34 14.26
## 405 Male 55 35 15.07
## 407 Female 53 35 13.76
## 411 Female 50 36 12.85
## 415 Female 51 33 12.64
## 416 Male 54 37 14.12
## 419 Male 55 42 14.02
## 421 Male 55 39 15.24
## 424 Male 54 40 14.38
## 427 Male 55 41 13.62
## 430 Male 55 35 14.78
## 432 Male 54 44 13.67
## 469 Male 54 39 14.40
## 473 Male 57 37 13.60
## 481 Male 54 37 13.70
## 492 Female 49 36 13.80
## 494 Male 55 45 13.40
## 495 Female 51 37 14.61
## 501 Male 56 36 13.80
## 502 Male 56 43 13.60
## 503 Female 52 35 13.94
## 505 Female 51 36 13.12
## 506 Male 51 37 12.62
## 507 Female 52 34 13.30
## 513 Male 53 40 15.27
## 514 Male 55 43 13.25
## 515 Male 55 42 13.63
## 516 Male 54 37 16.35
## 519 Male 55 42 13.67
## 520 Male 55 43 13.84
## 521 Male 54 37 13.11
## 522 Male 56 41 13.46
## 525 Male 54 41 13.43
## 529 Male 52 37 13.89
## 530 Male 57 42 12.35
## 531 Male 54 40 14.69
## 533 Male 56 42 13.28
## 534 Male 53 38 14.75
## 535 Male 53 41 13.91
## 538 Female 48 33 13.32
## 541 Male 53 41 14.08
## 542 Male 62 39 13.07
## 543 Male 54 43 14.38
## 544 Male 54 38 13.28
## 548 Female 51 35 13.72
## 549 Female 52 35 16.69
## 550 Female 49 37 15.82
## 555 Male 57 42 14.34
## 556 Male 54 37 14.11
## 559 Male 55 42 14.40
## 560 Male 53 41 14.87
## 563 Male 53 37 14.40
## 564 Male 56 44 14.13
## 565 Male 55 38 13.15
## 569 Male 53 36 13.30
## 570 Male 53 39 14.75
## 571 Male 59 38 15.53
## Bill.depth..base.of.feathers.on.mandible.
## 26 2.81
## 27 3.07
## 28 2.84
## 29 3.05
## 30 2.86
## 32 2.86
## 38 3.02
## 39 2.42
## 60 2.74
## 64 2.87
## 97 2.71
## 104 3.26
## 122 3.22
## 126 2.75
## 127 3.03
## 153 2.95
## 161 2.89
## 313 2.68
## 315 2.96
## 319 3.24
## 321 3.01
## 325 2.86
## 335 2.50
## 337 2.83
## 339 3.19
## 341 3.19
## 343 3.03
## 345 2.98
## 349 3.02
## 351 2.75
## 355 3.10
## 357 2.96
## 359 2.95
## 361 2.78
## 363 2.66
## 365 2.68
## 367 2.95
## 369 2.91
## 371 2.69
## 375 3.00
## 377 2.91
## 383 2.85
## 387 2.89
## 391 2.93
## 395 3.07
## 397 2.36
## 399 3.00
## 401 2.40
## 403 2.72
## 405 3.01
## 407 3.03
## 411 2.75
## 415 2.67
## 416 2.94
## 419 2.78
## 421 2.61
## 424 2.55
## 427 2.88
## 430 3.10
## 432 2.58
## 469 2.80
## 473 2.70
## 481 3.00
## 492 2.90
## 494 2.79
## 495 2.73
## 501 3.00
## 502 2.89
## 503 2.65
## 505 2.71
## 506 2.73
## 507 2.64
## 513 2.64
## 514 2.88
## 515 2.64
## 516 2.88
## 519 3.14
## 520 2.78
## 521 3.00
## 522 3.06
## 525 2.85
## 529 2.66
## 530 2.59
## 531 2.95
## 533 2.83
## 534 2.70
## 535 2.66
## 538 2.72
## 541 2.73
## 542 2.70
## 543 2.68
## 544 2.73
## 548 2.76
## 549 2.69
## 550 2.73
## 555 2.65
## 556 2.95
## 559 2.54
## 560 2.66
## 563 2.87
## 564 2.53
## 565 2.90
## 569 2.63
## 570 3.05
## 571 2.76
## Bill.width..base.of.feathers.on.maxilla. Left.Tarsus PC1
## 26 4.14 12.04 0.02236324
## 27 4.07 11.37 0.05297342
## 28 4.41 12.60 0.02992202
## 29 4.20 11.78 0.05923821
## 30 4.58 11.48 0.04584284
## 32 4.11 12.05 0.06216544
## 38 4.36 12.19 0.04558603
## 39 3.90 12.01 0.05538448
## 60 4.64 11.54 0.03023857
## 64 4.39 11.31 0.04696798
## 97 4.17 11.60 0.04541680
## 104 4.26 11.69 0.01922474
## 122 4.30 11.45 0.04578766
## 126 4.32 12.38 0.02379220
## 127 4.31 12.47 0.02472241
## 153 4.36 12.71 0.03690129
## 161 4.68 11.08 0.02631948
## 313 4.23 11.66 0.03421546
## 315 4.48 12.54 0.02989449
## 319 4.36 12.08 0.03900387
## 321 4.52 11.57 0.01908089
## 325 4.62 11.49 0.06410763
## 335 4.19 11.79 0.08599225
## 337 4.24 10.54 0.01919945
## 339 4.69 10.20 0.05811253
## 341 4.42 10.97 0.02650605
## 343 4.40 9.31 0.07642985
## 345 4.59 11.17 0.02917456
## 349 4.34 11.68 0.02808066
## 351 4.26 12.98 0.01542789
## 355 4.45 11.66 0.03126261
## 357 4.20 10.83 0.03373876
## 359 4.35 13.52 0.02097537
## 361 4.34 10.68 0.05159113
## 363 4.72 11.89 0.02763410
## 365 4.65 11.14 0.03641685
## 367 4.31 11.96 0.02975364
## 369 4.06 11.62 0.04480316
## 371 4.29 11.68 0.02133437
## 375 4.24 10.91 0.06977300
## 377 3.56 11.66 0.07994134
## 383 4.05 11.86 0.08296241
## 387 4.28 11.85 0.06527326
## 391 3.91 12.22 0.04205651
## 395 3.34 12.76 0.07278846
## 397 4.03 9.08 0.15929372
## 399 3.75 11.11 0.08650405
## 401 3.77 10.49 0.12796967
## 403 3.95 11.23 0.08963101
## 405 3.60 10.48 0.06400951
## 407 3.97 11.79 0.05903901
## 411 3.90 11.41 0.09031260
## 415 4.38 10.57 0.08979685
## 416 4.27 11.93 0.04061737
## 419 4.59 11.62 0.02401169
## 421 4.53 12.25 0.02480529
## 424 4.63 11.62 0.03686825
## 427 4.21 13.40 0.02303710
## 430 4.27 12.51 0.02620806
## 432 4.00 13.29 0.03743505
## 469 4.00 12.70 0.04082998
## 473 4.50 11.50 0.03730844
## 481 4.10 13.40 0.03494256
## 492 3.70 13.50 0.07137493
## 494 4.47 11.04 0.02951462
## 495 3.99 10.77 0.07421793
## 501 4.10 12.95 0.03219719
## 502 3.70 12.65 0.03576305
## 503 4.51 11.58 0.06022923
## 505 4.28 11.20 0.07585168
## 506 4.70 12.25 0.05521205
## 507 3.93 11.90 0.08271196
## 513 4.42 11.97 0.03478034
## 514 3.64 11.90 0.05052740
## 515 4.33 11.28 0.04243097
## 516 4.05 12.41 0.02924647
## 519 4.42 11.79 0.01778978
## 520 4.30 13.32 0.01815173
## 521 4.38 12.50 0.03796464
## 522 4.52 12.04 0.01541466
## 525 4.23 11.66 0.04285980
## 529 4.07 12.36 0.06271089
## 530 3.91 12.17 0.05121811
## 531 4.49 10.34 0.03517406
## 533 4.04 13.26 0.02726227
## 534 4.43 12.60 0.03578372
## 535 4.02 10.99 0.06178135
## 538 3.72 11.66 0.10638587
## 541 4.24 12.64 0.03809676
## 542 4.08 12.25 0.02335066
## 543 3.66 14.21 0.03404139
## 544 4.03 11.42 0.06320018
## 548 4.00 11.81 0.07567819
## 549 4.38 10.85 0.04854887
## 550 4.09 11.09 0.06771225
## 555 4.37 12.36 0.01899963
## 556 4.40 11.53 0.03959565
## 559 4.45 12.96 0.02311407
## 560 4.40 12.99 0.02706940
## 563 4.42 12.89 0.03270303
## 564 4.36 12.34 0.02451381
## 565 4.46 12.19 0.03490331
## 569 4.30 10.90 0.07116609
## 570 4.23 13.74 0.01812063
## 571 3.97 12.30 0.02050084
## PC2 PC3 PC4 PC5 PC6
## 26 3.951923e-02 -0.0396659133 -0.0030441180 0.0533115450 0.014830398
## 27 7.738691e-02 0.0321893539 -0.0335407769 -0.0089921406 0.028578935
## 28 8.360993e-03 0.0026744328 -0.0053328923 -0.0168496572 -0.025084162
## 29 5.884064e-02 0.0414500638 -0.0402287342 -0.0313431542 -0.079119471
## 30 9.957146e-03 0.0519180321 0.0553214908 -0.0056527061 -0.006267652
## 32 -1.900921e-03 0.0416131939 -0.0194277638 0.0479643547 -0.009918828
## 38 7.049582e-03 0.0792743377 -0.0165453389 0.0064309208 0.020955805
## 39 -4.008815e-03 -0.1043666603 0.0201346566 0.0601006224 0.018220910
## 60 -1.233662e-02 0.0293700245 0.0847350918 0.0836292066 -0.081613287
## 64 -3.653228e-03 0.0667190851 0.0476329960 0.1263514198 -0.078784221
## 97 1.847553e-02 -0.0217847029 0.0295876689 0.0356581048 0.057395289
## 104 8.361657e-02 0.0653171846 -0.0399050278 0.0290220529 0.022861706
## 122 4.941998e-02 0.1129263080 -0.0218332205 0.0295904148 0.030659806
## 126 -9.376833e-03 -0.0060582041 0.0179854053 0.0693557370 0.010300213
## 127 1.124309e-02 0.0567731936 -0.0276089211 0.0598875921 -0.003553123
## 153 -7.012620e-03 0.0533635823 -0.0267274915 0.0193188539 -0.027932472
## 161 3.889385e-02 0.0329809083 0.0812457665 0.0060264923 -0.003916388
## 313 4.448666e-02 -0.0680585778 0.0294583973 0.0228509282 -0.059676622
## 315 3.663567e-02 0.0133542245 -0.0191623144 -0.0817479271 -0.020829847
## 319 5.891441e-02 0.0902999200 -0.0503847475 -0.0481193811 0.014818963
## 321 7.068351e-02 0.0124763465 0.0213846157 -0.0268547300 -0.022803212
## 325 6.274241e-05 0.0744766041 0.0529968456 -0.0633718509 0.017799594
## 335 -1.267386e-02 -0.0368148235 0.0346350305 -0.0652365007 0.009460134
## 337 1.255188e-01 -0.0853752338 0.0545136708 -0.0409750731 0.102095074
## 339 8.883406e-02 0.1135508340 0.0596698658 -0.0906756612 0.028370534
## 341 9.934255e-02 0.0532060470 0.0079526063 0.0089310148 -0.039580322
## 343 1.075682e-01 0.0642706614 0.0806499884 -0.0266305116 -0.009281598
## 345 3.322303e-02 0.0657640876 0.0614760101 0.0040834789 0.099784750
## 349 4.931139e-02 0.0362123408 0.0029966326 0.0131668519 0.046410361
## 351 1.837907e-03 -0.0413933370 -0.0172153209 0.0289451498 -0.007773092
## 355 5.744958e-02 0.0567447473 -0.0022243414 -0.0022706196 -0.041849768
## 357 1.085377e-01 -0.0296131786 0.0159622867 0.0006397415 -0.025910298
## 359 2.249371e-03 0.0135654533 -0.0620617902 -0.0483900050 0.007223584
## 361 5.793627e-02 -0.0113346000 0.0645086273 0.0074855722 -0.017488689
## 363 2.903474e-02 -0.0489104537 0.0722055779 -0.0911615654 -0.058981379
## 365 3.112644e-02 -0.0211833688 0.0982670943 -0.0503657218 0.024407674
## 367 6.483538e-02 -0.0116047719 -0.0126074804 -0.0464031347 0.001144876
## 369 6.933552e-02 -0.0211734759 -0.0221441293 -0.0040355461 0.004973339
## 371 2.611183e-02 -0.0487946201 0.0464019798 0.0743253820 -0.012216054
## 375 6.969303e-02 0.0451909776 0.0056074125 -0.0427212693 -0.011543909
## 377 4.870799e-02 0.0026051498 -0.0780823063 0.0528037666 0.062948511
## 383 2.014325e-02 0.0272398453 -0.0328175628 -0.0235319927 -0.047856273
## 387 2.019562e-02 0.0411931103 -0.0047365826 -0.0642557485 0.057498939
## 391 4.640949e-02 -0.0093721357 -0.0553775838 -0.0043750806 0.135834776
## 395 6.814386e-02 -0.0111050290 -0.1760774087 0.0295960450 -0.015666977
## 397 -7.391643e-03 0.0191283000 0.1343300655 -0.0148750377 0.047081187
## 399 6.483753e-02 0.0375983411 -0.0533530862 0.0176457988 0.042067013
## 401 -5.718155e-03 -0.0334811322 0.0516806313 0.0115417375 0.055828684
## 403 2.526138e-03 0.0249406496 0.0105362688 0.0304874780 0.071629631
## 405 7.965271e-02 0.0225919496 -0.0306544093 0.1812898287 0.010641807
## 407 4.617797e-02 0.0415588771 -0.0531769606 0.0444436717 -0.026183504
## 411 2.637397e-02 0.0007046988 -0.0157991312 -0.0119747539 0.023340865
## 415 1.462434e-02 0.0261518169 0.0800144895 -0.0161289769 -0.057461997
## 416 3.626482e-02 0.0213718633 -0.0106589064 0.0132273527 -0.033081768
## 419 4.735293e-02 -0.0302424614 0.0569923706 -0.0494777845 -0.023715566
## 421 -1.120430e-02 -0.0369379590 0.0590018038 0.0071249207 -0.009928880
## 424 2.909995e-03 -0.0447503998 0.0956503755 -0.0376494487 -0.010704090
## 427 2.293036e-02 -0.0332331323 -0.0693080727 -0.0462158653 -0.037284217
## 430 2.923890e-02 0.0560240187 -0.0493517527 0.0492322922 -0.056022801
## 432 6.896127e-03 -0.1054935535 -0.0476792652 -0.0543933416 0.031452910
## 469 1.572445e-02 -0.0290754741 -0.0490645623 0.0206798770 0.009287034
## 473 2.490237e-02 -0.0256404392 0.0622018830 0.0237053589 -0.121798795
## 481 1.519145e-02 0.0153362053 -0.0968900143 -0.0106988296 -0.047938014
## 492 -1.059154e-02 0.0174756953 -0.1325241052 -0.0209904039 0.073459226
## 494 8.528737e-02 -0.0548694644 0.0595730163 -0.0669966112 -0.004208112
## 495 3.289226e-02 -0.0002961257 0.0338911700 0.0464773474 0.073089765
## 501 2.564365e-02 0.0131471115 -0.0776240039 0.0378466446 -0.092327648
## 502 6.690823e-02 -0.0700962866 -0.0980732762 0.0257056008 -0.001923039
## 503 -1.051120e-02 0.0105533494 0.0673815371 -0.0196366869 -0.027757784
## 505 1.908324e-02 0.0070787646 0.0419117257 -0.0307406206 -0.009544685
## 506 -4.042086e-03 0.0146880695 0.0412779692 -0.1262588973 -0.050614682
## 507 -7.187594e-03 -0.0137303178 -0.0113858931 0.0328289901 -0.026077592
## 513 8.551080e-04 -0.0308006602 0.0524026270 -0.0093605435 0.050593279
## 514 8.453101e-02 -0.0668801782 -0.0774918499 0.0253038340 0.015445327
## 515 4.721669e-02 -0.0646724641 0.0581003075 -0.0210979437 -0.017443775
## 516 6.716529e-03 0.0103309022 -0.0294207185 0.0934506403 0.055547851
## 519 9.271568e-02 0.0187091071 -0.0203513939 -0.0466987590 -0.030029676
## 520 1.985842e-02 -0.0580697356 -0.0415379311 -0.0653831610 -0.013856185
## 521 3.459253e-02 0.0287007720 -0.0357438237 -0.0415151327 -0.079645804
## 522 7.401375e-02 0.0090500409 -0.0085750641 -0.0466641192 -0.074968792
## 525 5.882945e-02 -0.0246934453 0.0018866239 -0.0301416718 -0.012248411
## 529 -7.190881e-03 -0.0243903315 -0.0130891655 -0.0032396209 0.012142193
## 530 4.415586e-02 -0.1124933570 -0.0230788516 0.0008391465 -0.082738174
## 531 8.226693e-02 0.0215033941 0.0762544348 0.0110534831 0.013786347
## 533 3.460916e-02 -0.0624104481 -0.0777998329 -0.0274049917 -0.047748814
## 534 -1.406719e-02 -0.0119759175 0.0179668764 -0.0219778955 0.007074690
## 535 5.041973e-02 -0.0538192131 0.0335646050 0.0111293486 0.044552537
## 538 -4.088040e-03 0.0213594224 -0.0380110149 0.0167850716 0.060801574
## 541 1.340387e-02 -0.0393468582 -0.0140180662 -0.0460975952 0.024692975
## 542 4.500933e-02 -0.0923230996 -0.0142350636 0.0929730215 -0.202270563
## 543 -5.796381e-03 -0.0977749625 -0.1288973417 -0.0057251191 0.060046790
## 544 3.981867e-02 -0.0318620455 0.0054399123 0.0219541310 -0.031963347
## 548 7.229131e-03 0.0101006248 -0.0146923537 0.0148165151 0.014124961
## 549 -4.932343e-03 0.0274901355 0.0946943637 0.0896404112 0.080824649
## 550 8.741610e-03 0.0198233943 0.0399629027 0.0351037762 0.147911104
## 555 2.248631e-02 -0.0743060994 0.0257046032 -0.0014397733 -0.043854330
## 556 4.417442e-02 0.0303604047 0.0175003494 0.0059846229 -0.038826452
## 559 -1.292251e-02 -0.0795947479 0.0260405854 -0.0481920156 -0.008741425
## 560 -1.248343e-02 -0.0416797597 0.0052974824 -0.0492245941 0.040540276
## 563 -3.179696e-03 0.0193075707 -0.0199029483 -0.0299960991 -0.018075325
## 564 1.799686e-02 -0.1032399027 0.0394360319 -0.0357772855 -0.008881934
## 565 3.607650e-02 0.0047377076 -0.0007163052 -0.0373124650 -0.090728387
## 569 2.064519e-02 -0.0144839254 0.0689686269 0.0065726163 -0.042739632
## 570 1.045695e-02 0.0244905338 -0.0963030079 -0.0350429253 0.018103488
## 571 2.214197e-02 -0.0518247427 -0.0207337086 0.1425268189 -0.058927636
It appears that ‘small’ C. r. preussi may be mostly females; at this point, we decided to split everything up by sex for this analysis.
m.reich=reich[reich$Sex=="Male",]
f.reich=reich[reich$Sex=="Female",]
There are 263 males and 120 females. This bias is likely (in part) due to the difficulty of identifying and aging female Cinnyris sunbirds.
Recalculate PCA values for males only.
summary(m.reich)
## Species Subspecies Collection Catalog
## regius : 0 genderuensis: 13 NHMUK :95 139779 : 1
## reichenowi:263 parvirostris: 27 ZFMK :49 145115 : 1
## preussi :116 AMNH :40 145818 : 1
## regius : 0 MNMH :24 145819 : 1
## reichenowi :104 RMCA :19 145882 : 1
## Unknown : 3 CM :16 145967 : 1
## (Other):20 (Other):257
## Locality2 Sex Right.wing.chord Tail.length
## Mt Cameroon : 40 : 0 Min. :49.00 Min. :35.00
## Bioko : 27 Femae : 0 1st Qu.:55.00 1st Qu.:40.00
## Bamenda Highlands: 25 Female : 0 Median :57.00 Median :42.00
## Rwenzori Mts : 20 Male :263 Mean :56.78 Mean :41.78
## Mt Manengouba : 19 Unknown: 0 3rd Qu.:59.00 3rd Qu.:44.00
## Mt Oku : 19 Max. :63.00 Max. :52.00
## (Other) :113
## Culmen.length Bill.depth..base.of.feathers.on.mandible.
## Min. :12.29 Min. :1.870
## 1st Qu.:14.48 1st Qu.:2.775
## Median :16.26 Median :2.920
## Mean :16.15 Mean :2.903
## 3rd Qu.:17.59 3rd Qu.:3.050
## Max. :22.13 Max. :3.550
##
## Bill.width..base.of.feathers.on.maxilla. Left.Tarsus
## Min. :3.340 Min. :10.34
## 1st Qu.:4.360 1st Qu.:12.30
## Median :4.610 Median :13.29
## Mean :4.632 Mean :13.23
## 3rd Qu.:4.920 3rd Qu.:14.12
## Max. :5.730 Max. :16.49
##
We can now do PCAs for these data.
## PC1
## 0.5299903
## PC2
## 0.1582757
## PC3
## 0.1357479
## PC4
## 0.07996781
## PC5
## 0.06152048
## PC6
## 0.03449792
Unsurprisingly, the results for PCA contribution for only males is almost identical to the whole dataset.
## [1] "For Right.wing.chord: PC1: 0.189"
## [1] "For Tail.length: PC1: 0.149"
## [1] "For Culmen.length: PC1: 0.209"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC1: 0.114"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC1: 0.19"
## [1] "For Left.Tarsus: PC1: 0.15"
## [1] "For Right.wing.chord: PC2: 0.0238"
## [1] "For Tail.length: PC2: 0.171"
## [1] "For Culmen.length: PC2: 0.0825"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC2: 0.367"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC2: 0.0338"
## [1] "For Left.Tarsus: PC2: 0.322"
## [1] "For Right.wing.chord: PC3: 0.171"
## [1] "For Tail.length: PC3: 0.312"
## [1] "For Culmen.length: PC3: 0.0752"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC3: 0.239"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC3: 0.143"
## [1] "For Left.Tarsus: PC3: 0.0597"
biplot(rda.x)
Again, the biplot and contributions are similar for all individuals.
#Plot reichenowi only, flawed loadings
a=ggplot(x3[which(x3$Species=="reichenowi"),],aes(x=PC1,y=PC2,colour=Subspecies))
b=geom_point()
c=theme_classic()
d=stat_ellipse()
print(a+b+c+d)
## Too few points to calculate an ellipse
## Warning: Removed 1 row(s) containing missing values (geom_path).
colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi","genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)
#Plot reichenowi only, males
a=ggplot(x3[which(x3$Species=="reichenowi"),],aes(x=PC1,y=PC2,
colour=Subspecies,colour=grp))
## Warning: Duplicated aesthetics after name standardisation: colour
b=geom_point(size=1.5)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15),
legend.title = element_blank(),
legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
f1=ylab("PC 2")
f2=xlab("PC 1")
plotx=a+b+c+d+e+f1+f2
print(plotx)
## Too few points to calculate an ellipse
## Warning: Removed 1 row(s) containing missing values (geom_path).
x99=x3[x3$Subspecies=="genderuensis",]
x99[order(x99$PC1),c(2:6,13)]
## Subspecies Collection Catalog Locality2 Sex PC1
## 34 genderuensis RMCA 75-3-A-438 Adamawa Male -0.006837937
## 25 genderuensis MNMH 1971.637 Yaounde Male 0.002249162
## 35 genderuensis RMCA 75-3-A-451 Adamawa Male 0.016450666
## 44 genderuensis ZMB 75/80 Yaounde Male 0.019065758
## 33 genderuensis NHMUK 1940.2.8.63 Tibati Male 0.021588830
## 31 genderuensis NHMUK 1922.11.25.216 Tibati Male 0.022491253
## 45 genderuensis ZMB 75/99 Adamawa Male 0.028710801
## 37 genderuensis RMCA 75-3-A-522 Adamawa Male 0.036313941
## 42 genderuensis ZMB 49/252 Genderu Male 0.039492812
## 26 genderuensis MNMH 1983.62 Adamawa Male 0.050241655
## 28 genderuensis MNMH 1994.1401 Adamawa Male 0.064582894
## 41 genderuensis RMCA 75-3-A-727 Adamawa Male 0.072250196
## 39 genderuensis RMCA 75-3-A-660 Adamawa Male 0.092838521
x3[x3$Subspecies=="Unknown",1:5]
## Species Subspecies Collection Catalog Locality2
## 570 reichenowi Unknown MNMH 2005.995 Unknown
## 571 reichenowi Unknown ZMB 2000/7987 Unknown
## 572 reichenowi Unknown ZMB 75/79 Bangwa Highlands
The most extreme C. r. genderuensis individual is a bird collected at Tello, Cameroon (RMCA 75-3-A-438), as part of a larger series at the RMCA. There are several birds that appear to be at the “edge” of C. r. preussi morphometric space.
Interestingly, all three “Unknown” individuals are towards the genderuensis side of the spectrum. But what about the C. r. preussi that are towards the extreme?
x3[x3$Subspecies=="preussi"&x3$PC1>0,c(1:5,13)]
## Species Subspecies Collection Catalog Locality2
## 93 reichenowi preussi AMNH 415793 Bamenda Highlands
## 101 reichenowi preussi AMNH 688997 Bamenda Highlands
## 103 reichenowi preussi AMNH 688998 Bamenda Highlands
## 121 reichenowi preussi MNMH 1994.1405 Bamenda Highlands
## 149 reichenowi preussi NHMUK 1911.12.23.4230 Mt Cameroon
## 152 reichenowi preussi NHMUK 1911.12.23.4348 Mt Manengouba
## 193 reichenowi preussi NHMUK 1966.16.2439 Mt Manengouba
## 194 reichenowi preussi NHMUK 1966.16.2440 Mt Manengouba
## 297 reichenowi preussi ZMB 75/82 Bamenda Highlands
## PC1
## 93 0.013220157
## 101 0.025077182
## 103 0.008237055
## 121 0.006620631
## 149 0.008033713
## 152 0.010611211
## 193 0.013690054
## 194 0.026974751
## 297 0.004422322
Interestingly, some of the most extreme individuals for preussi come from the Manengouba area. These may require a little more research. The other birds that are “intermediate” are from the edge of the Bamenda highlands, and may be intermediate birds or be misallocated to subspecies. However, we are leaving the assignations as is based on locality data.
We can also look by character and visualize how different specific characters are for these species.
colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
ssps.x=c("reichenowi","preussi","genderuensis","parvirostris","Unknown")
names(colorset)=c("reichenowi","preussi","genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)
for(i in 7:12){
ave=colnames(m.reich)[i]
print(paste0("Working: ",ave))
for(k in 1:(length(ssps.x)-1)){
ssp1=ssps.x[k]
ssp2=ssps.x[k+1]
ssp1.x=m.reich[which(m.reich$Subspecies==ssp1),ave]
ssp2.x=m.reich[which(m.reich$Subspecies==ssp2),ave]
mu1=mean(ssp1.x)
mu2=mean(ssp2.x)
sd1=sd(ssp1.x)
sd2=sd(ssp2.x)
n1=length(ssp1.x)
n2=length(ssp2.x)
print(paste0("Summary stats: ",
ssp1," vs. ",ssp2))
print(paste0(ssp1,": ","Avg: ",round(mu1,2)," SD: ",round(sd1,2)," #: ",n1))
print(paste0(ssp2,": ","Avg: ",round(mu2,2)," SD: ",round(sd2,2)," #: ",n2))
percent.diff=round(abs(((mu1/mu2)*100)-100),2)
print(paste0("Difference: ",percent.diff,"%"))
}
a=ggplot(m.reich,aes(y=m.reich[,i],x=Subspecies))
b=geom_boxplot()
c=theme_classic()
d=ylab(print(ave))
e=scale_color_manual(values=colorset,aesthetics = c("fill"))
print(a+b+c+d+e)
}
## [1] "Working: Right.wing.chord"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 54.47 SD: 1.93 #: 104"
## [1] "preussi: Avg: 58.68 SD: 1.6 #: 116"
## [1] "Difference: 7.17%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 58.68 SD: 1.6 #: 116"
## [1] "genderuensis: Avg: 56 SD: 1.96 #: 13"
## [1] "Difference: 4.79%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 56 SD: 1.96 #: 13"
## [1] "parvirostris: Avg: 58 SD: 2.4 #: 27"
## [1] "Difference: 3.45%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 58 SD: 2.4 #: 27"
## [1] "Unknown: Avg: 55.67 SD: 3.06 #: 3"
## [1] "Difference: 4.19%"
## [1] "Right.wing.chord"
## [1] "Working: Tail.length"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 40.05 SD: 2.77 #: 104"
## [1] "preussi: Avg: 43.38 SD: 2.59 #: 116"
## [1] "Difference: 7.68%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 43.38 SD: 2.59 #: 116"
## [1] "genderuensis: Avg: 40.92 SD: 2.1 #: 13"
## [1] "Difference: 6%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 40.92 SD: 2.1 #: 13"
## [1] "parvirostris: Avg: 42.33 SD: 3.13 #: 27"
## [1] "Difference: 3.33%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 42.33 SD: 3.13 #: 27"
## [1] "Unknown: Avg: 39 SD: 1 #: 3"
## [1] "Difference: 8.55%"
## [1] "Tail.length"
## [1] "Working: Culmen.length"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 14.27 SD: 0.81 #: 104"
## [1] "preussi: Avg: 17.78 SD: 1.21 #: 116"
## [1] "Difference: 19.72%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 17.78 SD: 1.21 #: 116"
## [1] "genderuensis: Avg: 15.31 SD: 0.72 #: 13"
## [1] "Difference: 16.1%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 15.31 SD: 0.72 #: 13"
## [1] "parvirostris: Avg: 16.86 SD: 0.63 #: 27"
## [1] "Difference: 9.18%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 16.86 SD: 0.63 #: 27"
## [1] "Unknown: Avg: 15.03 SD: 0.43 #: 3"
## [1] "Difference: 12.17%"
## [1] "Culmen.length"
## [1] "Working: Bill.depth..base.of.feathers.on.mandible."
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 2.83 SD: 0.22 #: 104"
## [1] "preussi: Avg: 2.98 SD: 0.19 #: 116"
## [1] "Difference: 5.3%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 2.98 SD: 0.19 #: 116"
## [1] "genderuensis: Avg: 2.71 SD: 0.31 #: 13"
## [1] "Difference: 10.1%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 2.71 SD: 0.31 #: 13"
## [1] "parvirostris: Avg: 2.96 SD: 0.16 #: 27"
## [1] "Difference: 8.34%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 2.96 SD: 0.16 #: 27"
## [1] "Unknown: Avg: 2.79 SD: 0.24 #: 3"
## [1] "Difference: 5.87%"
## [1] "Bill.depth..base.of.feathers.on.mandible."
## [1] "Working: Bill.width..base.of.feathers.on.maxilla."
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 4.33 SD: 0.28 #: 104"
## [1] "preussi: Avg: 4.89 SD: 0.34 #: 116"
## [1] "Difference: 11.53%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 4.89 SD: 0.34 #: 116"
## [1] "genderuensis: Avg: 4.39 SD: 0.22 #: 13"
## [1] "Difference: 11.41%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 4.39 SD: 0.22 #: 13"
## [1] "parvirostris: Avg: 4.86 SD: 0.24 #: 27"
## [1] "Difference: 9.78%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 4.86 SD: 0.24 #: 27"
## [1] "Unknown: Avg: 4.25 SD: 0.29 #: 3"
## [1] "Difference: 14.45%"
## [1] "Bill.width..base.of.feathers.on.maxilla."
## [1] "Working: Left.Tarsus"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 12.38 SD: 0.91 #: 104"
## [1] "preussi: Avg: 13.82 SD: 1.14 #: 116"
## [1] "Difference: 10.4%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 13.82 SD: 1.14 #: 116"
## [1] "genderuensis: Avg: 13 SD: 0.97 #: 13"
## [1] "Difference: 6.29%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 13 SD: 0.97 #: 13"
## [1] "parvirostris: Avg: 14.14 SD: 1 #: 27"
## [1] "Difference: 8.07%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 14.14 SD: 1 #: 27"
## [1] "Unknown: Avg: 12.93 SD: 0.74 #: 3"
## [1] "Difference: 9.41%"
## [1] "Left.Tarsus"
It looks like the most extreme divergences (in male sunbirds) are for bill length and bill width, which makes sense as montane Cameroonian birds seem “big billed” in the hand.
a=ggplot(m.reich,aes(x=Culmen.length,y=Bill.width..base.of.feathers.on.maxilla.,colour=Subspecies))
b=geom_point()
c=theme_classic()
d=stat_ellipse()
print(a+b+c+d)
## Too few points to calculate an ellipse
## Warning: Removed 1 row(s) containing missing values (geom_path).
Bill information separates out east from west extremely well except for the intermediary birds of C. r. genderuensis and a few extreme individuals.
We can perform iterative Wilcoxon rank-sum tests of the data to understand how distinct these individual variables are for each population.
#colnames(x6)
morphocols=7:12
wilcox.sunbird=function(input,ssp1,ssp2,morphocols){
#Define groups
w1=input[which(input$Subspecies==ssp1),]
w2=input[which(input$Subspecies==ssp2),]
print(paste0("COMPARISONS OF: ",ssp1, " & ",ssp2))
for(i in morphocols){
print(paste0("For ",colnames(input[i]),":"))
#Test each character
##Define vector
a=w1[,i]
b=w2[,i]
#perform test of normality
##Null hypothesis is from normal distribution
a.shapiro=shapiro.test(a)
if(a.shapiro$p.value>0.05){
print(paste0("For ",ssp1,": failure to reject normality."))}else{
print(paste0("For ",ssp1,": NON NORMAL."))
}
b.shapiro=shapiro.test(b)
if(b.shapiro$p.value>0.05){
print(paste0("For ",ssp2,": failure to reject normality."))}else{
print(paste0("For ",ssp2,": NON NORMAL."))
}
#Wilcoxon test
a.b.wilco=wilcox.test(x=a,y=b)
print(a.b.wilco)
}
}
We can now run this test function across the data set.
#Subspecies:
##genderuensis
##preussi
##parvirostris
##reichenowi
wilcox.sunbird(input=m.reich,ssp1="genderuensis",ssp2="preussi",morphocols=morphocols)
## [1] "COMPARISONS OF: genderuensis & preussi"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 187.5, p-value = 6.919e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 349, p-value = 0.001428
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 57.5, p-value = 5.165e-08
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: NON NORMAL."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 300, p-value = 0.0003862
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 144, p-value = 1.849e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 422.5, p-value = 0.009604
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=m.reich,ssp1="genderuensis",ssp2="reichenowi",morphocols=morphocols)
## [1] "COMPARISONS OF: genderuensis & reichenowi"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 993, p-value = 0.005256
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 827, p-value = 0.1885
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 1129.5, p-value = 8.53e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: NON NORMAL."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 543.5, p-value = 0.2522
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 748, p-value = 0.5351
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 922, p-value = 0.03323
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=m.reich,ssp1="genderuensis",ssp2="parvirostris",morphocols=morphocols)
## [1] "COMPARISONS OF: genderuensis & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 76, p-value = 0.003804
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 131, p-value = 0.1999
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 16, p-value = 4.394e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 72, p-value = 0.002918
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 27.5, p-value = 2.043e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 70, p-value = 0.002428
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=m.reich,ssp1="preussi",ssp2="reichenowi",morphocols=morphocols)
## [1] "COMPARISONS OF: preussi & reichenowi"
## [1] "For Right.wing.chord:"
## [1] "For preussi: NON NORMAL."
## [1] "For reichenowi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 11486, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 9792, p-value = 1.097e-15
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 11982, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 8364, p-value = 7.527e-07
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 11002, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 10136, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=m.reich,ssp1="preussi",ssp2="parvirostris",morphocols=morphocols)
## [1] "COMPARISONS OF: preussi & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For preussi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 1876, p-value = 0.1044
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 1888, p-value = 0.09505
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 2341.5, p-value = 6.395e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 1708, p-value = 0.4653
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 1577, p-value = 0.9568
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 1292, p-value = 0.1583
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=m.reich,ssp1="reichenowi",ssp2="parvirostris",morphocols=morphocols)
## [1] "COMPARISONS OF: reichenowi & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 320, p-value = 4.59e-10
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 831.5, p-value = 0.001063
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 25, p-value = 4.372e-15
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 942, p-value = 0.008626
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 178, p-value = 3.085e-12
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 277, p-value = 1.456e-10
## alternative hypothesis: true location shift is not equal to 0
We can perform an RDA to see how predictable these subspecies are.
#head(x6)
x7=x3[x3$Subspecies!='Unknown',]
x7$Subspecies[which(x7$Subspecies=="parvirostris")]="preussi"
#Remove 'ghost' groups
x7$Subspecies=as.character(x7$Subspecies)
x7$Subspecies=as.factor(x7$Subspecies)
lda.x2=lda(Subspecies~PC1+PC2+PC3,data=x7,CV=T)
#print(lda.x2)
summary(lda.x2)
## Length Class Mode
## class 260 factor numeric
## posterior 780 -none- numeric
## terms 3 terms call
## call 4 -none- call
## xlevels 0 -none- list
#Check predictions
ct=table(x7$Subspecies,lda.x2$class)
print(ct)
##
## genderuensis preussi reichenowi
## genderuensis 1 2 10
## preussi 0 136 7
## reichenowi 0 2 102
diag(prop.table(ct,1))
## genderuensis preussi reichenowi
## 0.07692308 0.95104895 0.98076923
sum(diag(prop.table(ct)))
## [1] 0.9192308
We can also do a test of only genderuensis and preussi.
#head(x6)
x7=x3[x3$Subspecies!='Unknown',]
xy=x7[x7$Subspecies!='reichenowi',]
xy$Subspecies[which(xy$Subspecies=="parvirostris")]="preussi"
#Remove 'ghost' groups
xy$Subspecies=as.character(xy$Subspecies)
xy$Subspecies=as.factor(xy$Subspecies)
lda.x2=lda(Subspecies~PC1+PC2+PC3,data=xy,CV=T)
#print(lda.x2)
summary(lda.x2)
## Length Class Mode
## class 156 factor numeric
## posterior 312 -none- numeric
## terms 3 terms call
## call 4 -none- call
## xlevels 0 -none- list
#Check predictions
ct=table(xy$Subspecies,lda.x2$class)
print(ct)
##
## genderuensis preussi
## genderuensis 7 6
## preussi 0 143
diag(prop.table(ct,1))
## genderuensis preussi
## 0.5384615 1.0000000
sum(diag(prop.table(ct)))
## [1] 0.9615385
The tests are 100% successful for preussi, but only ~50% successful for genderuensis. This may be related to limited representation for genderuensis.
#summary(xy)
xypreuss=xy[xy$Subspecies=='preussi',]
xygend=xy[xy$Subspecies=='genderuensis',]
jack=as.data.frame(matrix(nrow=100,ncol=3))
colnames(jack)=c('PREUSS','GEND','SUM')
for(i in 1:1000){
rows=sample(nrow(xypreuss),nrow(xygend))
r.x=xypreuss[rows,]
new.x=rbind(r.x,xygend)
lda.x2=lda(Subspecies~PC1+PC2+PC3,data=new.x,CV=T)
ct=table(new.x$Subspecies,lda.x2$class)
x.tab=diag(prop.table(ct,1))
jack[i,2]=x.tab[1]
jack[i,1]=x.tab[2]
jack[i,3]=sum(diag(prop.table(ct)))
}
summary(jack)
## PREUSS GEND SUM
## Min. :0.6154 Min. :0.6923 Min. :0.6923
## 1st Qu.:0.8462 1st Qu.:0.8462 1st Qu.:0.8462
## Median :0.8462 Median :0.9231 Median :0.8846
## Mean :0.8695 Mean :0.9146 Mean :0.8920
## 3rd Qu.:0.9231 3rd Qu.:1.0000 3rd Qu.:0.9231
## Max. :1.0000 Max. :1.0000 Max. :1.0000
j.p=cbind(jack[,1],"preuss")
j.g=cbind(jack[,2],"gend")
j.s=cbind(jack[,3],"sum")
jack2=rbind(j.p,j.g)
jack2=as.data.frame(jack2)
colnames(jack2)=c("Value","Population")
jack2[,1]=as.numeric(as.character(jack2[,1]))
jack2[,2]=as.factor(jack2[,2])
summary(jack2[jack2$Population=='preuss',])
## Value Population
## Min. :0.6154 gend : 0
## 1st Qu.:0.8462 preuss:1000
## Median :0.8462
## Mean :0.8695
## 3rd Qu.:0.9231
## Max. :1.0000
summary(jack2[jack2$Population=='gend',])
## Value Population
## Min. :0.6923 gend :1000
## 1st Qu.:0.8462 preuss: 0
## Median :0.9231
## Mean :0.9146
## 3rd Qu.:1.0000
## Max. :1.0000
On average, we are correctly identifying 86.5% of preussi and 91.2% of genderuensis. This is pretty indicative that these groups are separating.
More informative metric would be the sum correct, shown below.
j.s=as.data.frame(j.s)
colnames(j.s)=c("Value","Population")
j.s[,1]=as.numeric(as.character(j.s[,1]))
j.s[,2]=as.factor(j.s[,2])
summary(j.s)
## Value Population
## Min. :0.6923 sum:1000
## 1st Qu.:0.8462
## Median :0.8846
## Mean :0.8920
## 3rd Qu.:0.9231
## Max. :1.0000
As shown above, the reduced dataset of the same size of genderuensis does improve the performance of recovering the two separate groups. On average, if we have just a few birds, we can identify which group they belong to with 88.8% accuracy.
We can also look at the accuracy of diagnosing only two separate groups, for the east and the west.
x8=x7[which(x7$Subspecies!='genderuensis'),]
#Remove 'ghost' group
x8$Subspecies=as.character(x8$Subspecies)
x8$Subspecies=as.factor(x8$Subspecies)
lda.x2=lda(Subspecies~PC1+PC2+PC3,data=x8,CV=T)
#print(lda.x2)
summary(lda.x2)
## Length Class Mode
## class 247 factor numeric
## posterior 741 -none- numeric
## terms 3 terms call
## call 4 -none- call
## xlevels 0 -none- list
#Check predictions
ct=table(x8$Subspecies,lda.x2$class)
print(ct)
##
## parvirostris preussi reichenowi
## parvirostris 0 26 1
## preussi 0 109 7
## reichenowi 0 2 102
diag(prop.table(ct,1))
## parvirostris preussi reichenowi
## 0.0000000 0.9396552 0.9807692
sum(diag(prop.table(ct)))
## [1] 0.854251
Removing genderuensis, we have over 95% confidence in separating out these two populations based on morphological characters.
summary(f.reich)
## Species Subspecies Collection Catalog
## regius : 0 genderuensis: 7 NHMUK :48 118562 : 1
## reichenowi:120 parvirostris:10 AMNH :19 139823 : 1
## preussi :61 ZFMK :16 145815 : 1
## regius : 0 MNMH :14 145817 : 1
## reichenowi :42 RMCA :12 146100 : 1
## Unknown : 0 CM : 5 1887.3.7.35: 1
## (Other): 6 (Other) :114
## Locality2 Sex Right.wing.chord Tail.length
## Mt Cameroon :29 : 0 Min. :45.00 Min. : 8.18
## Bamenda Highlands:12 Femae : 0 1st Qu.:51.00 1st Qu.:33.00
## Bioko :10 Female :120 Median :52.00 Median :35.00
## Rwenzori Mts :10 Male : 0 Mean :52.38 Mean :35.03
## Mt Manengouba : 7 Unknown: 0 3rd Qu.:54.25 3rd Qu.:37.00
## Mt Oku : 6 Max. :57.00 Max. :43.00
## (Other) :46
## Culmen.length Bill.depth..base.of.feathers.on.mandible.
## Min. :11.95 Min. :2.200
## 1st Qu.:13.89 1st Qu.:2.658
## Median :15.48 Median :2.780
## Mean :15.29 Mean :2.805
## 3rd Qu.:16.54 3rd Qu.:2.950
## Max. :19.74 Max. :3.510
##
## Bill.width..base.of.feathers.on.maxilla. Left.Tarsus
## Min. :3.200 Min. : 9.08
## 1st Qu.:4.228 1st Qu.:11.80
## Median :4.460 Median :12.77
## Mean :4.432 Mean :12.71
## 3rd Qu.:4.680 3rd Qu.:13.57
## Max. :5.320 Max. :15.43
##
This section is repeating the above but for only adult female sunbirds.
## PC1
## 0.4556222
## PC2
## 0.1982091
## PC3
## 0.1423943
## PC4
## 0.08265578
## PC5
## 0.06668558
## PC6
## 0.05443294
Unsurprisingly, the results for PCA contribution for only females is almost identical to the whole dataset.
## [1] "For Right.wing.chord: PC1: 0.216"
## [1] "For Tail.length: PC1: 0.0813"
## [1] "For Culmen.length: PC1: 0.223"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC1: 0.106"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC1: 0.201"
## [1] "For Left.Tarsus: PC1: 0.172"
## [1] "For Right.wing.chord: PC2: -0.066"
## [1] "For Tail.length: PC2: -0.302"
## [1] "For Culmen.length: PC2: 0.079"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC2: -0.27"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC2: 0.0458"
## [1] "For Left.Tarsus: PC2: 0.237"
## [1] "For Right.wing.chord: PC3: 0.0627"
## [1] "For Tail.length: PC3: 0.35"
## [1] "For Culmen.length: PC3: 0.0216"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC3: -0.341"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC3: -0.132"
## [1] "For Left.Tarsus: PC3: 0.093"
biplot(rda.x)
Again, the biplot and contributions are similar for all individuals.
#Plot reichenowi only, flawed loadings
a=ggplot(x3[which(x3$Species=="reichenowi"),],aes(x=PC1,y=PC2,colour=Subspecies))
b=geom_point()
c=theme_classic()
d=stat_ellipse()
print(a+b+c+d)
We can also look at individual boxplots of the data to see how they behave for females.
for(i in 7:12){
ave=colnames(f.reich)[i]
print(paste0("Working: ",ave))
for(k in 1:(length(ssps.x)-1)){
ssp1=ssps.x[k]
ssp2=ssps.x[k+1]
ssp1.x=f.reich[which(f.reich$Subspecies==ssp1),ave]
ssp2.x=f.reich[which(f.reich$Subspecies==ssp2),ave]
mu1=mean(ssp1.x)
mu2=mean(ssp2.x)
sd1=sd(ssp1.x)
sd2=sd(ssp2.x)
n1=length(ssp1.x)
n2=length(ssp2.x)
print(paste0("Summary stats: ",
ssp1," vs. ",ssp2))
print(paste0(ssp1,": ","Avg: ",round(mu1,2)," SD: ",round(sd1,2)," #: ",n1))
print(paste0(ssp2,": ","Avg: ",round(mu2,2)," SD: ",round(sd2,2)," #: ",n2))
percent.diff=round(abs(((mu1/mu2)*100)-100),2)
print(paste0("Difference: ",percent.diff,"%"))
}
a=ggplot(f.reich,aes(y=f.reich[,i],x=Subspecies))
b=geom_boxplot()
c=theme_classic()
d=ylab(print(ave))
print(a+b+c+d)
}
## [1] "Working: Right.wing.chord"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 49.9 SD: 2.16 #: 42"
## [1] "preussi: Avg: 53.89 SD: 1.64 #: 61"
## [1] "Difference: 7.39%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 53.89 SD: 1.64 #: 61"
## [1] "genderuensis: Avg: 51.71 SD: 0.95 #: 7"
## [1] "Difference: 4.2%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 51.71 SD: 0.95 #: 7"
## [1] "parvirostris: Avg: 54 SD: 1.83 #: 10"
## [1] "Difference: 4.23%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 54 SD: 1.83 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Right.wing.chord"
## [1] "Working: Tail.length"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 34.19 SD: 2.65 #: 42"
## [1] "preussi: Avg: 35.92 SD: 4.36 #: 61"
## [1] "Difference: 4.82%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 35.92 SD: 4.36 #: 61"
## [1] "genderuensis: Avg: 34.71 SD: 2.29 #: 7"
## [1] "Difference: 3.48%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 34.71 SD: 2.29 #: 7"
## [1] "parvirostris: Avg: 33.4 SD: 2.07 #: 10"
## [1] "Difference: 3.93%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 33.4 SD: 2.07 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Tail.length"
## [1] "Working: Culmen.length"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 13.72 SD: 1.01 #: 42"
## [1] "preussi: Avg: 16.4 SD: 1.08 #: 61"
## [1] "Difference: 16.36%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 16.4 SD: 1.08 #: 61"
## [1] "genderuensis: Avg: 14.21 SD: 0.92 #: 7"
## [1] "Difference: 15.44%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 14.21 SD: 0.92 #: 7"
## [1] "parvirostris: Avg: 15.79 SD: 0.63 #: 10"
## [1] "Difference: 10.02%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 15.79 SD: 0.63 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Culmen.length"
## [1] "Working: Bill.depth..base.of.feathers.on.mandible."
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 2.7 SD: 0.22 #: 42"
## [1] "preussi: Avg: 2.87 SD: 0.22 #: 61"
## [1] "Difference: 5.75%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 2.87 SD: 0.22 #: 61"
## [1] "genderuensis: Avg: 2.83 SD: 0.25 #: 7"
## [1] "Difference: 1.24%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 2.83 SD: 0.25 #: 7"
## [1] "parvirostris: Avg: 2.81 SD: 0.22 #: 10"
## [1] "Difference: 0.79%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 2.81 SD: 0.22 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Bill.depth..base.of.feathers.on.mandible."
## [1] "Working: Bill.width..base.of.feathers.on.maxilla."
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 4.15 SD: 0.38 #: 42"
## [1] "preussi: Avg: 4.64 SD: 0.3 #: 61"
## [1] "Difference: 10.5%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 4.64 SD: 0.3 #: 61"
## [1] "genderuensis: Avg: 4.26 SD: 0.2 #: 7"
## [1] "Difference: 8.94%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 4.26 SD: 0.2 #: 7"
## [1] "parvirostris: Avg: 4.5 SD: 0.2 #: 10"
## [1] "Difference: 5.39%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 4.5 SD: 0.2 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Bill.width..base.of.feathers.on.maxilla."
## [1] "Working: Left.Tarsus"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 11.88 SD: 1.14 #: 42"
## [1] "preussi: Avg: 13.4 SD: 1.04 #: 61"
## [1] "Difference: 11.32%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 13.4 SD: 1.04 #: 61"
## [1] "genderuensis: Avg: 11.81 SD: 0.3 #: 7"
## [1] "Difference: 13.42%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 11.81 SD: 0.3 #: 7"
## [1] "parvirostris: Avg: 12.67 SD: 0.89 #: 10"
## [1] "Difference: 6.74%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 12.67 SD: 0.89 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Left.Tarsus"
It looks like the most extreme divergences (in female sunbirds) are for bill length and bill width, but there is also more variation for wing than there is for males. (Or so it appears to the naked eye).
a=ggplot(f.reich,aes(x=Culmen.length,y=Bill.width..base.of.feathers.on.maxilla.,colour=Subspecies))
b=geom_point()
c=theme_classic()
d=stat_ellipse()
print(a+b+c+d)
Bill information for the females is not as drastic as for males.
We can perform iterative Wilcoxon rank-sum tests of the data to understand how distinct these individual variables are for each population.
#Subspecies:
##genderuensis
##preussi
##parvirostris
##reichenowi
wilcox.sunbird(input=f.reich,ssp1="genderuensis",ssp2="preussi",morphocols=morphocols)
## [1] "COMPARISONS OF: genderuensis & preussi"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 65.5, p-value = 0.00247
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 131.5, p-value = 0.09778
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 13, p-value = 5.426e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 214, p-value = 1
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 63.5, p-value = 0.002547
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 33.5, p-value = 0.0002916
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=f.reich,ssp1="genderuensis",ssp2="reichenowi",morphocols=morphocols)
## [1] "COMPARISONS OF: genderuensis & reichenowi"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 241, p-value = 0.006668
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 161.5, p-value = 0.6868
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 207.5, p-value = 0.08641
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 200.5, p-value = 0.1297
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 173.5, p-value = 0.4575
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 143, p-value = 0.9203
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=f.reich,ssp1="genderuensis",ssp2="parvirostris",morphocols=morphocols)
## [1] "COMPARISONS OF: genderuensis & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 11.5, p-value = 0.02102
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 43.5, p-value = 0.4275
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test
##
## data: a and b
## W = 3, p-value = 0.0007199
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 38, p-value = 0.8067
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 13.5, p-value = 0.0403
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test
##
## data: a and b
## W = 17, p-value = 0.08782
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=f.reich,ssp1="preussi",ssp2="reichenowi",morphocols=morphocols)
## [1] "COMPARISONS OF: preussi & reichenowi"
## [1] "For Right.wing.chord:"
## [1] "For preussi: NON NORMAL."
## [1] "For reichenowi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 2394, p-value = 5.359e-14
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For preussi: NON NORMAL."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 1848, p-value = 0.0001306
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 2450, p-value = 4.432e-15
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 1777.5, p-value = 0.0008702
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 2131.5, p-value = 1.164e-08
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 2137, p-value = 9.391e-09
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=f.reich,ssp1="preussi",ssp2="parvirostris",morphocols=morphocols)
## [1] "COMPARISONS OF: preussi & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For preussi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 288.5, p-value = 0.7875
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For preussi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 493, p-value = 0.001815
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 408, p-value = 0.09019
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 370, p-value = 0.2861
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 367, p-value = 0.3093
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 422.5, p-value = 0.05312
## alternative hypothesis: true location shift is not equal to 0
wilcox.sunbird(input=f.reich,ssp1="reichenowi",ssp2="parvirostris",morphocols=morphocols)
## [1] "COMPARISONS OF: reichenowi & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 28, p-value = 2.051e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Tail.length:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 246, p-value = 0.4055
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Culmen.length:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 31, p-value = 3.397e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 172.5, p-value = 0.3901
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 81, p-value = 0.002843
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "For Left.Tarsus:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: a and b
## W = 125, p-value = 0.04975
## alternative hypothesis: true location shift is not equal to 0
We can perform an RDA to see how predictable these subspecies are.
#head(x6)
x7=x3[x3$Subspecies!='Unknown',]
x7$Subspecies[which(x7$Subspecies=="parvirostris")]="preussi"
#Remove 'ghost' groups
x7$Subspecies=as.character(x7$Subspecies)
x7$Subspecies=as.factor(x7$Subspecies)
lda.x2=lda(Subspecies~PC1+PC2+PC3,data=x7,CV=T)
#print(lda.x2)
summary(lda.x2)
## Length Class Mode
## class 120 factor numeric
## posterior 360 -none- numeric
## terms 3 terms call
## call 4 -none- call
## xlevels 0 -none- list
#Check predictions
ct=table(x7$Subspecies,lda.x2$class)
print(ct)
##
## genderuensis preussi reichenowi
## genderuensis 0 2 5
## preussi 0 68 3
## reichenowi 0 4 38
diag(prop.table(ct,1))
## genderuensis preussi reichenowi
## 0.0000000 0.9577465 0.9047619
sum(diag(prop.table(ct)))
## [1] 0.8833333
Similar to the males, genderuensis get lost in the variation of the other two (fairly well defined) populations.
x8=x7[which(x7$Subspecies!='genderuensis'),]
#Remove 'ghost' group
x8$Subspecies=as.character(x8$Subspecies)
x8$Subspecies=as.factor(x8$Subspecies)
lda.x2=lda(Subspecies~PC1+PC2+PC3,data=x8,CV=T)
#print(lda.x2)
summary(lda.x2)
## Length Class Mode
## class 113 factor numeric
## posterior 226 -none- numeric
## terms 3 terms call
## call 4 -none- call
## xlevels 0 -none- list
#Check predictions
ct=table(x8$Subspecies,lda.x2$class)
print(ct)
##
## preussi reichenowi
## preussi 68 3
## reichenowi 4 38
diag(prop.table(ct,1))
## preussi reichenowi
## 0.9577465 0.9047619
sum(diag(prop.table(ct)))
## [1] 0.9380531
Removing genderuensis, we have over 90% confidence in separating out these two populations based on morphological characters.
Now for the random sample part.
We can also do a test of only genderuensis and preussi.
#head(x6)
x7=x3[x3$Subspecies!='Unknown',]
xy=x7[x7$Subspecies!='reichenowi',]
xy$Subspecies[which(xy$Subspecies=="parvirostris")]="preussi"
#Remove 'ghost' groups
xy$Subspecies=as.character(xy$Subspecies)
xy$Subspecies=as.factor(xy$Subspecies)
lda.x2=lda(Subspecies~PC1+PC2+PC3,data=xy,CV=T)
#print(lda.x2)
summary(lda.x2)
## Length Class Mode
## class 78 factor numeric
## posterior 156 -none- numeric
## terms 3 terms call
## call 4 -none- call
## xlevels 0 -none- list
#Check predictions
ct=table(xy$Subspecies,lda.x2$class)
print(ct)
##
## genderuensis preussi
## genderuensis 5 2
## preussi 1 70
diag(prop.table(ct,1))
## genderuensis preussi
## 0.7142857 0.9859155
sum(diag(prop.table(ct)))
## [1] 0.9615385
The tests are 100% successful for preussi, but only ~50% successful for genderuensis. This may be related to limited representation for genderuensis.
#summary(xy)
xypreuss=xy[xy$Subspecies=='preussi',]
xygend=xy[xy$Subspecies=='genderuensis',]
jack=as.data.frame(matrix(nrow=100,ncol=3))
colnames(jack)=c('PREUSS','GEND','SUM')
for(i in 1:1000){
rows=sample(nrow(xypreuss),nrow(xygend))
r.x=xypreuss[rows,]
new.x=rbind(r.x,xygend)
lda.x2=lda(Subspecies~PC1+PC2+PC3,data=new.x,CV=T)
ct=table(new.x$Subspecies,lda.x2$class)
x.tab=diag(prop.table(ct,1))
jack[i,2]=x.tab[1]
jack[i,1]=x.tab[2]
jack[i,3]=sum(diag(prop.table(ct)))
}
summary(jack)
## PREUSS GEND SUM
## Min. :0.5714 Min. :0.7143 Min. :0.6429
## 1st Qu.:0.7143 1st Qu.:1.0000 1st Qu.:0.8571
## Median :0.8571 Median :1.0000 Median :0.9286
## Mean :0.8573 Mean :0.9823 Mean :0.9198
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000
j.p=cbind(jack[,1],"preuss")
j.g=cbind(jack[,2],"gend")
j.s=cbind(jack[,3],"sum")
jack2=rbind(j.p,j.g)
jack2=as.data.frame(jack2)
colnames(jack2)=c("Value","Population")
jack2[,1]=as.numeric(as.character(jack2[,1]))
jack2[,2]=as.factor(jack2[,2])
summary(jack2[jack2$Population=='preuss',])
## Value Population
## Min. :0.5714 gend : 0
## 1st Qu.:0.7143 preuss:1000
## Median :0.8571
## Mean :0.8573
## 3rd Qu.:1.0000
## Max. :1.0000
summary(jack2[jack2$Population=='gend',])
## Value Population
## Min. :0.7143 gend :1000
## 1st Qu.:1.0000 preuss: 0
## Median :1.0000
## Mean :0.9823
## 3rd Qu.:1.0000
## Max. :1.0000
We do not have very many individuals of regius:
## Femae Female Male Unknown
## 0 0 3 17 0
There are three females and seventeen males in the dataset; thus we will look only at males.
## PC1
## 0.3911925
## PC2
## 0.2027945
## PC3
## 0.1903053
## PC4
## 0.1255034
## PC5
## 0.06243406
## PC6
## 0.02777019
The PCA variation is surprisingly similar to the equivalent plot for reichenowi.
## [1] "For Right.wing.chord: PC1: -0.208"
## [1] "For Tail.length: PC1: -0.19"
## [1] "For Culmen.length: PC1: 0.0964"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC1: 0.221"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC1: -0.194"
## [1] "For Left.Tarsus: PC1: -0.0905"
## [1] "For Right.wing.chord: PC2: 0.0744"
## [1] "For Tail.length: PC2: -0.0699"
## [1] "For Culmen.length: PC2: 0.294"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC2: 0.105"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC2: 0.0784"
## [1] "For Left.Tarsus: PC2: 0.378"
## [1] "For Right.wing.chord: PC3: -0.208"
## [1] "For Tail.length: PC3: 0.103"
## [1] "For Culmen.length: PC3: 0.236"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC3: -0.0684"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC3: 0.233"
## [1] "For Left.Tarsus: PC3: -0.152"
biplot(rda.x)
Again, the biplot and contributions are similar for all individuals.
a=ggplot(x3,aes(x=PC1,y=PC2,colour=Locality2))
b=geom_point()
c=theme_classic()
d=stat_ellipse()
print(a+b+c+d)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 2 row(s) containing missing values (geom_path).
There is a lot of morphological overlap between these groups, which is not wholly surprising given that we lack a lot of data and we don’t have robust representation for each mountain range.
for(i in 7:12){
ave=colnames(regius)[i]
a=ggplot(regius,aes(y=regius[,i],x=Locality2))
b=geom_boxplot()
c=theme_classic()
d=ylab(print(ave))
print(a+b+c+d)
}
## [1] "Right.wing.chord"
## [1] "Tail.length"
## [1] "Culmen.length"
## [1] "Bill.depth..base.of.feathers.on.mandible."
## [1] "Bill.width..base.of.feathers.on.maxilla."
## [1] "Left.Tarsus"
The bird from the Rwenzori mountains appears to have a much bigger bill, but we lack data to conclusively see any divergences between this population and others.
Now, to look at the ecology of these populations.
rm(list=ls())
x=read.delim("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Ecological Analysis/ebd_ndcsun2_relAug-2018/ebd_ndcsun2_relAug-2018.txt")
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
## EOF within quoted string
#colnames(x)[c(6,8,9,13,15,26:28,35,36)]
x2=x[-which(x$EFFORT.DISTANCE.KM>=25),
c("SCIENTIFIC.NAME","SUBSPECIES.SCIENTIFIC.NAME",
"OBSERVATION.COUNT","COUNTRY",
"STATE","LATITUDE",
"LONGITUDE","OBSERVATION.DATE",
"DURATION.MINUTES","EFFORT.DISTANCE.KM")]
x2=unique(x2)
print(paste0("Removed based on distance (uniques): ",nrow(x)-nrow(x2)))
## [1] "Removed based on distance (uniques): 218"
print(paste0("Records remaining (uniques): ",nrow(x2)))
## [1] "Records remaining (uniques): 1005"
rm(x)
x2$SOURCE="eBird"
After removing long distances (over 20 km), we have 1005 records left. These are only unique records.
plot(y=x2$LATITUDE,x=x2$LONGITUDE,pch=19,asp=1)
The above is a 1x1 aspect ratio map of the occurrence points; it is immediately obvious that we have the Albertine Rift population, the Kenyan population, and the spread out West African population. (Strangely, it appears as though north Uganda/Sudan birds may be removed).
First, we need to subset these points into the groups that we have observed through the genetic data: genderuensis, preussi, and reichenowi. This is based off of genetic data and the understanding that Bamenda Highlands birds appear to be closes to preussi.
#colnames(x)
x=x2
rm(x2)
#set scientific name to character
x$SUBSPECIES.SCIENTIFIC.NAME=as.character(x$SUBSPECIES.SCIENTIFIC.NAME)
#set all western to preussi
x$SUBSPECIES.SCIENTIFIC.NAME[which(x$LONGITUDE<20)]="preussi"
#set all eastern to reichenowi
x$SUBSPECIES.SCIENTIFIC.NAME[which(x$LONGITUDE>20)]="reichenowi"
#single out genderuensis from preussi
x$SUBSPECIES.SCIENTIFIC.NAME[which(x$LATITUDE<4&x$LATITUDE>3&x$LONGITUDE>10)]="genderuensis"
x$SUBSPECIES.SCIENTIFIC.NAME[which(x$LATITUDE>5&x$LONGITUDE>12&x$LONGITUDE<20)]="genderuensis"
x$SUBSPECIES.SCIENTIFIC.NAME=as.factor(x$SUBSPECIES.SCIENTIFIC.NAME)
summary(x$SUBSPECIES.SCIENTIFIC.NAME)
## genderuensis preussi reichenowi
## 9 169 827
eBird data has excellent georeferencing, but there are still many areas no one has submitted eBird data from. Thus, I am merging some specimen georeferencing into the below database. I am indebted to Pascal Eckhoff and Sylke Franhert for making data available regarding Riggenbach for this section.
z=read.csv("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Georeference_Cinnyris_Specimens.csv")
z1.5=z[,-c(2:6)]
z1.5=unique(z1.5)
z2=z1.5[,c(1,3,2)]
#z2=unique(z[,c("Subspecies","Long","Lat")])
colnames(z2)=c("SUBSPECIES.SCIENTIFIC.NAME",
"LONGITUDE","LATITUDE")
z2$SUBSPECIES.SCIENTIFIC.NAME=as.factor(as.character(z2$SUBSPECIES.SCIENTIFIC.NAME))
z2$SOURCE="Specimen"
Next, I will reduce to unique localities and run a rarefication to remove spatial bias. This code was provided by Dr. Joe Manthey. We are using 30 arcsecond grid cells so we will reduce the data so that all points are at least 3 km from the nearest point. All populations are spatially separated enough to ignore subspecific assignment and run the rarefy on the entire dataset here; while it is possible that some overlap may exist between genderuensis and preussi, the contact zone has been severely deforested and lacks eBird observations.
dist.test=function(point1long,point1lat,point2long,point2lat){
dist.rep=deg.dist(point1long,point1lat,point2long,point2lat)
return(dist.rep)
}
x2=x[,c("SUBSPECIES.SCIENTIFIC.NAME",
"LONGITUDE","LATITUDE","SOURCE")]
x2=rbind(x2,z2)
x2=unique(x2)
x.n=nrow(x2)
output=x2[1,]
test.point=x2[1,]
x2=x2[2:nrow(x),]
keep_going=T
while(keep_going==T){
kg_test=dist.test(x2[,2],x2[,3],test.point[1,2],test.point[1,3])
x2=x2[kg_test>3,]
if(nrow(x2)>1){
output=rbind(output,x2[1,])
test.point=x2[1,]
x2=x2[2:nrow(x2),]
#writeLines(paste("Points remaining:", nrow(x2)))
}else{
keep_going=F
}
if(nrow(x2)==1){
output=rbind(output,x2[1,])
}
}
output=na.omit(output)
summary(output)
## SUBSPECIES.SCIENTIFIC.NAME LONGITUDE LATITUDE
## genderuensis:10 Min. : 8.50 Min. :-5.0306
## preussi :27 1st Qu.:13.88 1st Qu.:-1.2129
## reichenowi :98 Median :36.66 Median :-0.3858
## Mean :28.50 Mean : 0.8227
## 3rd Qu.:36.85 3rd Qu.: 3.6031
## Max. :37.63 Max. : 8.2104
## SOURCE
## Length:135
## Class :character
## Mode :character
##
##
##
This procedure of getting unique localities and reducing localities by distance has taken us from points to 135 points.
a=ggplot(output,aes(x=LONGITUDE,y=LATITUDE,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
d=coord_fixed()
plot1=a+b+c+d
print(plot1)
Now, we are going to incorporate some of the ecological layers from the ENVIREM dataset. I am going to extract the environmental layers for each point and then perform a PCA analysis to maximize the variation within these and to determine which variables are the most informative for separating these populations.
##Not run in entirety in markdown document
##This can be run separately if so desired
# Isolate odd numbered files
# rasterpath="path/to/envirem_africa/Africa_current-30s/"
y=list.files(rasterpath,pattern='*.bil')
#The following isolates files if "aux" files are present
#y2=1:length(y)
#y3=y2[y2 %% 2!=0]
#y2=y[y3]
#y2=y2[-10]
#Create raster stack of all objects
setwd(rasterpath)
#bils=stack(y[y3])
bils=stack(y)
#Visualize points on plot
##Ensure coordinates read correctly
plot(bils$current_30arcsec_minTempWarmest)
points(output[,-1],pch=19)
#Extract values for point localities from all layers
ext=extract(x=bils,y=output[,-c(1,4)])
#Free up memory
rm(bils)
#Create entire data frame
x=cbind(output,ext)
write.csv(x,
paste0(filepath,"Ecological Analysis/envirem_extracts.csv"),
quote=F,row.names=F)
#Perform PCA of environmental data
rda.x=rda(ext,scale=T)
rda.x.data=rda.x$CA$u
eigs=rda.x$CA$eig
w=NULL
for(i in 1:length(eigs)){
print(eigs[i]/sum(eigs))
w[i]=eigs[i]/sum(eigs)
}
## PC1
## 0.6328363
## PC2
## 0.173279
## PC3
## 0.08782684
## PC4
## 0.03333912
## PC5
## 0.02709639
## PC6
## 0.02229936
## PC7
## 0.009873212
## PC8
## 0.00562697
## PC9
## 0.003272304
## PC10
## 0.002097139
## PC11
## 0.001088964
## PC12
## 0.0006532066
## PC13
## 0.0003753352
## PC14
## 0.0002194165
## PC15
## 9.323964e-05
## PC16
## 2.322726e-05
plot(x=1:length(w),y=w,pch=19,main="PCA Eigenvalues")
x=cbind(x,rda.x.data)
colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi",
"genderuensis","parvirostris",
"Unknown")
colScale=scale_color_manual(name="grp",values=colorset)
a=ggplot(x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME,shape=SOURCE))
b=theme(panel.background = element_rect(fill="white",color = "grey50"),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15),
legend.title = element_blank(),
legend.text = element_text(size=15))
c=geom_point(size=1.5)
d=stat_ellipse()
e=colScale
plot1=a+b+c+d+e
print(plot1)
## Too few points to calculate an ellipse
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Too few points to calculate an ellipse
## Warning: Removed 2 row(s) containing missing values (geom_path).
contrib=rda.x$CA$v
#Isolate most important PC's
contrib2=contrib[,1:3]
xx=rowSums(abs(contrib2))
print(xx[order(xx,decreasing = T)])
## current_30arcsec_continentality
## 1.0103746
## current_30arcsec_PETseasonality
## 0.9334945
## current_30arcsec_PETWettestQuarter
## 0.8003781
## current_30arcsec_minTempWarmest
## 0.7942127
## current_30arcsec_embergerQ
## 0.6920052
## current_30arcsec_aridityIndexThornthwaite
## 0.6633440
## current_30arcsec_climaticMoistureIndex
## 0.6297976
## current_30arcsec_maxTempColdest
## 0.6139360
## current_30arcsec_thermicityIndex
## 0.6115650
## current_30arcsec_growingDegDays0
## 0.6055714
## current_30arcsec_annualPET
## 0.6009393
## current_30arcsec_growingDegDays5
## 0.5993007
## current_30arcsec_PETColdestQuarter
## 0.5790410
## current_30arcsec_PETWarmestQuarter
## 0.4879567
## current_30arcsec_PETDriestQuarter
## 0.4585063
## current_30arcsec_monthCountByTemp10
## 0.3241039
From above, we get that the most important variables for the first three PC’s are:
Everything after layer 5 plateaus in terms of its contribution.
We can also look at the correlation between layers to determine what should be removed:
All maintained:
#colnames(x)
ext=x[,5:20]
cor(ext)
## current_30arcsec_annualPET
## current_30arcsec_annualPET 1.0000000
## current_30arcsec_aridityIndexThornthwaite 0.5356523
## current_30arcsec_climaticMoistureIndex -0.7425039
## current_30arcsec_continentality 0.3113893
## current_30arcsec_embergerQ -0.6852135
## current_30arcsec_growingDegDays0 0.7173201
## current_30arcsec_growingDegDays5 0.7234581
## current_30arcsec_maxTempColdest 0.8519527
## current_30arcsec_minTempWarmest 0.5029645
## current_30arcsec_monthCountByTemp10 0.6226440
## current_30arcsec_PETColdestQuarter 0.9509294
## current_30arcsec_PETDriestQuarter 0.8647464
## current_30arcsec_PETseasonality 0.6248540
## current_30arcsec_PETWarmestQuarter 0.9805415
## current_30arcsec_PETWettestQuarter 0.9294288
## current_30arcsec_thermicityIndex 0.7071537
## current_30arcsec_aridityIndexThornthwaite
## current_30arcsec_annualPET 0.5356523
## current_30arcsec_aridityIndexThornthwaite 1.0000000
## current_30arcsec_climaticMoistureIndex -0.4086194
## current_30arcsec_continentality 0.4067714
## current_30arcsec_embergerQ -0.2766005
## current_30arcsec_growingDegDays0 0.6919106
## current_30arcsec_growingDegDays5 0.6990780
## current_30arcsec_maxTempColdest 0.6169803
## current_30arcsec_minTempWarmest 0.6682661
## current_30arcsec_monthCountByTemp10 0.5440015
## current_30arcsec_PETColdestQuarter 0.4951528
## current_30arcsec_PETDriestQuarter 0.5603930
## current_30arcsec_PETseasonality 0.4584248
## current_30arcsec_PETWarmestQuarter 0.5841881
## current_30arcsec_PETWettestQuarter 0.3219686
## current_30arcsec_thermicityIndex 0.6822435
## current_30arcsec_climaticMoistureIndex
## current_30arcsec_annualPET -0.7425039
## current_30arcsec_aridityIndexThornthwaite -0.4086194
## current_30arcsec_climaticMoistureIndex 1.0000000
## current_30arcsec_continentality -0.3819811
## current_30arcsec_embergerQ 0.9159646
## current_30arcsec_growingDegDays0 -0.3649445
## current_30arcsec_growingDegDays5 -0.3715665
## current_30arcsec_maxTempColdest -0.4993619
## current_30arcsec_minTempWarmest -0.1885577
## current_30arcsec_monthCountByTemp10 -0.4027436
## current_30arcsec_PETColdestQuarter -0.6339926
## current_30arcsec_PETDriestQuarter -0.4968806
## current_30arcsec_PETseasonality -0.6734906
## current_30arcsec_PETWarmestQuarter -0.7073433
## current_30arcsec_PETWettestQuarter -0.8012461
## current_30arcsec_thermicityIndex -0.3518836
## current_30arcsec_continentality
## current_30arcsec_annualPET 0.3113893
## current_30arcsec_aridityIndexThornthwaite 0.4067714
## current_30arcsec_climaticMoistureIndex -0.3819811
## current_30arcsec_continentality 1.0000000
## current_30arcsec_embergerQ -0.4153093
## current_30arcsec_growingDegDays0 0.2479743
## current_30arcsec_growingDegDays5 0.2496265
## current_30arcsec_maxTempColdest 0.1203737
## current_30arcsec_minTempWarmest 0.2681165
## current_30arcsec_monthCountByTemp10 0.2090693
## current_30arcsec_PETColdestQuarter 0.1522460
## current_30arcsec_PETDriestQuarter 0.1924481
## current_30arcsec_PETseasonality 0.8129495
## current_30arcsec_PETWarmestQuarter 0.4184588
## current_30arcsec_PETWettestQuarter 0.2405960
## current_30arcsec_thermicityIndex 0.2492855
## current_30arcsec_embergerQ
## current_30arcsec_annualPET -0.685213476
## current_30arcsec_aridityIndexThornthwaite -0.276600501
## current_30arcsec_climaticMoistureIndex 0.915964616
## current_30arcsec_continentality -0.415309279
## current_30arcsec_embergerQ 1.000000000
## current_30arcsec_growingDegDays0 -0.179704348
## current_30arcsec_growingDegDays5 -0.190039152
## current_30arcsec_maxTempColdest -0.342118724
## current_30arcsec_minTempWarmest 0.007497584
## current_30arcsec_monthCountByTemp10 -0.313475643
## current_30arcsec_PETColdestQuarter -0.542920360
## current_30arcsec_PETDriestQuarter -0.446684078
## current_30arcsec_PETseasonality -0.725232433
## current_30arcsec_PETWarmestQuarter -0.659458836
## current_30arcsec_PETWettestQuarter -0.765204883
## current_30arcsec_thermicityIndex -0.163912787
## current_30arcsec_growingDegDays0
## current_30arcsec_annualPET 0.7173201
## current_30arcsec_aridityIndexThornthwaite 0.6919106
## current_30arcsec_climaticMoistureIndex -0.3649445
## current_30arcsec_continentality 0.2479743
## current_30arcsec_embergerQ -0.1797043
## current_30arcsec_growingDegDays0 1.0000000
## current_30arcsec_growingDegDays5 0.9946251
## current_30arcsec_maxTempColdest 0.9406786
## current_30arcsec_minTempWarmest 0.9560046
## current_30arcsec_monthCountByTemp10 0.6704893
## current_30arcsec_PETColdestQuarter 0.7744221
## current_30arcsec_PETDriestQuarter 0.7583540
## current_30arcsec_PETseasonality 0.3029431
## current_30arcsec_PETWarmestQuarter 0.7384720
## current_30arcsec_PETWettestQuarter 0.4821080
## current_30arcsec_thermicityIndex 0.9986232
## current_30arcsec_growingDegDays5
## current_30arcsec_annualPET 0.7234581
## current_30arcsec_aridityIndexThornthwaite 0.6990780
## current_30arcsec_climaticMoistureIndex -0.3715665
## current_30arcsec_continentality 0.2496265
## current_30arcsec_embergerQ -0.1900392
## current_30arcsec_growingDegDays0 0.9946251
## current_30arcsec_growingDegDays5 1.0000000
## current_30arcsec_maxTempColdest 0.9390167
## current_30arcsec_minTempWarmest 0.9468207
## current_30arcsec_monthCountByTemp10 0.7015541
## current_30arcsec_PETColdestQuarter 0.7763068
## current_30arcsec_PETDriestQuarter 0.7611201
## current_30arcsec_PETseasonality 0.3099811
## current_30arcsec_PETWarmestQuarter 0.7423933
## current_30arcsec_PETWettestQuarter 0.4959363
## current_30arcsec_thermicityIndex 0.9927087
## current_30arcsec_maxTempColdest
## current_30arcsec_annualPET 0.8519527
## current_30arcsec_aridityIndexThornthwaite 0.6169803
## current_30arcsec_climaticMoistureIndex -0.4993619
## current_30arcsec_continentality 0.1203737
## current_30arcsec_embergerQ -0.3421187
## current_30arcsec_growingDegDays0 0.9406786
## current_30arcsec_growingDegDays5 0.9390167
## current_30arcsec_maxTempColdest 1.0000000
## current_30arcsec_minTempWarmest 0.8219372
## current_30arcsec_monthCountByTemp10 0.6757719
## current_30arcsec_PETColdestQuarter 0.9168618
## current_30arcsec_PETDriestQuarter 0.8409024
## current_30arcsec_PETseasonality 0.3149568
## current_30arcsec_PETWarmestQuarter 0.8361990
## current_30arcsec_PETWettestQuarter 0.6740010
## current_30arcsec_thermicityIndex 0.9396267
## current_30arcsec_minTempWarmest
## current_30arcsec_annualPET 0.502964461
## current_30arcsec_aridityIndexThornthwaite 0.668266080
## current_30arcsec_climaticMoistureIndex -0.188557678
## current_30arcsec_continentality 0.268116512
## current_30arcsec_embergerQ 0.007497584
## current_30arcsec_growingDegDays0 0.956004592
## current_30arcsec_growingDegDays5 0.946820723
## current_30arcsec_maxTempColdest 0.821937192
## current_30arcsec_minTempWarmest 1.000000000
## current_30arcsec_monthCountByTemp10 0.569843180
## current_30arcsec_PETColdestQuarter 0.585297990
## current_30arcsec_PETDriestQuarter 0.599587572
## current_30arcsec_PETseasonality 0.192523103
## current_30arcsec_PETWarmestQuarter 0.538127313
## current_30arcsec_PETWettestQuarter 0.229093254
## current_30arcsec_thermicityIndex 0.958133142
## current_30arcsec_monthCountByTemp10
## current_30arcsec_annualPET 0.6226440
## current_30arcsec_aridityIndexThornthwaite 0.5440015
## current_30arcsec_climaticMoistureIndex -0.4027436
## current_30arcsec_continentality 0.2090693
## current_30arcsec_embergerQ -0.3134756
## current_30arcsec_growingDegDays0 0.6704893
## current_30arcsec_growingDegDays5 0.7015541
## current_30arcsec_maxTempColdest 0.6757719
## current_30arcsec_minTempWarmest 0.5698432
## current_30arcsec_monthCountByTemp10 1.0000000
## current_30arcsec_PETColdestQuarter 0.6035413
## current_30arcsec_PETDriestQuarter 0.5521891
## current_30arcsec_PETseasonality 0.3423416
## current_30arcsec_PETWarmestQuarter 0.6204620
## current_30arcsec_PETWettestQuarter 0.5285099
## current_30arcsec_thermicityIndex 0.6657654
## current_30arcsec_PETColdestQuarter
## current_30arcsec_annualPET 0.9509294
## current_30arcsec_aridityIndexThornthwaite 0.4951528
## current_30arcsec_climaticMoistureIndex -0.6339926
## current_30arcsec_continentality 0.1522460
## current_30arcsec_embergerQ -0.5429204
## current_30arcsec_growingDegDays0 0.7744221
## current_30arcsec_growingDegDays5 0.7763068
## current_30arcsec_maxTempColdest 0.9168618
## current_30arcsec_minTempWarmest 0.5852980
## current_30arcsec_monthCountByTemp10 0.6035413
## current_30arcsec_PETColdestQuarter 1.0000000
## current_30arcsec_PETDriestQuarter 0.8780569
## current_30arcsec_PETseasonality 0.4095537
## current_30arcsec_PETWarmestQuarter 0.9166130
## current_30arcsec_PETWettestQuarter 0.8460238
## current_30arcsec_thermicityIndex 0.7703611
## current_30arcsec_PETDriestQuarter
## current_30arcsec_annualPET 0.8647464
## current_30arcsec_aridityIndexThornthwaite 0.5603930
## current_30arcsec_climaticMoistureIndex -0.4968806
## current_30arcsec_continentality 0.1924481
## current_30arcsec_embergerQ -0.4466841
## current_30arcsec_growingDegDays0 0.7583540
## current_30arcsec_growingDegDays5 0.7611201
## current_30arcsec_maxTempColdest 0.8409024
## current_30arcsec_minTempWarmest 0.5995876
## current_30arcsec_monthCountByTemp10 0.5521891
## current_30arcsec_PETColdestQuarter 0.8780569
## current_30arcsec_PETDriestQuarter 1.0000000
## current_30arcsec_PETseasonality 0.4567340
## current_30arcsec_PETWarmestQuarter 0.8748009
## current_30arcsec_PETWettestQuarter 0.6890573
## current_30arcsec_thermicityIndex 0.7475316
## current_30arcsec_PETseasonality
## current_30arcsec_annualPET 0.6248540
## current_30arcsec_aridityIndexThornthwaite 0.4584248
## current_30arcsec_climaticMoistureIndex -0.6734906
## current_30arcsec_continentality 0.8129495
## current_30arcsec_embergerQ -0.7252324
## current_30arcsec_growingDegDays0 0.3029431
## current_30arcsec_growingDegDays5 0.3099811
## current_30arcsec_maxTempColdest 0.3149568
## current_30arcsec_minTempWarmest 0.1925231
## current_30arcsec_monthCountByTemp10 0.3423416
## current_30arcsec_PETColdestQuarter 0.4095537
## current_30arcsec_PETDriestQuarter 0.4567340
## current_30arcsec_PETseasonality 1.0000000
## current_30arcsec_PETWarmestQuarter 0.6878161
## current_30arcsec_PETWettestQuarter 0.5954100
## current_30arcsec_thermicityIndex 0.2878791
## current_30arcsec_PETWarmestQuarter
## current_30arcsec_annualPET 0.9805415
## current_30arcsec_aridityIndexThornthwaite 0.5841881
## current_30arcsec_climaticMoistureIndex -0.7073433
## current_30arcsec_continentality 0.4184588
## current_30arcsec_embergerQ -0.6594588
## current_30arcsec_growingDegDays0 0.7384720
## current_30arcsec_growingDegDays5 0.7423933
## current_30arcsec_maxTempColdest 0.8361990
## current_30arcsec_minTempWarmest 0.5381273
## current_30arcsec_monthCountByTemp10 0.6204620
## current_30arcsec_PETColdestQuarter 0.9166130
## current_30arcsec_PETDriestQuarter 0.8748009
## current_30arcsec_PETseasonality 0.6878161
## current_30arcsec_PETWarmestQuarter 1.0000000
## current_30arcsec_PETWettestQuarter 0.8842324
## current_30arcsec_thermicityIndex 0.7299431
## current_30arcsec_PETWettestQuarter
## current_30arcsec_annualPET 0.9294288
## current_30arcsec_aridityIndexThornthwaite 0.3219686
## current_30arcsec_climaticMoistureIndex -0.8012461
## current_30arcsec_continentality 0.2405960
## current_30arcsec_embergerQ -0.7652049
## current_30arcsec_growingDegDays0 0.4821080
## current_30arcsec_growingDegDays5 0.4959363
## current_30arcsec_maxTempColdest 0.6740010
## current_30arcsec_minTempWarmest 0.2290933
## current_30arcsec_monthCountByTemp10 0.5285099
## current_30arcsec_PETColdestQuarter 0.8460238
## current_30arcsec_PETDriestQuarter 0.6890573
## current_30arcsec_PETseasonality 0.5954100
## current_30arcsec_PETWarmestQuarter 0.8842324
## current_30arcsec_PETWettestQuarter 1.0000000
## current_30arcsec_thermicityIndex 0.4735434
## current_30arcsec_thermicityIndex
## current_30arcsec_annualPET 0.7071537
## current_30arcsec_aridityIndexThornthwaite 0.6822435
## current_30arcsec_climaticMoistureIndex -0.3518836
## current_30arcsec_continentality 0.2492855
## current_30arcsec_embergerQ -0.1639128
## current_30arcsec_growingDegDays0 0.9986232
## current_30arcsec_growingDegDays5 0.9927087
## current_30arcsec_maxTempColdest 0.9396267
## current_30arcsec_minTempWarmest 0.9581331
## current_30arcsec_monthCountByTemp10 0.6657654
## current_30arcsec_PETColdestQuarter 0.7703611
## current_30arcsec_PETDriestQuarter 0.7475316
## current_30arcsec_PETseasonality 0.2878791
## current_30arcsec_PETWarmestQuarter 0.7299431
## current_30arcsec_PETWettestQuarter 0.4735434
## current_30arcsec_thermicityIndex 1.0000000
z=cor(ext)
Removing layers:
cor(ext[,-c(1,3,6,7,10,11,14,16)])
## current_30arcsec_aridityIndexThornthwaite
## current_30arcsec_aridityIndexThornthwaite 1.0000000
## current_30arcsec_continentality 0.4067714
## current_30arcsec_embergerQ -0.2766005
## current_30arcsec_maxTempColdest 0.6169803
## current_30arcsec_minTempWarmest 0.6682661
## current_30arcsec_PETDriestQuarter 0.5603930
## current_30arcsec_PETseasonality 0.4584248
## current_30arcsec_PETWettestQuarter 0.3219686
## current_30arcsec_continentality
## current_30arcsec_aridityIndexThornthwaite 0.4067714
## current_30arcsec_continentality 1.0000000
## current_30arcsec_embergerQ -0.4153093
## current_30arcsec_maxTempColdest 0.1203737
## current_30arcsec_minTempWarmest 0.2681165
## current_30arcsec_PETDriestQuarter 0.1924481
## current_30arcsec_PETseasonality 0.8129495
## current_30arcsec_PETWettestQuarter 0.2405960
## current_30arcsec_embergerQ
## current_30arcsec_aridityIndexThornthwaite -0.276600501
## current_30arcsec_continentality -0.415309279
## current_30arcsec_embergerQ 1.000000000
## current_30arcsec_maxTempColdest -0.342118724
## current_30arcsec_minTempWarmest 0.007497584
## current_30arcsec_PETDriestQuarter -0.446684078
## current_30arcsec_PETseasonality -0.725232433
## current_30arcsec_PETWettestQuarter -0.765204883
## current_30arcsec_maxTempColdest
## current_30arcsec_aridityIndexThornthwaite 0.6169803
## current_30arcsec_continentality 0.1203737
## current_30arcsec_embergerQ -0.3421187
## current_30arcsec_maxTempColdest 1.0000000
## current_30arcsec_minTempWarmest 0.8219372
## current_30arcsec_PETDriestQuarter 0.8409024
## current_30arcsec_PETseasonality 0.3149568
## current_30arcsec_PETWettestQuarter 0.6740010
## current_30arcsec_minTempWarmest
## current_30arcsec_aridityIndexThornthwaite 0.668266080
## current_30arcsec_continentality 0.268116512
## current_30arcsec_embergerQ 0.007497584
## current_30arcsec_maxTempColdest 0.821937192
## current_30arcsec_minTempWarmest 1.000000000
## current_30arcsec_PETDriestQuarter 0.599587572
## current_30arcsec_PETseasonality 0.192523103
## current_30arcsec_PETWettestQuarter 0.229093254
## current_30arcsec_PETDriestQuarter
## current_30arcsec_aridityIndexThornthwaite 0.5603930
## current_30arcsec_continentality 0.1924481
## current_30arcsec_embergerQ -0.4466841
## current_30arcsec_maxTempColdest 0.8409024
## current_30arcsec_minTempWarmest 0.5995876
## current_30arcsec_PETDriestQuarter 1.0000000
## current_30arcsec_PETseasonality 0.4567340
## current_30arcsec_PETWettestQuarter 0.6890573
## current_30arcsec_PETseasonality
## current_30arcsec_aridityIndexThornthwaite 0.4584248
## current_30arcsec_continentality 0.8129495
## current_30arcsec_embergerQ -0.7252324
## current_30arcsec_maxTempColdest 0.3149568
## current_30arcsec_minTempWarmest 0.1925231
## current_30arcsec_PETDriestQuarter 0.4567340
## current_30arcsec_PETseasonality 1.0000000
## current_30arcsec_PETWettestQuarter 0.5954100
## current_30arcsec_PETWettestQuarter
## current_30arcsec_aridityIndexThornthwaite 0.3219686
## current_30arcsec_continentality 0.2405960
## current_30arcsec_embergerQ -0.7652049
## current_30arcsec_maxTempColdest 0.6740010
## current_30arcsec_minTempWarmest 0.2290933
## current_30arcsec_PETDriestQuarter 0.6890573
## current_30arcsec_PETseasonality 0.5954100
## current_30arcsec_PETWettestQuarter 1.0000000
Using the above to guide what is most important, I have removed the following, which are more than 85% correlated with another layer of importance within the sample data. This will be compared to the overall patterns of covariation within the 2.5 arcminute data used for distribution modeling.
rasterpath="path/to/envirem_africa/Africa_current_2.5arcmin_generic/"
y=list.files(rasterpath,pattern="*.bil")
y1=paste0(rasterpath,y)
y=stack(y1)
nc=ncell(y)
valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
#Removing those with no correlation to make it easier to read
#Remove non continuous counts
valsT2=na.omit(valsT[,-c(1,3,6,7,10,11,13,14,16)])
cor(valsT2)
abs(cor(valsT2))>.85
Continentality was more important for the data points, so it was kept and seasonality removed.
Final kept layers are:
The following section outlies the procedure for creation ecological niche models (ENMs) for these taxa, and subsequently testing for niche divergence. Additionally, we will be projecting these layers through time to see the likely pathway of colonization for the species.
Current models are restricted to the following area:
## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read
The species does not currently occur in Angola, so we are excluding it from the training region. Past projections will occur broader areas, in part because we have no way of knowing for certain if local extinction has occurred.
We need to reduce the current environmental layers to the training area. This will create more accurate models and will reduce the strain on our processors at the same time. The extracts are from the 30s dataframe, but the models are made with 2.5 for computing reasons.
y=list.files(rasterpath,pattern="*.bil")[-c(1,3,6,7,10,11,13,14,16)]
Note that we do not have any aux files at the present time, so I do not need to omit them from the file list.
#Set lists for shapefiles of M and for bioclim layers of datasets
m="~/Dropbox/GIS/small-af/cinnyris.shp"
bioclim=paste0(rasterpath,y)
#Reformat variables for the loop code
filelist=bioclim
ShapeFile=m
#set save directory
# SaveDir="path/to/Ecological Analysis/Africa_current-2.5m_CLIPPED/"
#Note that there is a section that must be edited each time in function
CropLoop<-function(filelist=NA,ShapeFile=NA,SaveDir)
{
require(maptools)
require(raster)
Shp1 = readShapePoly(ShapeFile)
for (i in 1:length(filelist))
{
r1 = raster(filelist[i])
cr1 = crop(r1,Shp1)
cr2 = raster::mask(cr1,Shp1) #Avoid confusion with other packages
#Get number of elements in filename
j2=unlist(strsplit(as.character(filelist[i]),"[/]"))
n=length(j2)
#Get filename
j=strsplit(as.character(filelist[i]),"[/]")[[1]][n]
FileName=strsplit(as.character(j),"[.]")[[1]][1]
#Save as ASCII
writeRaster(cr2,paste0(SaveDir,FileName),"ascii",overwrite=T)
#plot(cr2)
print(FileName)
}
}
CropLoop(filelist=filelist,ShapeFile=ShapeFile,SaveDir=SaveDir)
The above code was not run while concatonating this document, but did run successfully for all files.
This code is based on code from Dr. Jorge Soberón. It will create models for all time periods at the same time.
First, we must define the function that calculates the distance from a point p to an ellipse of centroid m and matrix s. The parameters are thus: p test point; m ellipse centroid; s inverse matrix of the covariance of the ellipse.
#MAJA function
maja=function(p,m,s)((p-m)%*%s%*%t(p-m))^0.5
#Quantile function
##Double check function? divide by 1 is 1...
##changed to 4, for quantiles
NDquantil=function(nD,level){
return(round(nD*level))
}
These minimum volume ellipsoids are less sensitive to point density, and do not rely on pseudoabsence data for determining where species do not occur.
genderu=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"),]
reich=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="reichenowi"),]
preuss=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="preussi"),]
y1=stack(paste0(rasterpath,y))
y=y1
The only thing that is different from the previous look at correlation is that seasonality and continentality are correlated. These are also the two most important parts from the PC; I am keeping both of them here, as they passed the previous tests of correlation and this is a coarser dataset.
#Create function for individual plot formation
ssp.plot=function(ssp,ssp.text){
vals=extract(x=y,y=ssp[,2:3])
vals=na.omit(vals)
vals=unique(vals)
#vals=vals[,-10]
n1=NDquantil(nrow(vals),0.9)
#for(i in 1:ncol(vals)){print(IQR(vals[,i]))}
mve1=cov.mve(vals,quantile.use=n1)
nc=ncell(y)
mu1=matrix(mve1$center,nrow=1)
s1=mve1$cov
invs1=solve(s1)
dT1=matrix(0,ncol=1,nrow=nc)
#Load values for current time period
valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
#Create current models
valsT1=as.matrix(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y),ncol=ncol(y),
ext=extent(y),resolution=res(y),vals=dT1)
setwd(paste0(filepath,'Ecological Analysis/raw-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
plot(q)
ext=extract(x=q,y=ssp[,2:3])
ext2=na.omit(ext)
#Remove the furthest 20% to reflect issues with plotting in eBird
#threshold based on these values
ext2=ext2[order(ext2)]
cutoff=round(0.8*length(ext2))
ext3=ext2[1:cutoff]
##The following sets binary presence to everything above 1.5 sd below mean of occurrence
#n=max(ext2)
#ND=(round(n*0.95))
#m=c(NA,NA,NA,0,ND,1,ND,Inf,0)
#m=matrix(m,ncol=3,byrow=T)
#rc=reclassify(q,m)
#y2=y[which(ext>ND),]
#Everything up to 1.5 sd above the mean included
#ext2
sdext=sd(ext3)
mext=mean(ext3)
ND=mext+1.5*sdext
m=c(NA,NA,NA,0,ND,1,ND,Inf,0)
#Current threshold
##Used only here; heirarchical for other parts
m=matrix(m,ncol=3,byrow=T)
rc=reclassify(q,m)
y2=y[which(ext>ND),]
#Create color threshold for past models
#color change for every standard deviation
#New threshold on current conditions, then hierarchical
#Created here, executed further down
m2=m
m=c(NA,NA,NA,
0,(mext+(1.5*sdext)),1,
(mext+(1.5*sdext)),(mext+(3*sdext)),2,
(mext+(3*sdext)),(mext+(6*sdext)),3,
(mext+(6*sdext)),(mext+(12*sdext)),4,
(mext+(12*sdext)),(mext+(24*sdext)),5,
(mext+(24*sdext)),Inf,6)
m=matrix(m,ncol=3,byrow=T)
species=ssp.text
if(nrow(y2)!=0){
setwd(paste0(filepath,"Ecological Analysis/threshold-mve/"))
write.csv(y2,file=paste0(species,'_out.csv'),quote=F,row.names=F)
}
pathway=paste0(filepath,"Ecological Analysis/threshold-mve/",
species,".asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
points(ssp[,2:3],pch=19,col="black")
#threshold classify tier
thresh=reclassify(q,m)
pathway=paste0(filepath,"Ecological Analysis/threshold-mve/",
species,"-tier.asc",sep="")
writeRaster(thresh,pathway,overwrite=T)
plot(thresh)
rm(thresh)
#Create color bands of how far it is from center
#Holocene
##CCSM
rm(valsT1)
y.l=stack(paste0(holopath1,holo1))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,'Ecological Analysis/holo-ccsm-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("Holocene CCSM")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,"Ecological Analysis/holo-ccsm-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
##miroc
rm(valsT1)
y.l=stack(paste0(holopath2,holo2))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,'Ecological Analysis/holo-miroc-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("Holocene MIROC")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,"Ecological Analysis/holo-miroc-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
##mpi
rm(valsT)
rm(valsT1)
y.l=stack(paste0(holopath3,holo3))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,'Ecological Analysis/holo-mpi-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("Holocene MPI")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,"Ecological Analysis/holo-mpi-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
#Last Glacial Maximum
##CCSM
rm(valsT1)
y.l=stack(paste0(lgmpath1,lgm1))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,'Ecological Analysis/lgm-ccsm-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("Last Glacial Maximum CCSM")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,"Ecological Analysis/lgm-ccsm-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
##miroc
rm(valsT)
rm(valsT1)
y.l=stack(paste0(lgmpath2,lgm2))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,'Ecological Analysis/lgm-miroc-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("LGM MIROC")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,"Ecological Analysis/lgm-miroc-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
##mpi
rm(valsT)
rm(valsT1)
y.l=stack(paste0(lgmpath3,lgm3))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,'Ecological Analysis/lgm-mpi-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("LGM MPI")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,"Ecological Analysis/lgm-mpi-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
}
Now, to perform individual iterations of the MVE script.
ssp.plot(ssp=preuss,ssp.text="preussi")
ssp.plot(ssp=reich,ssp.text="reichenowi")
ssp.plot(ssp=genderu,ssp.text="genderuensis")
Using these variables, we can look at the occupied niche areas of the populations and see how divergent they are. This will be done using custom scripts from Cooper & Barragan (unpublished), based on the methodology of Warren et al.
In QGIS, I created individual shapefiles of the “regions” that each species inhabits. For each of these regions, I want to create 100 “random” niche models to compare, each model created using random points from within each species’ accessible area. These accessible areas are defined by biogeography, and are an attempt to encompass the geographically accessible area around each species.
nc=ncell(y)
valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
valsT1=as.matrix(valsT)
GISpath='~/Dropbox/GIS/small-af/'
randomizer=function(data,type,sp.text){
# make sure GIS path goes to the M files
x=readShapePoly(paste0(GISpath,sp.text,'.shp'))
dT1=matrix(0,ncol=1,nrow=nc)
nx=nrow(data)
for(i in 1:100){
yy=spsample(x=x,n=nx,type=type)
#Alternate method, not as effective
#yy=randomPoints(mask=x,n=nrow(data),
# p=data[,2:3],excludep=T,
# cellnumbers=F,tryf=5)
yy2=as.data.frame(coordinates(yy))
colnames(yy2)=c("Long","Lat")
yy2$Population=sp.text
yy2=yy2[,c('Population','Long','Lat')]
vals=extract(x=y,y=yy2[,2:3])
vals=na.omit(vals)
vals=unique(vals)
#vals=vals[,-10]
n1=NDquantil(nrow(vals),0.9)
#for(i in 1:ncol(vals)){print(IQR(vals[,i]))}
mve1=cov.mve(vals,quantile.use=n1)
mu1=matrix(mve1$center,nrow=1)
s1=mve1$cov
invs1=solve(s1)
dT1=matrix(0,ncol=1,nrow=nc)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y),ncol=ncol(y),
ext=extent(y),resolution=res(y),vals=dT1)
setwd(paste0(filepath,'Ecological Analysis/random/',sp.text,'/'))
sp1=sp.text
write.csv(yy2,
file=paste0(sp.text,"_random-",i,'.csv'),
quote=F,row.names=F)
writeRaster(q,
filename=paste0(sp.text,"_random-",i),
format='ascii',overwrite=T)
#plot(q)
}
}
randomizer(data=reich,type='random',sp.text="reichenowi")
randomizer(data=preuss,type='random',sp.text="preussi")
randomizer(data=genderu,type='random',sp.text="genderuensis")
Now, to compare niche distributions. First, we must reduce the datasets down to the number of points being used to train the above models.
#restrict to closest 80% of points to centroid for comparisons, just like models
#reichenowi
r.q=raster(paste0(filepath,
"Ecological Analysis/raw-mve/reichenowi.asc"))
reich$r.dist=extract(r.q,reich[,2:3])
hist(reich$r.dist)
reich=reich[order(reich$r.dist),]
r.pt=round(nrow(reich)*0.8)
plot(r.q)
points(reich[1:r.pt,2:3],col="black",pch=19)
points(reich[r.pt:nrow(reich),2:3],col="red",pch=19)
reich2=reich[1:r.pt,]
#preussi
r.q=raster(paste0(filepath,
"Ecological Analysis/raw-mve/preussi.asc"))
preuss$r.dist=extract(r.q,preuss[,2:3])
hist(preuss$r.dist)
preuss=preuss[order(preuss$r.dist),]
r.pt=round(nrow(preuss)*0.8)
plot(r.q)
points(preuss[1:r.pt,2:3],col="black",pch=19)
points(preuss[r.pt:nrow(preuss),2:3],col="red",pch=19)
preuss2=preuss[1:r.pt,]
#genderuensis
r.q=raster(paste0(filepath,
"Ecological Analysis/raw-mve/genderuensis.asc"))
genderu$r.dist=extract(r.q,genderu[,2:3])
colnames(genderu)
## [1] "SUBSPECIES.SCIENTIFIC.NAME"
## [2] "LONGITUDE"
## [3] "LATITUDE"
## [4] "SOURCE"
## [5] "current_30arcsec_annualPET"
## [6] "current_30arcsec_aridityIndexThornthwaite"
## [7] "current_30arcsec_climaticMoistureIndex"
## [8] "current_30arcsec_continentality"
## [9] "current_30arcsec_embergerQ"
## [10] "current_30arcsec_growingDegDays0"
## [11] "current_30arcsec_growingDegDays5"
## [12] "current_30arcsec_maxTempColdest"
## [13] "current_30arcsec_minTempWarmest"
## [14] "current_30arcsec_monthCountByTemp10"
## [15] "current_30arcsec_PETColdestQuarter"
## [16] "current_30arcsec_PETDriestQuarter"
## [17] "current_30arcsec_PETseasonality"
## [18] "current_30arcsec_PETWarmestQuarter"
## [19] "current_30arcsec_PETWettestQuarter"
## [20] "current_30arcsec_thermicityIndex"
## [21] "PC1"
## [22] "PC2"
## [23] "PC3"
## [24] "PC4"
## [25] "PC5"
## [26] "PC6"
## [27] "PC7"
## [28] "PC8"
## [29] "PC9"
## [30] "PC10"
## [31] "PC11"
## [32] "PC12"
## [33] "PC13"
## [34] "PC14"
## [35] "PC15"
## [36] "PC16"
## [37] "r.dist"
hist(genderu$r.dist)
genderu=genderu[order(genderu$r.dist),]
r.pt=round(nrow(genderu)*0.8)
plot(r.q)
points(genderu[1:r.pt,2:3],col="black",pch=19)
points(genderu[r.pt:nrow(genderu),2:3],col="red",pch=19)
genderu2=genderu[1:r.pt,]
And now to perform the tests.
# new filepath
filepath2="prev.filepath/Ecological Analysis/random/"
splist=list.files(filepath2)
#comparisons=matrix(nrow=100,ncol=2,data=NA)
truecomps=-99
truelists=matrix(nrow=100,ncol=1,data=-99)
for(i in 1:length(splist)){
sp=splist[i]
splist2=splist[-i]
null.x=raster(paste0(filepath,
"Ecological Analysis/raw-mve/",sp,".asc"))
comparisons=matrix(nrow=100,ncol=2,data=NA)
comparisons=as.data.frame(comparisons)
compvals=NULL
for(j in 1:length(splist2)){
comparelist=list.files(paste0(filepath2,splist2[j],"/"),
pattern="*.asc")
true2=raster(paste0(filepath,"Ecological Analysis/raw-mve/",
splist2[j],".asc"))
compvals=NULL
for(k in 1:length(comparelist)){
rando=raster(paste0(filepath2,splist2[j],"/",comparelist[k]))
compvals[k]=nicheOverlap(x=null.x,y=rando,stat="D")
}
comparisons[,j]=compvals
colnames(comparisons)[j]=paste0(splist[i],"-",splist2[j])
truecomps=c(truecomps,
nicheOverlap(x=null.x,y=true2,stat="D"))
}
truelists=cbind(truelists,comparisons)
}
truecomps2=t(as.data.frame(truecomps))
colnames(truecomps2)=colnames(truelists)
fullcomps=rbind(truecomps2,truelists)
write.csv(fullcomps,file=paste0(filepath,"Schoener-first-row-true.csv"),
quote=F,row.names=F)
We can now look at and compare the niche models derived from the MVE envelopes of where these species occur.
x=read.csv(paste0(filepath,"Schoener-first-row-true.csv"))
x=x[,-1]
head(x)
## genderuensis.preussi genderuensis.reichenowi preussi.genderuensis
## 1 0.7773179 0.7232936 0.7773179
## 2 0.7455431 0.7553698 0.7725751
## 3 0.7639776 0.7407890 0.6870270
## 4 0.7612553 0.7607214 0.6787750
## 5 0.7916149 0.7433471 0.7775531
## 6 0.7768061 0.7533292 0.6786293
## preussi.reichenowi reichenowi.genderuensis reichenowi.preussi
## 1 0.7449366 0.7232936 0.7449366
## 2 0.7968239 0.6799918 0.8161258
## 3 0.7899377 0.6628084 0.8365100
## 4 0.7938077 0.5893233 0.7848623
## 5 0.8011549 0.7177177 0.8007086
## 6 0.8098695 0.5764343 0.8129598
We know that the first row is the “true” comparisons. We can therefore compare these to the entire distribution of the comparisons.
datax=matrix(data=NA,nrow=600,ncol=2)
datax=as.data.frame(datax)
colnames(datax)=c("ID","Value")
trues=x[1,c(1,2,6)]
datax$ID[1:100]="genderuensis.preussi"
datax$ID[101:200]="genderuensis.reichenowi"
datax$ID[201:300]="preussi.genderuensis"
datax$ID[301:400]="preussi.reichenowi"
datax$ID[401:500]="reichenowi.genderuensis"
datax$ID[501:600]="reichenowi.preussi"
datax$Value[1:100]=x[-1,1]
datax$Value[101:200]=x[-1,2]
datax$Value[201:300]=x[-1,3]
datax$Value[301:400]=x[-1,4]
datax$Value[401:500]=x[-1,5]
datax$Value[501:600]=x[-1,6]
datax$ID=as.factor(datax$ID)
datax$Value=as.numeric(datax$Value)
We have created a new data frame that is easier to manipulate in ggplot to look at the results. We can now go through things iteratively.
gen.preus=datax[which(datax$ID=='genderuensis.preussi'|
datax$ID=='preussi.genderuensis'),]
inter=trues$genderuensis.preussi
a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
print(a+b+b.5+c+d)
gen.preus=datax[which(datax$ID=='genderuensis.reichenowi'|
datax$ID=='reichenowi.genderuensis'),]
inter=trues$genderuensis.reichenowi
a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
print(a+b+b.5+c+d)
gen.preus=datax[which(datax$ID=='reichenowi.preussi'|
datax$ID=='preussi.reichenowi'),]
inter=trues$reichenowi.preussi
a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
print(a+b+b.5+c+d)
We are going to calculate \(P\) values for these distributions and compare them to the test statistic.
dist=unique(datax$ID)
trues=x[1,]
for(i in 1:length(dist)){
distx=dist[i]
print(paste0("Testing ",distx))
datadist=datax[which(datax$ID==distx),]
xbar=trues[,which(colnames(trues)==distx)]
mu=mean(datadist$Value)
sigma=sd(datadist$Value)
n=nrow(datadist)
z=(xbar-mu)/(sigma/sqrt(n))
lowcrit=qnorm(p=0.025,mean=mu,sd=sigma)
hicrit=qnorm(p=0.975,mean=mu,sd=sigma)
if(xbar<lowcrit){
print("Test statistic below low critical value.")
print(paste0(lowcrit,"; statistic = ",xbar))
}
if(xbar>hicrit){
print("Test statistic above high critical value.")
print(paste0(hicrit,"; statistic = ",xbar))
}
print(paste0("P value for ",distx,
" = ",pnorm(xbar,
mean=mu,sd=sigma)))
}
## [1] "Testing genderuensis.preussi"
## [1] "P value for genderuensis.preussi = 0.706751492086142"
## [1] "Testing genderuensis.reichenowi"
## [1] "Test statistic below low critical value."
## [1] "0.736680233179814; statistic = 0.723293582149787"
## [1] "P value for genderuensis.reichenowi = 0.000223524531141092"
## [1] "Testing preussi.genderuensis"
## [1] "P value for preussi.genderuensis = 0.692896339416361"
## [1] "Testing preussi.reichenowi"
## [1] "Test statistic below low critical value."
## [1] "0.783994797585943; statistic = 0.744936612710812"
## [1] "P value for preussi.reichenowi = 5.72761756781268e-11"
## [1] "Testing reichenowi.genderuensis"
## [1] "P value for reichenowi.genderuensis = 0.918497156199243"
## [1] "Testing reichenowi.preussi"
## [1] "P value for reichenowi.preussi = 0.0338531774097366"
One last thing, visualizing the PC plots from the ENVIREM extracts.
x=read.csv(paste0(filepath,
"Ecological Analysis/envirem_extracts_PCA.csv"))
a=ggplot(data=x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
print(a+b+c)
a=ggplot(data=x,aes(x=PC3,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
print(a+b+c)
for(i in 4:19){
x[,i]=as.numeric(x[,i])
nombre=colnames(x)[i]
a=ggplot(data=x,aes(y=x[,i],x=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_boxplot(notch=T)
c=theme_classic()
d=ggtitle(paste(nombre))
print(a+b+c+d)
}
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
Lastly, I am going to do t-tests comparing the two Cameroonian populations to each other to see if they differ significantly in any aspects.
x2=x[x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"|
x$SUBSPECIES.SCIENTIFIC.NAME=="preussi",1:19]
rda.x=rda(x2[,-c(1:3)],scale=T)
rda.x.data=rda.x$CA$u
x3=cbind(x2,rda.x.data)
a=ggplot(x3,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
print(a+b+c)
Now for iterative tests. I’m assuming random distribution, but unequal population sizes.
for(i in 4:19){
names=unique(x3$SUBSPECIES.SCIENTIFIC.NAME)
pop1=x3[x3$SUBSPECIES.SCIENTIFIC.NAME==names[1],i]
pop2=x3[x3$SUBSPECIES.SCIENTIFIC.NAME==names[2],i]
print(colnames(x3)[i])
z=t.test(x=pop1,y=pop2,c="two.sided",conf.level=0.95)
print(z)
z2=wilcox.test(x=pop1,y=pop2,alternative="two.sided",conf.level=0.95)
print(z2)
}
## [1] "SOURCE"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 4.5826, df = 9, p-value = 0.001323
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.3544498 1.0455502
## sample estimates:
## mean of x mean of y
## 1.7 1.0
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 229.5, p-value = 2.167e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_annualPET"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 7.658, df = 22.159, p-value = 1.153e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 268.4117 467.6660
## sample estimates:
## mean of x mean of y
## 1647.304 1279.265
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 265, p-value = 1.091e-07
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_aridityIndexThornthwaite"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 2.9329, df = 24.676, p-value = 0.007148
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.474285 25.623938
## sample estimates:
## mean of x mean of y
## 69.44800 54.39889
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 221, p-value = 0.002403
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_climaticMoistureIndex"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = -9.8113, df = 14.521, p-value = 8.568e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.5867430 -0.3768125
## sample estimates:
## mean of x mean of y
## -0.0340000 0.4477778
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 0, p-value = 4.157e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_continentality"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 1.7608, df = 10.013, p-value = 0.1087
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1496194 1.2781379
## sample estimates:
## mean of x mean of y
## 3.105000 2.540741
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 192, p-value = 0.05255
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_embergerQ"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = -7.7642, df = 24.704, p-value = 4.38e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -443.2483 -257.3050
## sample estimates:
## mean of x mean of y
## 346.0630 696.3396
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 11, p-value = 1.114e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_growingDegDays0"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 5.7098, df = 28.498, p-value = 3.765e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 16239.61 34388.26
## sample estimates:
## mean of x mean of y
## 97932.60 72618.67
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 255, p-value = 3.778e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_growingDegDays5"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 5.6426, df = 29.423, p-value = 4.059e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 16280.61 34774.81
## sample estimates:
## mean of x mean of y
## 97932.60 72404.89
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 255, p-value = 3.778e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_maxTempColdest"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 5.8465, df = 19.613, p-value = 1.098e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 45.12414 95.28327
## sample estimates:
## mean of x mean of y
## 262.5000 192.2963
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 257, p-value = 3.232e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_minTempWarmest"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 4.5908, df = 22.989, p-value = 0.0001292
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 29.74997 78.55374
## sample estimates:
## mean of x mean of y
## 183.3000 129.1481
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 243, p-value = 0.0002357
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_monthCountByTemp10"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 1.776, df = 26, p-value = 0.08745
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1807469 2.4770432
## sample estimates:
## mean of x mean of y
## 12.00000 10.85185
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 150, p-value = 0.2949
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETColdestQuarter"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 5.4017, df = 13.438, p-value = 0.0001076
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 15.70166 36.51753
## sample estimates:
## mean of x mean of y
## 123.46700 97.35741
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 266, p-value = 6.89e-08
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETDriestQuarter"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 6.9832, df = 19.668, p-value = 9.776e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 25.10025 46.51606
## sample estimates:
## mean of x mean of y
## 148.0500 112.2419
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 260, p-value = 7.981e-07
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETseasonality"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 5.4913, df = 11.997, p-value = 0.0001383
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 308.3805 714.0727
## sample estimates:
## mean of x mean of y
## 1476.4240 965.1974
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 247, p-value = 2.798e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETWarmestQuarter"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 7.0929, df = 22.418, p-value = 3.675e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 23.82179 43.47814
## sample estimates:
## mean of x mean of y
## 152.847 119.197
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 265, p-value = 1.091e-07
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETWettestQuarter"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 9.1267, df = 26.74, p-value = 1.061e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 18.20741 28.77451
## sample estimates:
## mean of x mean of y
## 122.21800 98.72704
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 266, p-value = 6.89e-08
## alternative hypothesis: true location shift is not equal to 0
Example of divergence:
a=ggplot(data=x,aes(x=current_30arcsec_PETWettestQuarter,
y=current_30arcsec_embergerQ,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
print(a+b+c)
##Modeling for past climates
In the above loop code, I projected the MVEs of species occurrence into past climates for the Holocene and the Last Glacial Maximum. We can average the three scenarios together to create a “best guess” of the distance of each grid cell to the environmental centroid of a given species.
holo.ccsm=paste0(filepath,"Ecological Analysis/holo-ccsm-mve/",
list.files(paste0(filepath,
"Ecological Analysis/holo-ccsm-mve/"),
pattern="*.asc"))
holo.miroc=paste0(filepath,"Ecological Analysis/holo-miroc-mve/",
list.files(paste0(filepath,
"Ecological Analysis/holo-miroc-mve/"),
pattern="*.asc"))
holo.mpi=paste0(filepath,"Ecological Analysis/holo-mpi-mve/",
list.files(paste0(filepath,
"Ecological Analysis/holo-mpi-mve/"),
pattern="*.asc"))
lgm.ccsm=paste0(filepath,"Ecological Analysis/lgm-ccsm-mve/",
list.files(paste0(filepath,
"Ecological Analysis/lgm-ccsm-mve/"),
pattern="*.asc"))
lgm.miroc=paste0(filepath,"Ecological Analysis/lgm-miroc-mve/",
list.files(paste0(filepath,
"Ecological Analysis/lgm-miroc-mve/"),
pattern="*.asc"))
lgm.mpi=paste0(filepath,"Ecological Analysis/lgm-mpi-mve/",
list.files(paste0(filepath,
"Ecological Analysis/lgm-mpi-mve/"),
pattern="*.asc"))
Now we have a list of files for each scenario in the same order for each situation. Now we have to average these together and save them.
#Plot preussi
#avg holo
holo=stack(holo.ccsm[4],holo.miroc[4],holo.mpi[4])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/holo-all-avg/preussi-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
holo=stack(holo.ccsm[3],holo.miroc[3],holo.mpi[3])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/holo-all-avg/preussi-threshold-avg.asc"),
overwrite=T)
#Plot genderuensis
#avg holo
holo=stack(holo.ccsm[2],holo.miroc[2],holo.mpi[2])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/holo-all-avg/genderuensis-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
holo=stack(holo.ccsm[1],holo.miroc[1],holo.mpi[1])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/holo-all-avg/genderuensis-threshold-avg.asc"),
overwrite=T)
#Plot reichenowi
#avg holo
holo=stack(holo.ccsm[6],holo.miroc[6],holo.mpi[6])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/holo-all-avg/reichenowi-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
holo=stack(holo.ccsm[5],holo.miroc[5],holo.mpi[5])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/holo-all-avg/reichenowi-threshold-avg.asc"),
overwrite=T)
#Plot preussi
#avg lgm
lgm=stack(lgm.ccsm[4],lgm.miroc[4],lgm.mpi[4])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/lgm-all-avg/preussi-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
lgm=stack(lgm.ccsm[3],lgm.miroc[3],lgm.mpi[3])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/lgm-all-avg/preussi-threshold-avg.asc"),
overwrite=T)
#Plot genderuensis
#avg lgm
lgm=stack(lgm.ccsm[2],lgm.miroc[2],lgm.mpi[2])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/lgm-all-avg/genderuensis-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
lgm=stack(lgm.ccsm[1],lgm.miroc[1],lgm.mpi[1])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/lgm-all-avg/genderuensis-threshold-avg.asc"),
overwrite=T)
#Plot reichenowi
#avg lgm
lgm=stack(lgm.ccsm[6],lgm.miroc[6],lgm.mpi[6])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/lgm-all-avg/reichenowi-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
lgm=stack(lgm.ccsm[5],lgm.miroc[5],lgm.mpi[5])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/lgm-all-avg/reichenowi-threshold-avg.asc"),
overwrite=T)
Each subspecies tells us something about the colonization path across Africa. We can similar average these scenarios together to understand where, exactly, the species most likely cross Africa.
holo=paste0(filepath,"Ecological Analysis/holo-all-avg/",
list.files(paste0(filepath,
"Ecological Analysis/holo-all-avg/")))
lgm=paste0(filepath,"Ecological Analysis/lgm-all-avg/",
list.files(paste0(filepath,
"Ecological Analysis/lgm-all-avg/")))
#Average all occurrence
lgm2=stack(holo[1],holo[3],holo[5])
y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,"Ecological Analysis/holo-all-avg/holo-avg.asc"))
#Average all threshold
lgm2=stack(holo[2],holo[4],holo[6])
y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/holo-all-avg/holo-threshold-avg.asc"))
#Average all occurrence
lgm2=stack(lgm[1],lgm[3],lgm[5])
y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/lgm-all-avg/all-avg.asc"))
#Average all threshold
lgm2=stack(lgm[2],lgm[4],lgm[6])
y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/lgm-all-avg/all-threshold-avg.asc"))
Per the morphological data, the following males are outliers in the genderuensis dataset: RMCA 75-3-A-438 and MNMH 1971.637.
34 genderuensis RMCA 75-3-A-438 Adamawa Male -0.006837937
25 genderuensis MNMH 1971.637 Yaounde Male 0.002249162
35 genderuensis RMCA 75-3-A-451 Adamawa Male 0.016450666
44 genderuensis ZMB 75/80 Yaounde Male 0.019065758
33 genderuensis NHMUK 1940.2.8.63 Tibati Male 0.021588830
31 genderuensis NHMUK 1922.11.25.216 Tibati Male 0.022491253
x=read.csv(paste0(filepath,
"Ecological Analysis/envirem_extracts_PCA.csv"))
colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi",
"genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)
a=ggplot(x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME,shape=SOURCE))
b=theme(panel.background = element_rect(fill="white",color = "grey50"),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15),
legend.title = element_blank(),
legend.text = element_text(size=15))
c=geom_point(size=1.5)
d=stat_ellipse()
e=colScale
plot1=a+b+c+d+e
print(plot1)
## Too few points to calculate an ellipse
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Too few points to calculate an ellipse
## Warning: Removed 2 row(s) containing missing values (geom_path).
Which environmental points are the outliers? For C. genderuensis, it looks like it is two specimens and an eBird record that have the most overlap:
x2=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"),]
x2=x2[order(x2$PC1,decreasing = T),]
x2[c(1:3,nrow(x2)),c(1,3,2,4,21,22)]
## SUBSPECIES.SCIENTIFIC.NAME LATITUDE LONGITUDE SOURCE PC1
## 127 genderuensis 7.256281 12.06172 Specimen -0.02677265
## 35 genderuensis 3.888752 11.50979 eBird -0.07307286
## 128 genderuensis 2.836032 11.16286 Specimen -0.07366380
## 36 genderuensis 8.210441 13.81760 eBird -0.22744424
## PC2
## 127 0.01861294
## 35 0.16000832
## 128 0.17733891
## 36 0.05241527
Furthest to the left point is in Benoue National Park; general park checklist perhaps? The other points are the supposed location of Genderu Mountain (the type locality), Yaounde, and Ebolowa to the south of Yaounde. Since the location of Genderu Mountain is assumed from notes of the one specimen and the Benoue locality is possibly park-wide, I am removing these two points.
Reload x dataframe from original extract .csv, and then remove rows of interest.
xx=x[-c(36,127),]
ext=xx[,-c(1:4)]
#Perform PCA of environmental data
rda.x=rda(ext,scale=T)
rda.x.data=rda.x$CA$u
eigs=rda.x$CA$eig
w=NULL
for(i in 1:length(eigs)){
#print(eigs[i]/sum(eigs))
w[i]=eigs[i]/sum(eigs)
}
plot(x=1:length(w),y=w,pch=19,main="PCA Eigenvalues")
Redefining x variable to be new dataset without those two points here.
x=cbind(xx,rda.x.data)
colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi",
"genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)
a=ggplot(x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME,shape=SOURCE))
b=theme(panel.background = element_rect(fill="white",color = "grey50"),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20),
axis.text.x = element_text(size=15),
axis.text.y = element_text(size=15),
legend.title = element_blank(),
legend.text = element_text(size=15))
c=geom_point(size=1.5)
d=stat_ellipse()
e=colScale
plot1=a+b+c+d+e
print(plot1)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 2 row(s) containing missing values (geom_path).
contrib=rda.x$CA$v
#Isolate most important PC's
contrib2=contrib[,1:3]
xx=rowSums(abs(contrib2))
print(xx[order(xx,decreasing = T)])
## current_30arcsec_continentality
## 1.0071651
## current_30arcsec_PETseasonality
## 0.9241937
## current_30arcsec_minTempWarmest
## 0.8017752
## current_30arcsec_PETWettestQuarter
## 0.7830133
## current_30arcsec_embergerQ
## 0.6886587
## current_30arcsec_aridityIndexThornthwaite
## 0.6650488
## current_30arcsec_thermicityIndex
## 0.6219764
## current_30arcsec_climaticMoistureIndex
## 0.6175901
## current_30arcsec_maxTempColdest
## 0.6173110
## current_30arcsec_growingDegDays0
## 0.6161763
## current_30arcsec_growingDegDays5
## 0.6113469
## current_30arcsec_PETColdestQuarter
## 0.6033732
## current_30arcsec_annualPET
## 0.6008060
## current_30arcsec_PETWarmestQuarter
## 0.4926509
## current_30arcsec_PETDriestQuarter
## 0.4639028
## current_30arcsec_monthCountByTemp10
## 0.3599580
From above, we get that the most important variables for the first three PC’s are:
Everything after layer 5 plateaus in terms of its contribution. I am repeating the same steps for removing correlated layes here as I did in the other part of the analysis. The code is executed but hidden. Because we have fewer points this iteration, we will use the first six layers.
We can also look at the correlation between layers to determine what should be removed:
I am modeling this section with the same data layers as the previous modeling iteration, due in part to the similarity of these layers in their importance.
The following section outlies the procedure for creation ecological niche models (ENMs) for these taxa, and subsequently testing for niche divergence. Additionally, we will be projecting these layers through time to see the likely pathway of colonization for the species.
Current models are restricted to the following area:
## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read
The species does not currently occur in Angola, so we are excluding it from the training region. Past projections will occur broader areas, in part because we have no way of knowing for certain if local extinction has occurred.
We need to reduce the current environmental layers to the training area. This will create more accurate models and will reduce the strain on our processors at the same time. The extracts are from the 30s dataframe, but the models are made with 2.5 for computing reasons.
# rasterpath="path/to/envirem_africa/Africa_current_2.5arcmin_generic/"
y=list.files(rasterpath,pattern="*.bil")[-c(1,3,6,7,10,11,13,14,16)]
Note that we do not have any aux files at the present time, so I do not need to omit them from the file list.
This code is based on code from Dr. Jorge Soberón. It will create models for all time periods at the same time.
First, we must define the function that calculates the distance from a point p to an ellipse of centroid m and matrix s. The parameters are thus: p test point; m ellipse centroid; s inverse matrix of the covariance of the ellipse.
#MAJA function
maja=function(p,m,s)((p-m)%*%s%*%t(p-m))^0.5
#Quantile function
##Double check function? divide by 1 is 1...
##changed to 4, for quantiles
NDquantil=function(nD,level){
return(round(nD*level))
}
These minimum volume ellipsoids are less sensitive to point density, and do not rely on pseudoabsence data for determining where species do not occur.
genderu=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"),]
reich=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="reichenowi"),]
preuss=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="preussi"),]
y1=stack(paste0(rasterpath,y))
y=y1
The only thing that is different from the previous look at correlation is that seasonality and continentality are correlated. These are also the two most important parts from the PC; I am keeping both of them here, as they passed the previous tests of correlation and this is a coarser dataset.
#Create function for individual plot formation
ssp.plot=function(ssp,ssp.text){
vals=extract(x=y,y=ssp[,2:3])
vals=na.omit(vals)
vals=unique(vals)
#vals=vals[,-10]
n1=NDquantil(nrow(vals),0.9)
#for(i in 1:ncol(vals)){print(IQR(vals[,i]))}
mve1=cov.mve(vals,quantile.use=n1)
nc=ncell(y)
mu1=matrix(mve1$center,nrow=1)
s1=mve1$cov
invs1=solve(s1)
dT1=matrix(0,ncol=1,nrow=nc)
#Load values for current time period
valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
#Create current models
valsT1=as.matrix(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y),ncol=ncol(y),
ext=extent(y),resolution=res(y),vals=dT1)
setwd(paste0(filepath,
'Ecological Analysis/no-out/raw-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
plot(q)
ext=extract(x=q,y=ssp[,2:3])
ext2=na.omit(ext)
#Remove the furthest 20% to reflect issues with plotting in eBird
#threshold based on these values
ext2=ext2[order(ext2)]
cutoff=round(0.8*length(ext2))
ext3=ext2[1:cutoff]
##The following sets binary presence to everything above 1.5 sd below mean of occurrence
#n=max(ext2)
#ND=(round(n*0.95))
#m=c(NA,NA,NA,0,ND,1,ND,Inf,0)
#m=matrix(m,ncol=3,byrow=T)
#rc=reclassify(q,m)
#y2=y[which(ext>ND),]
#Everything up to 1.5 sd above the mean included
#ext2
sdext=sd(ext3)
mext=mean(ext3)
ND=mext+1.5*sdext
m=c(NA,NA,NA,0,ND,1,ND,Inf,0)
#Current threshold
##Used only here; heirarchical for other parts
m=matrix(m,ncol=3,byrow=T)
rc=reclassify(q,m)
y2=y[which(ext>ND),]
#Create color threshold for past models
#color change for every standard deviation
#New threshold on current conditions, then hierarchical
#Created here, executed further down
m2=m
m=c(NA,NA,NA,
0,(mext+(1.5*sdext)),1,
(mext+(1.5*sdext)),(mext+(3*sdext)),2,
(mext+(3*sdext)),(mext+(6*sdext)),3,
(mext+(6*sdext)),(mext+(12*sdext)),4,
(mext+(12*sdext)),(mext+(24*sdext)),5,
(mext+(24*sdext)),Inf,6)
m=matrix(m,ncol=3,byrow=T)
species=ssp.text
if(nrow(y2)!=0){
setwd(paste0(filepath,
"Ecological Analysis/no-out/threshold-mve/"))
write.csv(y2,file=paste0(species,'_out.csv'),quote=F,row.names=F)
}
pathway=paste0(filepath,
"Ecological Analysis/no-out/threshold-mve/",
species,".asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
points(ssp[,2:3],pch=19,col="black")
#threshold classify tier
thresh=reclassify(q,m)
pathway=paste0(filepath,
"Ecological Analysis/no-out/threshold-mve/",
species,"-tier.asc",sep="")
writeRaster(thresh,pathway,overwrite=T)
plot(thresh)
rm(thresh)
#Create color bands of how far it is from center
#Holocene
##CCSM
rm(valsT1)
y.l=stack(paste0(holopath1,holo1))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,
'Ecological Analysis/no-out/holo-ccsm-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("Holocene CCSM")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,
"Ecological Analysis/no-out/holo-ccsm-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
##miroc
rm(valsT1)
y.l=stack(paste0(holopath2,holo2))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,
'Ecological Analysis/no-out/holo-miroc-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("Holocene MIROC")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,
"Ecological Analysis/no-out/holo-miroc-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
##mpi
rm(valsT)
rm(valsT1)
y.l=stack(paste0(holopath3,holo3))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,
'Ecological Analysis/no-out/holo-mpi-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("Holocene MPI")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,
"Ecological Analysis/no-out/holo-mpi-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
#Last Glacial Maximum
##CCSM
rm(valsT1)
y.l=stack(paste0(lgmpath1,lgm1))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,
'Ecological Analysis/no-out/lgm-ccsm-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("Last Glacial Maximum CCSM")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,
"Ecological Analysis/no-out/lgm-ccsm-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
##miroc
rm(valsT)
rm(valsT1)
y.l=stack(paste0(lgmpath2,lgm2))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,
'Ecological Analysis/no-out/lgm-miroc-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("LGM MIROC")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,
"Ecological Analysis/no-out/lgm-miroc-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
##mpi
rm(valsT)
rm(valsT1)
y.l=stack(paste0(lgmpath3,lgm3))
nc=ncell(y.l)
valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
dT1=matrix(0,ncol=1,nrow=nc)
valsT1=as.matrix(valsT)
rm(valsT)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
ext=extent(y.l),resolution=res(y.l),vals=dT1)
setwd(paste0(filepath,
'Ecological Analysis/no-out/lgm-mpi-mve/'))
#sp1=strsplit(sp,'[.]')[[1]][1]
sp1=ssp.text
writeRaster(q,filename=sp1,format='ascii',overwrite=T)
print("LGM MPI")
plot(q)
#Threshold on current conditions
rc=reclassify(q,m)
rm(q)
species=ssp.text
pathway=paste0(filepath,
"Ecological Analysis/no-out/lgm-mpi-mve/",
species,"_threshold.asc",sep="")
writeRaster(rc,pathway,overwrite=T)
plot(rc)
#points(ssp[,2:3],pch=19,col="black")
}
Now, to perform individual iterations of the MVE script.
ssp.plot(ssp=preuss,ssp.text="preussi")
ssp.plot(ssp=reich,ssp.text="reichenowi")
ssp.plot(ssp=genderu,ssp.text="genderuensis")
Using these variables, we can look at the occupied niche areas of the populations and see how divergent they are. This will be done using custom scripts from Cooper & Barragan (unpublished), based on the methodology of Warren et al.
In QGIS, I created individual shapefiles of the “regions” that each species inhabits. For each of these regions, I want to create 100 “random” niche models to compare, each model created using random points from within each species’ accessible area. These accessible areas are defined by biogeography, and are an attempt to encompass the geographically accessible area around each species.
nc=ncell(y)
valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
valsT1=as.matrix(valsT)
randomizer=function(data,type,sp.text){
# set to GISpath
x=readShapePoly(paste0(GISpath,sp.text,'.shp'))
dT1=matrix(0,ncol=1,nrow=nc)
nx=nrow(data)
for(i in 1:100){
yy=spsample(x=x,n=nx,type=type)
#Alternate method, not as effective
#yy=randomPoints(mask=x,n=nrow(data),
# p=data[,2:3],excludep=T,
# cellnumbers=F,tryf=5)
yy2=as.data.frame(coordinates(yy))
colnames(yy2)=c("Long","Lat")
yy2$Population=sp.text
yy2=yy2[,c('Population','Long','Lat')]
vals=extract(x=y,y=yy2[,2:3])
vals=na.omit(vals)
vals=unique(vals)
#vals=vals[,-10]
n1=NDquantil(nrow(vals),0.9)
#for(i in 1:ncol(vals)){print(IQR(vals[,i]))}
mve1=cov.mve(vals,quantile.use=n1)
mu1=matrix(mve1$center,nrow=1)
s1=mve1$cov
invs1=solve(s1)
dT1=matrix(0,ncol=1,nrow=nc)
mu2=as.matrix(mu1)
invs2=as.matrix(invs1)
for(j in 1:nrow(valsT1)){
dT1[j,1]=maja(valsT1[j,],mu2,invs2)
}
q=raster(nrow=nrow(y),ncol=ncol(y),
ext=extent(y),resolution=res(y),vals=dT1)
setwd(paste0(filepath,
'Ecological Analysis/no-out/random/',sp.text,'/'))
sp1=sp.text
write.csv(yy2,file=paste0(sp.text,"_random-",i,'.csv'),quote=F,row.names=F)
writeRaster(q,filename=paste0(sp.text,"_random-",i),format='ascii',overwrite=T)
#plot(q)
}
}
randomizer(data=reich,type='random',sp.text="reichenowi")
randomizer(data=preuss,type='random',sp.text="preussi")
randomizer(data=genderu,type='random',sp.text="genderuensis")
Now, to compare niche distributions. First, we must reduce the datasets down to the number of points being used to train the above models.
#restrict to closest 80% of points to centroid for comparisons, just like models
#reichenowi
r.q=raster(paste0(filepath,
"Ecological Analysis/no-out/raw-mve/reichenowi.asc"))
reich$r.dist=extract(r.q,reich[,2:3])
hist(reich$r.dist)
reich=reich[order(reich$r.dist),]
r.pt=round(nrow(reich)*0.8)
plot(r.q)
points(reich[1:r.pt,2:3],col="black",pch=19)
points(reich[r.pt:nrow(reich),2:3],col="red",pch=19)
reich2=reich[1:r.pt,]
#preussi
r.q=raster(paste0(filepath,
"Ecological Analysis/no-out/raw-mve/preussi.asc"))
preuss$r.dist=extract(r.q,preuss[,2:3])
hist(preuss$r.dist)
preuss=preuss[order(preuss$r.dist),]
r.pt=round(nrow(preuss)*0.8)
plot(r.q)
points(preuss[1:r.pt,2:3],col="black",pch=19)
points(preuss[r.pt:nrow(preuss),2:3],col="red",pch=19)
preuss2=preuss[1:r.pt,]
#genderuensis
r.q=raster(paste0(filepath,
"Ecological Analysis/no-out/raw-mve/genderuensis.asc"))
genderu$r.dist=extract(r.q,genderu[,2:3])
colnames(genderu)
## [1] "SUBSPECIES.SCIENTIFIC.NAME"
## [2] "LONGITUDE"
## [3] "LATITUDE"
## [4] "SOURCE"
## [5] "current_30arcsec_annualPET"
## [6] "current_30arcsec_aridityIndexThornthwaite"
## [7] "current_30arcsec_climaticMoistureIndex"
## [8] "current_30arcsec_continentality"
## [9] "current_30arcsec_embergerQ"
## [10] "current_30arcsec_growingDegDays0"
## [11] "current_30arcsec_growingDegDays5"
## [12] "current_30arcsec_maxTempColdest"
## [13] "current_30arcsec_minTempWarmest"
## [14] "current_30arcsec_monthCountByTemp10"
## [15] "current_30arcsec_PETColdestQuarter"
## [16] "current_30arcsec_PETDriestQuarter"
## [17] "current_30arcsec_PETseasonality"
## [18] "current_30arcsec_PETWarmestQuarter"
## [19] "current_30arcsec_PETWettestQuarter"
## [20] "current_30arcsec_thermicityIndex"
## [21] "PC1"
## [22] "PC2"
## [23] "PC3"
## [24] "PC4"
## [25] "PC5"
## [26] "PC6"
## [27] "PC7"
## [28] "PC8"
## [29] "PC9"
## [30] "PC10"
## [31] "PC11"
## [32] "PC12"
## [33] "PC13"
## [34] "PC14"
## [35] "PC15"
## [36] "PC16"
## [37] "r.dist"
hist(genderu$r.dist)
genderu=genderu[order(genderu$r.dist),]
r.pt=round(nrow(genderu)*0.8)
plot(r.q)
points(genderu[1:r.pt,2:3],col="black",pch=19)
points(genderu[r.pt:nrow(genderu),2:3],col="red",pch=19)
genderu2=genderu[1:r.pt,]
And now to perform the tests.
filepath2=paste0(filepath,"Ecological Analysis/no-out/random/")
splist=list.files(filepath2)
#comparisons=matrix(nrow=100,ncol=2,data=NA)
truecomps=-99
truelists=matrix(nrow=100,ncol=1,data=-99)
for(i in 1:length(splist)){
sp=splist[i]
splist2=splist[-i]
null.x=raster(paste0(filepath,
"Ecological Analysis/no-out/raw-mve/",sp,".asc"))
comparisons=matrix(nrow=100,ncol=2,data=NA)
comparisons=as.data.frame(comparisons)
compvals=NULL
for(j in 1:length(splist2)){
comparelist=list.files(paste0(filepath2,splist2[j],"/"),
pattern="*.asc")
true2=raster(paste0(filepath,
"Ecological Analysis/no-out/raw-mve/",
splist2[j],".asc"))
compvals=NULL
for(k in 1:length(comparelist)){
rando=raster(paste0(filepath2,splist2[j],"/",comparelist[k]))
compvals[k]=nicheOverlap(x=null.x,y=rando,stat="D")
}
comparisons[,j]=compvals
colnames(comparisons)[j]=paste0(splist[i],"-",splist2[j])
truecomps=c(truecomps,
nicheOverlap(x=null.x,y=true2,stat="D"))
}
truelists=cbind(truelists,comparisons)
}
truecomps2=t(as.data.frame(truecomps))
colnames(truecomps2)=colnames(truelists)
fullcomps=rbind(truecomps2,truelists)
write.csv(fullcomps,file=paste0(filepath,
"Schoener-first-row-true_no-out.csv"),
quote=F,row.names=F)
We can now look at and compare the niche models derived from the MVE envelopes of where these species occur.
x=read.csv(paste0(filepath,"Schoener-first-row-true_no-out.csv"))
x=x[,-1]
head(x)
## genderuensis.preussi genderuensis.reichenowi preussi.genderuensis
## 1 0.7309587 0.7937367 0.7309587
## 2 0.7617447 0.7717766 0.7252408
## 3 0.7467540 0.7633932 0.6425719
## 4 0.7207890 0.7765976 0.6421447
## 5 0.7433229 0.7750977 0.8138083
## 6 0.7210789 0.7784997 0.5783603
## preussi.reichenowi reichenowi.genderuensis reichenowi.preussi
## 1 0.6985322 0.7937367 0.6985322
## 2 0.7824307 0.5872728 0.7628152
## 3 0.7780641 0.5528911 0.7304288
## 4 0.7759432 0.6623748 0.7123873
## 5 0.7661090 0.6155956 0.7505732
## 6 0.7628315 0.6677287 0.7309420
We know that the first row is the “true” comparisons. We can therefore compare these to the entire distribution of the comparisons.
datax=matrix(data=NA,nrow=600,ncol=2)
datax=as.data.frame(datax)
colnames(datax)=c("ID","Value")
trues=x[1,c(1,2,6)]
datax$ID[1:100]="genderuensis.preussi"
datax$ID[101:200]="genderuensis.reichenowi"
datax$ID[201:300]="preussi.genderuensis"
datax$ID[301:400]="preussi.reichenowi"
datax$ID[401:500]="reichenowi.genderuensis"
datax$ID[501:600]="reichenowi.preussi"
datax$Value[1:100]=x[-1,1]
datax$Value[101:200]=x[-1,2]
datax$Value[201:300]=x[-1,3]
datax$Value[301:400]=x[-1,4]
datax$Value[401:500]=x[-1,5]
datax$Value[501:600]=x[-1,6]
datax$ID=as.factor(datax$ID)
datax$Value=as.numeric(datax$Value)
We have created a new data frame that is easier to manipulate in ggplot to look at the results. We can now go through things iteratively.
gen.preus=datax[which(datax$ID=='genderuensis.preussi'|datax$ID=='preussi.genderuensis'),]
inter=trues$genderuensis.preussi
a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
print(a+b+b.5+c+d)
gen.preus=datax[which(datax$ID=='genderuensis.reichenowi'|datax$ID=='reichenowi.genderuensis'),]
inter=trues$genderuensis.reichenowi
a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
print(a+b+b.5+c+d)
gen.preus=datax[which(datax$ID=='reichenowi.preussi'|datax$ID=='preussi.reichenowi'),]
inter=trues$reichenowi.preussi
a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
print(a+b+b.5+c+d)
We are going to calculate \(P\) values for these distributions and compare them to the test statistic.
dist=unique(datax$ID)
trues=x[1,]
for(i in 1:length(dist)){
distx=dist[i]
print(paste0("Testing ",distx))
datadist=datax[which(datax$ID==distx),]
xbar=trues[,which(colnames(trues)==distx)]
mu=mean(datadist$Value)
sigma=sd(datadist$Value)
n=nrow(datadist)
z=(xbar-mu)/(sigma/sqrt(n))
lowcrit=qnorm(p=0.025,mean=mu,sd=sigma)
hicrit=qnorm(p=0.975,mean=mu,sd=sigma)
if(xbar<lowcrit){
print("Test statistic below low critical value.")
print(paste0(lowcrit,"; statistic = ",xbar))
}
if(xbar>hicrit){
print("Test statistic above high critical value.")
print(paste0(hicrit,"; statistic = ",xbar))
}
print(paste0("P value for ",distx,
" = ",pnorm(xbar,
mean=mu,sd=sigma)))
print(paste(" "))
print(paste(" "))
}
## [1] "Testing genderuensis.preussi"
## [1] "P value for genderuensis.preussi = 0.2387014523214"
## [1] " "
## [1] " "
## [1] "Testing genderuensis.reichenowi"
## [1] "Test statistic above high critical value."
## [1] "0.790069175213148; statistic = 0.793736744682844"
## [1] "P value for genderuensis.reichenowi = 0.990289874434537"
## [1] " "
## [1] " "
## [1] "Testing preussi.genderuensis"
## [1] "P value for preussi.genderuensis = 0.570069422343806"
## [1] " "
## [1] " "
## [1] "Testing preussi.reichenowi"
## [1] "Test statistic below low critical value."
## [1] "0.753354646345893; statistic = 0.698532194352503"
## [1] "P value for preussi.reichenowi = 1.36471563895498e-10"
## [1] " "
## [1] " "
## [1] "Testing reichenowi.genderuensis"
## [1] "Test statistic above high critical value."
## [1] "0.720624997483034; statistic = 0.793736744682844"
## [1] "P value for reichenowi.genderuensis = 0.999835050352036"
## [1] " "
## [1] " "
## [1] "Testing reichenowi.preussi"
## [1] "P value for reichenowi.preussi = 0.0397011642748739"
## [1] " "
## [1] " "
One last thing, visualizing the PC plots from the ENVIREM extracts.
x=read.csv(paste0(filepath,
"Ecological Analysis/envirem_extracts_no-outlier_PCA.csv"))
a=ggplot(data=x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
print(a+b+c)
a=ggplot(data=x,aes(x=PC3,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
print(a+b+c)
for(i in 4:19){
x[,i]=as.numeric(x[,i])
nombre=colnames(x)[i]
a=ggplot(data=x,aes(y=x[,i],x=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_boxplot(notch=T)
c=theme_classic()
d=ggtitle(paste(nombre))
print(a+b+c+d)
}
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.
Lastly, I am going to do t-tests comparing the two Cameroonian populations to each other to see if they differ significantly in any aspects.
x2=x[x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"|x$SUBSPECIES.SCIENTIFIC.NAME=="preussi",1:19]
rda.x=rda(x2[,-c(1:3)],scale=T)
rda.x.data=rda.x$CA$u
x3=cbind(x2,rda.x.data)
a=ggplot(x3,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
print(a+b+c)
Now for iterative tests. I’m assuming random distribution, but unequal population sizes.
for(i in 4:19){
names=unique(x3$SUBSPECIES.SCIENTIFIC.NAME)
pop1=x3[x3$SUBSPECIES.SCIENTIFIC.NAME==names[1],i]
pop2=x3[x3$SUBSPECIES.SCIENTIFIC.NAME==names[2],i]
print(colnames(x3)[i])
z=t.test(x=pop1,y=pop2,c="two.sided",conf.level=0.95)
print(z)
z2=wilcox.test(x=pop1,y=pop2,alternative="two.sided",conf.level=0.95)
print(z2)
}
## [1] "SOURCE"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 4.5826, df = 7, p-value = 0.002536
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.3629975 1.1370025
## sample estimates:
## mean of x mean of y
## 1.75 1.00
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 189, p-value = 1.283e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_annualPET"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 8.5891, df = 23.927, p-value = 8.991e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 274.2719 447.8102
## sample estimates:
## mean of x mean of y
## 1640.306 1279.265
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 213, p-value = 5.948e-07
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_aridityIndexThornthwaite"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 2.3178, df = 16.784, p-value = 0.03336
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.153794 24.815930
## sample estimates:
## mean of x mean of y
## 67.38375 54.39889
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 167, p-value = 0.01933
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_climaticMoistureIndex"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = -10.515, df = 12.751, p-value = 1.197e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.5640744 -0.3714812
## sample estimates:
## mean of x mean of y
## -0.0200000 0.4477778
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 0, p-value = 2.37e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_continentality"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 1.3196, df = 9.5331, p-value = 0.2178
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1770614 0.6830799
## sample estimates:
## mean of x mean of y
## 2.793750 2.540741
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 139, p-value = 0.2291
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_embergerQ"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = -6.8632, df = 17.381, p-value = 2.442e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -434.0855 -230.2137
## sample estimates:
## mean of x mean of y
## 364.1900 696.3396
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 11, p-value = 1.598e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_growingDegDays0"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 7.504, df = 32.3, p-value = 1.436e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 19237.63 33565.54
## sample estimates:
## mean of x mean of y
## 99020.25 72618.67
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 216, p-value = 8.498e-08
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_growingDegDays5"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 7.3281, df = 32.072, p-value = 2.448e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 19218.00 34012.72
## sample estimates:
## mean of x mean of y
## 99020.25 72404.89
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 216, p-value = 8.498e-08
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_maxTempColdest"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 7.9673, df = 30.754, p-value = 5.705e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 51.57576 87.08165
## sample estimates:
## mean of x mean of y
## 261.6250 192.2963
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 216, p-value = 2.395e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_minTempWarmest"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 6.0773, df = 32.098, p-value = 8.575e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 36.71873 73.73498
## sample estimates:
## mean of x mean of y
## 184.3750 129.1481
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 208, p-value = 9.236e-05
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_monthCountByTemp10"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 1.776, df = 26, p-value = 0.08745
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1807469 2.4770432
## sample estimates:
## mean of x mean of y
## 12.00000 10.85185
## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: pop1 and pop2
## W = 120, p-value = 0.3523
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETColdestQuarter"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 9.8209, df = 33, p-value = 2.541e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 18.43850 28.07418
## sample estimates:
## mean of x mean of y
## 120.61375 97.35741
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 216, p-value = 8.498e-08
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETDriestQuarter"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 6.8422, df = 15.391, p-value = 4.867e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 24.39796 46.40584
## sample estimates:
## mean of x mean of y
## 147.6438 112.2419
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 208, p-value = 5.693e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETseasonality"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 4.8474, df = 9.021, p-value = 0.0009055
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 256.4243 704.8808
## sample estimates:
## mean of x mean of y
## 1445.8500 965.1974
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 193, p-value = 0.0003421
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETWarmestQuarter"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 7.3659, df = 21.019, p-value = 2.998e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 23.35025 41.72067
## sample estimates:
## mean of x mean of y
## 151.7325 119.1970
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 212, p-value = 1.02e-06
## alternative hypothesis: true location shift is not equal to 0
##
## [1] "current_30arcsec_PETWettestQuarter"
##
## Welch Two Sample t-test
##
## data: pop1 and pop2
## t = 12.598, df = 29.411, p-value = 2.222e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 19.90955 27.62137
## sample estimates:
## mean of x mean of y
## 122.49250 98.72704
##
##
## Wilcoxon rank sum test
##
## data: pop1 and pop2
## W = 216, p-value = 8.498e-08
## alternative hypothesis: true location shift is not equal to 0
Example of divergence:
a=ggplot(data=x,aes(x=current_30arcsec_PETWettestQuarter,
y=current_30arcsec_embergerQ,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
print(a+b+c)
In the above loop code, I projected the MVEs of species occurrence into past climates for the Holocene and the Last Glacial Maximum. We can average the three scenarios together to create a “best guess” of the distance of each grid cell to the environmental centroid of a given species.
holo.ccsm=paste0(filepath,"Ecological Analysis/no-out/holo-ccsm-mve/",
list.files(paste0(filepath,
"Ecological Analysis/no-out/holo-ccsm-mve/"),
pattern="*.asc"))
holo.miroc=paste0(filepath,"Ecological Analysis/no-out/holo-miroc-mve/",
list.files(paste0(filepath,
"Ecological Analysis/no-out/holo-miroc-mve/"),
pattern="*.asc"))
holo.mpi=paste0(filepath,"Ecological Analysis/no-out/holo-mpi-mve/",
list.files(paste0(filepath,
"Ecological Analysis/no-out/holo-mpi-mve/"),
pattern="*.asc"))
lgm.ccsm=paste0(filepath,"Ecological Analysis/no-out/lgm-ccsm-mve/",
list.files(paste0(filepath,
"Ecological Analysis/no-out/lgm-ccsm-mve/"),
pattern="*.asc"))
lgm.miroc=paste0(filepath,"Ecological Analysis/no-out/lgm-miroc-mve/",
list.files(paste0(filepath,
"Ecological Analysis/no-out/lgm-miroc-mve/"),
pattern="*.asc"))
lgm.mpi=paste0(filepath,"Ecological Analysis/no-out/lgm-mpi-mve/",
list.files(paste0(filepath,
"Ecological Analysis/no-out/lgm-mpi-mve/"),
pattern="*.asc"))
Now we have a list of files for each scenario in the same order for each situation. Now we have to average these together and save them.
#Plot preussi
#avg holo
holo=stack(holo.ccsm[4],holo.miroc[4],holo.mpi[4])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/preussi-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
holo=stack(holo.ccsm[3],holo.miroc[3],holo.mpi[3])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/preussi-threshold-avg.asc"),
overwrite=T)
#Plot genderuensis
#avg holo
holo=stack(holo.ccsm[2],holo.miroc[2],holo.mpi[2])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/genderuensis-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
holo=stack(holo.ccsm[1],holo.miroc[1],holo.mpi[1])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/genderuensis-threshold-avg.asc"),
overwrite=T)
#Plot reichenowi
#avg holo
holo=stack(holo.ccsm[6],holo.miroc[6],holo.mpi[6])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/reichenowi-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
holo=stack(holo.ccsm[5],holo.miroc[5],holo.mpi[5])
y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/reichenowi-threshold-avg.asc"),
overwrite=T)
#Plot preussi
#avg lgm
lgm=stack(lgm.ccsm[4],lgm.miroc[4],lgm.mpi[4])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/lgm-all-avg/preussi-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
lgm=stack(lgm.ccsm[3],lgm.miroc[3],lgm.mpi[3])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/lgm-all-avg/preussi-threshold-avg.asc"),
overwrite=T)
#Plot genderuensis
#avg lgm
lgm=stack(lgm.ccsm[2],lgm.miroc[2],lgm.mpi[2])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/lgm-all-avg/genderuensis-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
lgm=stack(lgm.ccsm[1],lgm.miroc[1],lgm.mpi[1])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/lgm-all-avg/genderuensis-threshold-avg.asc"),
overwrite=T)
#Plot reichenowi
#avg lgm
lgm=stack(lgm.ccsm[6],lgm.miroc[6],lgm.mpi[6])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/lgm-all-avg/reichenowi-avg.asc"),
overwrite=T)
#averaging threshold distance; not as scientific but for visualizing
lgm=stack(lgm.ccsm[5],lgm.miroc[5],lgm.mpi[5])
y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/lgm-all-avg/reichenowi-threshold-avg.asc"),
overwrite=T)
Each subspecies tells us something about the colonization path across Africa. We can similar average these scenarios together to understand where, exactly, the species most likely cross Africa.
holo=paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/",
list.files(paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/")))
lgm=paste0(filepath,"Ecological Analysis/no-out/lgm-all-avg/",
list.files(paste0(filepath,"Ecological Analysis/no-out/lgm-all-avg/")))
#Average all occurrence
lgm2=stack(holo[1],holo[3],holo[5])
y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/holo-avg.asc"),
overwrite=T)
#Average all threshold
lgm2=stack(holo[2],holo[4],holo[6])
y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/holo-all-avg/holo-threshold-avg.asc"),
overwrite=T)
#Average all occurrence
lgm2=stack(lgm[1],lgm[3],lgm[5])
y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/lgm-all-avg/all-avg.asc"),
overwrite=T)
#Average all threshold
lgm2=stack(lgm[2],lgm[4],lgm[6])
y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
"Ecological Analysis/no-out/lgm-all-avg/all-threshold-avg.asc"),
overwrite=T)