Introduction

The following code accompanies Cooper et al.’s assessment of the phylogeography of Cinnyris reichenowi. Throughout this document, the following taxonomy is used:

Cinnyris regius

A medium sized sunbird from the Albertine Rift mountains of East Africa. Sympatric with some populations of the next species, and with a similar distribution that is discontinuous among large mountain ranges.

Cinnyris reichenowi

A small montane sunbird from East and West Africa, that we subdivide into three groups:

C. [r.] reichenowi

The nominate population from East Africa.

C. [r.] preussi

Montane populations from the West African Cameroon Line; chiefly distributed between Bioko Island and Mt. Oku.

C. [r.] genderuensis

Xeric interior populations in Cameroon, the Central African Republic, and possible Nigeria. Specimens differ slightly in morphometrics from preussi, but are genetically distinct (see below and our paper for a full discussion).

Note that for some sections of this appendix, programs are iterated randomly and sometimes jackknifed. This means that some values may differ from being a 1:1 match from the manuscript.

Required Software and Packages

This study utilized programs in \(bash\) (shell script), \(python\), and \(R\). \(python\) programs were accessed and run via \(bash\). Programs used via this interface include:

\(angsd\)
\(biopython\)
\(bwa\)
Genome Analysis Toolkit (\(GATK\))
\(illumiprocessor\)
\(phyluce\)
\(picard\)
\(RaxML\)
\(samtools\)
\(vcftools\)

This document was created using RStudio 1.0.143, R 3.4.4 (R Foundation 2018), and rmarkdown 1.10. (Allaire, Xie, et al. 2018). R packages used throughout this manuscript include:

ape (Paradis & Schliep 2018)
dismo (Hijmansm Phillips, Leathwick & Elith 2017)
ellipse (Murdoch & Chow 2018)
fossil (Vavrek 2011)
ggplot2 (Wickham 2016)
LEA (Frichot & Francois 2014)
maptools (Bivand & Lewin-Koh 2018)
MASS (Venables & Ripley 2002)
raster (Hijmans 2018)
vegan (Oksanen, Blanchet, et al. 2018)

In this document, I load all of these R packages here in a hidden code box that can be viewed in the rmarkdown document.

## Loading required package: raster

## Loading required package: sp

## 
## Attaching package: 'raster'

## The following objects are masked from 'package:ape':
## 
##     rotate, zoom

## 
## Attaching package: 'ellipse'

## The following object is masked from 'package:raster':
## 
##     pairs

## The following object is masked from 'package:graphics':
## 
##     pairs

## Loading required package: maps

## Loading required package: shapefiles

## Loading required package: foreign

## 
## Attaching package: 'shapefiles'

## The following objects are masked from 'package:foreign':
## 
##     read.dbf, write.dbf

## Checking rgeos availability: TRUE

## 
## Attaching package: 'MASS'

## The following objects are masked from 'package:raster':
## 
##     area, select

## Loading required package: permute

## Loading required package: lattice

## This is vegan 2.5-6

Other programs used in this study (via Windows operating system) include:

\(BEAST2\)
\(SNAPP\)
\(structure\)

Many chunks of code are, after the first run, “hidden” from view to create a document that is easier to read. All code chunks and all code used in this manuscript can be viewed via the rmarkdown document.

Genetic Data Cleaning

Samples

The first analyses concern genetic data from the complex. We used sequences from 24 individual Cinnyris sunbirds in this study, from several different major biogeographic areas.

Note: We separate genderuensis here to better visualize which birds are from xeric regions; all birds labeled reichenowi from West Africa refer to populations of preussi.

East Africa

Rwenzori Mountains:

FMNH356179_Cinnyris_regius
FMNH356181_Cinnyris_regius

Kahuzi-Biega Mountains:

FMNH438857_Cinnyris_regius
FMNH443947_Cinnyris_reichenowi
FMNH481235_Cinnyris_regius
FMNH481236_Cinnyris_reichenowi

Bwindi Highlands:

FMNH385275_Cinnyris_regius
FMNH385276_Cinnyris_regius

Rwanda-Burundi Highlands:

FMNH346623_Cinnyris_regius
FMNH346624_Cinnyris_regius
FMNH358156_Cinnyris_reichenowi
FMNH358157_Cinnyris_reichenowi

Mt. Kabobo:

FMNH450580_Cinnyris_regius
FMNH450581_Cinnyris_regius

West Africa

Bioko Island:

KU131883_Cinnyris_reichenowi
KU132209_Cinnyris_reichenowi
KU132234_Cinnyris_reichenowi

Mt. Cameroon:

FMNH95912_Cinnyris_reichenowi
FMNH95913_Cinnyris_reichenowi
FMNH95915_Cinnyris_reichenowi
FMNH95916_Cinnyris_reichenowi

Bamenda Highlands:

FMNH273746_Cinnyris_reichenowi

Xeric Interior Cameroon:

FMNH122395_Cinnyris_genderuensis
FMNH189462_Cinnyris_genderuensis

Color Palatte

The following color scheme was used throughout this study for official figures:

Cinnyris r. reichenowi: black #000000
Cinnyris reichenowi preussi: blue #1f2887
Cinnyris reichenowi genderuensis: red #e31a1c

Cleaning and Processing UCEs

The following code was formatted for machines at both the University of Chicago and the Field Museum. *This code will not run as is, and must be modified for your specific computer.

The first steps are identical to those followed on the PHYLUCE website. However, I will start here with the creation of the taxon-set that was used in this paper.

mkdir -p taxon-sets/cinnyris

phyluce_assembly_get_match_counts \
    --locus-db uce-search-results/probe.matches.sqlite \
    --taxon-list-config cinnyris.conf \
    --taxon-group 'cinnyris' \
    --incomplete-matrix \
    --output taxon-sets/cinnyris/cinnyris-taxa-incomplete.conf
   
cd taxon-sets/cinnyris

mkdir log

phyluce_assembly_get_fastas_from_match_counts \
    --contigs ../../assemblies_trinity_2017/contigs \
    --locus-db ../../uce-search-results/probe.matches.sqlite \
    --match-count-output cinnyris-taxa-incomplete.conf \
    --output cinnyris-taxa-incomplete.fasta \
    --incomplete-matrix cinnyris-taxa-incomplete.incomplete \
    --log-path log

According to the counts printed above, the most loci rich individuals are:

FMNH438857 Cinnyris regius: 4287 loci
KU142209 C. reichenowi: 4276 loci
FMNH356179 C. regius: 4265 loci
FMNH358156 C. reichenowi: 4264 loci

Given that KU142209 is the largest member of the main study group (C. reichenowi), this is the individual to which we will map our reads later.

phyluce_assembly_explode_get_fastas_file --input cinnyris-taxa-incomplete.fasta --output-dir exploded-fastas --by-taxon

phyluce_align_seqcap_align \
    --fasta cinnyris-taxa-incomplete.fasta \
    --output mafft-nexus-internal-trimmed \
    --taxa 24 \
    --aligner mafft \
    --cores 20 \
    --incomplete-matrix \
    --output-format fasta \
    --no-trim \
    --log-path log
   
phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed \
    --alignments mafft-nexus-internal-trimmed \
    --output mafft-nexus-internal-trimmed-gblocks \
    --cores 20 \
    --log log
   
phyluce_align_get_align_summary_data \
    --alignments mafft-nexus-internal-trimmed-gblocks \
    --cores 20 \
    --log-path log

A printout of the summary data follows:

#----------------------- Alignment summary -----------------------
#[Alignments] loci:      4,946
#[Alignments] length:    2,979,047
#[Alignments] mean:      602.31
#[Alignments] 95% CI:    5.64
#[Alignments] min:       120
#[Alignments] max:       2,081
#------------------- Informative Sites summary -------------------
#[Sites] loci:   4,946
#[Sites] total:  18,517
#[Sites] mean:   3.74
#[Sites] 95% CI: 0.13
#[Sites] min:    0
#[Sites] max:    76
#------------------------- Taxon summary -------------------------
#[Taxa] mean:            18.86
#[Taxa] 95% CI:  0.09
#[Taxa] min:             3
#[Taxa] max:             24
#----------------- Missing data from trim summary ----------------
#[Missing] mean: 0.00
#[Missing] 95% CI:       0.00
#[Missing] min:  0.00
#[Missing] max:  0.00
#-------------------- Character count summary --------------------
#[All characters]        56,341,110
#[Nucleotides]           53,710,713
#---------------- Data matrix completeness summary ---------------
#[Matrix 50%]            4754 alignments
#[Matrix 55%]            4709 alignments
#[Matrix 60%]            4565 alignments
#[Matrix 65%]            4417 alignments
#[Matrix 70%]            4160 alignments
#[Matrix 75%]            3761 alignments
#[Matrix 80%]            3237 alignments
#[Matrix 85%]            1693 alignments
#[Matrix 90%]            866 alignments
#[Matrix 95%]            275 alignments
#------------------------ Character counts -----------------------
#[Characters] '-' is present 2,630,397 times
#[Characters] 'A' is present 16,442,611 times
#[Characters] 'C' is present 10,446,382 times
#[Characters] 'G' is present 10,431,151 times
#[Characters] 'T' is present 16,390,569 times

The above plot visualizes the percent coverage and the number of loci with that coverage. (I.e., 275 alignments are shared among 90% of the individuals within the dataset). The amount of loci declines with increasing coverage (as is to be expected), and precipitously declines between 80-85% coverage. We opted to use 80% coverage as we are still using 3237 loci, or ~0.65% of the total possible loci while still providing a large amount of data for the phylogeographic analyses.

We then proceeding with the creation of a cleaned 80% matrix for use in RaxML.

phyluce_align_remove_locus_name_from_nexus_lines \
    --alignments mafft-nexus-internal-trimmed-gblocks \
    --output mafft-nexus-internal-trimmed-gblocks-clean \
    --cores 20 \
    --log-path log

phyluce_align_get_only_loci_with_min_taxa \
    --alignments mafft-nexus-internal-trimmed-gblocks-clean \
    --taxa 24 \
    --percent 0.80 \
    --output mafft-nexus-internal-trimmed-gblocks-clean-80p \
    --cores 20 \
    --log-path log
   
phyluce_align_format_nexus_files_for_raxml \
    --alignments mafft-nexus-internal-trimmed-gblocks-clean-80p \
    --output mafft-nexus-internal-trimmed-gblocks-clean-80p-raxml \
    --charsets \
    --log-path log

Note that we also tested with other amounts of data to determine how the models reacted; the overall topology was extremely similar to the 80p dataset increasing and decreasing the number of loci used.

Phylogeographic Relationships

We assessed relationships between all taxa using a bootstrapped RaxML approach, as follows. Note that this code has -T 20, indicating that we used 20 cores on our machine; this value must be adjusted to the machine on which you are running the program.

cd mafft-nexus-internal-trimmed-gblocks-clean-80p-raxml

raxmlHPC-PTHREADS-SSE3 \
    -m GTRGAMMA \
    -N 24 \
    -p 19877 \
    -n best \
    -s mafft-nexus-internal-trimmed-gblocks-clean-80p.phylip \
    -T 20

raxmlHPC-PTHREADS-SSE3 \
    -m GTRGAMMA \
    -N autoMRE \
    -p 19877 \
    -b 7175 \
    -n bootreps \
    -s mafft-nexus-internal-trimmed-gblocks-clean-80p.phylip \
    -T 20

raxmlHPC-SSE3 \
    -n resolved \
    -m GTRGAMMA \
    -f b \
    -t RAxML_bestTree.best \
    -z RAxML_bootstrap.bootreps

Unable to print document with the below included, but refers to figure 3 in the paper.

![RaxML Phylogeny of *Cinnyris*. Diamonds are sized by support, with white centers indicating support over 90%. Note FMNH122395 appears to have contamination.](~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Figure_3.png)

Harvesting Single Nucleotide Polymorphisms

The following section relies heavily on the methods and codes outlines by Zarza et al. (2016) in their study of Aphelocoma jays.

We did not focus on mitogenomes for this study, as we were working with museum specimens from which we were unable to obtain mitochondrial information. Any mitochondrial studies would have lacked samples from Mt. Cameroon and genderuensis populations of central Cameroon.

Zarza et al. (2016) Pipeline

Notes about this Section

Throughout this section, we have used \ to subdivide code into multiple lines so that it is easier to read. For some parts of the code, these will have no effect; for others, they will need to be removed. This code is specifically formatted to work on our machines; it will need to be reformatted for you own machine should you choose to use it.

Pipeline

As mentioned above, we indexed reads to our best represented ingroup, KU132209 Cinnyris reichenowi with 4276 loci. We did this using the bwa-mem algorithm.

cd 
cd uce-cinnyris/taxon-sets/cinnyris/

mkdir map-to-read

cd map-to-read/

cp ../exploded-fastas/KU132209-Cinnyris-reichenowi.unaligned.fasta ./KU132209_Cinnyris_reichenowi.fasta

bwa index -p KU132209_Cinnyris_reichenowi -a is KU132209_Cinnyris_reichenowi.fasta

From here on, I assigned shortcuts to my relevant folders and files to make it easier to run the appropriate codes. Many of these abbreviations are the same as used by Zarza et al. (2016). One of the files referenced below is a simple .txt file with the necessary taxa listed. This file is included with our data download.

READS_FOLDER=~/UCE-Data/UCEs/clean_reads_2017
SUBSET=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED
INDEX=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/map-to-read/KU132209_Cinnyris_reichenowi.fasta
FILES=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/taxalist.txt

We then ran a loop code to perform multiple actions on all of our sequences of interest.

while read -r line
do
    name="$line"
    
#Map sequences against the reference sequence using bwa-mem

    echo "Processing species: - $name"
    eval $(echo "bwa mem -B 10 -M -R '@RG\tID:$name\tSM:$name\tPL:Illumina' \
                         KU132209_Cinnyris_reichenowi \
                         $READS_FOLDER/$name/split-adapter-quality-trimmed/$name-READ1.fastq.gz \
                         $READS_FOLDER/$name/split-adapter-quality-trimmed/$name-READ2.fastq.gz > \
                         ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/BWAMEM/$name.pair.sam")
    eval $(echo "bwa mem -B 10 -M -R '@RG\tID:$name\tSM:$name\tPL:Illumina' \
                         KU132209_Cinnyris_reichenowi \
                         $READS_FOLDER/$name/split-adapter-quality-trimmed/$name-READ-singleton.fastq.gz > \
                         ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/BWAMEM/$name.single.sam")

#We then sorted reads using SAMTOOLS

    eval $(echo "samtools view -bS \
                          ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/BWAMEM/$name.pair.sam | \
                          samtools sort -m 30000000000 \
                          - ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/SAM/$name.pair_sorted")
    eval $(echo "samtools view \
                          -bS ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/BWAMEM/$name.single.sam | \
                          samtools sort -m 30000000000 \
                          - ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/SAM/$name.single_sorted")

#Mark duplicates using picard

    eval $(echo "java -Xmx4g -jar ~/anaconda/jar/MarkDuplicates.jar \
                      INPUT=$SUBSET/SAM/$name.pair_sorted.bam \
                      INPUT=$SUBSET/SAM/$name.single_sorted.bam \
                      OUTPUT=$SUBSET/SAM/$name.All_dedup.bam \
                      METRICS_FILE=$SUBSET/SAM/$name.All_dedup_metricsfile \
                      MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=250 ASSUME_SORTED=true \
                      VALIDATION_STRINGENCY=SILENT REMOVE_DUPLICATES=True")

#Index the resulting '.bam' file

    eval $(echo "java -Xmx4g -jar ~/anaconda/jar/BuildBamIndex.jar \
                      INPUT=$SUBSET/SAM/$name.All_dedup.bam")

    eval $(echo "samtools flagstat $SUBSET/SAM/$name.All_dedup.bam > $SUBSET/Picard-Stats/$name.All_dedup_stats.txt")
   
done < "$FILES"

#Remove files that are no longer needed

rm *.sam
rm *sorted.bam

The next step was the ‘indel realigner’ step. This utilized the Genome Analysis Toolkit (GATK), which uses .dict dictionary files for contig names and sizes and .fai fasta index files to allow for efficient random access to the reference bases.

The first step was to prepare a fasta file to use as a reference with picard and samtools.

java -jar ~/anaconda/pkgs/picard-1.106-0/jar/CreateSequenceDictionary.jar \
    R=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/map-to-read/KU132209_Cinnyris_reichenowi.fasta \  
    O=KU132209_Cinnyris_reichenowi.dict
samtools faidx ~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/map-to-read/KU132209_Cinnyris_reichenowi.fasta

We realigned the mapping produced with bwa-mem with a gap penalty of \(B=10\). The minimum number of reads per locus was set to 10.

DEDUP_BAMS=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/SAM/*All_dedup.bam

cd $SUBSET

for sample in $DEDUP_BAMS
do

#Taxon or sample that is presently being processed

    echo "Processing $sample"

#Create a variable with the sample name using the name of "dedup-bam" file
#This uses the "cut" command with "/" as a field delimiter
#In my case, this cuts it into 9 field, keeping #9

    DEDUPBAMNAME=$(echo $sample | cut -d/ -f9)
    DEDUPBASENAME=$(echo $DEDUPBAMNAME | cut -d. -f1)

#Create the name of the intervals file

    INTERVALS_NAME=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/GATK/$DEDUPBASENAME'.intervals'
    echo $INTERVALS_NAME

#Create the ouput location for the realigned bam

    REALIGNED_NAME=~/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/GATK/$DEDUPBASENAME'_realigned.bam'
    echo $REALIGNED_NAME

#Execute the command in GATK to create intervals and realign reads

   eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T RealignerTargetCreator \
          -R $INDEX -o $INTERVALS_NAME -I $sample --minReadsAtLocus 10")
   eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T IndelRealigner \
          -R $INDEX -I $sample -targetIntervals $INTERVALS_NAME  -o $REALIGNED_NAME -LOD 3.0")
   
done

#Realign the mapping produced with bwa-mem
#Gap penalty of 10
#Minimum number of reads per locus = 10

mkdir GCVF

#I set the REFERENCE to equal my INDEX path
#Zarza et al. (2016) used REFERENCE

REFERENCE=$INDEX

#Realigned bams after removing duplicates with picard

REALIGNED_BAMS=$SUBSET/GATK/*realigned.bam

for sample in $REALIGNED_BAMS
do

#Current Sample

    echo "Processing $sample"

#Create a variable with the sample name using the name of "dedup-bam" file
#This uses the "cut" command with "/" as a field delimiter
#In my case, this cuts it into 9 field, keeping #9

    OUTPUT_BASENAME=$(echo $sample | cut -d/ -f9)
    echo $OUTPUT_BASENAME
    OUTPUT_NAME=/home/jcooper/UCE-Data/UCEs/taxon-sets/cinnyris-FIXED/GATK/$(echo $OUTPUT_BASENAME | cut -d. -f1)'.g.vcf'
    echo $OUTPUT_NAME

#Execute the command in GATK for haplotype call
#Variant discovery with HaplotypeCaller
#Normal mode can process all samples merged in one file
#With gVCF each sample needs to be processed at one time
#This is the mode needed to serve as input for GenotypeGCVF

    eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
           -R $REFERENCE -I $sample -o $OUTPUT_NAME --emitRefConfidence GVCF \
           --variant_index_type LINEAR --variant_index_parameter 128000 \
           --contamination_fraction_to_filter 0.0002 --min_base_quality_score 20 \
           --phredScaledGlobalReadMismappingRate 30 --standard_min_confidence_threshold_for_calling 40.0 \
           --standard_min_confidence_threshold_for_emitting 40.0")

done

We now need to get the names of the VCF files for the next step.

ls -d -1 $PWD/GATK/*.g.vcf > gvcf.list

Next, we did genotyping with GCVF in all of the variant files produced by HaplotypeCaller. We merged files and only kept variable sites.

java -Xmx4g -jar ~/GenomeAnalysisTK.jar  -R $REFERENCE -T GenotypeGVCFs \
     --standard_min_confidence_threshold_for_calling 40.0 --standard_min_confidence_threshold_for_emitting 40.0 \
     -V gvcf.list \
     -o $PWD/GCVF/genotyped_X_samples.g.vcf

 #Extract the SNPs from the call set
java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $PWD/GCVF/genotyped_X_samples.g.vcf \
     -selectType SNP \
     -o $PWD/GCVF/genotyped_X_samples_snps.vcf

#Extract the indels from the call set
java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $PWD/GCVF/genotyped_X_samples.g.vcf \
     -selectType INDEL \
     -o $PWD/GCVF/genotyped_X_samples_indels.vcf

Zarza et al. (2016) and, thus, we filtered SNP calls around indels and applied quality filters following the methods of Brant Faircloth and the GATK Forums.

java -jar ~/GenomeAnalysisTK.jar \
     -T VariantFiltration \
     -R $REFERENCE  \
     -V $PWD/GCVF/genotyped_X_samples_snps.vcf \
     --mask $PWD/GCVF/genotyped_X_samples_indels.vcf \
     --maskExtension 5 \
     --maskName InDel \
     --clusterWindowSize 10 \
     --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
     --filterName "BadValidation" \
     --filterExpression "QUAL < 30.0" \
     --filterName "LowQual" \
     --filterExpression "QD < 5.0" \
     --filterName "LowVQCBD" \
     --filterExpression "FS > 60" \
     --filterName "FisherStrand" \
     -o $PWD/GCVF/genotyped_X_samples_filtered_1st.vcf
     
#Get only the pass SNPs

cat $PWD/GCVF/genotyped_X_samples_filtered_1st.vcf | grep 'PASS\|^#' > $PWD/GCVF/genotyped_X_samples_only_PASS_snp.vcf

#Recalibrate Bases

mkdir recal

for sample in $REALIGNED_BAMS
do

#Current sample

    echo "Processing $sample"

#Create a variable with the sample name using the name of "dedup-bam" file
#This uses the "cut" command with "/" as a field delimiter
#In my case, this cuts it into 9 field, keeping #9

    FILE_BASENAME=$(echo $sample | cut -d/ -f9)
    echo $FILE_BASENAME
    TABLE_NAME=$(echo $FILE_BASENAME | cut -d. -f1)'.table'
    echo $TABLE_NAME
      RECAL_OUT=$(echo $FILE_BASENAME | cut -d. -f1)'_recal.bam'
    RECAL_OUT_bai=$(echo $FILE_BASENAME | cut -d. -f1)'_recal.bai'

#Execute GATK command to recalibrate bases

   eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T BaseRecalibrator \
          -R $REFERENCE -I $sample -knownSites $SUBSET/GCVF/genotyped_X_samples_only_PASS_snp.vcf \
          -o $SUBSET/recal/$TABLE_NAME") 
   eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T PrintReads -R $REFERENCE -I $sample \
          -BQSR $SUBSET/recal/$TABLE_NAME -o $SUBSET/recal/$RECAL_OUT")

done

RECAL_BAMS=$SUBSET/recal/*_recal.bam

#Haplotype calling on 1st recalibrated bam

for bam_recal in $RECAL_BAMS
do

#Current sample

    echo "Processing $bam_recal"

#Create a variable with the sample name using the name of "dedup-bam" file
#This uses the "cut" command with "/" as a field delimiter
#In my case, this cuts it into 9 field, keeping #9

    RECAL1_BASENAME=$(echo $bam_recal | cut -d/ -f9)
    echo $RECAL1_BASENAME
    RECAL1_NAME=$SUBSET/recal/$(echo $RECAL1_BASENAME | cut -d. -f1)'.g.vcf'
    echo $RECAL1_NAME

#Execute the GATK command for haplotype call on recalibrated bams.
  
   eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
          -R $REFERENCE -I $bam_recal -o $RECAL1_NAME --emitRefConfidence GVCF \
          --variant_index_type LINEAR --variant_index_parameter 128000 \
          --contamination_fraction_to_filter 0.0002 --min_base_quality_score 20 \
          --phredScaledGlobalReadMismappingRate 30 \
          --standard_min_confidence_threshold_for_calling 40.0 \
          --standard_min_confidence_threshold_for_emitting 40.0")

done

Move on to the genotyped files to perform multiple loops of identifying and filtering SNPs.

#Get the names of the recal vcf files to be used in the next step
ls -d -1 $SUBSET/recal/*_recal.g.vcf > recal_vcf.list

mkdir genotyped

#Genotyping with GVCF in all the variant files produced by HaplotypeCaller gvcf; merges files and contains only variable sites
java -Xmx4g -jar ~/GenomeAnalysisTK.jar  -R $REFERENCE -T GenotypeGVCFs \
     --standard_min_confidence_threshold_for_calling 40.0 \
     --standard_min_confidence_threshold_for_emitting 40.0 \
     -V recal_vcf.list \
     -o $SUBSET/genotyped/genotyped_X_samples_recal.g.vcf

 #Extract the SNPs from the call set
java -jar ~/GenomeAnalysisTK.jar \
          -T SelectVariants \
          -R $REFERENCE  \
          -V $SUBSET/genotyped/genotyped_X_samples_recal.g.vcf \
          -selectType SNP \
          -o $SUBSET/genotyped/genotyped_X_samples_recal_snps.vcf

#Extract the indels from the call set
java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $SUBSET/genotyped/genotyped_X_samples_recal.g.vcf \
     -selectType INDEL \
     -o $SUBSET/genotyped/genotyped_X_samples_recal_indels.vcf

java -jar ~/GenomeAnalysisTK.jar \
     -T VariantFiltration \
     -R $REFERENCE  \
     -V $SUBSET/genotyped/genotyped_X_samples_recal_snps.vcf \
     --mask $SUBSET/genotyped/genotyped_X_samples_recal_indels.vcf \
     --maskExtension 5 \
     --maskName InDel \
     --clusterWindowSize 10 \
     --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
     --filterName "BadValidation" \
     --filterExpression "QUAL < 30.0" \
     --filterName "LowQual" \
     --filterExpression "QD < 5.0" \
     --filterName "LowVQCBD" \
     --filterExpression "FS > 60" \
     --filterName "FisherStrand" \
     -o $SUBSET/genotyped/genotyped_X_samples_filtered_2nd.vcf
     
#Only get the passable SNPs

cat $SUBSET/genotyped/genotyped_X_samples_filtered_2nd.vcf | grep 'PASS\|^#' > $SUBSET/genotyped/genotyped_X_samples_only_PASS_snp_2nd.vcf

#Second base recalibration loop on uncalibrated bams
#ANNOTATION WITHIN LOOP STOPPED

mkdir GCVF2

for sample in $REALIGNED_BAMS
do

    echo "Processing $sample"

    FILE2_BASENAME=$(echo $sample | cut -d/ -f9)
    echo $FILE2_BASENAME
    TABLE2_NAME=$(echo $FILE2_BASENAME | cut -d. -f1)'2.table'
    echo $TABLE2_NAME
    RECAL2_OUT=$(echo $FILE2_BASENAME | cut -d. -f1)'_2recal.bam'
    RECAL2_OUT_bai=$(echo $FILE2_BASENAME | cut -d. -f1)'_2recal.bai'

   eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T BaseRecalibrator \
                     -R $REFERENCE -I $sample \
                     -knownSites $SUBSET/genotyped/genotyped_X_samples_only_PASS_snp_2nd.vcf \
                     -o $SUBSET/GCVF2/$TABLE2_NAME") 
   eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T PrintReads -R $REFERENCE -I $sample \
                     -BQSR $SUBSET/GCVF2/$TABLE2_NAME -o $SUBSET/GCVF2/$RECAL2_OUT")

    echo RECAL_OUT_bai

done

RECAL2_BAMS=$SUBSET/GCVF2/*_2recal.bam

#Haplotype calling on second recalibrated bam

for bam2_recal in $RECAL2_BAMS
do

    echo "Processing $bam2_recal"

    RECAL2_BASENAME=$(echo $bam2_recal | cut -d/ -f9)
    echo $RECAL2_BASENAME
    RECAL2_NAME=$(echo $RECAL2_BASENAME | cut -d. -f1)'.g.vcf'
    echo $RECAL2_NAME

#Haplotype call on second recalibrated bams

   eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
                            -R $REFERENCE -I $bam2_recal -o $SUBSET/GCVF2/$RECAL2_NAME \
                            --emitRefConfidence GVCF --variant_index_type LINEAR \
                            --variant_index_parameter 128000 \
                            --contamination_fraction_to_filter 0.0002 \
                            --min_base_quality_score 20 --phredScaledGlobalReadMismappingRate 30 \
                            --standard_min_confidence_threshold_for_calling 40.0 \
                            --standard_min_confidence_threshold_for_emitting 40.0")
  
done

#Get filelist for next step

ls -d -1 $SUBSET/GCVF2/*_2recal.g.vcf > recal2_vcf.list

#Genotyping with GCVF all of the variant files; merge files and keep only variable sites

java -Xmx4g -jar ~/GenomeAnalysisTK.jar -R $REFERENCE -T GenotypeGVCFs \
            --standard_min_confidence_threshold_for_calling 40.0 \
            --standard_min_confidence_threshold_for_emitting 40.0 \
            -V recal2_vcf.list \
            -o $SUBSET/GCVF2/genotyped_X_samples_2recal.g.vcf

# Extract the SNPs from the call set
java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $SUBSET/GCVF2/genotyped_X_samples_2recal.g.vcf \
     -selectType SNP \
     -o $SUBSET/GCVF2/genotyped_X_samples_2recal_snps.vcf


# Extract the indels from the call set
java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $SUBSET/GCVF2/genotyped_X_samples_2recal.g.vcf \
     -selectType INDEL \
     -o $SUBSET/GCVF2/genotyped_X_samples_2recal_indels.vcf

#Filter SNPs
     
java -jar ~/GenomeAnalysisTK.jar \
     -T VariantFiltration \
     -R $REFERENCE  \
     -V $SUBSET/GCVF2/genotyped_X_samples_2recal_snps.vcf \
     --mask $SUBSET/GCVF2/genotyped_X_samples_2recal_indels.vcf \
     --maskExtension 5 \
     --maskName InDel \
     --clusterWindowSize 10 \
     --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
     --filterName "BadValidation" \
     --filterExpression "QUAL < 30.0" \
     --filterName "LowQual" \
     --filterExpression "QD < 5.0" \
     --filterName "LowVQCBD" \
     --filterExpression "FS > 60" \
     --filterName "FisherStrand" \
     -o $SUBSET/GCVF2/genotyped_X_samples_filtered_3rd.vcf

#Keep only passable SNPs

mkdir GCVF3

cat $SUBSET/GCVF2/genotyped_X_samples_filtered_3rd.vcf | grep 'PASS\|^#' > $SUBSET/GCVF3/genotyped_X_samples_only_PASS_snp_3rd.vcf

#Third base recalibration loop

for sample in $REALIGNED_BAMS
do

    echo "Processing $sample"

    FILE3_BASENAME=$(echo $sample | cut -d/ -f9)
    echo $FILE3_BASENAME
    TABLE3_NAME=$(echo $FILE3_BASENAME | cut -d. -f1)'3.table'
    echo $TABLE3_NAME
    RECAL3_OUT=$(echo $FILE3_BASENAME | cut -d. -f1)'_3recal.bam'
    RECAL3_OUT_bai=$(echo $FILE3_BASENAME | cut -d. -f1)'_3recal.bai'

#Execute the GATK command for base recalibration

   eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T BaseRecalibrator -R $REFERENCE \
                     -I $sample \
                     -knownSites $SUBSET/GCVF3/genotyped_X_samples_only_PASS_snp_3rd.vcf \
                     -o $SUBSET/GCVF3/$TABLE3_NAME") 
   eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T PrintReads -R $REFERENCE -I $sample \
                     -BQSR $SUBSET/GCVF3/$TABLE3_NAME -o $SUBSET/GCVF3/$RECAL3_OUT")

done

RECAL3_BAMS=$SUBSET/GCVF3/*_3recal.bam

#Haplotype callinf on third bam recalibration

for bam3_recal in $RECAL3_BAMS
do

    echo "Processing $bam3_recal"

    RECAL3_BASENAME=$(echo $bam3_recal | cut -d/ -f9)
    echo $RECAL3_BASENAME
    RECAL3_NAME=$(echo $RECAL3_BASENAME | cut -d. -f1)'.g.vcf'
    echo $RECAL3_NAME

#Execute the GATK command for haplotype recalibration
  
   eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
                            -R $REFERENCE -I $bam3_recal \
                            -o $SUBSET/GCVF3/$RECAL3_NAME \
                            --emitRefConfidence GVCF --variant_index_type LINEAR \
                            --variant_index_parameter 128000 \
                            --contamination_fraction_to_filter 0.0002 \
                            --min_base_quality_score 20 \
                            --phredScaledGlobalReadMismappingRate 30 \
                            --standard_min_confidence_threshold_for_calling 40.0 \
                            --standard_min_confidence_threshold_for_emitting 40.0")
  
done

#Get file list for next step

ls -d -1 $SUBSET/GCVF3/*_3recal.g.vcf > recal3_vcf.list

#Genotyping with GCVF

java -Xmx4g -jar ~/GenomeAnalysisTK.jar  -R $REFERENCE -T GenotypeGVCFs \
            --standard_min_confidence_threshold_for_calling 40.0 \
            --standard_min_confidence_threshold_for_emitting 40.0 \
            -V recal3_vcf.list \
            -o $SUBSET/GCVF3/genotyped_X_samples_3recal.g.vcf
            
#Extract the SNPs from the call set

java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $SUBSET/GCVF3/genotyped_X_samples_3recal.g.vcf \
     -selectType SNP \
     -o $SUBSET/GCVF3/genotyped_X_samples_3recal_snps.vcf

#Extract indels from the call set

java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $SUBSET/GCVF3/genotyped_X_samples_3recal.g.vcf \
     -selectType INDEL \
     -o $SUBSET/GCVF3/genotyped_X_samples_3recal_indels.vcf
     
java -jar ~/GenomeAnalysisTK.jar \
     -T VariantFiltration \
     -R $REFERENCE  \
     -V $SUBSET/GCVF3/genotyped_X_samples_3recal_snps.vcf \
     --mask $SUBSET/GCVF3/genotyped_X_samples_3recal_indels.vcf \
     --maskExtension 5 \
     --maskName InDel \
     --clusterWindowSize 10 \
     --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
     --filterName "BadValidation" \
     --filterExpression "QUAL < 30.0" \
     --filterName "LowQual" \
     --filterExpression "QD < 5.0" \
     --filterName "LowVQCBD" \
     --filterExpression "FS > 60" \
     --filterName "FisherStrand" \
     -o $SUBSET/GCVF3/genotyped_X_samples_filtered_4th.vcf

#Get the passable SNPs

mkdir GCVF4

cat $SUBSET/GCVF3/genotyped_X_samples_filtered_4th.vcf | grep 'PASS\|^#' > $SUBSET/GCVF4/genotyped_X_samples_only_PASS_snp_4th.vcf

#Fourth and final recalibration

for sample in $REALIGNED_BAMS
do

    echo "Processing $sample"

    FILE4_BASENAME=$(echo $sample | cut -d/ -f9)
    echo $FILE4_BASENAME
    TABLE4_NAME=$(echo $FILE4_BASENAME | cut -d. -f1)'4.table'
    echo $TABLE4_NAME
    RECAL4_OUT=$(echo $FILE4_BASENAME | cut -d. -f1)'_4recal.bam'
    RECAL4_OUT_bai=$(echo $FILE4_BASENAME | cut -d. -f1)'_4recal.bai'

#Execute the GATK base recalibration command
  
   eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T BaseRecalibrator \
                     -R $REFERENCE -I $sample \
                     -knownSites $SUBSET/GCVF4/genotyped_X_samples_only_PASS_snp_4th.vcf \
                     -o $SUBSET/GCVF4/$TABLE4_NAME") 
   eval $(echo "java -jar ~/GenomeAnalysisTK.jar -T PrintReads -R $REFERENCE -I $sample \
                     -BQSR $SUBSET/GCVF4/$TABLE4_NAME -o $SUBSET/GCVF4/$RECAL4_OUT")

done

RECAL4_BAMS=$SUBSET/GCVF4/*_4recal.bam

#Haplotype calling on fourth recalibrated bam

for bam4_recal in $RECAL4_BAMS
do

    echo "Processing $bam4_recal"

    RECAL4_BASENAME=$(echo $bam4_recal | cut -d/ -f9)
    echo $RECAL4_BASENAME
    RECAL4_NAME=$(echo $RECAL4_BASENAME | cut -d. -f1)'.g.vcf'
    echo $RECAL4_NAME

#Execute the haplotype call command in GATK

   eval $(echo "java -Xmx4g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller \
                            -R $REFERENCE -I $bam4_recal \
                            -o $SUBSET/GCVF4/$RECAL4_NAME \
                            --emitRefConfidence GVCF --variant_index_type LINEAR \
                            --variant_index_parameter 128000 \
                            --contamination_fraction_to_filter 0.0002 \
                            --min_base_quality_score 20 \
                            --phredScaledGlobalReadMismappingRate 30 \
                            --standard_min_confidence_threshold_for_calling 40.0 \
                            --standard_min_confidence_threshold_for_emitting 40.0")
  
done

#Get list of files from fourth loop for the next step

ls -d -1 $SUBSET/GCVF4/*_4recal.g.vcf > recal4_vcf.list

#Genotyping with GCVF; merge files and keep only the variable sites

java -Xmx4g -jar ~/GenomeAnalysisTK.jar  -R $REFERENCE -T GenotypeGVCFs \
            --standard_min_confidence_threshold_for_calling 40.0 \
            --standard_min_confidence_threshold_for_emitting 40.0 \
            -V recal4_vcf.list \
            -o $SUBSET/GCVF4/genotyped_X_samples_4recal.g.vcf

#Extract SNPs from the call set

java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $SUBSET/GCVF4/genotyped_X_samples_4recal.g.vcf \
     -selectType SNP \
     -o $SUBSET/GCVF4/genotyped_X_samples_4recal_snps.vcf

#Extract indels from the call set

java -jar ~/GenomeAnalysisTK.jar \
     -T SelectVariants \
     -R $REFERENCE  \
     -V $SUBSET/GCVF4/genotyped_X_samples_4recal.g.vcf \
     -selectType INDEL \
     -o $SUBSET/GCVF4/genotyped_X_samples_4recal_indels.vcf
     
java -jar ~/GenomeAnalysisTK.jar \
     -T VariantFiltration \
     -R $REFERENCE  \
     -V $SUBSET/GCVF4/genotyped_X_samples_4recal_snps.vcf \
     --mask $SUBSET/GCVF4/genotyped_X_samples_4recal_indels.vcf \
     --maskExtension 5 \
     --maskName InDel \
     --clusterWindowSize 10 \
     --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
     --filterName "BadValidation" \
     --filterExpression "QUAL < 30.0" \
     --filterName "LowQual" \
     --filterExpression "QD < 5.0" \
     --filterName "LowVQCBD" \
     --filterExpression "FS > 60" \
     --filterName "FisherStrand" \
     -o $SUBSET/GCVF4/genotyped_X_samples_filtered_5th.vcf

#Get only passable SNPs

mkdir GCVF5

cat $SUBSET/GCVF4/genotyped_X_samples_filtered_5th.vcf | grep 'PASS\|^#' > $SUBSET/GCVF5/genotyped_X_samples_only_PASS_snp_5th.vcf

We used the last pass - genotyped_X_samples_only_PASS_snp_5th.vcf - for our downstream analyses. Zarza et al. (2016) note that the *4recal.bam files can be used as input for ANGSD.

Next, we can create a summary text file that will look at the average depth per site.

mkdir $SUBSET/vcftools

cd vcftools

cp ../GCVF5/genotyped_X_samples_only_PASS_snp_5th.vcf

vcftools --vcf $SUBSET/vcftools/genotyped_X_samples_only_PASS_snp_5th.vcf --depth \
      -c > $SUBSET/vcftools/depth_summary.txt

Next, we had to convert from .vcf to SNAPP and structure formats.

#Due to previous steps, nothing missing
#No missing flag just to be safe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --hwe 0.1 --max-missing 1 --012 --out nomiss_900_hwe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --hwe 0.1 --max-missing 1 --recode --out nomiss_900_hwe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --max-missing 1 --012 --out nomiss_900_no-hwe

From the output of VCF, the implementation of the Hardy-Weinberg filter removes 246 sites.

#With HWE filter

Parameters as interpreted:
        --vcf genotyped_X_samples_only_PASS_snp_5th.vcf
        --max-alleles 2
        --hwe 0.1
        --thin 900
        --max-missing 1
        --012
        --out nomiss_900_hwe

After filtering, kept 24 out of 24 Individuals
Writing 012 matrix files ... Done.
After filtering, kept 3370 out of a possible 69982 Sites
Run Time = 1.00 seconds

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

#without HWE filter

Parameters as interpreted:
        --vcf genotyped_X_samples_only_PASS_snp_5th.vcf
        --thin 900
        --max-missing 1
        --012
        --out nomiss_900_no-hwe

After filtering, kept 24 out of 24 Individuals
Writing 012 matrix files ...    012: Only outputting biallelic loci.
Done.
After filtering, kept 3616 out of a possible 69982 Sites
Run Time = 1.00 seconds

Using iterations of the above code, we determined how many SNPs remained for different thinning windows. we used windows of 10 bp from 10 to 260, and further performed reductions of 270, 360, 450, 540, 630, 720, 810, and 900. we used 900 as this is the window that was used by Zarza et al.. We had difficulty determining which was the best window, so we used two different files for my analyses: 170 and 900. We used 170 as this is the point at which the number of SNPs being reduced ‘levels out’. This was done before the HWE thinning, but reflects the overall behavior of the data.

#Note, not writing entire number string to save space in document
#"X" is number string by tens "10 20 30 ... 260 270"
#after 270, by 90s from 270 to 900

#This will rewrite files but print the number of kept SNPs to the terminal screen

for VAR in X
do
  vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin $VAR --max-missing 1 --012 --out test
done

#Plot output in R
x1=seq(from=10,to=260,by=10)
x2=seq(from=270,to=900,by=90)
x=c(x1,x2)

#Outputs from VCF program
y=c(10606,9570,8821,8223,7727,7312,
    6968,6683,6438,6201,6011,5855,
    5714,5575,5444,5317,5223,5126,
    5032,4948,4846,4753,4651,4567,
    4472,4373,4280,3731,3637,3629,
    3622,3617,3616,3616)

#Check if dimensions equal
#length(x)==length(y)

plot(x=x,y=y,pch=19)

plot(x=x[1:9],y=y[1:9],pch=19)

y2=y/69982

x2=x/900

plot(x=x2,y=y2,pch=19)

plot(x=x2[1:9],y=y2[1:9],pch=19)

y3=NULL
for(i in 1:length(y)){
  if(i==1){y3[i]=0}else{
    y3[i]=y[i]-y[i-1]
  }
}

plot(y=y3[2:27],x=x[2:27],pch=19,
     main="SNPs removed",xlab="Thin Window",
     ylab="# SNPs removed")

#Print matrix of number removed.
cbind(x,y,y3)

##         x     y    y3
##  [1,]  10 10606     0
##  [2,]  20  9570 -1036
##  [3,]  30  8821  -749
##  [4,]  40  8223  -598
##  [5,]  50  7727  -496
##  [6,]  60  7312  -415
##  [7,]  70  6968  -344
##  [8,]  80  6683  -285
##  [9,]  90  6438  -245
## [10,] 100  6201  -237
## [11,] 110  6011  -190
## [12,] 120  5855  -156
## [13,] 130  5714  -141
## [14,] 140  5575  -139
## [15,] 150  5444  -131
## [16,] 160  5317  -127
## [17,] 170  5223   -94
## [18,] 180  5126   -97
## [19,] 190  5032   -94
## [20,] 200  4948   -84
## [21,] 210  4846  -102
## [22,] 220  4753   -93
## [23,] 230  4651  -102
## [24,] 240  4567   -84
## [25,] 250  4472   -95
## [26,] 260  4373   -99
## [27,] 270  4280   -93
## [28,] 360  3731  -549
## [29,] 450  3637   -94
## [30,] 540  3629    -8
## [31,] 630  3622    -7
## [32,] 720  3617    -5
## [33,] 810  3616    -1
## [34,] 900  3616     0

Running again for 170 bases.

#Due to previous steps, nothing missing
#No missing flag just to be safe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --hwe 0.1 --max-missing 1 --012 --out nomiss_170_hwe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --max-missing 1 --012 --out nomiss_170_no-hwe

From the output of VCF:

#With HWE filter

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        --vcf genotyped_X_samples_only_PASS_snp_5th.vcf
        --max-alleles 2
        --hwe 0.1
        --thin 170
        --max-missing 1
        --012
        --out nomiss_170_hwe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --max-missing 1 --012 --out nomiss_900_hwe
After filtering, kept 24 out of 24 Individuals
Writing 012 matrix files ... Done.
After filtering, kept 4540 out of a possible 69982 Sites
Run Time = 1.00 seconds

#without HWE filter

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        --vcf genotyped_X_samples_only_PASS_snp_5th.vcf
        --thin 170
        --max-missing 1
        --012
        --out nomiss_170_hwe

After filtering, kept 24 out of 24 Individuals
Writing 012 matrix files ...    012: Only outputting biallelic loci.
Done.
After filtering, kept 5223 out of a possible 69982 Sites
Run Time = 2.00 seconds

Also save as plink files.

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --hwe 0.1 --max-missing 1 --plink --out PLINK_nomiss_900_hwe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 900 --max-missing 1 --plink --out PLINK_nomiss_900_no-hwe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --hwe 0.1 --max-missing 1 --plink --out PLINK_nomiss_170_hwe

vcftools --vcf genotyped_X_samples_only_PASS_snp_5th.vcf --thin 170 --max-missing 1 --plink --out PLINK_nomiss_170_no-hwe

Convert .012 to alternative formats

The following converts the file to structure format, and is from Zarza et al..

#This script converts the vcf file coded as 012 (output of vcftools) to 1 line per individual and two columns per locus structure format. It requires the *.indv vcftools output with taxon labels

#delete first column of file, as it contains individual numerical id by printing from 2nd column to last

cut -f 2- nomiss_170_hwe.012 > f170_vcf_012_hwe_wo_id.txt
cut -f 2- nomiss_900_hwe.012 > f900_vcf_012_hwe_wo_id.txt

#replace 012 for 01 coding, and -9 for -1 for missing data. This should be done before adding taxa names which might contain numbers in the labels
 
#Take output from here to create 012 nexus files in Notepad ++ Notepadqq
 
sed -e 's/-1/-9 -9/g' \
-e 's/0/0 0/g' \
-e 's/1/0 1/g' \
-e 's/2/1 1/g' f170_vcf_012_hwe_wo_id.txt > f170_structure_01_hwe_woID.txt

sed -e 's/-1/-9 -9/g' \
-e 's/0/0 0/g' \
-e 's/1/0 1/g' \
-e 's/2/1 1/g' f900_vcf_012_hwe_wo_id.txt > f900_structure_01_hwe_woID.txt

#optional get number of columns (= number of loci x2 from vcf file).
#head -1 structure_01_woID.txt | wc -w

#paste individual id name from vcf *.indv

paste -d "\t" nomiss_170_hwe.012.indv f170_structure_01_hwe_woID.txt > structure012_170_hwe.txt
paste -d "\t" nomiss_900_hwe.012.indv f900_structure_01_hwe_woID.txt > structure012_900_hwe.txt

ABBA/BABA Gene Flow

ABBA/BABA gene flow tests were performed in \(angsd\), with more details available from the angsd website.

Commands to run, based out of my programs folder and referencing my specific folders:

Merge relevant files to facilitate program:

#merge following this format

cd ~/uce-cinnyris/taxon-sets/cinnyris/abbababa

samtools merge \
    ./genderuensis-merge.bam \
    ../recal/FMNH122395_Cinnyris_genderuensis_realigned_recal.bam \
    ../recal/FMNH189462_Cinnyris_genderuensis_realigned_recal.bam

samtools merge \
    ./reichenowi-merge.bam \
    ../recal/FMNH358156_Cinnyris_reichenowi_realigned_recal.bam \
    ../recal/FMNH358157_Cinnyris_reichenowi_realigned_recal.bam \
    ../recal/FMNH443947_Cinnyris_reichenowi_realigned_recal.bam \
    ../recal/FMNH481236_Cinnyris_reichenowi_realigned_recal.bam

#etc to create groups you want

Executing the angsd ABBA/BABA program:

cd ~/uce-cinnyris/taxon-sets/cinnyris/abbababa

angcd=~/programs/angsd
abbacd=~/uce-cinnyris/taxon-sets/cinnyris/abbababa

cd $abbacd

~/programs/angsd/angsd -doAbbababa 1 \
  -bam ~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_fullset.txt \
  -doCounts 1 \
  -useLast 1 \
  -blockSize 500 \
  -minQ 30 \
  -minmapQ 30 \
  -out ~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/fullset_500 \
  -checkBamHeaders 0
 

~/programs/angsd/angsd -doAbbababa 1 \
  -bam ~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_westset.txt \
  -doCounts 1 \
  -useLast 1 \
  -blockSize 500 \
  -minQ 30 \
  -minmapQ 30 \
  -out ~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/westset_500 \
  -checkBamHeaders 0

Rscript ~/programs/angsd/R/jackKnife.R \
        file=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/fullset_500.abbababa \
        indNames=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_fullset_noout.txt \
        outfile=fullset_500_jackknife
Rscript ~/programs/angsd/R/jackKnife.R \
        file=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/westset_500.abbababa \
        indNames=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_westset_noout.txt \
        outfile=westset_500_jackknife
Rscript ~/programs/angsd/R/jackKnife.R \
        file=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/output/mountains_500.abbababa \
        indNames=~/uce-cinnyris/taxon-sets/cinnyris/abbababa/abbababa_mountains_noout.txt \
        outfile=mountains_500_jackknife

Genetic Analyses in `LEA`

Population genetic analyses were performed in the \(R\) package LEA.

Loading `LEA`

LEA cannot be loaded like a normal package and requires a special installation procedure. The following must be run the first time you install LEA on your computer:

##Required installation
##Linux computer

#run first time

#clear working directory
rm(list=ls())

#install required packages
install.packages(c("fields","RColorBrewer","mapplots"))

#download lea from source
source("http://bioconductor.org/biocLite.R")

#install LEA to R
biocLite("LEA")

Every subsequent time LEA is run on your machine, you need to run the following:

#run every time

rm(list=ls())
library(LEA)

## 
## Attaching package: 'LEA'

## The following object is masked from 'package:lattice':
## 
##     barchart

source("http://membres-timc.imag.fr/Olivier.Francois/Conversion.R")
source("http://membres-timc.imag.fr/Olivier.Francois/POPSutilities.R")

## [1] "Loading fields"

## Loading required package: fields

## Loading required package: spam

## Loading required package: dotCall64

## Loading required package: grid

## Spam version 2.5-1 (2019-12-12) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.

## 
## Attaching package: 'spam'

## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve

## See https://github.com/NCAR/Fields for
##  an extensive vignette, other supplements and source code

## [1] "Loading RColorBrewer"

## Loading required package: RColorBrewer

## Warning in helpPops(): Available functions: 
## 
##  HELP: 
##    * helpPops() 
## 
##  
##  SHOW EXAMPLE: 
##    * Open the R script scriptExample.r 
## 
##  
##  CORRELATION UTILITIES: 
##  Compute correlation between matrix of membership/admixture coefficients (from matrix or from POPS outputs) 
##    * correlation(matrix1,matrix2,plot=TRUE,colors=defaultPalette) 
##    * correlationFromPops(file1,file2,nind,nskip1=2,nskip2=2,plot=TRUE,colors=defaultPalette) 
## 
##  
##  BARPLOT UTILITIES: 
##  Display barplot of membership/admixture coefficients (from matrix or from POPS output) 
##  * barplotCoeff(matrix,colors=defaultPalette,...) 
##  * barplotFromPops(file1,nind,nskip1=2,colors=defaultPalette,...) 
## 
##  
##  MAPS UTILITIES: 
##  Display maps of membership/admixture coefficients (from matrix or from POPS output) 
##    * maps(matrix,coord,grid,constraints=NULL,method="treshold",colorGradientsList=lColorGradients,onemap=T,onepage=T,...) 
##    * mapsFromPops(file,nind,nskip=2,coord,grid,constraints=NULL,method="treshold",colorGradientsList=lColorGradients,onemap=T,onepage=T,...) 
##  Create grid on which coefficients will be displayed 
##    * createGrid(min_long,max_long,min_lat,max_lat,npixels_long,npixels_lat) 
##    * createGridFromAsciiRaster(file) 
##    * getConstraintsFromAsciiRaster(file,cell_value_min=NULL,cell_value_max=NULL) 
##  Legend for maps 
##    * displayLegend(K=NULL,colorGradientsList=lColorGradients)

LEA requires special file formats for performing its analyses; the structure files created in the previous pipeline are acceptable inputs for initiating LEA analyses. I am doing these analyses with and without C. regius included within the dataset.

First run; SNPs are at least 900 BP apart.

x="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/structure012_900_hwe_ordered_noregius.txt"
y="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno"

#Write Geno File
#Data is diploid
#Format = 1; all data in one row
#Extra.row = 0; no extra rows
#Extra column = 1; there is a column of individual IDs

struct2geno(file=x,output.format='geno',output=y,
            diploid=T,FORMAT=1,extra.row=0,extra.col=1)

#geno2lfmm struggles with full filepaths
#set directory to run properly

setwd("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/")
geno2lfmm("Cinnyris-900-geno_noregius.geno")

#Repeating for regius included
x="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/structure012_900_hwe_ordered.txt"
y="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno"

#Write Geno File
#Data is diploid
#Format = 1; all data in one row
#Extra.row = 0; no extra rows
#Extra column = 1; there is a column of individual IDs

struct2geno(file=x,output.format='geno',output=y,
            diploid=T,FORMAT=1,extra.row=0,extra.col=1)

setwd("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/")
geno2lfmm("Cinnyris-900-geno.geno")

Second run: SNPs are 170 bp apart.

x="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/structure012_170_hwe_ordered_noregius.txt"
y="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno"

#Write Geno File
#Data is diploid
#Format = 1; all data in one row
#Extra.row = 0; no extra rows
#Extra column = 1; there is a column of individual IDs

struct2geno(file=x,output.format='geno',output=y,
            diploid=T,FORMAT=1,extra.row=0,extra.col=1)

#geno2lfmm struggles with full filepaths
#set directory to run properly

setwd("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/")
geno2lfmm("Cinnyris-170-geno_noregius.geno")

#Repeating for regius included
x="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/structure012_170_hwe_ordered.txt"
y="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno.geno"

#Write Geno File
#Data is diploid
#Format = 1; all data in one row
#Extra.row = 0; no extra rows
#Extra column = 1; there is a column of individual IDs

struct2geno(file=x,output.format='geno',output=y,
            diploid=T,FORMAT=1,extra.row=0,extra.col=1)

setwd("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/")
geno2lfmm("Cinnyris-170-geno.geno")

Assign files variables for further analysis.

geno.170="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno.geno"
lfmm.170="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno.lfmm"

geno.900="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno"
lfmm.900="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.lfmm"

noreg.geno.170="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno"
noreg.lfmm.170="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.lfmm"

noreg.geno.900="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno"
noreg.lfmm.900="~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.lfmm"

SNMF Analyses

SNMF(sparse Non-Negative Matrix Factorization) analyses look at the structure of the population across different populations. We can start by doing a quick structure analysis with three populations, just like the LEA tutorial suggests.

obj.snmf=snmf(geno.900,K=3,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             24
##         -L (number of loci)                    6740
##         -K (number of ancestral pops)          3
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K3/run1/Cinnyris-900-geno_r1.3.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K3/run1/Cinnyris-900-geno_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1698003345
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno:      OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [===========]
## Number of iterations: 29
## 
## Least-square error: 15619.927600
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K3/run1/Cinnyris-900-geno_r1.3.Q:     OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K3/run1/Cinnyris-900-geno_r1.3.G:  OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-900-geno.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")

qmatrix=Q(obj.snmf,K=3)

barplot(t(qmatrix),col=c("#000000","#ffa500","#1f2887"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

obj.snmf=snmf(geno.900,K=4,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 4  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             24
##         -L (number of loci)                    6740
##         -K (number of ancestral pops)          4
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K4/run1/Cinnyris-900-geno_r1.4.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K4/run1/Cinnyris-900-geno_r1.4.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  2200132065897
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno:      OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [================]
## Number of iterations: 42
## 
## Least-square error: 14730.906300
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K4/run1/Cinnyris-900-geno_r1.4.Q:     OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K4/run1/Cinnyris-900-geno_r1.4.G:  OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-900-geno.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")

qmatrix=Q(obj.snmf,K=4)

barplot(t(qmatrix),col=c("#000000","#ffa500","#1f2887","#e31a1c"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

obj.snmf=snmf(geno.900,K=5,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             24
##         -L (number of loci)                    6740
##         -K (number of ancestral pops)          5
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K5/run1/Cinnyris-900-geno_r1.5.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K5/run1/Cinnyris-900-geno_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  7234318570481357565
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.geno:      OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [===========]
## Number of iterations: 30
## 
## Least-square error: 13459.401039
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K5/run1/Cinnyris-900-geno_r1.5.Q:     OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.snmf/K5/run1/Cinnyris-900-geno_r1.5.G:  OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-900-geno.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno.snmfProject")

qmatrix=Q(obj.snmf,K=5)

barplot(t(qmatrix),col=c("#1f2887","#808080","#ffa500","#000000","#e31a1c"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

I’m having trouble getting the colors right, but note that C. regius block is being split before the C. reichenowi groups. Let’s switch to the no regius data.

obj.snmf=snmf(noreg.geno.900,K=2,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 2  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          2
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1998485743
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==========]
## Number of iterations: 26
## 
## Least-square error: 8992.360308
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G:    OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")

qmatrix=Q(obj.snmf,K=2)

barplot(t(qmatrix),col=c("#000000","#1f2887"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

obj.snmf=snmf(noreg.geno.900,K=3,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          3
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1399607943
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [====================]
## Number of iterations: 53
## 
## Least-square error: 8028.784397
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G:    OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")

qmatrix=Q(obj.snmf,K=3)

barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

One individual is pure genderuensis, other individuals appear to be admixed. The most red individuals in order are from 1) Mt. Genderu, Adamawa; 2) Yaounde, Centre; and 3) Babadjou, Ouest.

obj.snmf=snmf(noreg.geno.900,K=4,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 4  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          4
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  792621406
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [========================]
## Number of iterations: 63
## 
## Least-square error: 7101.948983
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G:    OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")

qmatrix=Q(obj.snmf,K=4)

barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887","purple"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

Running the above results in multiple splits of either two populations in the east (variable) or two populations in the west (interior vs. Bioko).

obj.snmf=snmf(noreg.geno.900,K=5,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          5
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  2019936637
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [================================]
## Number of iterations: 86
## 
## Least-square error: 6102.370974
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G:    OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")

qmatrix=Q(obj.snmf,K=5)

barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887","purple","gold"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

Four populations begins subdividing the eastern population.

We can try this for the 170 bp dataset as well to see how it compares.

obj.snmf=snmf(noreg.geno.170,K=3,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-170-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    7358
##         -K (number of ancestral pops)          3
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K3/run1/Cinnyris-170-geno_noregius_r1.3.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K3/run1/Cinnyris-170-geno_noregius_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  812977239
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==============]
## Number of iterations: 38
## 
## Least-square error: 10602.123334
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K3/run1/Cinnyris-170-geno_noregius_r1.3.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K3/run1/Cinnyris-170-geno_noregius_r1.3.G:    OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-170-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")

qmatrix=Q(obj.snmf,K=3)

barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

Results are essentially identical.

obj.snmf=snmf(noreg.geno.170,K=5,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-170-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    7358
##         -K (number of ancestral pops)          5
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K5/run1/Cinnyris-170-geno_noregius_r1.5.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K5/run1/Cinnyris-170-geno_noregius_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  104288053
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [============]
## Number of iterations: 33
## 
## Least-square error: 8108.938590
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K5/run1/Cinnyris-170-geno_noregius_r1.5.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-170-geno_noregius.snmf/K5/run1/Cinnyris-170-geno_noregius_r1.5.G:    OK.
## 
## The project is saved into :
##  Sequences/Cinnyris-170-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-170-geno_noregius.snmfProject")

qmatrix=Q(obj.snmf,K=5)

barplot(t(qmatrix),col=c("#000000","#e31a1c","#1f2887","purple","gold"),border=NA,space=0,
        xlab="INDIVIDUALS",ylab="ADMIXTURE")

Again, results are virtually identical.

We can also look at \(\alpha\) levels to see how everything compares.

For \(\alpha=1\):

## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] 1936746251
## [1] "*************************************"
## [1] "*          create.dataset            *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)                 14
##         -L (number of loci)                        5475
##         -s (seed random init)                      1936746251
##         -r (percentage of masked data)             0.05
##         -x (genotype file in .geno format)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -o (output file in .geno format)           /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## 
##  Write genotype file with masked data, /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## [1] "*************************************"
## [1] "* sNMF K = 1  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          1
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          1
##         -s (seed random init)                  1936746251
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
## 
## Least-square error: 10757.715416
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      1
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.252479
## Cross-Entropy (masked data):  0.586075
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 2  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          2
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          1
##         -s (seed random init)                  1936746251
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==========]
## Number of iterations: 26
## 
## Least-square error: 8993.238989
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      2
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.194479
## Cross-Entropy (masked data):  0.573637
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          3
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          1
##         -s (seed random init)                  1936746251
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==============]
## Number of iterations: 37
## 
## Least-square error: 8004.652741
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      3
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.174729
## Cross-Entropy (masked data):  0.60261
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 4  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          4
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          1
##         -s (seed random init)                  1936746251
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [===========]
## Number of iterations: 30
## 
## Least-square error: 6843.762563
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      4
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.147341
## Cross-Entropy (masked data):  0.629595
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          5
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          1
##         -s (seed random init)                  1936746251
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [===========================================================================]
## Number of iterations: 200
## 
## Least-square error: 6210.122024
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      5
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.140718
## Cross-Entropy (masked data):  0.718848
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 6  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          6
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          1
##         -s (seed random init)                  36285820462859
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=========]
## Number of iterations: 23
## 
## Least-square error: 5112.272207
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      6
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.113556
## Cross-Entropy (masked data):  0.744037
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")

For \(\alpha=50\):

#Alpha=50
obj.snmf=snmf(noreg.geno.900,K=1:6,ploidy=2,entropy=T,alpha=50,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] 2058102490
## [1] "*************************************"
## [1] "*          create.dataset            *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)                 14
##         -L (number of loci)                        5475
##         -s (seed random init)                      2058102490
##         -r (percentage of masked data)             0.05
##         -x (genotype file in .geno format)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -o (output file in .geno format)           /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## 
##  Write genotype file with masked data, /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## [1] "*************************************"
## [1] "* sNMF K = 1  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          1
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          50
##         -s (seed random init)                  2058102490
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
## 
## Least-square error: 10704.001133
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      1
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.251244
## Cross-Entropy (masked data):  0.631556
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 2  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          2
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          50
##         -s (seed random init)                  2058102490
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=======]
## Number of iterations: 20
## 
## Least-square error: 8947.097701
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      2
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.192978
## Cross-Entropy (masked data):  0.628353
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          3
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          50
##         -s (seed random init)                  2058102490
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==========]
## Number of iterations: 26
## 
## Least-square error: 7828.695566
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      3
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.166795
## Cross-Entropy (masked data):  0.650764
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 4  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          4
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          50
##         -s (seed random init)                  7236837123484821210
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=========]
## Number of iterations: 24
## 
## Least-square error: 6837.193442
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      4
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.146896
## Cross-Entropy (masked data):  0.693897
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          5
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          50
##         -s (seed random init)                  8647192761586165466
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==================]
## Number of iterations: 47
## 
## Least-square error: 6300.361057
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      5
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.128977
## Cross-Entropy (masked data):  0.732612
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 6  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          6
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          50
##         -s (seed random init)                  2058102490
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=======================]
## Number of iterations: 62
## 
## Least-square error: 5225.340499
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      6
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.112312
## Cross-Entropy (masked data):  0.791253
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")

plot(obj.snmf,col='black',cex=1.5,pch=19,main="Alpha=50")

For \(\alpha=100\):

#Alpha=100
obj.snmf=snmf(noreg.geno.900,K=1:6,ploidy=2,entropy=T,alpha=100,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] 1881905503
## [1] "*************************************"
## [1] "*          create.dataset            *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)                 14
##         -L (number of loci)                        5475
##         -s (seed random init)                      1881905503
##         -r (percentage of masked data)             0.05
##         -x (genotype file in .geno format)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -o (output file in .geno format)           /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## 
##  Write genotype file with masked data, /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## [1] "*************************************"
## [1] "* sNMF K = 1  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          1
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  7356074838603831647
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
## 
## Least-square error: 10685.286845
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      1
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.251802
## Cross-Entropy (masked data):  0.594754
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 2  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          2
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  4837294914291865951
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=========]
## Number of iterations: 23
## 
## Least-square error: 8981.101753
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      2
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.193601
## Cross-Entropy (masked data):  0.599374
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          3
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1881905503
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==========]
## Number of iterations: 28
## 
## Least-square error: 7832.819973
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      3
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.167141
## Cross-Entropy (masked data):  0.621968
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 4  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          4
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1756845967604947295
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=======================]
## Number of iterations: 61
## 
## Least-square error: 7083.226215
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      4
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.145969
## Cross-Entropy (masked data):  0.633284
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          5
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1881905503
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [===================================================]
## Number of iterations: 137
## 
## Least-square error: 6159.090705
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      5
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.127436
## Cross-Entropy (masked data):  0.704299
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 6  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          6
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1881905503
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [============================]
## Number of iterations: 74
## 
## Least-square error: 5338.289703
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      6
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.112088
## Cross-Entropy (masked data):  0.743308
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")

plot(obj.snmf,col='black',cex=1.5,pch=19,main="Alpha=100")

For \(\alpha=500\):

#Alpha=500
obj.snmf=snmf(noreg.geno.900,K=1:6,ploidy=2,entropy=T,alpha=500,project="new")

## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] 303906406
## [1] "*************************************"
## [1] "*          create.dataset            *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)                 14
##         -L (number of loci)                        5475
##         -s (seed random init)                      303906406
##         -r (percentage of masked data)             0.05
##         -x (genotype file in .geno format)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -o (output file in .geno format)           /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
## 
##  Write genotype file with masked data, /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## [1] "*************************************"
## [1] "* sNMF K = 1  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          1
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          500
##         -s (seed random init)                  303906406
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
## 
## Least-square error: 10697.143991
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      1
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K1/run1/Cinnyris-900-geno_noregius_r1.1.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.251403
## Cross-Entropy (masked data):  0.61752
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 2  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          2
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          500
##         -s (seed random init)                  303906406
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [======]
## Number of iterations: 16
## 
## Least-square error: 8977.209101
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      2
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K2/run1/Cinnyris-900-geno_noregius_r1.2.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.192563
## Cross-Entropy (masked data):  0.598772
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          3
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          500
##         -s (seed random init)                  303906406
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [========]
## Number of iterations: 22
## 
## Least-square error: 7851.097877
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      3
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K3/run1/Cinnyris-900-geno_noregius_r1.3.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.166293
## Cross-Entropy (masked data):  0.619389
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 4  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          4
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          500
##         -s (seed random init)                  303906406
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [======]
## Number of iterations: 17
## 
## Least-square error: 6919.801305
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      4
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K4/run1/Cinnyris-900-geno_noregius_r1.4.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.146621
## Cross-Entropy (masked data):  0.6787
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          5
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          500
##         -s (seed random init)                  303906406
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==========]
## Number of iterations: 26
## 
## Least-square error: 5983.418952
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      5
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K5/run1/Cinnyris-900-geno_noregius_r1.5.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.12837
## Cross-Entropy (masked data):  0.719034
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 6  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             14
##         -L (number of loci)                    5475
##         -K (number of ancestral pops)          6
##         -x (input file)                        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         -q (individual admixture file)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
##         -g (ancestral frequencies file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          500
##         -s (seed random init)                  303906406
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=====================]
## Number of iterations: 56
## 
## Least-square error: 5269.647621
## Write individual ancestry coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q:       OK.
## Write ancestral allele frequency coefficient file /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G:    OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         14
##         -L (number of loci)                5475
##         -K (number of ancestral pops)      6
##         -x (genotype file)                 /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.geno
##         -q (individual admixture)          /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.Q
##         -g (ancestral frequencies)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/K6/run1/Cinnyris-900-geno_noregius_r1.6.G
##         -i (with masked genotypes)         /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.snmf/masked/Cinnyris-900-geno_noregius_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.105715
## Cross-Entropy (masked data):  0.794926
## The project is saved into :
##  Sequences/Cinnyris-900-geno_noregius.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("Sequences/Cinnyris-900-geno_noregius.snmfProject")

plot(obj.snmf,col='black',cex=1.5,pch=19,main="Alpha=500")

Two groups is the most likely scenario for almost all \(\alpha\) levels. One group is slightly more likely than three as a backup, suggesting that whole group separation is not yet possible.

PCA Analyses

We can also perform PCA analyses on the data to view the inherit variation. We will start with the full dataset.

pca.all=pca(lfmm.900)

## [1] "******************************"
## [1] " Principal Component Analysis "
## [1] "******************************"
## summary of the options:
## 
##         -n (number of individuals)          24
##         -L (number of loci)                 6740
##         -K (number of principal components) 24
##         -x (genotype file)                  /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.lfmm
##         -a (eigenvalue file)                /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.pca/Cinnyris-900-geno.eigenvalues
##         -e (eigenvector file)               /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.pca/Cinnyris-900-geno.eigenvectors
##         -d (standard deviation file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.pca/Cinnyris-900-geno.sdev
##         -p (projection file)                /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno.pca/Cinnyris-900-geno.projections
##         -c data centered

summary(pca.all)

## Importance of components:

##                              PC1        PC2        PC3        PC4        PC5
## Standard deviation     9.1036900 6.95271000 5.17214000 4.71181000 4.61898000
## Proportion of Variance 0.1700916 0.09920992 0.05490212 0.04556417 0.04378659
## Cumulative Proportion  0.1700916 0.26930154 0.32420366 0.36976783 0.41355441
##                               PC6        PC7        PC8        PC9       PC10
## Standard deviation     4.57264000 4.54291000 4.37712000 4.25993000 4.19482000
## Proportion of Variance 0.04291238 0.04235611 0.03932105 0.03724383 0.03611393
## Cumulative Proportion  0.45646679 0.49882290 0.53814395 0.57538777 0.61150170
##                              PC11       PC12       PC13       PC14       PC15
## Standard deviation     4.14640000 4.09217000 4.01732000 3.99482000 3.94513000
## Proportion of Variance 0.03528513 0.03436808 0.03312239 0.03275246 0.03194273
## Cumulative Proportion  0.64678683 0.68115490 0.71427730 0.74702976 0.77897248
##                              PC16       PC17       PC18       PC19       PC20
## Standard deviation     3.80242000 3.78511000 3.75782000 3.68345000 3.65494000
## Proportion of Variance 0.02967344 0.02940399 0.02898138 0.02784567 0.02741622
## Cumulative Proportion  0.80864593 0.83804991 0.86703129 0.89487696 0.92229317
##                              PC21       PC22       PC23         PC24
## Standard deviation     3.59893000 3.56889000 3.48904000 1.816330e-06
## Proportion of Variance 0.02658245 0.02614052 0.02498385 6.770767e-15
## Cumulative Proportion  0.94887563 0.97501615 1.00000000 1.000000e+00

plot(pca.all)

pca.all.proj=as.data.frame(pca.all$projections)

Next (hidden here) assign populations to the PCA data.

samples=c("FMNH346623_Cinnyris_regius",
          "FMNH346624_Cinnyris_regius",
          "FMNH356179_Cinnyris_regius",
          "FMNH356181_Cinnyris_regius",
          "FMNH385275_Cinnyris_regius",
          "FMNH385276_Cinnyris_regius",
          "FMNH450580_Cinnyris_regius",
          "FMNH450581_Cinnyris_regius",
          "FMNH481235_Cinnyris_regius",
          "FMNH438857_Cinnyris_regius",
          "FMNH358156_Cinnyris_reichenowi",
          "FMNH358157_Cinnyris_reichenowi",
          "FMNH443947_Cinnyris_reichenowi",
          "FMNH481236_Cinnyris_reichenowi",
          "FMNH122395_Cinnyris_genderuensis",
          "FMNH189462_Cinnyris_genderuensis",
          "FMNH273746_Cinnyris_reichenowi",
          "FMNH95912_Cinnyris_reichenowi",
          "FMNH95913_Cinnyris_reichenowi",
          "FMNH95915_Cinnyris_reichenowi",
          "FMNH95916_Cinnyris_reichenowi",
          "KU131883_Cinnyris_reichenowi",
          "KU132209_Cinnyris_reichenowi",
          "KU132234_Cinnyris_reichenowi")

pops=c("regius","regius","regius","regius",
       "regius","regius","regius","regius",
       "regius","regius","reichenowi","reichenowi",
       "reichenowi","reichenowi","genderuensis","genderuensis",
       "preussi","preussi","preussi","preussi",
       "preussi","preussi","preussi","preussi")

# *Cinnyris r. reichenowi*: black `#000000`
# *Cinnyris reichenowi preussi*: blue `#1f2887`
# *Cinnyris reichenowi genderuensis*: red `#e31a1c`
# *Cinnyris reichenowi parvirostris*: light blue `#1f9eff`

kleurs=c("#ffd700","#ffd700","#ffd700","#ffd700",
       "#ffd700","#ffd700","#ffd700","#ffd700",
       "#ffd700","#ffd700","#000000","#000000",
       "#000000","#000000","#e31a1c","#e31a1c",
       "#1f2887","#1f2887","#1f2887","#1f2887",
       "#1f2887","#1f2887","#1f2887","#1f2887")

pca.all2=cbind(samples,pops,kleurs,pca.all.proj)

# *Cinnyris r. reichenowi*: black `#000000`
# *Cinnyris reichenowi preussi*: blue `#1f2887`
# *Cinnyris reichenowi genderuensis*: red `#e31a1c`
# *Cinnyris reichenowi parvirostris*: light blue `#1f9eff`

colorset=c("#000000","#1f2887","#e31a1c","#ffd700")
names(colorset)=c("reichenowi","preussi","genderuensis","regius")
colScale=scale_color_manual(name="grp",values=colorset)

a=ggplot(data=pca.all2,aes(x=V1,y=V2,colour=pops,colour=grp))

## Warning: Duplicated aesthetics after name standardisation: colour

b=geom_point(size=6)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
        axis.title.x = element_text(size=20),
        axis.title.y = element_text(size=20),
        axis.text.x = element_text(size=15),
        axis.text.y = element_text(size=15),
        legend.title = element_blank(),
        legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale

plotx=a+b+c+d+e
print(plotx)

## Too few points to calculate an ellipse

## Warning: Removed 1 row(s) containing missing values (geom_path).

Cinnyris regius appears to be a pretty cohesive group that is greatly skewing the directionality and magnitude of the PCAs.

Two populations almost appears messier, with genderuensis birds being “halfway” between the two populations.

pca.all=pca(noreg.lfmm.900)

## [1] "******************************"
## [1] " Principal Component Analysis "
## [1] "******************************"
## summary of the options:
## 
##         -n (number of individuals)          14
##         -L (number of loci)                 5475
##         -K (number of principal components) 14
##         -x (genotype file)                  /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.lfmm
##         -a (eigenvalue file)                /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.eigenvalues
##         -e (eigenvector file)               /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.eigenvectors
##         -d (standard deviation file)        /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.sdev
##         -p (projection file)                /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.projections
##         -c data centered

summary(pca.all)

## Importance of components:

##                              PC1       PC2        PC3        PC4        PC5
## Standard deviation     9.1052800 6.7773100 6.15467000 5.94936000 5.72978000
## Proportion of Variance 0.1930024 0.1069275 0.08818293 0.08239763 0.07642776
## Cumulative Proportion  0.1930024 0.2999299 0.38811283 0.47051046 0.54693822
##                               PC6        PC7        PC8        PC9       PC10
## Standard deviation     5.45789000 5.23291000 4.97916000 4.96471000 4.79219000
## Proportion of Variance 0.06934645 0.06374739 0.05771483 0.05738027 0.05346165
## Cumulative Proportion  0.61628468 0.68003207 0.73774689 0.79512716 0.84858881
##                              PC11       PC12       PC13 PC14
## Standard deviation     4.71584000 4.67519000 4.57645000    0
## Proportion of Variance 0.05177172 0.05088311 0.04875636    0
## Cumulative Proportion  0.90036053 0.95124364 1.00000000    1

plot(pca.all)

pca.all.proj=as.data.frame(pca.all$projections)

Next (hidden here) assign populations to the PCA data.

samples=c("FMNH358156_Cinnyris_reichenowi",
          "FMNH358157_Cinnyris_reichenowi",
          "FMNH443947_Cinnyris_reichenowi",
          "FMNH481236_Cinnyris_reichenowi",
          "FMNH122395_Cinnyris_genderuensis",
          "FMNH189462_Cinnyris_genderuensis",
          "FMNH273746_Cinnyris_reichenowi",
          "FMNH95912_Cinnyris_reichenowi",
          "FMNH95913_Cinnyris_reichenowi",
          "FMNH95915_Cinnyris_reichenowi",
          "FMNH95916_Cinnyris_reichenowi",
          "KU131883_Cinnyris_reichenowi",
          "KU132209_Cinnyris_reichenowi",
          "KU132234_Cinnyris_reichenowi")

pops=c("reichenowi","reichenowi",
       "reichenowi","reichenowi","genderuensis","genderuensis",
       "preussi","preussi","preussi","preussi",
       "preussi","preussi","preussi","preussi")

pca.all2=cbind(samples,pops,pca.all.proj)

# *Cinnyris r. reichenowi*: black `#000000`
# *Cinnyris reichenowi preussi*: blue `#1f2887`
# *Cinnyris reichenowi genderuensis*: red `#e31a1c`
# *Cinnyris reichenowi parvirostris*: light blue `#1f9eff`

colorset=c("#000000","#1f2887","#e31a1c")
names(colorset)=c("reichenowi","preussi","genderuensis")
colScale=scale_color_manual(name="grp",values=colorset)

a=ggplot(data=pca.all2,aes(x=V1,y=V2,colour=pops,colour=grp))

## Warning: Duplicated aesthetics after name standardisation: colour

b=geom_point(size=6)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
        axis.title.x = element_text(size=20),
        axis.title.y = element_text(size=20),
        axis.text.x = element_text(size=15),
        axis.text.y = element_text(size=15),
        legend.title = element_blank(),
        legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
f1=ylab("PC 2")
f2=xlab("PC 1")

plotx=a+b+c+d+e+f1+f2
print(plotx)

## Too few points to calculate an ellipse

## Warning: Removed 1 row(s) containing missing values (geom_path).

a=ggplot(data=pca.all2,aes(x=V3,y=V2,colour=pops,colour=grp))

## Warning: Duplicated aesthetics after name standardisation: colour

b=geom_point(size=6)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
        axis.title.x = element_text(size=20),
        axis.title.y = element_text(size=20),
        axis.text.x = element_text(size=15),
        axis.text.y = element_text(size=15),
        legend.title = element_blank(),
        legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
f1=ylab("PC 2")
f2=xlab("PC 3")

plotx=a+b+c+d+e+f1+f2
print(plotx)

## Too few points to calculate an ellipse

## Warning: Removed 1 row(s) containing missing values (geom_path).

We can also perform a Tracy-Widom test on these data.

tracy.widom(pca.all)

## [1] "*******************"
## [1] " Tracy-Widom tests "
## [1] "*******************"
## summary of the options:
## 
##         -n (number of eigenvalues)          14
##         -i (input file)                     /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.eigenvalues
##         -o (output file)                    /home/kupeornis/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Sequences/Cinnyris-900-geno_noregius.pca/Cinnyris-900-geno_noregius.tracywidom

##     N eigenvalues  twstats  pvalues       effectn percentage
## 1   1      1161.0  3.15300 0.001273  6.350968e+01    0.19300
## 2   2       643.0  1.07300 0.043920  2.127804e+02    0.10690
## 3   3       530.3 -0.44120 0.262000  3.125249e+02    0.08818
## 4   4       495.5 -0.05650 0.178700  3.772761e+02    0.08240
## 5   5       459.6  0.37390 0.109800  4.958396e+02    0.07643
## 6   6       417.0  0.28940 0.121400  7.485579e+02    0.06935
## 7   7       383.4  0.22090 0.131500  1.197853e+03    0.06375
## 8   8       347.1 -1.55600 0.591300  2.100662e+03    0.05771
## 9   9       345.1 -0.05971 0.179300  2.298622e+03    0.05738
## 10 10       321.5 -0.97400 0.408400  5.468815e+03    0.05346
## 11 11       311.3 -1.38900 0.538300  7.957344e+03    0.05177
## 12 12       306.0 -0.92310 0.393100  8.779880e+03    0.05088
## 13 13       293.2      NaN 1.000000 -1.772436e+15    0.04876

As suspected, the first two principle components are most significant for looking at the data distribution.

There are definitely two populations (east-west); Three populations is the second most-supported outcome, although this is occasionally not as strong or roughly as strong as the single population scenario.

Discriminant Function Analysis of Groups

A DFA analyses (discriminant function analysis) of the groups to determine how well we can statistically identify them using genetic data. We are performing this test on the PCA value outputs from the aforementioned tests in LEA. We are using the statistically significant PC’s, and avoiding using all the PC’s to keep from overfitting the model.

#Perform LDA of genetic data
##Perform on PCA values

lda.x=lda(pops~V1+V2,data=pca.all2,CV=T)

print(lda.x)

## $class
##  [1] reichenowi   reichenowi   reichenowi   reichenowi   preussi     
##  [6] genderuensis preussi      preussi      preussi      preussi     
## [11] preussi      preussi      preussi      preussi     
## Levels: genderuensis preussi reichenowi
## 
## $posterior
##    genderuensis       preussi    reichenowi
## 1  2.313387e-37  1.273404e-54  1.000000e+00
## 2  3.184895e-40  6.877730e-57  1.000000e+00
## 3  3.206855e-73 1.431829e-103  1.000000e+00
## 4  1.825269e-35  1.051479e-51  1.000000e+00
## 5  7.230216e-23  1.000000e+00  7.795312e-69
## 6  1.000000e+00  8.005585e-18 2.916653e-147
## 7  6.791039e-04  9.993209e-01  1.685666e-58
## 8  7.386062e-08  9.999999e-01  3.346142e-54
## 9  1.320292e-07  9.999999e-01  7.824219e-54
## 10 8.317248e-08  9.999999e-01  2.875153e-52
## 11 9.330919e-08  9.999999e-01  1.272547e-55
## 12 7.086640e-07  9.999993e-01  5.379440e-52
## 13 1.683922e-07  9.999998e-01  6.623579e-53
## 14 7.761592e-13  1.000000e+00  8.585138e-69
## 
## $terms
## pops ~ V1 + V2
## attr(,"variables")
## list(pops, V1, V2)
## attr(,"factors")
##      V1 V2
## pops  0  0
## V1    1  0
## V2    0  1
## attr(,"term.labels")
## [1] "V1" "V2"
## attr(,"order")
## [1] 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
## attr(,"predvars")
## list(pops, V1, V2)
## attr(,"dataClasses")
##      pops        V1        V2 
##  "factor" "numeric" "numeric" 
## 
## $call
## lda(formula = pops ~ V1 + V2, data = pca.all2, CV = T)
## 
## $xlevels
## named list()

#Check predictions

ct=table(pca.all2$pops,lda.x$class)

diag(prop.table(ct,1))

## genderuensis      preussi   reichenowi 
##          0.5          1.0          1.0

sum(diag(prop.table(ct)))

## [1] 0.9285714

#Let's try merging two of the SSP's

z2=pca.all2

z2$pops[which(z2$pops=="genderuensis")]="preussi"

lda.x2=lda(pops~V1+V2,data=z2,CV=T)

## Warning in lda.default(x, grouping, ...): group genderuensis is empty

#print(lda.x2)
summary(lda.x2)

##           Length Class  Mode   
## class     14     factor numeric
## posterior 28     -none- numeric
## terms      3     terms  call   
## call       4     -none- call   
## xlevels    0     -none- list

#Check predictions

ct=table(z2$pops,lda.x2$class)
print(ct)

##               
##                genderuensis preussi reichenowi
##   genderuensis            0       0          0
##   preussi                 0      10          0
##   reichenowi              0       0          4

diag(prop.table(ct,1))

## genderuensis      preussi   reichenowi 
##          NaN            1            1

sum(diag(prop.table(ct)))

## [1] 1

We are unable to separate genderuensis from preussi, but we are 100% able to separate east from west.

Morphological Analyses

Note: the databse has been edited to exclude some measurements in which there were errors. These errors were irreversibly biasing the data with respect to bill curvature. Bill curvature indices have been removed from the dataset used here, given their unreliability and difficulty to obtain using handheld calipers. Full notes on the reduction of the data, removal of juvenile birds, etc. can be seen in the rmarkdown file. The data cleaning also involves a PCA of the data using the rda function of \(Vegan\), just like we did for the PCAs of the SNP data.

##       Genus           Species           Subspecies    Collection 
##  Cinnyris:572   regius    : 22   genderuensis: 23   NHMUK  :153  
##                 reichenowi:550   parvirostris: 43   AMNH   :120  
##                                  preussi     :224   ZFMK   : 70  
##                                  regius      : 22   FMNH   : 67  
##                                  reichenowi  :257   MNMH   : 52  
##                                  Unknown     :  3   CM     : 39  
##                                                     (Other): 71  
##          Catalog              Locality               Locality2  
##  1966.16.2433:  2   Mt. Cameroon  : 97   Mt Cameroon      : 97  
##  1966.16.2438:  2   Bioko         : 38   Bamenda Highlands: 47  
##  1966.16.2453:  2   Rwenzori Mts. : 30   Bioko            : 43  
##  1966.16.2461:  2   Mt. Manengouba: 28   Rwenzori Mts     : 43  
##  1966.16.2469:  2   Mt. Oku       : 28   Mt Manengouba    : 31  
##  209805      :  2   Tshibati      : 28   Mt Oku           : 28  
##  (Other)     :560   (Other)       :323   (Other)          :283  
##               Country         Sex            Age      Right.wing.chord
##  Cameroon         :246          :  3           : 48   Min.   :45.00   
##  DRC              : 96   Femae  :  1   Adult   :499   1st Qu.:53.00   
##  Kenya            : 69   Female :168   Immature: 10   Median :55.00   
##  Uganda           : 54   Male   :399   Juvenile: 14   Mean   :54.85   
##  Equatorial Guinea: 43   Unknown:  1   Unknown :  1   3rd Qu.:57.00   
##  Burundi          : 19                                Max.   :63.00   
##  (Other)          : 45                                NA's   :6       
##   Tail.length    X1st.Prim.1st.Secon Culmen.length  
##  Min.   : 8.18   Min.   : 2.520      Min.   :11.95  
##  1st Qu.:36.00   1st Qu.: 5.888      1st Qu.:14.10  
##  Median :40.00   Median : 6.830      Median :15.29  
##  Mean   :39.44   Mean   : 6.874      Mean   :15.59  
##  3rd Qu.:43.00   3rd Qu.: 7.817      3rd Qu.:17.07  
##  Max.   :54.00   Max.   :10.770      Max.   :22.13  
##  NA's   :8       NA's   :80          NA's   :33     
##  Bill.depth..base.of.feathers.on.mandible.
##  Min.   :1.870                            
##  1st Qu.:2.685                            
##  Median :2.870                            
##  Mean   :2.842                            
##  3rd Qu.:3.020                            
##  Max.   :3.550                            
##  NA's   :29                               
##  Bill.width..base.of.feathers.on.maxilla.  Left.Tarsus     Kipp.s.Index   
##  Min.   :2.300                            Min.   : 9.08   Min.   :0.0450  
##  1st Qu.:4.223                            1st Qu.:11.97   1st Qu.:0.1096  
##  Median :4.490                            Median :12.94   Median :0.1260  
##  Mean   :4.475                            Mean   :12.99   Mean   :0.1243  
##  3rd Qu.:4.770                            3rd Qu.:13.85   3rd Qu.:0.1387  
##  Max.   :5.730                            Max.   :18.00   Max.   :0.1814  
##  NA's   :18                               NA's   :17      NA's   :80      
##                    Notes    
##                       :517  
##  Left leg             : 10  
##  Measurements from tag:  5  
##  Bill damaged         :  3  
##  Right leg            :  3  
##  Left tarsus          :  2  
##  (Other)              : 32

#Exclude juvenile birds from the analyses

summary(x$Age)

##             Adult Immature Juvenile  Unknown 
##       48      499       10       14        1

x=x[x$Age=="Adult",]
summary(x$Age)

##             Adult Immature Juvenile  Unknown 
##        0      499        0        0        0

Now, we have a data frame that is only adult individuals. We will be analyzing this as a whole and split up by sex; there appear to be minor differences between sexes, so this is necessary to determine if populations differ in size.

#Fixing a spelling error
x[x$Sex=="Femae",9]="Female"
summary(x$Sex)

##           Femae  Female    Male Unknown 
##       0       0     148     351       0

We will start out by looking at Cinnyris reichenowi. I have already identified specimens to meta-population by locality, assuming that birds in the xeric regions of Cameroon are C. r. genderuensis just like the individuals we sampled. Some birds, mostly those at the eastern edge of the Bamenda Highlands, have been left as ‘unknown’.

colnames(x)

##  [1] "Genus"                                    
##  [2] "Species"                                  
##  [3] "Subspecies"                               
##  [4] "Collection"                               
##  [5] "Catalog"                                  
##  [6] "Locality"                                 
##  [7] "Locality2"                                
##  [8] "Country"                                  
##  [9] "Sex"                                      
## [10] "Age"                                      
## [11] "Right.wing.chord"                         
## [12] "Tail.length"                              
## [13] "X1st.Prim.1st.Secon"                      
## [14] "Culmen.length"                            
## [15] "Bill.depth..base.of.feathers.on.mandible."
## [16] "Bill.width..base.of.feathers.on.maxilla." 
## [17] "Left.Tarsus"                              
## [18] "Kipp.s.Index"                             
## [19] "Notes"

Several of these measurements are repeats of measurements from the tags by past authorities. We can isolate/remove these here:

tag.measurements=x[x$Notes=="Measurements from tag",]
x=x[-(x$Notes=="Measurements from tag"),]

Now I can remove columns that will not be needed for further downstream analyses. Note that I am excluding Kipp’s Index here as it is a covariate of wing length and primary projection; I’m also removing primary projection here as I did not take it for all individuals at each museum.

x2=x[,c("Species","Subspecies","Collection","Catalog","Locality2",
        "Sex","Right.wing.chord","Tail.length",
        "Culmen.length","Bill.depth..base.of.feathers.on.mandible.",
        "Bill.width..base.of.feathers.on.maxilla.","Left.Tarsus")]

I can now subset the data frame into each superspecies. I need to remove NA values from any row to ensure that I am getting the full data for each individual.

#colnames(x2)
x1.1=x2[rowSums(is.na(x2))<1,]
x1.1=unique(x1.1) #Just in case there are repeats

This procedure removed individuals from the dataset.

regius=x1.1[x1.1$Species=="regius",]
reich=x1.1[x1.1$Species=="reichenowi",]

There are 20 individuals of C. regius and 383 individuals of R. reichenowi.

First, a look at C. reichenowi between different areas.

#Perform PCA using VEGAN
rda.x=rda(reich[,7:12],scale=T)

x3=cbind(reich,rda.x$CA$u)

eigs=rda.x$CA$eig

#Calculate eigenvalue contribution

w=NULL
for(i in 1:length(eigs)){
  print(eigs[i]/sum(eigs))
  w[i]=eigs[i]/sum(eigs)
}

##       PC1 
## 0.5390102 
##       PC2 
## 0.1662653 
##       PC3 
## 0.1355852 
##        PC4 
## 0.07707038 
##        PC5 
## 0.04499189 
##        PC6 
## 0.03707703

#View relative eigenvector contributions
#summary(rda.x)
plot(y=(eigs/sum(eigs)),x=1:length(eigs),pch=19,ylab="Contribution",xlab="PCA Variable")

PCA 1 accounts for almost all of the variation within the data, while PCAs 2 and 3 account for 10-20% of the variation.

#Relative strength of each variable to the PC

g=rda.x$CA$v
for(i in 1:3){
  y=sum(abs(g[,i]))
  for(j in 1:nrow(g)){
    print(paste0("For ",row.names(g)[j],": PC",i,": ",signif((g[j,i]/y),3)))
  }
}

## [1] "For Right.wing.chord: PC1: -0.195"
## [1] "For Tail.length: PC1: -0.155"
## [1] "For Culmen.length: PC1: -0.198"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC1: -0.118"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC1: -0.184"
## [1] "For Left.Tarsus: PC1: -0.15"
## [1] "For Right.wing.chord: PC2: 0.0844"
## [1] "For Tail.length: PC2: 0.21"
## [1] "For Culmen.length: PC2: -0.116"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC2: 0.243"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC2: -0.0806"
## [1] "For Left.Tarsus: PC2: -0.266"
## [1] "For Right.wing.chord: PC3: -0.161"
## [1] "For Tail.length: PC3: -0.259"
## [1] "For Culmen.length: PC3: 0.0719"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC3: 0.313"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC3: 0.149"
## [1] "For Left.Tarsus: PC3: -0.0466"

biplot(rda.x)

A biplot of PC1 and PC2 with the average contribution of each variable plotted out. There are not distinct clusters visible in this plot immediately, so data probably overlap and don’t form super distinct clusters.

# *Cinnyris r. reichenowi*: black `#000000`
# *Cinnyris reichenowi preussi*: blue `#1f2887`
# *Cinnyris reichenowi genderuensis*: red `#e31a1c`
# *Cinnyris reichenowi parvirostris*: light blue `#1f9eff`

colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi","genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)

#Plot reichenowi only, flawed loadings
a=ggplot(x3[which(x3$Species=="reichenowi"),],aes(x=PC1,y=PC2,
                                                  colour=Subspecies,colour=grp))

## Warning: Duplicated aesthetics after name standardisation: colour

b=geom_point(size=1.5)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
        axis.title.x = element_text(size=20),
        axis.title.y = element_text(size=20),
        axis.text.x = element_text(size=15),
        axis.text.y = element_text(size=15),
        legend.title = element_blank(),
        legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
f1=ylab("PC 2")
f2=xlab("PC 1")

plotx=a+b+c+d+e+f1+f2
print(plotx)

## Too few points to calculate an ellipse

## Warning: Removed 1 row(s) containing missing values (geom_path).

Let’s find that extreme individual.

x4=x3[order(x3$PC2),]

x4$PC2[1]

## [1] -0.2455511

x4[1,]

##        Species Subspecies Collection      Catalog   Locality2    Sex
## 180 reichenowi    preussi      NHMUK 1966.16.2424 Rumpi Hills Female
##     Right.wing.chord Tail.length Culmen.length
## 180               54        8.18         17.17
##     Bill.depth..base.of.feathers.on.mandible.
## 180                                      2.75
##     Bill.width..base.of.feathers.on.maxilla. Left.Tarsus        PC1        PC2
## 180                                     5.13        14.7 0.04378649 -0.2455511
##           PC3        PC4       PC5        PC6
## 180 0.2287369 0.03257101 0.1939922 -0.2832348

There is one outlying female from Rumpi Hills. We will leave it in for the time being.

x3[x3$PC1>0.015&x3$PC2>-0.015,]

##        Species   Subspecies Collection         Catalog          Locality2
## 26  reichenowi genderuensis       MNMH         1983.62            Adamawa
## 27  reichenowi genderuensis       MNMH         1983.63            Adamawa
## 28  reichenowi genderuensis       MNMH       1994.1401            Adamawa
## 29  reichenowi genderuensis      NHMUK  1922.11.25.214            Genderu
## 30  reichenowi genderuensis      NHMUK  1922.11.25.215            Genderu
## 32  reichenowi genderuensis      NHMUK  1922.11.25.217             Tibati
## 38  reichenowi genderuensis       RMCA      75-3-A-532            Adamawa
## 39  reichenowi genderuensis       RMCA      75-3-A-660            Adamawa
## 60  reichenowi parvirostris      NHMUK 1911.12.23.2759              Bioko
## 64  reichenowi parvirostris      NHMUK   1936.2.21.795              Bioko
## 97  reichenowi      preussi       AMNH          415796  Bamenda Highlands
## 104 reichenowi      preussi       AMNH          688999  Bamenda Highlands
## 122 reichenowi      preussi       MNMH       1994.1406  Bamenda Highlands
## 126 reichenowi      preussi       MNMH       1994.1411        Mt Cameroon
## 127 reichenowi      preussi       MNMH       1994.1412        Mt Cameroon
## 153 reichenowi      preussi      NHMUK 1911.12.23.4349      Mt Manengouba
## 161 reichenowi      preussi      NHMUK  1922.11.25.223  Bamenda Highlands
## 313 reichenowi   reichenowi       AMNH          209805            Mt Meru
## 315 reichenowi   reichenowi       AMNH          263861           Mt Kenya
## 319 reichenowi   reichenowi       AMNH          263865   W of Lake Albert
## 321 reichenowi   reichenowi       AMNH          263866       Rwenzori Mts
## 325 reichenowi   reichenowi       AMNH          263868       Rwenzori Mts
## 335 reichenowi   reichenowi       AMNH          688971            Itombwe
## 337 reichenowi   reichenowi       AMNH          688972            Itombwe
## 339 reichenowi   reichenowi       AMNH          688973            Itombwe
## 341 reichenowi   reichenowi       AMNH          688978            Nyungwe
## 343 reichenowi   reichenowi       AMNH          688980       Idjwa Island
## 345 reichenowi   reichenowi       AMNH          688984       Rwenzori Mts
## 349 reichenowi   reichenowi       AMNH          688992          Marakweta
## 351 reichenowi   reichenowi       AMNH          688994            Buguera
## 355 reichenowi   reichenowi       AMNH          764989           Tshibati
## 357 reichenowi   reichenowi       AMNH          764990           Tshibati
## 359 reichenowi   reichenowi       AMNH          764991     Kivu Highlands
## 361 reichenowi   reichenowi       AMNH          764992           Tshibati
## 363 reichenowi   reichenowi       AMNH          764993           Tshibati
## 365 reichenowi   reichenowi       AMNH          764994           Tshibati
## 367 reichenowi   reichenowi       AMNH          764995           Tshibati
## 369 reichenowi   reichenowi       AMNH          764996           Tshibati
## 371 reichenowi   reichenowi       AMNH          764997              Lwiro
## 375 reichenowi   reichenowi       AMNH          764999           Tshibati
## 377 reichenowi   reichenowi       AMNH          765000           Tshibati
## 383 reichenowi   reichenowi       AMNH          765003           Tshibati
## 387 reichenowi   reichenowi       AMNH          800359           Mt Kenya
## 391 reichenowi   reichenowi       AMNH          827224   Cherangani Hills
## 395 reichenowi   reichenowi       AMNH          827226             Bwindi
## 397 reichenowi   reichenowi       AMNH          827227           Kakamega
## 399 reichenowi   reichenowi       AMNH          827372   Cherangani Hills
## 401 reichenowi   reichenowi       AMNH          827373   Cherangani Hills
## 403 reichenowi   reichenowi       AMNH          827375   Cherangani Hills
## 405 reichenowi   reichenowi       AMNH         6888970            Itombwe
## 407 reichenowi   reichenowi       AMNH         6888987              Mbale
## 411 reichenowi   reichenowi         CM          139823   Cherangani Hills
## 415 reichenowi   reichenowi         CM          145817             Kezizi
## 416 reichenowi   reichenowi         CM          145818             Kezizi
## 419 reichenowi   reichenowi         CM          145967             Kigezi
## 421 reichenowi   reichenowi         CM          145993             Kigezi
## 424 reichenowi   reichenowi         CM          146132             Kigezi
## 427 reichenowi   reichenowi         CM          147670           Mt Kenya
## 430 reichenowi   reichenowi         CM          147923           Mt Kenya
## 432 reichenowi   reichenowi         CM          149092        Nyiro River
## 469 reichenowi   reichenowi       FMNH          356159       Rwenzori Mts
## 473 reichenowi   reichenowi       FMNH          356168       Rwenzori Mts
## 481 reichenowi   reichenowi       FMNH          385271             Bwindi
## 492 reichenowi   reichenowi       FMNH          481233     Kivu Highlands
## 494 reichenowi   reichenowi       MNMH       1936.1663     Kivu Highlands
## 495 reichenowi   reichenowi       MNMH        1988.694            Nyungwe
## 501 reichenowi   reichenowi      NHMUK   1901.2.22.941 Waso Nanyuki River
## 502 reichenowi   reichenowi      NHMUK  1904.11.20.332       Rwenzori Mts
## 503 reichenowi   reichenowi      NHMUK  1906.12.23.682       Rwenzori Mts
## 505 reichenowi   reichenowi      NHMUK  1906.12.23.684       Rwenzori Mts
## 506 reichenowi   reichenowi      NHMUK  1906.12.23.685       Rwenzori Mts
## 507 reichenowi   reichenowi      NHMUK  1906.12.23.686       Rwenzori Mts
## 513 reichenowi   reichenowi      NHMUK  1910.12.26.406           Mt Elgon
## 514 reichenowi   reichenowi      NHMUK    1934.1.17.32             Kigezi
## 515 reichenowi   reichenowi      NHMUK   1935.5.13.169         Kapenguria
## 516 reichenowi   reichenowi      NHMUK    1939.10.1.47        Didinga Mts
## 519 reichenowi   reichenowi      NHMUK   1939.10.12.49        Didinga Mts
## 520 reichenowi   reichenowi      NHMUK    1939.10.2.46        Didinga Mts
## 521 reichenowi   reichenowi      NHMUK   1939.10.3.257        Didinga Mts
## 522 reichenowi   reichenowi      NHMUK   1939.10.3.258        Imatong Mts
## 525 reichenowi   reichenowi      NHMUK    1947.100.303        Didinga Mts
## 529 reichenowi   reichenowi      NHMUK       1976.9.46             Kidepo
## 530 reichenowi   reichenowi       RMCA            2994       Rwenzori Mts
## 531 reichenowi   reichenowi       RMCA            2996       Rwenzori Mts
## 533 reichenowi   reichenowi       RMCA           29581       Rwenzori Mts
## 534 reichenowi   reichenowi       RMCA           29582       Rwenzori Mts
## 535 reichenowi   reichenowi       RMCA           42122              Nioka
## 538 reichenowi   reichenowi       RMCA           42823              Nioka
## 541 reichenowi   reichenowi       RMCA           63068       Idjwa Island
## 542 reichenowi   reichenowi       RMCA           73195              Ituri
## 543 reichenowi   reichenowi       RMCA           73801          Mt Kabobo
## 544 reichenowi   reichenowi       RMCA           74354              Nioka
## 548 reichenowi   reichenowi       RMCA           98858       Rwenzori Mts
## 549 reichenowi   reichenowi       RMCA           98859       Rwenzori Mts
## 550 reichenowi   reichenowi       RMCA           98860       Rwenzori Mts
## 555 reichenowi   reichenowi       RMCA    76-66-A-1183              Ituri
## 556 reichenowi   reichenowi       ZFMK          66.958              Lwiro
## 559 reichenowi   reichenowi       ZFMK          78.183        Imatong Mts
## 560 reichenowi   reichenowi       ZFMK          78.184           Nugishot
## 563 reichenowi   reichenowi       ZFMK         26.8.68              Lwiro
## 564 reichenowi   reichenowi       ZFMK          6.9.68              Lwiro
## 565 reichenowi   reichenowi        ZMB           31769       Angata Anyuk
## 569 reichenowi   reichenowi        ZMB       2000/7984     Kivu Highlands
## 570 reichenowi      Unknown       MNMH        2005.995            Unknown
## 571 reichenowi      Unknown        ZMB       2000/7987            Unknown
##        Sex Right.wing.chord Tail.length Culmen.length
## 26    Male               56          41         15.29
## 27  Female               52          39         13.71
## 28    Male               54          38         14.42
## 29  Female               53          36         12.37
## 30  Female               52          35         14.72
## 32  Female               52          33         14.46
## 38  Female               51          34         14.99
## 39    Male               55          40         14.54
## 60  Female               56          33         15.57
## 64  Female               55          30         15.46
## 97  Female               53          39         15.25
## 104 Female               54          39         15.18
## 122 Female               51          34         15.07
## 126 Female               55          37         16.08
## 127 Female               54          35         15.74
## 153 Female               53          34         14.85
## 161 Female               54          38         15.22
## 313   Male               57          41         13.79
## 315   Male               53          40         13.53
## 319   Male               51          37         14.00
## 321   Male               55          41         14.26
## 325 Female               49          34         14.00
## 335 Female               50          37         12.69
## 337   Male               55          50         14.53
## 339 Female               49          37         13.41
## 341   Male               55          39         14.10
## 343 Female               51          37         12.77
## 345   Male               51          38         16.31
## 349   Male               53          39         15.34
## 351   Male               56          40         15.31
## 355   Male               54          37         14.18
## 357   Male               56          43         13.46
## 359   Male               53          39         14.58
## 361   Male               54          39         13.83
## 363   Male               55          42         13.26
## 365   Male               53          41         14.49
## 367   Male               54          42         13.76
## 369   Male               54          41         13.63
## 371   Male               57          40         15.55
## 375 Female               51          37         12.73
## 377 Female               51          37         13.78
## 383 Female               51          34         12.35
## 387   Male               49          37         13.76
## 391   Male               51          42         15.19
## 395   Male               53          38         12.29
## 397 Female               45          30         11.95
## 399 Female               50          36         13.13
## 401 Female               48          34         12.66
## 403 Female               49          34         14.26
## 405   Male               55          35         15.07
## 407 Female               53          35         13.76
## 411 Female               50          36         12.85
## 415 Female               51          33         12.64
## 416   Male               54          37         14.12
## 419   Male               55          42         14.02
## 421   Male               55          39         15.24
## 424   Male               54          40         14.38
## 427   Male               55          41         13.62
## 430   Male               55          35         14.78
## 432   Male               54          44         13.67
## 469   Male               54          39         14.40
## 473   Male               57          37         13.60
## 481   Male               54          37         13.70
## 492 Female               49          36         13.80
## 494   Male               55          45         13.40
## 495 Female               51          37         14.61
## 501   Male               56          36         13.80
## 502   Male               56          43         13.60
## 503 Female               52          35         13.94
## 505 Female               51          36         13.12
## 506   Male               51          37         12.62
## 507 Female               52          34         13.30
## 513   Male               53          40         15.27
## 514   Male               55          43         13.25
## 515   Male               55          42         13.63
## 516   Male               54          37         16.35
## 519   Male               55          42         13.67
## 520   Male               55          43         13.84
## 521   Male               54          37         13.11
## 522   Male               56          41         13.46
## 525   Male               54          41         13.43
## 529   Male               52          37         13.89
## 530   Male               57          42         12.35
## 531   Male               54          40         14.69
## 533   Male               56          42         13.28
## 534   Male               53          38         14.75
## 535   Male               53          41         13.91
## 538 Female               48          33         13.32
## 541   Male               53          41         14.08
## 542   Male               62          39         13.07
## 543   Male               54          43         14.38
## 544   Male               54          38         13.28
## 548 Female               51          35         13.72
## 549 Female               52          35         16.69
## 550 Female               49          37         15.82
## 555   Male               57          42         14.34
## 556   Male               54          37         14.11
## 559   Male               55          42         14.40
## 560   Male               53          41         14.87
## 563   Male               53          37         14.40
## 564   Male               56          44         14.13
## 565   Male               55          38         13.15
## 569   Male               53          36         13.30
## 570   Male               53          39         14.75
## 571   Male               59          38         15.53
##     Bill.depth..base.of.feathers.on.mandible.
## 26                                       2.81
## 27                                       3.07
## 28                                       2.84
## 29                                       3.05
## 30                                       2.86
## 32                                       2.86
## 38                                       3.02
## 39                                       2.42
## 60                                       2.74
## 64                                       2.87
## 97                                       2.71
## 104                                      3.26
## 122                                      3.22
## 126                                      2.75
## 127                                      3.03
## 153                                      2.95
## 161                                      2.89
## 313                                      2.68
## 315                                      2.96
## 319                                      3.24
## 321                                      3.01
## 325                                      2.86
## 335                                      2.50
## 337                                      2.83
## 339                                      3.19
## 341                                      3.19
## 343                                      3.03
## 345                                      2.98
## 349                                      3.02
## 351                                      2.75
## 355                                      3.10
## 357                                      2.96
## 359                                      2.95
## 361                                      2.78
## 363                                      2.66
## 365                                      2.68
## 367                                      2.95
## 369                                      2.91
## 371                                      2.69
## 375                                      3.00
## 377                                      2.91
## 383                                      2.85
## 387                                      2.89
## 391                                      2.93
## 395                                      3.07
## 397                                      2.36
## 399                                      3.00
## 401                                      2.40
## 403                                      2.72
## 405                                      3.01
## 407                                      3.03
## 411                                      2.75
## 415                                      2.67
## 416                                      2.94
## 419                                      2.78
## 421                                      2.61
## 424                                      2.55
## 427                                      2.88
## 430                                      3.10
## 432                                      2.58
## 469                                      2.80
## 473                                      2.70
## 481                                      3.00
## 492                                      2.90
## 494                                      2.79
## 495                                      2.73
## 501                                      3.00
## 502                                      2.89
## 503                                      2.65
## 505                                      2.71
## 506                                      2.73
## 507                                      2.64
## 513                                      2.64
## 514                                      2.88
## 515                                      2.64
## 516                                      2.88
## 519                                      3.14
## 520                                      2.78
## 521                                      3.00
## 522                                      3.06
## 525                                      2.85
## 529                                      2.66
## 530                                      2.59
## 531                                      2.95
## 533                                      2.83
## 534                                      2.70
## 535                                      2.66
## 538                                      2.72
## 541                                      2.73
## 542                                      2.70
## 543                                      2.68
## 544                                      2.73
## 548                                      2.76
## 549                                      2.69
## 550                                      2.73
## 555                                      2.65
## 556                                      2.95
## 559                                      2.54
## 560                                      2.66
## 563                                      2.87
## 564                                      2.53
## 565                                      2.90
## 569                                      2.63
## 570                                      3.05
## 571                                      2.76
##     Bill.width..base.of.feathers.on.maxilla. Left.Tarsus        PC1
## 26                                      4.14       12.04 0.02236324
## 27                                      4.07       11.37 0.05297342
## 28                                      4.41       12.60 0.02992202
## 29                                      4.20       11.78 0.05923821
## 30                                      4.58       11.48 0.04584284
## 32                                      4.11       12.05 0.06216544
## 38                                      4.36       12.19 0.04558603
## 39                                      3.90       12.01 0.05538448
## 60                                      4.64       11.54 0.03023857
## 64                                      4.39       11.31 0.04696798
## 97                                      4.17       11.60 0.04541680
## 104                                     4.26       11.69 0.01922474
## 122                                     4.30       11.45 0.04578766
## 126                                     4.32       12.38 0.02379220
## 127                                     4.31       12.47 0.02472241
## 153                                     4.36       12.71 0.03690129
## 161                                     4.68       11.08 0.02631948
## 313                                     4.23       11.66 0.03421546
## 315                                     4.48       12.54 0.02989449
## 319                                     4.36       12.08 0.03900387
## 321                                     4.52       11.57 0.01908089
## 325                                     4.62       11.49 0.06410763
## 335                                     4.19       11.79 0.08599225
## 337                                     4.24       10.54 0.01919945
## 339                                     4.69       10.20 0.05811253
## 341                                     4.42       10.97 0.02650605
## 343                                     4.40        9.31 0.07642985
## 345                                     4.59       11.17 0.02917456
## 349                                     4.34       11.68 0.02808066
## 351                                     4.26       12.98 0.01542789
## 355                                     4.45       11.66 0.03126261
## 357                                     4.20       10.83 0.03373876
## 359                                     4.35       13.52 0.02097537
## 361                                     4.34       10.68 0.05159113
## 363                                     4.72       11.89 0.02763410
## 365                                     4.65       11.14 0.03641685
## 367                                     4.31       11.96 0.02975364
## 369                                     4.06       11.62 0.04480316
## 371                                     4.29       11.68 0.02133437
## 375                                     4.24       10.91 0.06977300
## 377                                     3.56       11.66 0.07994134
## 383                                     4.05       11.86 0.08296241
## 387                                     4.28       11.85 0.06527326
## 391                                     3.91       12.22 0.04205651
## 395                                     3.34       12.76 0.07278846
## 397                                     4.03        9.08 0.15929372
## 399                                     3.75       11.11 0.08650405
## 401                                     3.77       10.49 0.12796967
## 403                                     3.95       11.23 0.08963101
## 405                                     3.60       10.48 0.06400951
## 407                                     3.97       11.79 0.05903901
## 411                                     3.90       11.41 0.09031260
## 415                                     4.38       10.57 0.08979685
## 416                                     4.27       11.93 0.04061737
## 419                                     4.59       11.62 0.02401169
## 421                                     4.53       12.25 0.02480529
## 424                                     4.63       11.62 0.03686825
## 427                                     4.21       13.40 0.02303710
## 430                                     4.27       12.51 0.02620806
## 432                                     4.00       13.29 0.03743505
## 469                                     4.00       12.70 0.04082998
## 473                                     4.50       11.50 0.03730844
## 481                                     4.10       13.40 0.03494256
## 492                                     3.70       13.50 0.07137493
## 494                                     4.47       11.04 0.02951462
## 495                                     3.99       10.77 0.07421793
## 501                                     4.10       12.95 0.03219719
## 502                                     3.70       12.65 0.03576305
## 503                                     4.51       11.58 0.06022923
## 505                                     4.28       11.20 0.07585168
## 506                                     4.70       12.25 0.05521205
## 507                                     3.93       11.90 0.08271196
## 513                                     4.42       11.97 0.03478034
## 514                                     3.64       11.90 0.05052740
## 515                                     4.33       11.28 0.04243097
## 516                                     4.05       12.41 0.02924647
## 519                                     4.42       11.79 0.01778978
## 520                                     4.30       13.32 0.01815173
## 521                                     4.38       12.50 0.03796464
## 522                                     4.52       12.04 0.01541466
## 525                                     4.23       11.66 0.04285980
## 529                                     4.07       12.36 0.06271089
## 530                                     3.91       12.17 0.05121811
## 531                                     4.49       10.34 0.03517406
## 533                                     4.04       13.26 0.02726227
## 534                                     4.43       12.60 0.03578372
## 535                                     4.02       10.99 0.06178135
## 538                                     3.72       11.66 0.10638587
## 541                                     4.24       12.64 0.03809676
## 542                                     4.08       12.25 0.02335066
## 543                                     3.66       14.21 0.03404139
## 544                                     4.03       11.42 0.06320018
## 548                                     4.00       11.81 0.07567819
## 549                                     4.38       10.85 0.04854887
## 550                                     4.09       11.09 0.06771225
## 555                                     4.37       12.36 0.01899963
## 556                                     4.40       11.53 0.03959565
## 559                                     4.45       12.96 0.02311407
## 560                                     4.40       12.99 0.02706940
## 563                                     4.42       12.89 0.03270303
## 564                                     4.36       12.34 0.02451381
## 565                                     4.46       12.19 0.03490331
## 569                                     4.30       10.90 0.07116609
## 570                                     4.23       13.74 0.01812063
## 571                                     3.97       12.30 0.02050084
##               PC2           PC3           PC4           PC5          PC6
## 26   3.951923e-02 -0.0396659133 -0.0030441180  0.0533115450  0.014830398
## 27   7.738691e-02  0.0321893539 -0.0335407769 -0.0089921406  0.028578935
## 28   8.360993e-03  0.0026744328 -0.0053328923 -0.0168496572 -0.025084162
## 29   5.884064e-02  0.0414500638 -0.0402287342 -0.0313431542 -0.079119471
## 30   9.957146e-03  0.0519180321  0.0553214908 -0.0056527061 -0.006267652
## 32  -1.900921e-03  0.0416131939 -0.0194277638  0.0479643547 -0.009918828
## 38   7.049582e-03  0.0792743377 -0.0165453389  0.0064309208  0.020955805
## 39  -4.008815e-03 -0.1043666603  0.0201346566  0.0601006224  0.018220910
## 60  -1.233662e-02  0.0293700245  0.0847350918  0.0836292066 -0.081613287
## 64  -3.653228e-03  0.0667190851  0.0476329960  0.1263514198 -0.078784221
## 97   1.847553e-02 -0.0217847029  0.0295876689  0.0356581048  0.057395289
## 104  8.361657e-02  0.0653171846 -0.0399050278  0.0290220529  0.022861706
## 122  4.941998e-02  0.1129263080 -0.0218332205  0.0295904148  0.030659806
## 126 -9.376833e-03 -0.0060582041  0.0179854053  0.0693557370  0.010300213
## 127  1.124309e-02  0.0567731936 -0.0276089211  0.0598875921 -0.003553123
## 153 -7.012620e-03  0.0533635823 -0.0267274915  0.0193188539 -0.027932472
## 161  3.889385e-02  0.0329809083  0.0812457665  0.0060264923 -0.003916388
## 313  4.448666e-02 -0.0680585778  0.0294583973  0.0228509282 -0.059676622
## 315  3.663567e-02  0.0133542245 -0.0191623144 -0.0817479271 -0.020829847
## 319  5.891441e-02  0.0902999200 -0.0503847475 -0.0481193811  0.014818963
## 321  7.068351e-02  0.0124763465  0.0213846157 -0.0268547300 -0.022803212
## 325  6.274241e-05  0.0744766041  0.0529968456 -0.0633718509  0.017799594
## 335 -1.267386e-02 -0.0368148235  0.0346350305 -0.0652365007  0.009460134
## 337  1.255188e-01 -0.0853752338  0.0545136708 -0.0409750731  0.102095074
## 339  8.883406e-02  0.1135508340  0.0596698658 -0.0906756612  0.028370534
## 341  9.934255e-02  0.0532060470  0.0079526063  0.0089310148 -0.039580322
## 343  1.075682e-01  0.0642706614  0.0806499884 -0.0266305116 -0.009281598
## 345  3.322303e-02  0.0657640876  0.0614760101  0.0040834789  0.099784750
## 349  4.931139e-02  0.0362123408  0.0029966326  0.0131668519  0.046410361
## 351  1.837907e-03 -0.0413933370 -0.0172153209  0.0289451498 -0.007773092
## 355  5.744958e-02  0.0567447473 -0.0022243414 -0.0022706196 -0.041849768
## 357  1.085377e-01 -0.0296131786  0.0159622867  0.0006397415 -0.025910298
## 359  2.249371e-03  0.0135654533 -0.0620617902 -0.0483900050  0.007223584
## 361  5.793627e-02 -0.0113346000  0.0645086273  0.0074855722 -0.017488689
## 363  2.903474e-02 -0.0489104537  0.0722055779 -0.0911615654 -0.058981379
## 365  3.112644e-02 -0.0211833688  0.0982670943 -0.0503657218  0.024407674
## 367  6.483538e-02 -0.0116047719 -0.0126074804 -0.0464031347  0.001144876
## 369  6.933552e-02 -0.0211734759 -0.0221441293 -0.0040355461  0.004973339
## 371  2.611183e-02 -0.0487946201  0.0464019798  0.0743253820 -0.012216054
## 375  6.969303e-02  0.0451909776  0.0056074125 -0.0427212693 -0.011543909
## 377  4.870799e-02  0.0026051498 -0.0780823063  0.0528037666  0.062948511
## 383  2.014325e-02  0.0272398453 -0.0328175628 -0.0235319927 -0.047856273
## 387  2.019562e-02  0.0411931103 -0.0047365826 -0.0642557485  0.057498939
## 391  4.640949e-02 -0.0093721357 -0.0553775838 -0.0043750806  0.135834776
## 395  6.814386e-02 -0.0111050290 -0.1760774087  0.0295960450 -0.015666977
## 397 -7.391643e-03  0.0191283000  0.1343300655 -0.0148750377  0.047081187
## 399  6.483753e-02  0.0375983411 -0.0533530862  0.0176457988  0.042067013
## 401 -5.718155e-03 -0.0334811322  0.0516806313  0.0115417375  0.055828684
## 403  2.526138e-03  0.0249406496  0.0105362688  0.0304874780  0.071629631
## 405  7.965271e-02  0.0225919496 -0.0306544093  0.1812898287  0.010641807
## 407  4.617797e-02  0.0415588771 -0.0531769606  0.0444436717 -0.026183504
## 411  2.637397e-02  0.0007046988 -0.0157991312 -0.0119747539  0.023340865
## 415  1.462434e-02  0.0261518169  0.0800144895 -0.0161289769 -0.057461997
## 416  3.626482e-02  0.0213718633 -0.0106589064  0.0132273527 -0.033081768
## 419  4.735293e-02 -0.0302424614  0.0569923706 -0.0494777845 -0.023715566
## 421 -1.120430e-02 -0.0369379590  0.0590018038  0.0071249207 -0.009928880
## 424  2.909995e-03 -0.0447503998  0.0956503755 -0.0376494487 -0.010704090
## 427  2.293036e-02 -0.0332331323 -0.0693080727 -0.0462158653 -0.037284217
## 430  2.923890e-02  0.0560240187 -0.0493517527  0.0492322922 -0.056022801
## 432  6.896127e-03 -0.1054935535 -0.0476792652 -0.0543933416  0.031452910
## 469  1.572445e-02 -0.0290754741 -0.0490645623  0.0206798770  0.009287034
## 473  2.490237e-02 -0.0256404392  0.0622018830  0.0237053589 -0.121798795
## 481  1.519145e-02  0.0153362053 -0.0968900143 -0.0106988296 -0.047938014
## 492 -1.059154e-02  0.0174756953 -0.1325241052 -0.0209904039  0.073459226
## 494  8.528737e-02 -0.0548694644  0.0595730163 -0.0669966112 -0.004208112
## 495  3.289226e-02 -0.0002961257  0.0338911700  0.0464773474  0.073089765
## 501  2.564365e-02  0.0131471115 -0.0776240039  0.0378466446 -0.092327648
## 502  6.690823e-02 -0.0700962866 -0.0980732762  0.0257056008 -0.001923039
## 503 -1.051120e-02  0.0105533494  0.0673815371 -0.0196366869 -0.027757784
## 505  1.908324e-02  0.0070787646  0.0419117257 -0.0307406206 -0.009544685
## 506 -4.042086e-03  0.0146880695  0.0412779692 -0.1262588973 -0.050614682
## 507 -7.187594e-03 -0.0137303178 -0.0113858931  0.0328289901 -0.026077592
## 513  8.551080e-04 -0.0308006602  0.0524026270 -0.0093605435  0.050593279
## 514  8.453101e-02 -0.0668801782 -0.0774918499  0.0253038340  0.015445327
## 515  4.721669e-02 -0.0646724641  0.0581003075 -0.0210979437 -0.017443775
## 516  6.716529e-03  0.0103309022 -0.0294207185  0.0934506403  0.055547851
## 519  9.271568e-02  0.0187091071 -0.0203513939 -0.0466987590 -0.030029676
## 520  1.985842e-02 -0.0580697356 -0.0415379311 -0.0653831610 -0.013856185
## 521  3.459253e-02  0.0287007720 -0.0357438237 -0.0415151327 -0.079645804
## 522  7.401375e-02  0.0090500409 -0.0085750641 -0.0466641192 -0.074968792
## 525  5.882945e-02 -0.0246934453  0.0018866239 -0.0301416718 -0.012248411
## 529 -7.190881e-03 -0.0243903315 -0.0130891655 -0.0032396209  0.012142193
## 530  4.415586e-02 -0.1124933570 -0.0230788516  0.0008391465 -0.082738174
## 531  8.226693e-02  0.0215033941  0.0762544348  0.0110534831  0.013786347
## 533  3.460916e-02 -0.0624104481 -0.0777998329 -0.0274049917 -0.047748814
## 534 -1.406719e-02 -0.0119759175  0.0179668764 -0.0219778955  0.007074690
## 535  5.041973e-02 -0.0538192131  0.0335646050  0.0111293486  0.044552537
## 538 -4.088040e-03  0.0213594224 -0.0380110149  0.0167850716  0.060801574
## 541  1.340387e-02 -0.0393468582 -0.0140180662 -0.0460975952  0.024692975
## 542  4.500933e-02 -0.0923230996 -0.0142350636  0.0929730215 -0.202270563
## 543 -5.796381e-03 -0.0977749625 -0.1288973417 -0.0057251191  0.060046790
## 544  3.981867e-02 -0.0318620455  0.0054399123  0.0219541310 -0.031963347
## 548  7.229131e-03  0.0101006248 -0.0146923537  0.0148165151  0.014124961
## 549 -4.932343e-03  0.0274901355  0.0946943637  0.0896404112  0.080824649
## 550  8.741610e-03  0.0198233943  0.0399629027  0.0351037762  0.147911104
## 555  2.248631e-02 -0.0743060994  0.0257046032 -0.0014397733 -0.043854330
## 556  4.417442e-02  0.0303604047  0.0175003494  0.0059846229 -0.038826452
## 559 -1.292251e-02 -0.0795947479  0.0260405854 -0.0481920156 -0.008741425
## 560 -1.248343e-02 -0.0416797597  0.0052974824 -0.0492245941  0.040540276
## 563 -3.179696e-03  0.0193075707 -0.0199029483 -0.0299960991 -0.018075325
## 564  1.799686e-02 -0.1032399027  0.0394360319 -0.0357772855 -0.008881934
## 565  3.607650e-02  0.0047377076 -0.0007163052 -0.0373124650 -0.090728387
## 569  2.064519e-02 -0.0144839254  0.0689686269  0.0065726163 -0.042739632
## 570  1.045695e-02  0.0244905338 -0.0963030079 -0.0350429253  0.018103488
## 571  2.214197e-02 -0.0518247427 -0.0207337086  0.1425268189 -0.058927636

It appears that ‘small’ C. r. preussi may be mostly females; at this point, we decided to split everything up by sex for this analysis.

m.reich=reich[reich$Sex=="Male",]
f.reich=reich[reich$Sex=="Female",]

There are 263 males and 120 females. This bias is likely (in part) due to the difficulty of identifying and aging female Cinnyris sunbirds.

Male Sunbirds

Recalculate PCA values for males only.

summary(m.reich)

##        Species           Subspecies    Collection    Catalog   
##  regius    :  0   genderuensis: 13   NHMUK  :95   139779 :  1  
##  reichenowi:263   parvirostris: 27   ZFMK   :49   145115 :  1  
##                   preussi     :116   AMNH   :40   145818 :  1  
##                   regius      :  0   MNMH   :24   145819 :  1  
##                   reichenowi  :104   RMCA   :19   145882 :  1  
##                   Unknown     :  3   CM     :16   145967 :  1  
##                                      (Other):20   (Other):257  
##              Locality2        Sex      Right.wing.chord  Tail.length   
##  Mt Cameroon      : 40          :  0   Min.   :49.00    Min.   :35.00  
##  Bioko            : 27   Femae  :  0   1st Qu.:55.00    1st Qu.:40.00  
##  Bamenda Highlands: 25   Female :  0   Median :57.00    Median :42.00  
##  Rwenzori Mts     : 20   Male   :263   Mean   :56.78    Mean   :41.78  
##  Mt Manengouba    : 19   Unknown:  0   3rd Qu.:59.00    3rd Qu.:44.00  
##  Mt Oku           : 19                 Max.   :63.00    Max.   :52.00  
##  (Other)          :113                                                 
##  Culmen.length   Bill.depth..base.of.feathers.on.mandible.
##  Min.   :12.29   Min.   :1.870                            
##  1st Qu.:14.48   1st Qu.:2.775                            
##  Median :16.26   Median :2.920                            
##  Mean   :16.15   Mean   :2.903                            
##  3rd Qu.:17.59   3rd Qu.:3.050                            
##  Max.   :22.13   Max.   :3.550                            
##                                                           
##  Bill.width..base.of.feathers.on.maxilla.  Left.Tarsus   
##  Min.   :3.340                            Min.   :10.34  
##  1st Qu.:4.360                            1st Qu.:12.30  
##  Median :4.610                            Median :13.29  
##  Mean   :4.632                            Mean   :13.23  
##  3rd Qu.:4.920                            3rd Qu.:14.12  
##  Max.   :5.730                            Max.   :16.49  
##

We can now do PCAs for these data.

##       PC1 
## 0.5299903 
##       PC2 
## 0.1582757 
##       PC3 
## 0.1357479 
##        PC4 
## 0.07996781 
##        PC5 
## 0.06152048 
##        PC6 
## 0.03449792

Unsurprisingly, the results for PCA contribution for only males is almost identical to the whole dataset.

## [1] "For Right.wing.chord: PC1: 0.189"
## [1] "For Tail.length: PC1: 0.149"
## [1] "For Culmen.length: PC1: 0.209"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC1: 0.114"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC1: 0.19"
## [1] "For Left.Tarsus: PC1: 0.15"
## [1] "For Right.wing.chord: PC2: 0.0238"
## [1] "For Tail.length: PC2: 0.171"
## [1] "For Culmen.length: PC2: 0.0825"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC2: 0.367"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC2: 0.0338"
## [1] "For Left.Tarsus: PC2: 0.322"
## [1] "For Right.wing.chord: PC3: 0.171"
## [1] "For Tail.length: PC3: 0.312"
## [1] "For Culmen.length: PC3: 0.0752"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC3: 0.239"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC3: 0.143"
## [1] "For Left.Tarsus: PC3: 0.0597"

biplot(rda.x)

Again, the biplot and contributions are similar for all individuals.

#Plot reichenowi only, flawed loadings
a=ggplot(x3[which(x3$Species=="reichenowi"),],aes(x=PC1,y=PC2,colour=Subspecies))
b=geom_point()
c=theme_classic()
d=stat_ellipse()

print(a+b+c+d)

## Too few points to calculate an ellipse

## Warning: Removed 1 row(s) containing missing values (geom_path).

colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi","genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)

#Plot reichenowi only, males
a=ggplot(x3[which(x3$Species=="reichenowi"),],aes(x=PC1,y=PC2,
                                                  colour=Subspecies,colour=grp))

## Warning: Duplicated aesthetics after name standardisation: colour

b=geom_point(size=1.5)
c=theme(panel.background = element_rect(fill="white",color = "grey50"),
        axis.title.x = element_text(size=20),
        axis.title.y = element_text(size=20),
        axis.text.x = element_text(size=15),
        axis.text.y = element_text(size=15),
        legend.title = element_blank(),
        legend.text = element_text(size=15))
d=stat_ellipse()
e=colScale
f1=ylab("PC 2")
f2=xlab("PC 1")

plotx=a+b+c+d+e+f1+f2
print(plotx)

## Too few points to calculate an ellipse

## Warning: Removed 1 row(s) containing missing values (geom_path).

x99=x3[x3$Subspecies=="genderuensis",]
x99[order(x99$PC1),c(2:6,13)]

##      Subspecies Collection        Catalog Locality2  Sex          PC1
## 34 genderuensis       RMCA     75-3-A-438   Adamawa Male -0.006837937
## 25 genderuensis       MNMH       1971.637   Yaounde Male  0.002249162
## 35 genderuensis       RMCA     75-3-A-451   Adamawa Male  0.016450666
## 44 genderuensis        ZMB          75/80   Yaounde Male  0.019065758
## 33 genderuensis      NHMUK    1940.2.8.63    Tibati Male  0.021588830
## 31 genderuensis      NHMUK 1922.11.25.216    Tibati Male  0.022491253
## 45 genderuensis        ZMB          75/99   Adamawa Male  0.028710801
## 37 genderuensis       RMCA     75-3-A-522   Adamawa Male  0.036313941
## 42 genderuensis        ZMB         49/252   Genderu Male  0.039492812
## 26 genderuensis       MNMH        1983.62   Adamawa Male  0.050241655
## 28 genderuensis       MNMH      1994.1401   Adamawa Male  0.064582894
## 41 genderuensis       RMCA     75-3-A-727   Adamawa Male  0.072250196
## 39 genderuensis       RMCA     75-3-A-660   Adamawa Male  0.092838521

x3[x3$Subspecies=="Unknown",1:5]

##        Species Subspecies Collection   Catalog        Locality2
## 570 reichenowi    Unknown       MNMH  2005.995          Unknown
## 571 reichenowi    Unknown        ZMB 2000/7987          Unknown
## 572 reichenowi    Unknown        ZMB     75/79 Bangwa Highlands

The most extreme C. r. genderuensis individual is a bird collected at Tello, Cameroon (RMCA 75-3-A-438), as part of a larger series at the RMCA. There are several birds that appear to be at the “edge” of C. r. preussi morphometric space.

Interestingly, all three “Unknown” individuals are towards the genderuensis side of the spectrum. But what about the C. r. preussi that are towards the extreme?

x3[x3$Subspecies=="preussi"&x3$PC1>0,c(1:5,13)]

##        Species Subspecies Collection         Catalog         Locality2
## 93  reichenowi    preussi       AMNH          415793 Bamenda Highlands
## 101 reichenowi    preussi       AMNH          688997 Bamenda Highlands
## 103 reichenowi    preussi       AMNH          688998 Bamenda Highlands
## 121 reichenowi    preussi       MNMH       1994.1405 Bamenda Highlands
## 149 reichenowi    preussi      NHMUK 1911.12.23.4230       Mt Cameroon
## 152 reichenowi    preussi      NHMUK 1911.12.23.4348     Mt Manengouba
## 193 reichenowi    preussi      NHMUK    1966.16.2439     Mt Manengouba
## 194 reichenowi    preussi      NHMUK    1966.16.2440     Mt Manengouba
## 297 reichenowi    preussi        ZMB           75/82 Bamenda Highlands
##             PC1
## 93  0.013220157
## 101 0.025077182
## 103 0.008237055
## 121 0.006620631
## 149 0.008033713
## 152 0.010611211
## 193 0.013690054
## 194 0.026974751
## 297 0.004422322

Interestingly, some of the most extreme individuals for preussi come from the Manengouba area. These may require a little more research. The other birds that are “intermediate” are from the edge of the Bamenda highlands, and may be intermediate birds or be misallocated to subspecies. However, we are leaving the assignations as is based on locality data.

We can also look by character and visualize how different specific characters are for these species.

colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
ssps.x=c("reichenowi","preussi","genderuensis","parvirostris","Unknown")
names(colorset)=c("reichenowi","preussi","genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)

for(i in 7:12){
  ave=colnames(m.reich)[i]
  
  print(paste0("Working: ",ave))
  
  for(k in 1:(length(ssps.x)-1)){
    ssp1=ssps.x[k]
    ssp2=ssps.x[k+1]
    
    ssp1.x=m.reich[which(m.reich$Subspecies==ssp1),ave]
    ssp2.x=m.reich[which(m.reich$Subspecies==ssp2),ave]
    
    mu1=mean(ssp1.x)
    mu2=mean(ssp2.x)
    
    sd1=sd(ssp1.x)
    sd2=sd(ssp2.x)
    
    n1=length(ssp1.x)
    n2=length(ssp2.x)
    
    print(paste0("Summary stats: ",
                 ssp1," vs. ",ssp2))
    print(paste0(ssp1,": ","Avg: ",round(mu1,2)," SD: ",round(sd1,2)," #: ",n1))
    print(paste0(ssp2,": ","Avg: ",round(mu2,2)," SD: ",round(sd2,2)," #: ",n2))
    
    percent.diff=round(abs(((mu1/mu2)*100)-100),2)
    
    print(paste0("Difference: ",percent.diff,"%"))
  }
  
  a=ggplot(m.reich,aes(y=m.reich[,i],x=Subspecies))
  b=geom_boxplot()
  c=theme_classic()
  d=ylab(print(ave))
  e=scale_color_manual(values=colorset,aesthetics = c("fill"))

  print(a+b+c+d+e)
}

## [1] "Working: Right.wing.chord"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 54.47 SD: 1.93 #: 104"
## [1] "preussi: Avg: 58.68 SD: 1.6 #: 116"
## [1] "Difference: 7.17%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 58.68 SD: 1.6 #: 116"
## [1] "genderuensis: Avg: 56 SD: 1.96 #: 13"
## [1] "Difference: 4.79%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 56 SD: 1.96 #: 13"
## [1] "parvirostris: Avg: 58 SD: 2.4 #: 27"
## [1] "Difference: 3.45%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 58 SD: 2.4 #: 27"
## [1] "Unknown: Avg: 55.67 SD: 3.06 #: 3"
## [1] "Difference: 4.19%"
## [1] "Right.wing.chord"

## [1] "Working: Tail.length"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 40.05 SD: 2.77 #: 104"
## [1] "preussi: Avg: 43.38 SD: 2.59 #: 116"
## [1] "Difference: 7.68%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 43.38 SD: 2.59 #: 116"
## [1] "genderuensis: Avg: 40.92 SD: 2.1 #: 13"
## [1] "Difference: 6%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 40.92 SD: 2.1 #: 13"
## [1] "parvirostris: Avg: 42.33 SD: 3.13 #: 27"
## [1] "Difference: 3.33%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 42.33 SD: 3.13 #: 27"
## [1] "Unknown: Avg: 39 SD: 1 #: 3"
## [1] "Difference: 8.55%"
## [1] "Tail.length"

## [1] "Working: Culmen.length"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 14.27 SD: 0.81 #: 104"
## [1] "preussi: Avg: 17.78 SD: 1.21 #: 116"
## [1] "Difference: 19.72%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 17.78 SD: 1.21 #: 116"
## [1] "genderuensis: Avg: 15.31 SD: 0.72 #: 13"
## [1] "Difference: 16.1%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 15.31 SD: 0.72 #: 13"
## [1] "parvirostris: Avg: 16.86 SD: 0.63 #: 27"
## [1] "Difference: 9.18%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 16.86 SD: 0.63 #: 27"
## [1] "Unknown: Avg: 15.03 SD: 0.43 #: 3"
## [1] "Difference: 12.17%"
## [1] "Culmen.length"

## [1] "Working: Bill.depth..base.of.feathers.on.mandible."
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 2.83 SD: 0.22 #: 104"
## [1] "preussi: Avg: 2.98 SD: 0.19 #: 116"
## [1] "Difference: 5.3%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 2.98 SD: 0.19 #: 116"
## [1] "genderuensis: Avg: 2.71 SD: 0.31 #: 13"
## [1] "Difference: 10.1%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 2.71 SD: 0.31 #: 13"
## [1] "parvirostris: Avg: 2.96 SD: 0.16 #: 27"
## [1] "Difference: 8.34%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 2.96 SD: 0.16 #: 27"
## [1] "Unknown: Avg: 2.79 SD: 0.24 #: 3"
## [1] "Difference: 5.87%"
## [1] "Bill.depth..base.of.feathers.on.mandible."

## [1] "Working: Bill.width..base.of.feathers.on.maxilla."
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 4.33 SD: 0.28 #: 104"
## [1] "preussi: Avg: 4.89 SD: 0.34 #: 116"
## [1] "Difference: 11.53%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 4.89 SD: 0.34 #: 116"
## [1] "genderuensis: Avg: 4.39 SD: 0.22 #: 13"
## [1] "Difference: 11.41%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 4.39 SD: 0.22 #: 13"
## [1] "parvirostris: Avg: 4.86 SD: 0.24 #: 27"
## [1] "Difference: 9.78%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 4.86 SD: 0.24 #: 27"
## [1] "Unknown: Avg: 4.25 SD: 0.29 #: 3"
## [1] "Difference: 14.45%"
## [1] "Bill.width..base.of.feathers.on.maxilla."

## [1] "Working: Left.Tarsus"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 12.38 SD: 0.91 #: 104"
## [1] "preussi: Avg: 13.82 SD: 1.14 #: 116"
## [1] "Difference: 10.4%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 13.82 SD: 1.14 #: 116"
## [1] "genderuensis: Avg: 13 SD: 0.97 #: 13"
## [1] "Difference: 6.29%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 13 SD: 0.97 #: 13"
## [1] "parvirostris: Avg: 14.14 SD: 1 #: 27"
## [1] "Difference: 8.07%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 14.14 SD: 1 #: 27"
## [1] "Unknown: Avg: 12.93 SD: 0.74 #: 3"
## [1] "Difference: 9.41%"
## [1] "Left.Tarsus"

It looks like the most extreme divergences (in male sunbirds) are for bill length and bill width, which makes sense as montane Cameroonian birds seem “big billed” in the hand.

a=ggplot(m.reich,aes(x=Culmen.length,y=Bill.width..base.of.feathers.on.maxilla.,colour=Subspecies))
b=geom_point()
c=theme_classic()
d=stat_ellipse()

print(a+b+c+d)

## Too few points to calculate an ellipse

## Warning: Removed 1 row(s) containing missing values (geom_path).

Bill information separates out east from west extremely well except for the intermediary birds of C. r. genderuensis and a few extreme individuals.

We can perform iterative Wilcoxon rank-sum tests of the data to understand how distinct these individual variables are for each population.

#colnames(x6)
morphocols=7:12

wilcox.sunbird=function(input,ssp1,ssp2,morphocols){
  #Define groups
  w1=input[which(input$Subspecies==ssp1),]
  w2=input[which(input$Subspecies==ssp2),]
  
  print(paste0("COMPARISONS OF: ",ssp1, " & ",ssp2))
    
  for(i in morphocols){
    print(paste0("For ",colnames(input[i]),":"))
    
    #Test each character
    ##Define vector
    a=w1[,i]
    b=w2[,i]
    
    #perform test of normality
    ##Null hypothesis is from normal distribution
    
    a.shapiro=shapiro.test(a)
    if(a.shapiro$p.value>0.05){
      print(paste0("For ",ssp1,": failure to reject normality."))}else{
        print(paste0("For ",ssp1,": NON NORMAL."))
      }
    b.shapiro=shapiro.test(b)
    if(b.shapiro$p.value>0.05){
      print(paste0("For ",ssp2,": failure to reject normality."))}else{
        print(paste0("For ",ssp2,": NON NORMAL."))
      }
    
    #Wilcoxon test
    
    a.b.wilco=wilcox.test(x=a,y=b)
    print(a.b.wilco)
  }
}

We can now run this test function across the data set.

#Subspecies:
##genderuensis
##preussi
##parvirostris
##reichenowi

wilcox.sunbird(input=m.reich,ssp1="genderuensis",ssp2="preussi",morphocols=morphocols)

## [1] "COMPARISONS OF: genderuensis & preussi"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 187.5, p-value = 6.919e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 349, p-value = 0.001428
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 57.5, p-value = 5.165e-08
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: NON NORMAL."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 300, p-value = 0.0003862
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 144, p-value = 1.849e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 422.5, p-value = 0.009604
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=m.reich,ssp1="genderuensis",ssp2="reichenowi",morphocols=morphocols)

## [1] "COMPARISONS OF: genderuensis & reichenowi"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 993, p-value = 0.005256
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 827, p-value = 0.1885
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 1129.5, p-value = 8.53e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: NON NORMAL."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 543.5, p-value = 0.2522
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 748, p-value = 0.5351
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 922, p-value = 0.03323
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=m.reich,ssp1="genderuensis",ssp2="parvirostris",morphocols=morphocols)

## [1] "COMPARISONS OF: genderuensis & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 76, p-value = 0.003804
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 131, p-value = 0.1999
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 16, p-value = 4.394e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 72, p-value = 0.002918
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 27.5, p-value = 2.043e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 70, p-value = 0.002428
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=m.reich,ssp1="preussi",ssp2="reichenowi",morphocols=morphocols)

## [1] "COMPARISONS OF: preussi & reichenowi"
## [1] "For Right.wing.chord:"
## [1] "For preussi: NON NORMAL."
## [1] "For reichenowi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 11486, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 9792, p-value = 1.097e-15
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 11982, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 8364, p-value = 7.527e-07
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 11002, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 10136, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=m.reich,ssp1="preussi",ssp2="parvirostris",morphocols=morphocols)

## [1] "COMPARISONS OF: preussi & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For preussi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 1876, p-value = 0.1044
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 1888, p-value = 0.09505
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 2341.5, p-value = 6.395e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 1708, p-value = 0.4653
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 1577, p-value = 0.9568
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 1292, p-value = 0.1583
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=m.reich,ssp1="reichenowi",ssp2="parvirostris",morphocols=morphocols)

## [1] "COMPARISONS OF: reichenowi & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 320, p-value = 4.59e-10
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 831.5, p-value = 0.001063
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 25, p-value = 4.372e-15
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 942, p-value = 0.008626
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 178, p-value = 3.085e-12
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 277, p-value = 1.456e-10
## alternative hypothesis: true location shift is not equal to 0

We can perform an RDA to see how predictable these subspecies are.

#head(x6)

x7=x3[x3$Subspecies!='Unknown',]

x7$Subspecies[which(x7$Subspecies=="parvirostris")]="preussi"

#Remove 'ghost' groups
x7$Subspecies=as.character(x7$Subspecies)
x7$Subspecies=as.factor(x7$Subspecies)

lda.x2=lda(Subspecies~PC1+PC2+PC3,data=x7,CV=T)

#print(lda.x2)
summary(lda.x2)

##           Length Class  Mode   
## class     260    factor numeric
## posterior 780    -none- numeric
## terms       3    terms  call   
## call        4    -none- call   
## xlevels     0    -none- list

#Check predictions

ct=table(x7$Subspecies,lda.x2$class)
print(ct)

##               
##                genderuensis preussi reichenowi
##   genderuensis            1       2         10
##   preussi                 0     136          7
##   reichenowi              0       2        102

diag(prop.table(ct,1))

## genderuensis      preussi   reichenowi 
##   0.07692308   0.95104895   0.98076923

sum(diag(prop.table(ct)))

## [1] 0.9192308

We can also do a test of only genderuensis and preussi.

#head(x6)

x7=x3[x3$Subspecies!='Unknown',]
xy=x7[x7$Subspecies!='reichenowi',]
xy$Subspecies[which(xy$Subspecies=="parvirostris")]="preussi"

#Remove 'ghost' groups
xy$Subspecies=as.character(xy$Subspecies)
xy$Subspecies=as.factor(xy$Subspecies)

lda.x2=lda(Subspecies~PC1+PC2+PC3,data=xy,CV=T)

#print(lda.x2)
summary(lda.x2)

##           Length Class  Mode   
## class     156    factor numeric
## posterior 312    -none- numeric
## terms       3    terms  call   
## call        4    -none- call   
## xlevels     0    -none- list

#Check predictions

ct=table(xy$Subspecies,lda.x2$class)
print(ct)

##               
##                genderuensis preussi
##   genderuensis            7       6
##   preussi                 0     143

diag(prop.table(ct,1))

## genderuensis      preussi 
##    0.5384615    1.0000000

sum(diag(prop.table(ct)))

## [1] 0.9615385

The tests are 100% successful for preussi, but only ~50% successful for genderuensis. This may be related to limited representation for genderuensis.

#summary(xy)

xypreuss=xy[xy$Subspecies=='preussi',]
xygend=xy[xy$Subspecies=='genderuensis',]

jack=as.data.frame(matrix(nrow=100,ncol=3))
colnames(jack)=c('PREUSS','GEND','SUM')

for(i in 1:1000){
  rows=sample(nrow(xypreuss),nrow(xygend))
  r.x=xypreuss[rows,]
  
  new.x=rbind(r.x,xygend)
  lda.x2=lda(Subspecies~PC1+PC2+PC3,data=new.x,CV=T)

  ct=table(new.x$Subspecies,lda.x2$class)
  
  x.tab=diag(prop.table(ct,1))
  jack[i,2]=x.tab[1]
  jack[i,1]=x.tab[2]
  jack[i,3]=sum(diag(prop.table(ct)))
}

summary(jack)

##      PREUSS            GEND             SUM        
##  Min.   :0.6154   Min.   :0.6923   Min.   :0.6923  
##  1st Qu.:0.8462   1st Qu.:0.8462   1st Qu.:0.8462  
##  Median :0.8462   Median :0.9231   Median :0.8846  
##  Mean   :0.8695   Mean   :0.9146   Mean   :0.8920  
##  3rd Qu.:0.9231   3rd Qu.:1.0000   3rd Qu.:0.9231  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000

j.p=cbind(jack[,1],"preuss")
j.g=cbind(jack[,2],"gend")
j.s=cbind(jack[,3],"sum")

jack2=rbind(j.p,j.g)

jack2=as.data.frame(jack2)
colnames(jack2)=c("Value","Population")
jack2[,1]=as.numeric(as.character(jack2[,1]))
jack2[,2]=as.factor(jack2[,2])

summary(jack2[jack2$Population=='preuss',])

##      Value         Population  
##  Min.   :0.6154   gend  :   0  
##  1st Qu.:0.8462   preuss:1000  
##  Median :0.8462                
##  Mean   :0.8695                
##  3rd Qu.:0.9231                
##  Max.   :1.0000

summary(jack2[jack2$Population=='gend',])

##      Value         Population  
##  Min.   :0.6923   gend  :1000  
##  1st Qu.:0.8462   preuss:   0  
##  Median :0.9231                
##  Mean   :0.9146                
##  3rd Qu.:1.0000                
##  Max.   :1.0000

On average, we are correctly identifying 86.5% of preussi and 91.2% of genderuensis. This is pretty indicative that these groups are separating.

More informative metric would be the sum correct, shown below.

j.s=as.data.frame(j.s)

colnames(j.s)=c("Value","Population")
j.s[,1]=as.numeric(as.character(j.s[,1]))
j.s[,2]=as.factor(j.s[,2])

summary(j.s)

##      Value        Population
##  Min.   :0.6923   sum:1000  
##  1st Qu.:0.8462             
##  Median :0.8846             
##  Mean   :0.8920             
##  3rd Qu.:0.9231             
##  Max.   :1.0000

As shown above, the reduced dataset of the same size of genderuensis does improve the performance of recovering the two separate groups. On average, if we have just a few birds, we can identify which group they belong to with 88.8% accuracy.

We can also look at the accuracy of diagnosing only two separate groups, for the east and the west.

x8=x7[which(x7$Subspecies!='genderuensis'),]

#Remove 'ghost' group
x8$Subspecies=as.character(x8$Subspecies)
x8$Subspecies=as.factor(x8$Subspecies)

lda.x2=lda(Subspecies~PC1+PC2+PC3,data=x8,CV=T)

#print(lda.x2)
summary(lda.x2)

##           Length Class  Mode   
## class     247    factor numeric
## posterior 741    -none- numeric
## terms       3    terms  call   
## call        4    -none- call   
## xlevels     0    -none- list

#Check predictions

ct=table(x8$Subspecies,lda.x2$class)
print(ct)

##               
##                parvirostris preussi reichenowi
##   parvirostris            0      26          1
##   preussi                 0     109          7
##   reichenowi              0       2        102

diag(prop.table(ct,1))

## parvirostris      preussi   reichenowi 
##    0.0000000    0.9396552    0.9807692

sum(diag(prop.table(ct)))

## [1] 0.854251

Removing genderuensis, we have over 95% confidence in separating out these two populations based on morphological characters.

Female Sunbirds

summary(f.reich)

##        Species           Subspecies   Collection        Catalog   
##  regius    :  0   genderuensis: 7   NHMUK  :48   118562     :  1  
##  reichenowi:120   parvirostris:10   AMNH   :19   139823     :  1  
##                   preussi     :61   ZFMK   :16   145815     :  1  
##                   regius      : 0   MNMH   :14   145817     :  1  
##                   reichenowi  :42   RMCA   :12   146100     :  1  
##                   Unknown     : 0   CM     : 5   1887.3.7.35:  1  
##                                     (Other): 6   (Other)    :114  
##              Locality2       Sex      Right.wing.chord  Tail.length   
##  Mt Cameroon      :29          :  0   Min.   :45.00    Min.   : 8.18  
##  Bamenda Highlands:12   Femae  :  0   1st Qu.:51.00    1st Qu.:33.00  
##  Bioko            :10   Female :120   Median :52.00    Median :35.00  
##  Rwenzori Mts     :10   Male   :  0   Mean   :52.38    Mean   :35.03  
##  Mt Manengouba    : 7   Unknown:  0   3rd Qu.:54.25    3rd Qu.:37.00  
##  Mt Oku           : 6                 Max.   :57.00    Max.   :43.00  
##  (Other)          :46                                                 
##  Culmen.length   Bill.depth..base.of.feathers.on.mandible.
##  Min.   :11.95   Min.   :2.200                            
##  1st Qu.:13.89   1st Qu.:2.658                            
##  Median :15.48   Median :2.780                            
##  Mean   :15.29   Mean   :2.805                            
##  3rd Qu.:16.54   3rd Qu.:2.950                            
##  Max.   :19.74   Max.   :3.510                            
##                                                           
##  Bill.width..base.of.feathers.on.maxilla.  Left.Tarsus   
##  Min.   :3.200                            Min.   : 9.08  
##  1st Qu.:4.228                            1st Qu.:11.80  
##  Median :4.460                            Median :12.77  
##  Mean   :4.432                            Mean   :12.71  
##  3rd Qu.:4.680                            3rd Qu.:13.57  
##  Max.   :5.320                            Max.   :15.43  
##

This section is repeating the above but for only adult female sunbirds.

##       PC1 
## 0.4556222 
##       PC2 
## 0.1982091 
##       PC3 
## 0.1423943 
##        PC4 
## 0.08265578 
##        PC5 
## 0.06668558 
##        PC6 
## 0.05443294

Unsurprisingly, the results for PCA contribution for only females is almost identical to the whole dataset.

## [1] "For Right.wing.chord: PC1: 0.216"
## [1] "For Tail.length: PC1: 0.0813"
## [1] "For Culmen.length: PC1: 0.223"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC1: 0.106"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC1: 0.201"
## [1] "For Left.Tarsus: PC1: 0.172"
## [1] "For Right.wing.chord: PC2: -0.066"
## [1] "For Tail.length: PC2: -0.302"
## [1] "For Culmen.length: PC2: 0.079"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC2: -0.27"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC2: 0.0458"
## [1] "For Left.Tarsus: PC2: 0.237"
## [1] "For Right.wing.chord: PC3: 0.0627"
## [1] "For Tail.length: PC3: 0.35"
## [1] "For Culmen.length: PC3: 0.0216"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC3: -0.341"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC3: -0.132"
## [1] "For Left.Tarsus: PC3: 0.093"

biplot(rda.x)

Again, the biplot and contributions are similar for all individuals.

#Plot reichenowi only, flawed loadings
a=ggplot(x3[which(x3$Species=="reichenowi"),],aes(x=PC1,y=PC2,colour=Subspecies))
b=geom_point()
c=theme_classic()
d=stat_ellipse()

print(a+b+c+d)

We can also look at individual boxplots of the data to see how they behave for females.

for(i in 7:12){
  ave=colnames(f.reich)[i]
  
    print(paste0("Working: ",ave))
  
  for(k in 1:(length(ssps.x)-1)){
    ssp1=ssps.x[k]
    ssp2=ssps.x[k+1]
    
    ssp1.x=f.reich[which(f.reich$Subspecies==ssp1),ave]
    ssp2.x=f.reich[which(f.reich$Subspecies==ssp2),ave]
    
    mu1=mean(ssp1.x)
    mu2=mean(ssp2.x)
    
    sd1=sd(ssp1.x)
    sd2=sd(ssp2.x)
    
    n1=length(ssp1.x)
    n2=length(ssp2.x)
    
    print(paste0("Summary stats: ",
                 ssp1," vs. ",ssp2))
    print(paste0(ssp1,": ","Avg: ",round(mu1,2)," SD: ",round(sd1,2)," #: ",n1))
    print(paste0(ssp2,": ","Avg: ",round(mu2,2)," SD: ",round(sd2,2)," #: ",n2))
    
    percent.diff=round(abs(((mu1/mu2)*100)-100),2)
    
    print(paste0("Difference: ",percent.diff,"%"))
  }

  a=ggplot(f.reich,aes(y=f.reich[,i],x=Subspecies))
  b=geom_boxplot()
  c=theme_classic()
  d=ylab(print(ave))

  print(a+b+c+d)
}

## [1] "Working: Right.wing.chord"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 49.9 SD: 2.16 #: 42"
## [1] "preussi: Avg: 53.89 SD: 1.64 #: 61"
## [1] "Difference: 7.39%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 53.89 SD: 1.64 #: 61"
## [1] "genderuensis: Avg: 51.71 SD: 0.95 #: 7"
## [1] "Difference: 4.2%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 51.71 SD: 0.95 #: 7"
## [1] "parvirostris: Avg: 54 SD: 1.83 #: 10"
## [1] "Difference: 4.23%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 54 SD: 1.83 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Right.wing.chord"

## [1] "Working: Tail.length"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 34.19 SD: 2.65 #: 42"
## [1] "preussi: Avg: 35.92 SD: 4.36 #: 61"
## [1] "Difference: 4.82%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 35.92 SD: 4.36 #: 61"
## [1] "genderuensis: Avg: 34.71 SD: 2.29 #: 7"
## [1] "Difference: 3.48%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 34.71 SD: 2.29 #: 7"
## [1] "parvirostris: Avg: 33.4 SD: 2.07 #: 10"
## [1] "Difference: 3.93%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 33.4 SD: 2.07 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Tail.length"

## [1] "Working: Culmen.length"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 13.72 SD: 1.01 #: 42"
## [1] "preussi: Avg: 16.4 SD: 1.08 #: 61"
## [1] "Difference: 16.36%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 16.4 SD: 1.08 #: 61"
## [1] "genderuensis: Avg: 14.21 SD: 0.92 #: 7"
## [1] "Difference: 15.44%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 14.21 SD: 0.92 #: 7"
## [1] "parvirostris: Avg: 15.79 SD: 0.63 #: 10"
## [1] "Difference: 10.02%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 15.79 SD: 0.63 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Culmen.length"

## [1] "Working: Bill.depth..base.of.feathers.on.mandible."
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 2.7 SD: 0.22 #: 42"
## [1] "preussi: Avg: 2.87 SD: 0.22 #: 61"
## [1] "Difference: 5.75%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 2.87 SD: 0.22 #: 61"
## [1] "genderuensis: Avg: 2.83 SD: 0.25 #: 7"
## [1] "Difference: 1.24%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 2.83 SD: 0.25 #: 7"
## [1] "parvirostris: Avg: 2.81 SD: 0.22 #: 10"
## [1] "Difference: 0.79%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 2.81 SD: 0.22 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Bill.depth..base.of.feathers.on.mandible."

## [1] "Working: Bill.width..base.of.feathers.on.maxilla."
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 4.15 SD: 0.38 #: 42"
## [1] "preussi: Avg: 4.64 SD: 0.3 #: 61"
## [1] "Difference: 10.5%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 4.64 SD: 0.3 #: 61"
## [1] "genderuensis: Avg: 4.26 SD: 0.2 #: 7"
## [1] "Difference: 8.94%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 4.26 SD: 0.2 #: 7"
## [1] "parvirostris: Avg: 4.5 SD: 0.2 #: 10"
## [1] "Difference: 5.39%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 4.5 SD: 0.2 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Bill.width..base.of.feathers.on.maxilla."

## [1] "Working: Left.Tarsus"
## [1] "Summary stats: reichenowi vs. preussi"
## [1] "reichenowi: Avg: 11.88 SD: 1.14 #: 42"
## [1] "preussi: Avg: 13.4 SD: 1.04 #: 61"
## [1] "Difference: 11.32%"
## [1] "Summary stats: preussi vs. genderuensis"
## [1] "preussi: Avg: 13.4 SD: 1.04 #: 61"
## [1] "genderuensis: Avg: 11.81 SD: 0.3 #: 7"
## [1] "Difference: 13.42%"
## [1] "Summary stats: genderuensis vs. parvirostris"
## [1] "genderuensis: Avg: 11.81 SD: 0.3 #: 7"
## [1] "parvirostris: Avg: 12.67 SD: 0.89 #: 10"
## [1] "Difference: 6.74%"
## [1] "Summary stats: parvirostris vs. Unknown"
## [1] "parvirostris: Avg: 12.67 SD: 0.89 #: 10"
## [1] "Unknown: Avg: NaN SD: NA #: 0"
## [1] "Difference: NaN%"
## [1] "Left.Tarsus"

It looks like the most extreme divergences (in female sunbirds) are for bill length and bill width, but there is also more variation for wing than there is for males. (Or so it appears to the naked eye).

a=ggplot(f.reich,aes(x=Culmen.length,y=Bill.width..base.of.feathers.on.maxilla.,colour=Subspecies))
b=geom_point()
c=theme_classic()
d=stat_ellipse()

print(a+b+c+d)

Bill information for the females is not as drastic as for males.

We can perform iterative Wilcoxon rank-sum tests of the data to understand how distinct these individual variables are for each population.

#Subspecies:
##genderuensis
##preussi
##parvirostris
##reichenowi

wilcox.sunbird(input=f.reich,ssp1="genderuensis",ssp2="preussi",morphocols=morphocols)

## [1] "COMPARISONS OF: genderuensis & preussi"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 65.5, p-value = 0.00247
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 131.5, p-value = 0.09778
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 13, p-value = 5.426e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 214, p-value = 1
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 63.5, p-value = 0.002547
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For preussi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 33.5, p-value = 0.0002916
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=f.reich,ssp1="genderuensis",ssp2="reichenowi",morphocols=morphocols)

## [1] "COMPARISONS OF: genderuensis & reichenowi"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 241, p-value = 0.006668
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 161.5, p-value = 0.6868
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 207.5, p-value = 0.08641
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 200.5, p-value = 0.1297
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 173.5, p-value = 0.4575
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 143, p-value = 0.9203
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=f.reich,ssp1="genderuensis",ssp2="parvirostris",morphocols=morphocols)

## [1] "COMPARISONS OF: genderuensis & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 11.5, p-value = 0.02102
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 43.5, p-value = 0.4275
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test
## 
## data:  a and b
## W = 3, p-value = 0.0007199
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 38, p-value = 0.8067
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 13.5, p-value = 0.0403
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For genderuensis: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test
## 
## data:  a and b
## W = 17, p-value = 0.08782
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=f.reich,ssp1="preussi",ssp2="reichenowi",morphocols=morphocols)

## [1] "COMPARISONS OF: preussi & reichenowi"
## [1] "For Right.wing.chord:"
## [1] "For preussi: NON NORMAL."
## [1] "For reichenowi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 2394, p-value = 5.359e-14
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For preussi: NON NORMAL."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 1848, p-value = 0.0001306
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 2450, p-value = 4.432e-15
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 1777.5, p-value = 0.0008702
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 2131.5, p-value = 1.164e-08
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For preussi: failure to reject normality."
## [1] "For reichenowi: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 2137, p-value = 9.391e-09
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=f.reich,ssp1="preussi",ssp2="parvirostris",morphocols=morphocols)

## [1] "COMPARISONS OF: preussi & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For preussi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 288.5, p-value = 0.7875
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For preussi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 493, p-value = 0.001815
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 408, p-value = 0.09019
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 370, p-value = 0.2861
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 367, p-value = 0.3093
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For preussi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 422.5, p-value = 0.05312
## alternative hypothesis: true location shift is not equal to 0

wilcox.sunbird(input=f.reich,ssp1="reichenowi",ssp2="parvirostris",morphocols=morphocols)

## [1] "COMPARISONS OF: reichenowi & parvirostris"
## [1] "For Right.wing.chord:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 28, p-value = 2.051e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Tail.length:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 246, p-value = 0.4055
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Culmen.length:"
## [1] "For reichenowi: NON NORMAL."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 31, p-value = 3.397e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.depth..base.of.feathers.on.mandible.:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 172.5, p-value = 0.3901
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Bill.width..base.of.feathers.on.maxilla.:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: NON NORMAL."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 81, p-value = 0.002843
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "For Left.Tarsus:"
## [1] "For reichenowi: failure to reject normality."
## [1] "For parvirostris: failure to reject normality."

## Warning in wilcox.test.default(x = a, y = b): cannot compute exact p-value with
## ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 125, p-value = 0.04975
## alternative hypothesis: true location shift is not equal to 0

We can perform an RDA to see how predictable these subspecies are.

#head(x6)

x7=x3[x3$Subspecies!='Unknown',]

x7$Subspecies[which(x7$Subspecies=="parvirostris")]="preussi"

#Remove 'ghost' groups
x7$Subspecies=as.character(x7$Subspecies)
x7$Subspecies=as.factor(x7$Subspecies)

lda.x2=lda(Subspecies~PC1+PC2+PC3,data=x7,CV=T)

#print(lda.x2)
summary(lda.x2)

##           Length Class  Mode   
## class     120    factor numeric
## posterior 360    -none- numeric
## terms       3    terms  call   
## call        4    -none- call   
## xlevels     0    -none- list

#Check predictions

ct=table(x7$Subspecies,lda.x2$class)
print(ct)

##               
##                genderuensis preussi reichenowi
##   genderuensis            0       2          5
##   preussi                 0      68          3
##   reichenowi              0       4         38

diag(prop.table(ct,1))

## genderuensis      preussi   reichenowi 
##    0.0000000    0.9577465    0.9047619

sum(diag(prop.table(ct)))

## [1] 0.8833333

Similar to the males, genderuensis get lost in the variation of the other two (fairly well defined) populations.

x8=x7[which(x7$Subspecies!='genderuensis'),]

#Remove 'ghost' group
x8$Subspecies=as.character(x8$Subspecies)
x8$Subspecies=as.factor(x8$Subspecies)

lda.x2=lda(Subspecies~PC1+PC2+PC3,data=x8,CV=T)

#print(lda.x2)
summary(lda.x2)

##           Length Class  Mode   
## class     113    factor numeric
## posterior 226    -none- numeric
## terms       3    terms  call   
## call        4    -none- call   
## xlevels     0    -none- list

#Check predictions

ct=table(x8$Subspecies,lda.x2$class)
print(ct)

##             
##              preussi reichenowi
##   preussi         68          3
##   reichenowi       4         38

diag(prop.table(ct,1))

##    preussi reichenowi 
##  0.9577465  0.9047619

sum(diag(prop.table(ct)))

## [1] 0.9380531

Removing genderuensis, we have over 90% confidence in separating out these two populations based on morphological characters.

Now for the random sample part.

We can also do a test of only genderuensis and preussi.

#head(x6)

x7=x3[x3$Subspecies!='Unknown',]
xy=x7[x7$Subspecies!='reichenowi',]
xy$Subspecies[which(xy$Subspecies=="parvirostris")]="preussi"

#Remove 'ghost' groups
xy$Subspecies=as.character(xy$Subspecies)
xy$Subspecies=as.factor(xy$Subspecies)

lda.x2=lda(Subspecies~PC1+PC2+PC3,data=xy,CV=T)

#print(lda.x2)
summary(lda.x2)

##           Length Class  Mode   
## class      78    factor numeric
## posterior 156    -none- numeric
## terms       3    terms  call   
## call        4    -none- call   
## xlevels     0    -none- list

#Check predictions

ct=table(xy$Subspecies,lda.x2$class)
print(ct)

##               
##                genderuensis preussi
##   genderuensis            5       2
##   preussi                 1      70

diag(prop.table(ct,1))

## genderuensis      preussi 
##    0.7142857    0.9859155

sum(diag(prop.table(ct)))

## [1] 0.9615385

The tests are 100% successful for preussi, but only ~50% successful for genderuensis. This may be related to limited representation for genderuensis.

#summary(xy)

xypreuss=xy[xy$Subspecies=='preussi',]
xygend=xy[xy$Subspecies=='genderuensis',]

jack=as.data.frame(matrix(nrow=100,ncol=3))
colnames(jack)=c('PREUSS','GEND','SUM')

for(i in 1:1000){
  rows=sample(nrow(xypreuss),nrow(xygend))
  r.x=xypreuss[rows,]
  
  new.x=rbind(r.x,xygend)
  lda.x2=lda(Subspecies~PC1+PC2+PC3,data=new.x,CV=T)

  ct=table(new.x$Subspecies,lda.x2$class)
  
  x.tab=diag(prop.table(ct,1))
  jack[i,2]=x.tab[1]
  jack[i,1]=x.tab[2]
  jack[i,3]=sum(diag(prop.table(ct)))
}

summary(jack)

##      PREUSS            GEND             SUM        
##  Min.   :0.5714   Min.   :0.7143   Min.   :0.6429  
##  1st Qu.:0.7143   1st Qu.:1.0000   1st Qu.:0.8571  
##  Median :0.8571   Median :1.0000   Median :0.9286  
##  Mean   :0.8573   Mean   :0.9823   Mean   :0.9198  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000

j.p=cbind(jack[,1],"preuss")
j.g=cbind(jack[,2],"gend")
j.s=cbind(jack[,3],"sum")

jack2=rbind(j.p,j.g)

jack2=as.data.frame(jack2)
colnames(jack2)=c("Value","Population")
jack2[,1]=as.numeric(as.character(jack2[,1]))
jack2[,2]=as.factor(jack2[,2])

summary(jack2[jack2$Population=='preuss',])

##      Value         Population  
##  Min.   :0.5714   gend  :   0  
##  1st Qu.:0.7143   preuss:1000  
##  Median :0.8571                
##  Mean   :0.8573                
##  3rd Qu.:1.0000                
##  Max.   :1.0000

summary(jack2[jack2$Population=='gend',])

##      Value         Population  
##  Min.   :0.7143   gend  :1000  
##  1st Qu.:1.0000   preuss:   0  
##  Median :1.0000                
##  Mean   :0.9823                
##  3rd Qu.:1.0000                
##  Max.   :1.0000

Cinnyris regius

We do not have very many individuals of regius:

##           Femae  Female    Male Unknown 
##       0       0       3      17       0

There are three females and seventeen males in the dataset; thus we will look only at males.

##       PC1 
## 0.3911925 
##       PC2 
## 0.2027945 
##       PC3 
## 0.1903053 
##       PC4 
## 0.1255034 
##        PC5 
## 0.06243406 
##        PC6 
## 0.02777019

The PCA variation is surprisingly similar to the equivalent plot for reichenowi.

## [1] "For Right.wing.chord: PC1: -0.208"
## [1] "For Tail.length: PC1: -0.19"
## [1] "For Culmen.length: PC1: 0.0964"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC1: 0.221"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC1: -0.194"
## [1] "For Left.Tarsus: PC1: -0.0905"
## [1] "For Right.wing.chord: PC2: 0.0744"
## [1] "For Tail.length: PC2: -0.0699"
## [1] "For Culmen.length: PC2: 0.294"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC2: 0.105"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC2: 0.0784"
## [1] "For Left.Tarsus: PC2: 0.378"
## [1] "For Right.wing.chord: PC3: -0.208"
## [1] "For Tail.length: PC3: 0.103"
## [1] "For Culmen.length: PC3: 0.236"
## [1] "For Bill.depth..base.of.feathers.on.mandible.: PC3: -0.0684"
## [1] "For Bill.width..base.of.feathers.on.maxilla.: PC3: 0.233"
## [1] "For Left.Tarsus: PC3: -0.152"

biplot(rda.x)

Again, the biplot and contributions are similar for all individuals.

a=ggplot(x3,aes(x=PC1,y=PC2,colour=Locality2))
b=geom_point()
c=theme_classic()
d=stat_ellipse()

print(a+b+c+d)

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Warning: Removed 2 row(s) containing missing values (geom_path).

There is a lot of morphological overlap between these groups, which is not wholly surprising given that we lack a lot of data and we don’t have robust representation for each mountain range.

for(i in 7:12){
  ave=colnames(regius)[i]
  a=ggplot(regius,aes(y=regius[,i],x=Locality2))
  b=geom_boxplot()
  c=theme_classic()
  d=ylab(print(ave))

  print(a+b+c+d)
}

## [1] "Right.wing.chord"

## [1] "Tail.length"

## [1] "Culmen.length"

## [1] "Bill.depth..base.of.feathers.on.mandible."

## [1] "Bill.width..base.of.feathers.on.maxilla."

## [1] "Left.Tarsus"

The bird from the Rwenzori mountains appears to have a much bigger bill, but we lack data to conclusively see any divergences between this population and others.

Ecological Analyses

Now, to look at the ecology of these populations.

rm(list=ls())

x=read.delim("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Ecological Analysis/ebd_ndcsun2_relAug-2018/ebd_ndcsun2_relAug-2018.txt")

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
## EOF within quoted string

#colnames(x)[c(6,8,9,13,15,26:28,35,36)]
x2=x[-which(x$EFFORT.DISTANCE.KM>=25),
     c("SCIENTIFIC.NAME","SUBSPECIES.SCIENTIFIC.NAME",
       "OBSERVATION.COUNT","COUNTRY",
       "STATE","LATITUDE",
       "LONGITUDE","OBSERVATION.DATE",
       "DURATION.MINUTES","EFFORT.DISTANCE.KM")]

x2=unique(x2)
print(paste0("Removed based on distance (uniques): ",nrow(x)-nrow(x2)))

## [1] "Removed based on distance (uniques): 218"

print(paste0("Records remaining (uniques): ",nrow(x2)))

## [1] "Records remaining (uniques): 1005"

rm(x)
x2$SOURCE="eBird"

After removing long distances (over 20 km), we have 1005 records left. These are only unique records.

plot(y=x2$LATITUDE,x=x2$LONGITUDE,pch=19,asp=1)

The above is a 1x1 aspect ratio map of the occurrence points; it is immediately obvious that we have the Albertine Rift population, the Kenyan population, and the spread out West African population. (Strangely, it appears as though north Uganda/Sudan birds may be removed).

First, we need to subset these points into the groups that we have observed through the genetic data: genderuensis, preussi, and reichenowi. This is based off of genetic data and the understanding that Bamenda Highlands birds appear to be closes to preussi.

#colnames(x)

x=x2
rm(x2)

#set scientific name to character
x$SUBSPECIES.SCIENTIFIC.NAME=as.character(x$SUBSPECIES.SCIENTIFIC.NAME)

#set all western to preussi
x$SUBSPECIES.SCIENTIFIC.NAME[which(x$LONGITUDE<20)]="preussi"

#set all eastern to reichenowi
x$SUBSPECIES.SCIENTIFIC.NAME[which(x$LONGITUDE>20)]="reichenowi"

#single out genderuensis from preussi
x$SUBSPECIES.SCIENTIFIC.NAME[which(x$LATITUDE<4&x$LATITUDE>3&x$LONGITUDE>10)]="genderuensis"
x$SUBSPECIES.SCIENTIFIC.NAME[which(x$LATITUDE>5&x$LONGITUDE>12&x$LONGITUDE<20)]="genderuensis"

x$SUBSPECIES.SCIENTIFIC.NAME=as.factor(x$SUBSPECIES.SCIENTIFIC.NAME)
summary(x$SUBSPECIES.SCIENTIFIC.NAME)

## genderuensis      preussi   reichenowi 
##            9          169          827

Adding specimen data from sparse regions

eBird data has excellent georeferencing, but there are still many areas no one has submitted eBird data from. Thus, I am merging some specimen georeferencing into the below database. I am indebted to Pascal Eckhoff and Sylke Franhert for making data available regarding Riggenbach for this section.

z=read.csv("~/Dropbox/Manuscripts/Phylogeography-Cinnyris-reichenowi/Georeference_Cinnyris_Specimens.csv")
z1.5=z[,-c(2:6)]
z1.5=unique(z1.5)
z2=z1.5[,c(1,3,2)]
#z2=unique(z[,c("Subspecies","Long","Lat")])
colnames(z2)=c("SUBSPECIES.SCIENTIFIC.NAME",
       "LONGITUDE","LATITUDE")

z2$SUBSPECIES.SCIENTIFIC.NAME=as.factor(as.character(z2$SUBSPECIES.SCIENTIFIC.NAME))

z2$SOURCE="Specimen"

Extracting environmental variables

Next, I will reduce to unique localities and run a rarefication to remove spatial bias. This code was provided by Dr. Joe Manthey. We are using 30 arcsecond grid cells so we will reduce the data so that all points are at least 3 km from the nearest point. All populations are spatially separated enough to ignore subspecific assignment and run the rarefy on the entire dataset here; while it is possible that some overlap may exist between genderuensis and preussi, the contact zone has been severely deforested and lacks eBird observations.

dist.test=function(point1long,point1lat,point2long,point2lat){
    dist.rep=deg.dist(point1long,point1lat,point2long,point2lat)
    return(dist.rep)
}

x2=x[,c("SUBSPECIES.SCIENTIFIC.NAME",
       "LONGITUDE","LATITUDE","SOURCE")]
x2=rbind(x2,z2)
x2=unique(x2)
x.n=nrow(x2)

output=x2[1,]
test.point=x2[1,]
x2=x2[2:nrow(x),]

keep_going=T

while(keep_going==T){
  kg_test=dist.test(x2[,2],x2[,3],test.point[1,2],test.point[1,3])
  x2=x2[kg_test>3,]
  if(nrow(x2)>1){
    output=rbind(output,x2[1,])
    test.point=x2[1,]
    x2=x2[2:nrow(x2),]
    #writeLines(paste("Points remaining:", nrow(x2)))
  }else{
    keep_going=F
  }
  if(nrow(x2)==1){
    output=rbind(output,x2[1,])
  }
}

output=na.omit(output)

summary(output)

##  SUBSPECIES.SCIENTIFIC.NAME   LONGITUDE        LATITUDE      
##  genderuensis:10            Min.   : 8.50   Min.   :-5.0306  
##  preussi     :27            1st Qu.:13.88   1st Qu.:-1.2129  
##  reichenowi  :98            Median :36.66   Median :-0.3858  
##                             Mean   :28.50   Mean   : 0.8227  
##                             3rd Qu.:36.85   3rd Qu.: 3.6031  
##                             Max.   :37.63   Max.   : 8.2104  
##     SOURCE         
##  Length:135        
##  Class :character  
##  Mode  :character  
##                    
##                    
##

This procedure of getting unique localities and reducing localities by distance has taken us from points to 135 points.

a=ggplot(output,aes(x=LONGITUDE,y=LATITUDE,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()
d=coord_fixed()

plot1=a+b+c+d
print(plot1)

Now, we are going to incorporate some of the ecological layers from the ENVIREM dataset. I am going to extract the environmental layers for each point and then perform a PCA analysis to maximize the variation within these and to determine which variables are the most informative for separating these populations.

##Not run in entirety in markdown document
##This can be run separately if so desired

# Isolate odd numbered files
# rasterpath="path/to/envirem_africa/Africa_current-30s/"

y=list.files(rasterpath,pattern='*.bil')

#The following isolates files if "aux" files are present
#y2=1:length(y)
#y3=y2[y2 %% 2!=0]

#y2=y[y3]
#y2=y2[-10]

#Create raster stack of all objects
setwd(rasterpath)
#bils=stack(y[y3])
bils=stack(y)

#Visualize points on plot
##Ensure coordinates read correctly
plot(bils$current_30arcsec_minTempWarmest)
points(output[,-1],pch=19)

#Extract values for point localities from all layers
ext=extract(x=bils,y=output[,-c(1,4)])

#Free up memory
rm(bils)

#Create entire data frame
x=cbind(output,ext)

write.csv(x,
          paste0(filepath,"Ecological Analysis/envirem_extracts.csv"),
          quote=F,row.names=F)

#Perform PCA of environmental data
rda.x=rda(ext,scale=T)
rda.x.data=rda.x$CA$u

eigs=rda.x$CA$eig
w=NULL
for(i in 1:length(eigs)){
  print(eigs[i]/sum(eigs))
  w[i]=eigs[i]/sum(eigs)
}

##       PC1 
## 0.6328363 
##      PC2 
## 0.173279 
##        PC3 
## 0.08782684 
##        PC4 
## 0.03333912 
##        PC5 
## 0.02709639 
##        PC6 
## 0.02229936 
##         PC7 
## 0.009873212 
##        PC8 
## 0.00562697 
##         PC9 
## 0.003272304 
##        PC10 
## 0.002097139 
##        PC11 
## 0.001088964 
##         PC12 
## 0.0006532066 
##         PC13 
## 0.0003753352 
##         PC14 
## 0.0002194165 
##         PC15 
## 9.323964e-05 
##         PC16 
## 2.322726e-05

plot(x=1:length(w),y=w,pch=19,main="PCA Eigenvalues")

x=cbind(x,rda.x.data)

colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi",
                  "genderuensis","parvirostris",
                  "Unknown")
colScale=scale_color_manual(name="grp",values=colorset)

a=ggplot(x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME,shape=SOURCE))
b=theme(panel.background = element_rect(fill="white",color = "grey50"),
        axis.title.x = element_text(size=20),
        axis.title.y = element_text(size=20),
        axis.text.x = element_text(size=15),
        axis.text.y = element_text(size=15),
        legend.title = element_blank(),
        legend.text = element_text(size=15))
c=geom_point(size=1.5)
d=stat_ellipse()
e=colScale

plot1=a+b+c+d+e
print(plot1)

## Too few points to calculate an ellipse

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Too few points to calculate an ellipse

## Warning: Removed 2 row(s) containing missing values (geom_path).

contrib=rda.x$CA$v

#Isolate most important PC's
contrib2=contrib[,1:3]

xx=rowSums(abs(contrib2))
print(xx[order(xx,decreasing = T)])

##           current_30arcsec_continentality 
##                                 1.0103746 
##           current_30arcsec_PETseasonality 
##                                 0.9334945 
##        current_30arcsec_PETWettestQuarter 
##                                 0.8003781 
##           current_30arcsec_minTempWarmest 
##                                 0.7942127 
##                current_30arcsec_embergerQ 
##                                 0.6920052 
## current_30arcsec_aridityIndexThornthwaite 
##                                 0.6633440 
##    current_30arcsec_climaticMoistureIndex 
##                                 0.6297976 
##           current_30arcsec_maxTempColdest 
##                                 0.6139360 
##          current_30arcsec_thermicityIndex 
##                                 0.6115650 
##          current_30arcsec_growingDegDays0 
##                                 0.6055714 
##                current_30arcsec_annualPET 
##                                 0.6009393 
##          current_30arcsec_growingDegDays5 
##                                 0.5993007 
##        current_30arcsec_PETColdestQuarter 
##                                 0.5790410 
##        current_30arcsec_PETWarmestQuarter 
##                                 0.4879567 
##         current_30arcsec_PETDriestQuarter 
##                                 0.4585063 
##       current_30arcsec_monthCountByTemp10 
##                                 0.3241039

From above, we get that the most important variables for the first three PC’s are:

Continentality
PETseasonality
PETWettestQuarter
minTempWarmest
embergerQ
aridityIndexThornthwaite
climaticMoistureIndex
maxTempColdest
thermicity

Everything after layer 5 plateaus in terms of its contribution.

We can also look at the correlation between layers to determine what should be removed:

All maintained:

#colnames(x)
ext=x[,5:20]
cor(ext)

##                                           current_30arcsec_annualPET
## current_30arcsec_annualPET                                 1.0000000
## current_30arcsec_aridityIndexThornthwaite                  0.5356523
## current_30arcsec_climaticMoistureIndex                    -0.7425039
## current_30arcsec_continentality                            0.3113893
## current_30arcsec_embergerQ                                -0.6852135
## current_30arcsec_growingDegDays0                           0.7173201
## current_30arcsec_growingDegDays5                           0.7234581
## current_30arcsec_maxTempColdest                            0.8519527
## current_30arcsec_minTempWarmest                            0.5029645
## current_30arcsec_monthCountByTemp10                        0.6226440
## current_30arcsec_PETColdestQuarter                         0.9509294
## current_30arcsec_PETDriestQuarter                          0.8647464
## current_30arcsec_PETseasonality                            0.6248540
## current_30arcsec_PETWarmestQuarter                         0.9805415
## current_30arcsec_PETWettestQuarter                         0.9294288
## current_30arcsec_thermicityIndex                           0.7071537
##                                           current_30arcsec_aridityIndexThornthwaite
## current_30arcsec_annualPET                                                0.5356523
## current_30arcsec_aridityIndexThornthwaite                                 1.0000000
## current_30arcsec_climaticMoistureIndex                                   -0.4086194
## current_30arcsec_continentality                                           0.4067714
## current_30arcsec_embergerQ                                               -0.2766005
## current_30arcsec_growingDegDays0                                          0.6919106
## current_30arcsec_growingDegDays5                                          0.6990780
## current_30arcsec_maxTempColdest                                           0.6169803
## current_30arcsec_minTempWarmest                                           0.6682661
## current_30arcsec_monthCountByTemp10                                       0.5440015
## current_30arcsec_PETColdestQuarter                                        0.4951528
## current_30arcsec_PETDriestQuarter                                         0.5603930
## current_30arcsec_PETseasonality                                           0.4584248
## current_30arcsec_PETWarmestQuarter                                        0.5841881
## current_30arcsec_PETWettestQuarter                                        0.3219686
## current_30arcsec_thermicityIndex                                          0.6822435
##                                           current_30arcsec_climaticMoistureIndex
## current_30arcsec_annualPET                                            -0.7425039
## current_30arcsec_aridityIndexThornthwaite                             -0.4086194
## current_30arcsec_climaticMoistureIndex                                 1.0000000
## current_30arcsec_continentality                                       -0.3819811
## current_30arcsec_embergerQ                                             0.9159646
## current_30arcsec_growingDegDays0                                      -0.3649445
## current_30arcsec_growingDegDays5                                      -0.3715665
## current_30arcsec_maxTempColdest                                       -0.4993619
## current_30arcsec_minTempWarmest                                       -0.1885577
## current_30arcsec_monthCountByTemp10                                   -0.4027436
## current_30arcsec_PETColdestQuarter                                    -0.6339926
## current_30arcsec_PETDriestQuarter                                     -0.4968806
## current_30arcsec_PETseasonality                                       -0.6734906
## current_30arcsec_PETWarmestQuarter                                    -0.7073433
## current_30arcsec_PETWettestQuarter                                    -0.8012461
## current_30arcsec_thermicityIndex                                      -0.3518836
##                                           current_30arcsec_continentality
## current_30arcsec_annualPET                                      0.3113893
## current_30arcsec_aridityIndexThornthwaite                       0.4067714
## current_30arcsec_climaticMoistureIndex                         -0.3819811
## current_30arcsec_continentality                                 1.0000000
## current_30arcsec_embergerQ                                     -0.4153093
## current_30arcsec_growingDegDays0                                0.2479743
## current_30arcsec_growingDegDays5                                0.2496265
## current_30arcsec_maxTempColdest                                 0.1203737
## current_30arcsec_minTempWarmest                                 0.2681165
## current_30arcsec_monthCountByTemp10                             0.2090693
## current_30arcsec_PETColdestQuarter                              0.1522460
## current_30arcsec_PETDriestQuarter                               0.1924481
## current_30arcsec_PETseasonality                                 0.8129495
## current_30arcsec_PETWarmestQuarter                              0.4184588
## current_30arcsec_PETWettestQuarter                              0.2405960
## current_30arcsec_thermicityIndex                                0.2492855
##                                           current_30arcsec_embergerQ
## current_30arcsec_annualPET                              -0.685213476
## current_30arcsec_aridityIndexThornthwaite               -0.276600501
## current_30arcsec_climaticMoistureIndex                   0.915964616
## current_30arcsec_continentality                         -0.415309279
## current_30arcsec_embergerQ                               1.000000000
## current_30arcsec_growingDegDays0                        -0.179704348
## current_30arcsec_growingDegDays5                        -0.190039152
## current_30arcsec_maxTempColdest                         -0.342118724
## current_30arcsec_minTempWarmest                          0.007497584
## current_30arcsec_monthCountByTemp10                     -0.313475643
## current_30arcsec_PETColdestQuarter                      -0.542920360
## current_30arcsec_PETDriestQuarter                       -0.446684078
## current_30arcsec_PETseasonality                         -0.725232433
## current_30arcsec_PETWarmestQuarter                      -0.659458836
## current_30arcsec_PETWettestQuarter                      -0.765204883
## current_30arcsec_thermicityIndex                        -0.163912787
##                                           current_30arcsec_growingDegDays0
## current_30arcsec_annualPET                                       0.7173201
## current_30arcsec_aridityIndexThornthwaite                        0.6919106
## current_30arcsec_climaticMoistureIndex                          -0.3649445
## current_30arcsec_continentality                                  0.2479743
## current_30arcsec_embergerQ                                      -0.1797043
## current_30arcsec_growingDegDays0                                 1.0000000
## current_30arcsec_growingDegDays5                                 0.9946251
## current_30arcsec_maxTempColdest                                  0.9406786
## current_30arcsec_minTempWarmest                                  0.9560046
## current_30arcsec_monthCountByTemp10                              0.6704893
## current_30arcsec_PETColdestQuarter                               0.7744221
## current_30arcsec_PETDriestQuarter                                0.7583540
## current_30arcsec_PETseasonality                                  0.3029431
## current_30arcsec_PETWarmestQuarter                               0.7384720
## current_30arcsec_PETWettestQuarter                               0.4821080
## current_30arcsec_thermicityIndex                                 0.9986232
##                                           current_30arcsec_growingDegDays5
## current_30arcsec_annualPET                                       0.7234581
## current_30arcsec_aridityIndexThornthwaite                        0.6990780
## current_30arcsec_climaticMoistureIndex                          -0.3715665
## current_30arcsec_continentality                                  0.2496265
## current_30arcsec_embergerQ                                      -0.1900392
## current_30arcsec_growingDegDays0                                 0.9946251
## current_30arcsec_growingDegDays5                                 1.0000000
## current_30arcsec_maxTempColdest                                  0.9390167
## current_30arcsec_minTempWarmest                                  0.9468207
## current_30arcsec_monthCountByTemp10                              0.7015541
## current_30arcsec_PETColdestQuarter                               0.7763068
## current_30arcsec_PETDriestQuarter                                0.7611201
## current_30arcsec_PETseasonality                                  0.3099811
## current_30arcsec_PETWarmestQuarter                               0.7423933
## current_30arcsec_PETWettestQuarter                               0.4959363
## current_30arcsec_thermicityIndex                                 0.9927087
##                                           current_30arcsec_maxTempColdest
## current_30arcsec_annualPET                                      0.8519527
## current_30arcsec_aridityIndexThornthwaite                       0.6169803
## current_30arcsec_climaticMoistureIndex                         -0.4993619
## current_30arcsec_continentality                                 0.1203737
## current_30arcsec_embergerQ                                     -0.3421187
## current_30arcsec_growingDegDays0                                0.9406786
## current_30arcsec_growingDegDays5                                0.9390167
## current_30arcsec_maxTempColdest                                 1.0000000
## current_30arcsec_minTempWarmest                                 0.8219372
## current_30arcsec_monthCountByTemp10                             0.6757719
## current_30arcsec_PETColdestQuarter                              0.9168618
## current_30arcsec_PETDriestQuarter                               0.8409024
## current_30arcsec_PETseasonality                                 0.3149568
## current_30arcsec_PETWarmestQuarter                              0.8361990
## current_30arcsec_PETWettestQuarter                              0.6740010
## current_30arcsec_thermicityIndex                                0.9396267
##                                           current_30arcsec_minTempWarmest
## current_30arcsec_annualPET                                    0.502964461
## current_30arcsec_aridityIndexThornthwaite                     0.668266080
## current_30arcsec_climaticMoistureIndex                       -0.188557678
## current_30arcsec_continentality                               0.268116512
## current_30arcsec_embergerQ                                    0.007497584
## current_30arcsec_growingDegDays0                              0.956004592
## current_30arcsec_growingDegDays5                              0.946820723
## current_30arcsec_maxTempColdest                               0.821937192
## current_30arcsec_minTempWarmest                               1.000000000
## current_30arcsec_monthCountByTemp10                           0.569843180
## current_30arcsec_PETColdestQuarter                            0.585297990
## current_30arcsec_PETDriestQuarter                             0.599587572
## current_30arcsec_PETseasonality                               0.192523103
## current_30arcsec_PETWarmestQuarter                            0.538127313
## current_30arcsec_PETWettestQuarter                            0.229093254
## current_30arcsec_thermicityIndex                              0.958133142
##                                           current_30arcsec_monthCountByTemp10
## current_30arcsec_annualPET                                          0.6226440
## current_30arcsec_aridityIndexThornthwaite                           0.5440015
## current_30arcsec_climaticMoistureIndex                             -0.4027436
## current_30arcsec_continentality                                     0.2090693
## current_30arcsec_embergerQ                                         -0.3134756
## current_30arcsec_growingDegDays0                                    0.6704893
## current_30arcsec_growingDegDays5                                    0.7015541
## current_30arcsec_maxTempColdest                                     0.6757719
## current_30arcsec_minTempWarmest                                     0.5698432
## current_30arcsec_monthCountByTemp10                                 1.0000000
## current_30arcsec_PETColdestQuarter                                  0.6035413
## current_30arcsec_PETDriestQuarter                                   0.5521891
## current_30arcsec_PETseasonality                                     0.3423416
## current_30arcsec_PETWarmestQuarter                                  0.6204620
## current_30arcsec_PETWettestQuarter                                  0.5285099
## current_30arcsec_thermicityIndex                                    0.6657654
##                                           current_30arcsec_PETColdestQuarter
## current_30arcsec_annualPET                                         0.9509294
## current_30arcsec_aridityIndexThornthwaite                          0.4951528
## current_30arcsec_climaticMoistureIndex                            -0.6339926
## current_30arcsec_continentality                                    0.1522460
## current_30arcsec_embergerQ                                        -0.5429204
## current_30arcsec_growingDegDays0                                   0.7744221
## current_30arcsec_growingDegDays5                                   0.7763068
## current_30arcsec_maxTempColdest                                    0.9168618
## current_30arcsec_minTempWarmest                                    0.5852980
## current_30arcsec_monthCountByTemp10                                0.6035413
## current_30arcsec_PETColdestQuarter                                 1.0000000
## current_30arcsec_PETDriestQuarter                                  0.8780569
## current_30arcsec_PETseasonality                                    0.4095537
## current_30arcsec_PETWarmestQuarter                                 0.9166130
## current_30arcsec_PETWettestQuarter                                 0.8460238
## current_30arcsec_thermicityIndex                                   0.7703611
##                                           current_30arcsec_PETDriestQuarter
## current_30arcsec_annualPET                                        0.8647464
## current_30arcsec_aridityIndexThornthwaite                         0.5603930
## current_30arcsec_climaticMoistureIndex                           -0.4968806
## current_30arcsec_continentality                                   0.1924481
## current_30arcsec_embergerQ                                       -0.4466841
## current_30arcsec_growingDegDays0                                  0.7583540
## current_30arcsec_growingDegDays5                                  0.7611201
## current_30arcsec_maxTempColdest                                   0.8409024
## current_30arcsec_minTempWarmest                                   0.5995876
## current_30arcsec_monthCountByTemp10                               0.5521891
## current_30arcsec_PETColdestQuarter                                0.8780569
## current_30arcsec_PETDriestQuarter                                 1.0000000
## current_30arcsec_PETseasonality                                   0.4567340
## current_30arcsec_PETWarmestQuarter                                0.8748009
## current_30arcsec_PETWettestQuarter                                0.6890573
## current_30arcsec_thermicityIndex                                  0.7475316
##                                           current_30arcsec_PETseasonality
## current_30arcsec_annualPET                                      0.6248540
## current_30arcsec_aridityIndexThornthwaite                       0.4584248
## current_30arcsec_climaticMoistureIndex                         -0.6734906
## current_30arcsec_continentality                                 0.8129495
## current_30arcsec_embergerQ                                     -0.7252324
## current_30arcsec_growingDegDays0                                0.3029431
## current_30arcsec_growingDegDays5                                0.3099811
## current_30arcsec_maxTempColdest                                 0.3149568
## current_30arcsec_minTempWarmest                                 0.1925231
## current_30arcsec_monthCountByTemp10                             0.3423416
## current_30arcsec_PETColdestQuarter                              0.4095537
## current_30arcsec_PETDriestQuarter                               0.4567340
## current_30arcsec_PETseasonality                                 1.0000000
## current_30arcsec_PETWarmestQuarter                              0.6878161
## current_30arcsec_PETWettestQuarter                              0.5954100
## current_30arcsec_thermicityIndex                                0.2878791
##                                           current_30arcsec_PETWarmestQuarter
## current_30arcsec_annualPET                                         0.9805415
## current_30arcsec_aridityIndexThornthwaite                          0.5841881
## current_30arcsec_climaticMoistureIndex                            -0.7073433
## current_30arcsec_continentality                                    0.4184588
## current_30arcsec_embergerQ                                        -0.6594588
## current_30arcsec_growingDegDays0                                   0.7384720
## current_30arcsec_growingDegDays5                                   0.7423933
## current_30arcsec_maxTempColdest                                    0.8361990
## current_30arcsec_minTempWarmest                                    0.5381273
## current_30arcsec_monthCountByTemp10                                0.6204620
## current_30arcsec_PETColdestQuarter                                 0.9166130
## current_30arcsec_PETDriestQuarter                                  0.8748009
## current_30arcsec_PETseasonality                                    0.6878161
## current_30arcsec_PETWarmestQuarter                                 1.0000000
## current_30arcsec_PETWettestQuarter                                 0.8842324
## current_30arcsec_thermicityIndex                                   0.7299431
##                                           current_30arcsec_PETWettestQuarter
## current_30arcsec_annualPET                                         0.9294288
## current_30arcsec_aridityIndexThornthwaite                          0.3219686
## current_30arcsec_climaticMoistureIndex                            -0.8012461
## current_30arcsec_continentality                                    0.2405960
## current_30arcsec_embergerQ                                        -0.7652049
## current_30arcsec_growingDegDays0                                   0.4821080
## current_30arcsec_growingDegDays5                                   0.4959363
## current_30arcsec_maxTempColdest                                    0.6740010
## current_30arcsec_minTempWarmest                                    0.2290933
## current_30arcsec_monthCountByTemp10                                0.5285099
## current_30arcsec_PETColdestQuarter                                 0.8460238
## current_30arcsec_PETDriestQuarter                                  0.6890573
## current_30arcsec_PETseasonality                                    0.5954100
## current_30arcsec_PETWarmestQuarter                                 0.8842324
## current_30arcsec_PETWettestQuarter                                 1.0000000
## current_30arcsec_thermicityIndex                                   0.4735434
##                                           current_30arcsec_thermicityIndex
## current_30arcsec_annualPET                                       0.7071537
## current_30arcsec_aridityIndexThornthwaite                        0.6822435
## current_30arcsec_climaticMoistureIndex                          -0.3518836
## current_30arcsec_continentality                                  0.2492855
## current_30arcsec_embergerQ                                      -0.1639128
## current_30arcsec_growingDegDays0                                 0.9986232
## current_30arcsec_growingDegDays5                                 0.9927087
## current_30arcsec_maxTempColdest                                  0.9396267
## current_30arcsec_minTempWarmest                                  0.9581331
## current_30arcsec_monthCountByTemp10                              0.6657654
## current_30arcsec_PETColdestQuarter                               0.7703611
## current_30arcsec_PETDriestQuarter                                0.7475316
## current_30arcsec_PETseasonality                                  0.2878791
## current_30arcsec_PETWarmestQuarter                               0.7299431
## current_30arcsec_PETWettestQuarter                               0.4735434
## current_30arcsec_thermicityIndex                                 1.0000000

z=cor(ext)

Removing layers:

cor(ext[,-c(1,3,6,7,10,11,14,16)])

##                                           current_30arcsec_aridityIndexThornthwaite
## current_30arcsec_aridityIndexThornthwaite                                 1.0000000
## current_30arcsec_continentality                                           0.4067714
## current_30arcsec_embergerQ                                               -0.2766005
## current_30arcsec_maxTempColdest                                           0.6169803
## current_30arcsec_minTempWarmest                                           0.6682661
## current_30arcsec_PETDriestQuarter                                         0.5603930
## current_30arcsec_PETseasonality                                           0.4584248
## current_30arcsec_PETWettestQuarter                                        0.3219686
##                                           current_30arcsec_continentality
## current_30arcsec_aridityIndexThornthwaite                       0.4067714
## current_30arcsec_continentality                                 1.0000000
## current_30arcsec_embergerQ                                     -0.4153093
## current_30arcsec_maxTempColdest                                 0.1203737
## current_30arcsec_minTempWarmest                                 0.2681165
## current_30arcsec_PETDriestQuarter                               0.1924481
## current_30arcsec_PETseasonality                                 0.8129495
## current_30arcsec_PETWettestQuarter                              0.2405960
##                                           current_30arcsec_embergerQ
## current_30arcsec_aridityIndexThornthwaite               -0.276600501
## current_30arcsec_continentality                         -0.415309279
## current_30arcsec_embergerQ                               1.000000000
## current_30arcsec_maxTempColdest                         -0.342118724
## current_30arcsec_minTempWarmest                          0.007497584
## current_30arcsec_PETDriestQuarter                       -0.446684078
## current_30arcsec_PETseasonality                         -0.725232433
## current_30arcsec_PETWettestQuarter                      -0.765204883
##                                           current_30arcsec_maxTempColdest
## current_30arcsec_aridityIndexThornthwaite                       0.6169803
## current_30arcsec_continentality                                 0.1203737
## current_30arcsec_embergerQ                                     -0.3421187
## current_30arcsec_maxTempColdest                                 1.0000000
## current_30arcsec_minTempWarmest                                 0.8219372
## current_30arcsec_PETDriestQuarter                               0.8409024
## current_30arcsec_PETseasonality                                 0.3149568
## current_30arcsec_PETWettestQuarter                              0.6740010
##                                           current_30arcsec_minTempWarmest
## current_30arcsec_aridityIndexThornthwaite                     0.668266080
## current_30arcsec_continentality                               0.268116512
## current_30arcsec_embergerQ                                    0.007497584
## current_30arcsec_maxTempColdest                               0.821937192
## current_30arcsec_minTempWarmest                               1.000000000
## current_30arcsec_PETDriestQuarter                             0.599587572
## current_30arcsec_PETseasonality                               0.192523103
## current_30arcsec_PETWettestQuarter                            0.229093254
##                                           current_30arcsec_PETDriestQuarter
## current_30arcsec_aridityIndexThornthwaite                         0.5603930
## current_30arcsec_continentality                                   0.1924481
## current_30arcsec_embergerQ                                       -0.4466841
## current_30arcsec_maxTempColdest                                   0.8409024
## current_30arcsec_minTempWarmest                                   0.5995876
## current_30arcsec_PETDriestQuarter                                 1.0000000
## current_30arcsec_PETseasonality                                   0.4567340
## current_30arcsec_PETWettestQuarter                                0.6890573
##                                           current_30arcsec_PETseasonality
## current_30arcsec_aridityIndexThornthwaite                       0.4584248
## current_30arcsec_continentality                                 0.8129495
## current_30arcsec_embergerQ                                     -0.7252324
## current_30arcsec_maxTempColdest                                 0.3149568
## current_30arcsec_minTempWarmest                                 0.1925231
## current_30arcsec_PETDriestQuarter                               0.4567340
## current_30arcsec_PETseasonality                                 1.0000000
## current_30arcsec_PETWettestQuarter                              0.5954100
##                                           current_30arcsec_PETWettestQuarter
## current_30arcsec_aridityIndexThornthwaite                          0.3219686
## current_30arcsec_continentality                                    0.2405960
## current_30arcsec_embergerQ                                        -0.7652049
## current_30arcsec_maxTempColdest                                    0.6740010
## current_30arcsec_minTempWarmest                                    0.2290933
## current_30arcsec_PETDriestQuarter                                  0.6890573
## current_30arcsec_PETseasonality                                    0.5954100
## current_30arcsec_PETWettestQuarter                                 1.0000000

Using the above to guide what is most important, I have removed the following, which are more than 85% correlated with another layer of importance within the sample data. This will be compared to the overall patterns of covariation within the 2.5 arcminute data used for distribution modeling.

monthCountByTemp10 (count, not continuous)
growingDegDays0 (count, not continuous)
growingDegDays5 (count, not continuous)
annualPET (% with wettestquarter)
thermicity (% with minTempWarmest)
climaticMoistureIndex (% with embergerQ)
PETWarmestQuarter (% with PETWettestQuarter)
PETColdestQuarter (% with maxTempColdest)

Covariation at 2.5 ArcMinutes

rasterpath="path/to/envirem_africa/Africa_current_2.5arcmin_generic/"

y=list.files(rasterpath,pattern="*.bil")
y1=paste0(rasterpath,y)

y=stack(y1)

nc=ncell(y)

valsT=extract(x=y,y=seq(from=1,to=nc,by=1))

#Removing those with no correlation to make it easier to read
#Remove non continuous counts
valsT2=na.omit(valsT[,-c(1,3,6,7,10,11,13,14,16)])
cor(valsT2)
abs(cor(valsT2))>.85

Climatic moisture index still correlated with embergerQ
PET coldest still correlated with maxTempColdest
Thermicity still correlated with maxTempColdest
Continentality and seasonality are 96% correlated!

Continentality was more important for the data points, so it was kept and seasonality removed.

Final kept layers are:

aridityIndexThornthwaite
continentality
embergerQ
maxTempColdest
minTempWarmest
PETDriestQuarter
PETWettestQuarter

Ecological Niche Modeling

The following section outlies the procedure for creation ecological niche models (ENMs) for these taxa, and subsequently testing for niche divergence. Additionally, we will be projecting these layers through time to see the likely pathway of colonization for the species.

Current models are restricted to the following area:

## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read

The species does not currently occur in Angola, so we are excluding it from the training region. Past projections will occur broader areas, in part because we have no way of knowing for certain if local extinction has occurred.

Current Environmental Conditions

We need to reduce the current environmental layers to the training area. This will create more accurate models and will reduce the strain on our processors at the same time. The extracts are from the 30s dataframe, but the models are made with 2.5 for computing reasons.

y=list.files(rasterpath,pattern="*.bil")[-c(1,3,6,7,10,11,13,14,16)]

Note that we do not have any aux files at the present time, so I do not need to omit them from the file list.

#Set lists for shapefiles of M and for bioclim layers of datasets
m="~/Dropbox/GIS/small-af/cinnyris.shp"

bioclim=paste0(rasterpath,y)

#Reformat variables for the loop code
filelist=bioclim
ShapeFile=m

#set save directory
# SaveDir="path/to/Ecological Analysis/Africa_current-2.5m_CLIPPED/"

#Note that there is a section that must be edited each time in function
CropLoop<-function(filelist=NA,ShapeFile=NA,SaveDir)
{
  require(maptools)
  require(raster)
  Shp1 = readShapePoly(ShapeFile)
  for (i in 1:length(filelist))
  {
    r1 = raster(filelist[i])
    cr1 = crop(r1,Shp1)
    cr2 = raster::mask(cr1,Shp1) #Avoid confusion with other packages
    
    #Get number of elements in filename
    j2=unlist(strsplit(as.character(filelist[i]),"[/]"))
    n=length(j2)
    #Get filename
    j=strsplit(as.character(filelist[i]),"[/]")[[1]][n]
    FileName=strsplit(as.character(j),"[.]")[[1]][1]
    
    #Save as ASCII
    writeRaster(cr2,paste0(SaveDir,FileName),"ascii",overwrite=T)
    #plot(cr2)
    print(FileName)
  }
}

CropLoop(filelist=filelist,ShapeFile=ShapeFile,SaveDir=SaveDir)

The above code was not run while concatonating this document, but did run successfully for all files.

Minimum Volume Ellipsoids

This code is based on code from Dr. Jorge Soberón. It will create models for all time periods at the same time.

First, we must define the function that calculates the distance from a point p to an ellipse of centroid m and matrix s. The parameters are thus: p test point; m ellipse centroid; s inverse matrix of the covariance of the ellipse.

#MAJA function
maja=function(p,m,s)((p-m)%*%s%*%t(p-m))^0.5

#Quantile function
##Double check function? divide by 1 is 1...
##changed to 4, for quantiles
NDquantil=function(nD,level){
  return(round(nD*level))
}

These minimum volume ellipsoids are less sensitive to point density, and do not rely on pseudoabsence data for determining where species do not occur.

genderu=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"),]
reich=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="reichenowi"),]
preuss=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="preussi"),]

y1=stack(paste0(rasterpath,y))
y=y1

The only thing that is different from the previous look at correlation is that seasonality and continentality are correlated. These are also the two most important parts from the PC; I am keeping both of them here, as they passed the previous tests of correlation and this is a coarser dataset.

#Create function for individual plot formation
ssp.plot=function(ssp,ssp.text){
  vals=extract(x=y,y=ssp[,2:3])
  vals=na.omit(vals)
  vals=unique(vals)
  #vals=vals[,-10]

  n1=NDquantil(nrow(vals),0.9)

  #for(i in 1:ncol(vals)){print(IQR(vals[,i]))}

  mve1=cov.mve(vals,quantile.use=n1)

  nc=ncell(y)
  
  mu1=matrix(mve1$center,nrow=1)
  s1=mve1$cov
  invs1=solve(s1)

  dT1=matrix(0,ncol=1,nrow=nc)
  
  #Load values for current time period
  
  valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
  
  #Create current models
  
  valsT1=as.matrix(valsT)
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
      
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y),ncol=ncol(y),
         ext=extent(y),resolution=res(y),vals=dT1)
  setwd(paste0(filepath,'Ecological Analysis/raw-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  plot(q)
  
  ext=extract(x=q,y=ssp[,2:3])
  ext2=na.omit(ext)

  #Remove the furthest 20% to reflect issues with plotting in eBird
  #threshold based on these values
  
  ext2=ext2[order(ext2)]
  cutoff=round(0.8*length(ext2))
  ext3=ext2[1:cutoff]
  
  ##The following sets binary presence to everything above 1.5 sd below mean of occurrence
  #n=max(ext2)
  #ND=(round(n*0.95))
  #m=c(NA,NA,NA,0,ND,1,ND,Inf,0)
  #m=matrix(m,ncol=3,byrow=T)
  #rc=reclassify(q,m)
  #y2=y[which(ext>ND),]

  #Everything up to 1.5 sd above the mean included

  #ext2
  sdext=sd(ext3)
  mext=mean(ext3)
  ND=mext+1.5*sdext
  m=c(NA,NA,NA,0,ND,1,ND,Inf,0)
  
  #Current threshold
  ##Used only here; heirarchical for other parts
  
  m=matrix(m,ncol=3,byrow=T)
  rc=reclassify(q,m)
  y2=y[which(ext>ND),]

  #Create color threshold for past models
  #color change for every standard deviation
  
  #New threshold on current conditions, then hierarchical
  #Created here, executed further down
  
  m2=m
  
  m=c(NA,NA,NA,
      0,(mext+(1.5*sdext)),1,
      (mext+(1.5*sdext)),(mext+(3*sdext)),2,
      (mext+(3*sdext)),(mext+(6*sdext)),3,
      (mext+(6*sdext)),(mext+(12*sdext)),4,
      (mext+(12*sdext)),(mext+(24*sdext)),5,
      (mext+(24*sdext)),Inf,6)
  
  m=matrix(m,ncol=3,byrow=T)
  
  species=ssp.text

  if(nrow(y2)!=0){
    setwd(paste0(filepath,"Ecological Analysis/threshold-mve/"))
    write.csv(y2,file=paste0(species,'_out.csv'),quote=F,row.names=F)
  }
  pathway=paste0(filepath,"Ecological Analysis/threshold-mve/",
                 species,".asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  points(ssp[,2:3],pch=19,col="black")

  #threshold classify tier
  thresh=reclassify(q,m)
  pathway=paste0(filepath,"Ecological Analysis/threshold-mve/",
                 species,"-tier.asc",sep="")
  writeRaster(thresh,pathway,overwrite=T)
  plot(thresh)
  rm(thresh)
  
  #Create color bands of how far it is from center
    
  #Holocene
  ##CCSM
  
  rm(valsT1)
  
  y.l=stack(paste0(holopath1,holo1))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,'Ecological Analysis/holo-ccsm-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("Holocene CCSM")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,"Ecological Analysis/holo-ccsm-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  ##miroc
  
  rm(valsT1)
  
  y.l=stack(paste0(holopath2,holo2))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,'Ecological Analysis/holo-miroc-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("Holocene MIROC")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,"Ecological Analysis/holo-miroc-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  ##mpi
  
  rm(valsT)
  rm(valsT1)
  
  y.l=stack(paste0(holopath3,holo3))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,'Ecological Analysis/holo-mpi-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("Holocene MPI")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,"Ecological Analysis/holo-mpi-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  #Last Glacial Maximum
  ##CCSM
  
  rm(valsT1)
  
  y.l=stack(paste0(lgmpath1,lgm1))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,'Ecological Analysis/lgm-ccsm-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("Last Glacial Maximum CCSM")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,"Ecological Analysis/lgm-ccsm-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  ##miroc
  
  rm(valsT)
  rm(valsT1)
  
  y.l=stack(paste0(lgmpath2,lgm2))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,'Ecological Analysis/lgm-miroc-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("LGM MIROC")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,"Ecological Analysis/lgm-miroc-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  ##mpi
  
  rm(valsT)
  rm(valsT1)
  
  y.l=stack(paste0(lgmpath3,lgm3))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,'Ecological Analysis/lgm-mpi-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("LGM MPI")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,"Ecological Analysis/lgm-mpi-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
}

Now, to perform individual iterations of the MVE script.

ssp.plot(ssp=preuss,ssp.text="preussi")

ssp.plot(ssp=reich,ssp.text="reichenowi")

ssp.plot(ssp=genderu,ssp.text="genderuensis")

Niche comparisons of the different populations

Using these variables, we can look at the occupied niche areas of the populations and see how divergent they are. This will be done using custom scripts from Cooper & Barragan (unpublished), based on the methodology of Warren et al.

In QGIS, I created individual shapefiles of the “regions” that each species inhabits. For each of these regions, I want to create 100 “random” niche models to compare, each model created using random points from within each species’ accessible area. These accessible areas are defined by biogeography, and are an attempt to encompass the geographically accessible area around each species.

nc=ncell(y)
valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
valsT1=as.matrix(valsT)

GISpath='~/Dropbox/GIS/small-af/'

randomizer=function(data,type,sp.text){
  # make sure GIS path goes to the M files
  x=readShapePoly(paste0(GISpath,sp.text,'.shp'))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  nx=nrow(data)
  for(i in 1:100){
    yy=spsample(x=x,n=nx,type=type)
    
    #Alternate method, not as effective
    #yy=randomPoints(mask=x,n=nrow(data),
    #                p=data[,2:3],excludep=T,
    #                cellnumbers=F,tryf=5)
    
    yy2=as.data.frame(coordinates(yy))
    colnames(yy2)=c("Long","Lat")
    yy2$Population=sp.text
    yy2=yy2[,c('Population','Long','Lat')]
    
    vals=extract(x=y,y=yy2[,2:3])
    vals=na.omit(vals)
    vals=unique(vals)
    #vals=vals[,-10]

    n1=NDquantil(nrow(vals),0.9)

    #for(i in 1:ncol(vals)){print(IQR(vals[,i]))}

    mve1=cov.mve(vals,quantile.use=n1)

    mu1=matrix(mve1$center,nrow=1)
    s1=mve1$cov
    invs1=solve(s1)
    
    dT1=matrix(0,ncol=1,nrow=nc)
    
    mu2=as.matrix(mu1)
    invs2=as.matrix(invs1)
      
    for(j in 1:nrow(valsT1)){
      dT1[j,1]=maja(valsT1[j,],mu2,invs2)
    }
    
    q=raster(nrow=nrow(y),ncol=ncol(y),
         ext=extent(y),resolution=res(y),vals=dT1)
    setwd(paste0(filepath,'Ecological Analysis/random/',sp.text,'/'))
    sp1=sp.text
    write.csv(yy2,
              file=paste0(sp.text,"_random-",i,'.csv'),
              quote=F,row.names=F)
    writeRaster(q,
                filename=paste0(sp.text,"_random-",i),
                format='ascii',overwrite=T)
    
    #plot(q)
  }
}

randomizer(data=reich,type='random',sp.text="reichenowi")
randomizer(data=preuss,type='random',sp.text="preussi")
randomizer(data=genderu,type='random',sp.text="genderuensis")

Now, to compare niche distributions. First, we must reduce the datasets down to the number of points being used to train the above models.

#restrict to closest 80% of points to centroid for comparisons, just like models

#reichenowi
r.q=raster(paste0(filepath,
                  "Ecological Analysis/raw-mve/reichenowi.asc"))
reich$r.dist=extract(r.q,reich[,2:3])
hist(reich$r.dist)

reich=reich[order(reich$r.dist),]
r.pt=round(nrow(reich)*0.8)

plot(r.q)
points(reich[1:r.pt,2:3],col="black",pch=19)
points(reich[r.pt:nrow(reich),2:3],col="red",pch=19)

reich2=reich[1:r.pt,]

#preussi
r.q=raster(paste0(filepath,
                  "Ecological Analysis/raw-mve/preussi.asc"))
preuss$r.dist=extract(r.q,preuss[,2:3])
hist(preuss$r.dist)

preuss=preuss[order(preuss$r.dist),]
r.pt=round(nrow(preuss)*0.8)

plot(r.q)
points(preuss[1:r.pt,2:3],col="black",pch=19)
points(preuss[r.pt:nrow(preuss),2:3],col="red",pch=19)

preuss2=preuss[1:r.pt,]

#genderuensis
r.q=raster(paste0(filepath,
                  "Ecological Analysis/raw-mve/genderuensis.asc"))
genderu$r.dist=extract(r.q,genderu[,2:3])
colnames(genderu)

##  [1] "SUBSPECIES.SCIENTIFIC.NAME"               
##  [2] "LONGITUDE"                                
##  [3] "LATITUDE"                                 
##  [4] "SOURCE"                                   
##  [5] "current_30arcsec_annualPET"               
##  [6] "current_30arcsec_aridityIndexThornthwaite"
##  [7] "current_30arcsec_climaticMoistureIndex"   
##  [8] "current_30arcsec_continentality"          
##  [9] "current_30arcsec_embergerQ"               
## [10] "current_30arcsec_growingDegDays0"         
## [11] "current_30arcsec_growingDegDays5"         
## [12] "current_30arcsec_maxTempColdest"          
## [13] "current_30arcsec_minTempWarmest"          
## [14] "current_30arcsec_monthCountByTemp10"      
## [15] "current_30arcsec_PETColdestQuarter"       
## [16] "current_30arcsec_PETDriestQuarter"        
## [17] "current_30arcsec_PETseasonality"          
## [18] "current_30arcsec_PETWarmestQuarter"       
## [19] "current_30arcsec_PETWettestQuarter"       
## [20] "current_30arcsec_thermicityIndex"         
## [21] "PC1"                                      
## [22] "PC2"                                      
## [23] "PC3"                                      
## [24] "PC4"                                      
## [25] "PC5"                                      
## [26] "PC6"                                      
## [27] "PC7"                                      
## [28] "PC8"                                      
## [29] "PC9"                                      
## [30] "PC10"                                     
## [31] "PC11"                                     
## [32] "PC12"                                     
## [33] "PC13"                                     
## [34] "PC14"                                     
## [35] "PC15"                                     
## [36] "PC16"                                     
## [37] "r.dist"

hist(genderu$r.dist)

genderu=genderu[order(genderu$r.dist),]
r.pt=round(nrow(genderu)*0.8)

plot(r.q)
points(genderu[1:r.pt,2:3],col="black",pch=19)
points(genderu[r.pt:nrow(genderu),2:3],col="red",pch=19)

genderu2=genderu[1:r.pt,]

And now to perform the tests.

# new filepath
filepath2="prev.filepath/Ecological Analysis/random/"

splist=list.files(filepath2)

#comparisons=matrix(nrow=100,ncol=2,data=NA)

truecomps=-99
truelists=matrix(nrow=100,ncol=1,data=-99)

for(i in 1:length(splist)){
  sp=splist[i]
  splist2=splist[-i]
  
  null.x=raster(paste0(filepath,
                       "Ecological Analysis/raw-mve/",sp,".asc"))
  comparisons=matrix(nrow=100,ncol=2,data=NA)
 
  comparisons=as.data.frame(comparisons)
  
  compvals=NULL
  
  for(j in 1:length(splist2)){
    comparelist=list.files(paste0(filepath2,splist2[j],"/"),
                           pattern="*.asc")
    
    true2=raster(paste0(filepath,"Ecological Analysis/raw-mve/",
                        splist2[j],".asc"))
      
    compvals=NULL
  
    for(k in 1:length(comparelist)){
     rando=raster(paste0(filepath2,splist2[j],"/",comparelist[k]))
     compvals[k]=nicheOverlap(x=null.x,y=rando,stat="D")
    }
    
    comparisons[,j]=compvals
    colnames(comparisons)[j]=paste0(splist[i],"-",splist2[j])
    
    truecomps=c(truecomps,
                nicheOverlap(x=null.x,y=true2,stat="D"))
    
  }
  
  truelists=cbind(truelists,comparisons)
}

truecomps2=t(as.data.frame(truecomps))
colnames(truecomps2)=colnames(truelists)

fullcomps=rbind(truecomps2,truelists)

write.csv(fullcomps,file=paste0(filepath,"Schoener-first-row-true.csv"),
          quote=F,row.names=F)

We can now look at and compare the niche models derived from the MVE envelopes of where these species occur.

x=read.csv(paste0(filepath,"Schoener-first-row-true.csv"))
x=x[,-1]
head(x)

##   genderuensis.preussi genderuensis.reichenowi preussi.genderuensis
## 1            0.7773179               0.7232936            0.7773179
## 2            0.7455431               0.7553698            0.7725751
## 3            0.7639776               0.7407890            0.6870270
## 4            0.7612553               0.7607214            0.6787750
## 5            0.7916149               0.7433471            0.7775531
## 6            0.7768061               0.7533292            0.6786293
##   preussi.reichenowi reichenowi.genderuensis reichenowi.preussi
## 1          0.7449366               0.7232936          0.7449366
## 2          0.7968239               0.6799918          0.8161258
## 3          0.7899377               0.6628084          0.8365100
## 4          0.7938077               0.5893233          0.7848623
## 5          0.8011549               0.7177177          0.8007086
## 6          0.8098695               0.5764343          0.8129598

We know that the first row is the “true” comparisons. We can therefore compare these to the entire distribution of the comparisons.

datax=matrix(data=NA,nrow=600,ncol=2)
datax=as.data.frame(datax)

colnames(datax)=c("ID","Value")

trues=x[1,c(1,2,6)]

datax$ID[1:100]="genderuensis.preussi"
datax$ID[101:200]="genderuensis.reichenowi"
datax$ID[201:300]="preussi.genderuensis"
datax$ID[301:400]="preussi.reichenowi"
datax$ID[401:500]="reichenowi.genderuensis"
datax$ID[501:600]="reichenowi.preussi"

datax$Value[1:100]=x[-1,1]
datax$Value[101:200]=x[-1,2]
datax$Value[201:300]=x[-1,3]
datax$Value[301:400]=x[-1,4]
datax$Value[401:500]=x[-1,5]
datax$Value[501:600]=x[-1,6]

datax$ID=as.factor(datax$ID)
datax$Value=as.numeric(datax$Value)

We have created a new data frame that is easier to manipulate in ggplot to look at the results. We can now go through things iteratively.

gen.preus=datax[which(datax$ID=='genderuensis.preussi'|
                        datax$ID=='preussi.genderuensis'),]

inter=trues$genderuensis.preussi

a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
  
print(a+b+b.5+c+d)

gen.preus=datax[which(datax$ID=='genderuensis.reichenowi'|
                        datax$ID=='reichenowi.genderuensis'),]

inter=trues$genderuensis.reichenowi

a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
  
print(a+b+b.5+c+d)

gen.preus=datax[which(datax$ID=='reichenowi.preussi'|
                        datax$ID=='preussi.reichenowi'),]

inter=trues$reichenowi.preussi

a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
  
print(a+b+b.5+c+d)

We are going to calculate \(P\) values for these distributions and compare them to the test statistic.

dist=unique(datax$ID)

trues=x[1,]

for(i in 1:length(dist)){
  distx=dist[i]
  print(paste0("Testing ",distx))
  datadist=datax[which(datax$ID==distx),]
  
  xbar=trues[,which(colnames(trues)==distx)]
  
  mu=mean(datadist$Value)
  sigma=sd(datadist$Value)
  n=nrow(datadist)
  
  z=(xbar-mu)/(sigma/sqrt(n))
  
  lowcrit=qnorm(p=0.025,mean=mu,sd=sigma)
  hicrit=qnorm(p=0.975,mean=mu,sd=sigma)
  
  if(xbar<lowcrit){
    print("Test statistic below low critical value.")
    print(paste0(lowcrit,"; statistic = ",xbar))
    }
  if(xbar>hicrit){
    print("Test statistic above high critical value.")
    print(paste0(hicrit,"; statistic = ",xbar))
    }
  
  print(paste0("P value for ",distx,
               " = ",pnorm(xbar,
                           mean=mu,sd=sigma)))
  
}

## [1] "Testing genderuensis.preussi"
## [1] "P value for genderuensis.preussi = 0.706751492086142"
## [1] "Testing genderuensis.reichenowi"
## [1] "Test statistic below low critical value."
## [1] "0.736680233179814; statistic = 0.723293582149787"
## [1] "P value for genderuensis.reichenowi = 0.000223524531141092"
## [1] "Testing preussi.genderuensis"
## [1] "P value for preussi.genderuensis = 0.692896339416361"
## [1] "Testing preussi.reichenowi"
## [1] "Test statistic below low critical value."
## [1] "0.783994797585943; statistic = 0.744936612710812"
## [1] "P value for preussi.reichenowi = 5.72761756781268e-11"
## [1] "Testing reichenowi.genderuensis"
## [1] "P value for reichenowi.genderuensis = 0.918497156199243"
## [1] "Testing reichenowi.preussi"
## [1] "P value for reichenowi.preussi = 0.0338531774097366"

One last thing, visualizing the PC plots from the ENVIREM extracts.

x=read.csv(paste0(filepath,
                   "Ecological Analysis/envirem_extracts_PCA.csv"))

a=ggplot(data=x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()

print(a+b+c)

a=ggplot(data=x,aes(x=PC3,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()

print(a+b+c)

for(i in 4:19){
  x[,i]=as.numeric(x[,i])
  
  nombre=colnames(x)[i]
  
  a=ggplot(data=x,aes(y=x[,i],x=SUBSPECIES.SCIENTIFIC.NAME))
  b=geom_boxplot(notch=T)
  c=theme_classic()
  d=ggtitle(paste(nombre))
  
  print(a+b+c+d)
}

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

Lastly, I am going to do t-tests comparing the two Cameroonian populations to each other to see if they differ significantly in any aspects.

x2=x[x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"|
       x$SUBSPECIES.SCIENTIFIC.NAME=="preussi",1:19]

rda.x=rda(x2[,-c(1:3)],scale=T)
rda.x.data=rda.x$CA$u

x3=cbind(x2,rda.x.data)

a=ggplot(x3,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()

print(a+b+c)

Now for iterative tests. I’m assuming random distribution, but unequal population sizes.

for(i in 4:19){
  names=unique(x3$SUBSPECIES.SCIENTIFIC.NAME)
  pop1=x3[x3$SUBSPECIES.SCIENTIFIC.NAME==names[1],i]
  pop2=x3[x3$SUBSPECIES.SCIENTIFIC.NAME==names[2],i]
  print(colnames(x3)[i])
  z=t.test(x=pop1,y=pop2,c="two.sided",conf.level=0.95)
  
  print(z)
  
  z2=wilcox.test(x=pop1,y=pop2,alternative="two.sided",conf.level=0.95)
  
  print(z2)
}

## [1] "SOURCE"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 4.5826, df = 9, p-value = 0.001323
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3544498 1.0455502
## sample estimates:
## mean of x mean of y 
##       1.7       1.0

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 229.5, p-value = 2.167e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_annualPET"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 7.658, df = 22.159, p-value = 1.153e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  268.4117 467.6660
## sample estimates:
## mean of x mean of y 
##  1647.304  1279.265 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 265, p-value = 1.091e-07
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_aridityIndexThornthwaite"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 2.9329, df = 24.676, p-value = 0.007148
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   4.474285 25.623938
## sample estimates:
## mean of x mean of y 
##  69.44800  54.39889 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 221, p-value = 0.002403
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_climaticMoistureIndex"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = -9.8113, df = 14.521, p-value = 8.568e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5867430 -0.3768125
## sample estimates:
##  mean of x  mean of y 
## -0.0340000  0.4477778

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 0, p-value = 4.157e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_continentality"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 1.7608, df = 10.013, p-value = 0.1087
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1496194  1.2781379
## sample estimates:
## mean of x mean of y 
##  3.105000  2.540741

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 192, p-value = 0.05255
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_embergerQ"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = -7.7642, df = 24.704, p-value = 4.38e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -443.2483 -257.3050
## sample estimates:
## mean of x mean of y 
##  346.0630  696.3396 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 11, p-value = 1.114e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_growingDegDays0"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 5.7098, df = 28.498, p-value = 3.765e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  16239.61 34388.26
## sample estimates:
## mean of x mean of y 
##  97932.60  72618.67 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 255, p-value = 3.778e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_growingDegDays5"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 5.6426, df = 29.423, p-value = 4.059e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  16280.61 34774.81
## sample estimates:
## mean of x mean of y 
##  97932.60  72404.89 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 255, p-value = 3.778e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_maxTempColdest"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 5.8465, df = 19.613, p-value = 1.098e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  45.12414 95.28327
## sample estimates:
## mean of x mean of y 
##  262.5000  192.2963

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 257, p-value = 3.232e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_minTempWarmest"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 4.5908, df = 22.989, p-value = 0.0001292
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  29.74997 78.55374
## sample estimates:
## mean of x mean of y 
##  183.3000  129.1481

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 243, p-value = 0.0002357
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_monthCountByTemp10"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 1.776, df = 26, p-value = 0.08745
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1807469  2.4770432
## sample estimates:
## mean of x mean of y 
##  12.00000  10.85185

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 150, p-value = 0.2949
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETColdestQuarter"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 5.4017, df = 13.438, p-value = 0.0001076
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  15.70166 36.51753
## sample estimates:
## mean of x mean of y 
## 123.46700  97.35741 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 266, p-value = 6.89e-08
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETDriestQuarter"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 6.9832, df = 19.668, p-value = 9.776e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  25.10025 46.51606
## sample estimates:
## mean of x mean of y 
##  148.0500  112.2419 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 260, p-value = 7.981e-07
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETseasonality"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 5.4913, df = 11.997, p-value = 0.0001383
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  308.3805 714.0727
## sample estimates:
## mean of x mean of y 
## 1476.4240  965.1974 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 247, p-value = 2.798e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETWarmestQuarter"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 7.0929, df = 22.418, p-value = 3.675e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  23.82179 43.47814
## sample estimates:
## mean of x mean of y 
##   152.847   119.197 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 265, p-value = 1.091e-07
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETWettestQuarter"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 9.1267, df = 26.74, p-value = 1.061e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  18.20741 28.77451
## sample estimates:
## mean of x mean of y 
## 122.21800  98.72704 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 266, p-value = 6.89e-08
## alternative hypothesis: true location shift is not equal to 0

Example of divergence:

a=ggplot(data=x,aes(x=current_30arcsec_PETWettestQuarter,
                    y=current_30arcsec_embergerQ,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()

print(a+b+c)

##Modeling for past climates

In the above loop code, I projected the MVEs of species occurrence into past climates for the Holocene and the Last Glacial Maximum. We can average the three scenarios together to create a “best guess” of the distance of each grid cell to the environmental centroid of a given species.

holo.ccsm=paste0(filepath,"Ecological Analysis/holo-ccsm-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/holo-ccsm-mve/"),
                            pattern="*.asc"))

holo.miroc=paste0(filepath,"Ecological Analysis/holo-miroc-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/holo-miroc-mve/"),
                            pattern="*.asc"))

holo.mpi=paste0(filepath,"Ecological Analysis/holo-mpi-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/holo-mpi-mve/"),
                            pattern="*.asc"))

lgm.ccsm=paste0(filepath,"Ecological Analysis/lgm-ccsm-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/lgm-ccsm-mve/"),
                            pattern="*.asc"))

lgm.miroc=paste0(filepath,"Ecological Analysis/lgm-miroc-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/lgm-miroc-mve/"),
                            pattern="*.asc"))

lgm.mpi=paste0(filepath,"Ecological Analysis/lgm-mpi-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/lgm-mpi-mve/"),
                            pattern="*.asc"))

Now we have a list of files for each scenario in the same order for each situation. Now we have to average these together and save them.

Holocene Visualizations

#Plot preussi

#avg holo

holo=stack(holo.ccsm[4],holo.miroc[4],holo.mpi[4])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/holo-all-avg/preussi-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

holo=stack(holo.ccsm[3],holo.miroc[3],holo.mpi[3])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/holo-all-avg/preussi-threshold-avg.asc"),
            overwrite=T)

#Plot genderuensis

#avg holo

holo=stack(holo.ccsm[2],holo.miroc[2],holo.mpi[2])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/holo-all-avg/genderuensis-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

holo=stack(holo.ccsm[1],holo.miroc[1],holo.mpi[1])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/holo-all-avg/genderuensis-threshold-avg.asc"),
            overwrite=T)

#Plot reichenowi

#avg holo

holo=stack(holo.ccsm[6],holo.miroc[6],holo.mpi[6])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/holo-all-avg/reichenowi-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

holo=stack(holo.ccsm[5],holo.miroc[5],holo.mpi[5])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/holo-all-avg/reichenowi-threshold-avg.asc"),
            overwrite=T)

Last Glacial Maximum Visualizations

#Plot preussi

#avg lgm

lgm=stack(lgm.ccsm[4],lgm.miroc[4],lgm.mpi[4])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/lgm-all-avg/preussi-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

lgm=stack(lgm.ccsm[3],lgm.miroc[3],lgm.mpi[3])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/lgm-all-avg/preussi-threshold-avg.asc"),
            overwrite=T)

#Plot genderuensis

#avg lgm

lgm=stack(lgm.ccsm[2],lgm.miroc[2],lgm.mpi[2])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/lgm-all-avg/genderuensis-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

lgm=stack(lgm.ccsm[1],lgm.miroc[1],lgm.mpi[1])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/lgm-all-avg/genderuensis-threshold-avg.asc"),
            overwrite=T)

#Plot reichenowi

#avg lgm

lgm=stack(lgm.ccsm[6],lgm.miroc[6],lgm.mpi[6])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/lgm-all-avg/reichenowi-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

lgm=stack(lgm.ccsm[5],lgm.miroc[5],lgm.mpi[5])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/lgm-all-avg/reichenowi-threshold-avg.asc"),
            overwrite=T)

Path of most likely colonization

Each subspecies tells us something about the colonization path across Africa. We can similar average these scenarios together to understand where, exactly, the species most likely cross Africa.

holo=paste0(filepath,"Ecological Analysis/holo-all-avg/",
            list.files(paste0(filepath,
                              "Ecological Analysis/holo-all-avg/")))

lgm=paste0(filepath,"Ecological Analysis/lgm-all-avg/",
            list.files(paste0(filepath,
                              "Ecological Analysis/lgm-all-avg/")))

HOLOCENE

#Average all occurrence

lgm2=stack(holo[1],holo[3],holo[5])

y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,"Ecological Analysis/holo-all-avg/holo-avg.asc"))

#Average all threshold

lgm2=stack(holo[2],holo[4],holo[6])

y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/holo-all-avg/holo-threshold-avg.asc"))

LGM

#Average all occurrence

lgm2=stack(lgm[1],lgm[3],lgm[5])

y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/lgm-all-avg/all-avg.asc"))

#Average all threshold

lgm2=stack(lgm[2],lgm[4],lgm[6])

y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/lgm-all-avg/all-threshold-avg.asc"))

Outlier Identification

Per the morphological data, the following males are outliers in the genderuensis dataset: RMCA 75-3-A-438 and MNMH 1971.637.

34  genderuensis    RMCA    75-3-A-438  Adamawa Male    -0.006837937
25  genderuensis    MNMH    1971.637    Yaounde Male    0.002249162
35  genderuensis    RMCA    75-3-A-451  Adamawa Male    0.016450666
44  genderuensis    ZMB 75/80   Yaounde Male    0.019065758
33  genderuensis    NHMUK   1940.2.8.63 Tibati  Male    0.021588830
31  genderuensis    NHMUK   1922.11.25.216  Tibati  Male    0.022491253

x=read.csv(paste0(filepath,
                  "Ecological Analysis/envirem_extracts_PCA.csv"))

colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi",
                  "genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)

a=ggplot(x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME,shape=SOURCE))
b=theme(panel.background = element_rect(fill="white",color = "grey50"),
        axis.title.x = element_text(size=20),
        axis.title.y = element_text(size=20),
        axis.text.x = element_text(size=15),
        axis.text.y = element_text(size=15),
        legend.title = element_blank(),
        legend.text = element_text(size=15))
c=geom_point(size=1.5)
d=stat_ellipse()
e=colScale

plot1=a+b+c+d+e
print(plot1)

## Too few points to calculate an ellipse

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Too few points to calculate an ellipse

## Warning: Removed 2 row(s) containing missing values (geom_path).

Which environmental points are the outliers? For C. genderuensis, it looks like it is two specimens and an eBird record that have the most overlap:

x2=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"),]
x2=x2[order(x2$PC1,decreasing = T),]
x2[c(1:3,nrow(x2)),c(1,3,2,4,21,22)]

##     SUBSPECIES.SCIENTIFIC.NAME LATITUDE LONGITUDE   SOURCE         PC1
## 127               genderuensis 7.256281  12.06172 Specimen -0.02677265
## 35                genderuensis 3.888752  11.50979    eBird -0.07307286
## 128               genderuensis 2.836032  11.16286 Specimen -0.07366380
## 36                genderuensis 8.210441  13.81760    eBird -0.22744424
##            PC2
## 127 0.01861294
## 35  0.16000832
## 128 0.17733891
## 36  0.05241527

Furthest to the left point is in Benoue National Park; general park checklist perhaps? The other points are the supposed location of Genderu Mountain (the type locality), Yaounde, and Ebolowa to the south of Yaounde. Since the location of Genderu Mountain is assumed from notes of the one specimen and the Benoue locality is possibly park-wide, I am removing these two points.

Reload x dataframe from original extract .csv, and then remove rows of interest.

xx=x[-c(36,127),]
ext=xx[,-c(1:4)]

#Perform PCA of environmental data
rda.x=rda(ext,scale=T)
rda.x.data=rda.x$CA$u

eigs=rda.x$CA$eig
w=NULL
for(i in 1:length(eigs)){
  #print(eigs[i]/sum(eigs))
  w[i]=eigs[i]/sum(eigs)
}

plot(x=1:length(w),y=w,pch=19,main="PCA Eigenvalues")

Redefining x variable to be new dataset without those two points here.

x=cbind(xx,rda.x.data)

colorset=c("#000000","#1f2887","#e31a1c","#1f9eff","#ffb6c1")
names(colorset)=c("reichenowi","preussi",
                  "genderuensis","parvirostris","Unknown")
colScale=scale_color_manual(name="grp",values=colorset)

a=ggplot(x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME,shape=SOURCE))
b=theme(panel.background = element_rect(fill="white",color = "grey50"),
        axis.title.x = element_text(size=20),
        axis.title.y = element_text(size=20),
        axis.text.x = element_text(size=15),
        axis.text.y = element_text(size=15),
        legend.title = element_blank(),
        legend.text = element_text(size=15))
c=geom_point(size=1.5)
d=stat_ellipse()
e=colScale

plot1=a+b+c+d+e
print(plot1)

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Warning: Removed 2 row(s) containing missing values (geom_path).

contrib=rda.x$CA$v

#Isolate most important PC's
contrib2=contrib[,1:3]

xx=rowSums(abs(contrib2))
print(xx[order(xx,decreasing = T)])

##           current_30arcsec_continentality 
##                                 1.0071651 
##           current_30arcsec_PETseasonality 
##                                 0.9241937 
##           current_30arcsec_minTempWarmest 
##                                 0.8017752 
##        current_30arcsec_PETWettestQuarter 
##                                 0.7830133 
##                current_30arcsec_embergerQ 
##                                 0.6886587 
## current_30arcsec_aridityIndexThornthwaite 
##                                 0.6650488 
##          current_30arcsec_thermicityIndex 
##                                 0.6219764 
##    current_30arcsec_climaticMoistureIndex 
##                                 0.6175901 
##           current_30arcsec_maxTempColdest 
##                                 0.6173110 
##          current_30arcsec_growingDegDays0 
##                                 0.6161763 
##          current_30arcsec_growingDegDays5 
##                                 0.6113469 
##        current_30arcsec_PETColdestQuarter 
##                                 0.6033732 
##                current_30arcsec_annualPET 
##                                 0.6008060 
##        current_30arcsec_PETWarmestQuarter 
##                                 0.4926509 
##         current_30arcsec_PETDriestQuarter 
##                                 0.4639028 
##       current_30arcsec_monthCountByTemp10 
##                                 0.3599580

From above, we get that the most important variables for the first three PC’s are:

Continentality
PETseasonality
minTempWarmest
PETWettestQuarter
embergerQ
aridityIndexThornthwaite
thermicity
climaticMoistureIndex
maxTempColdest

Everything after layer 5 plateaus in terms of its contribution. I am repeating the same steps for removing correlated layes here as I did in the other part of the analysis. The code is executed but hidden. Because we have fewer points this iteration, we will use the first six layers.

We can also look at the correlation between layers to determine what should be removed:

I am modeling this section with the same data layers as the previous modeling iteration, due in part to the similarity of these layers in their importance.

monthCountByTemp10 (count, not continuous)
growingDegDays0 (count, not continuous)
growingDegDays5 (count, not continuous)
annualPET (% with wettestquarter)
thermicity (% with minTempWarmest)
climaticMoistureIndex (% with embergerQ)
PETWarmestQuarter (% with PETWettestQuarter)
PETColdestQuarter (% with maxTempColdest)

Ecological Niche Modeling

Current models are restricted to the following area:

## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read

Current Environmental Conditions

# rasterpath="path/to/envirem_africa/Africa_current_2.5arcmin_generic/"

y=list.files(rasterpath,pattern="*.bil")[-c(1,3,6,7,10,11,13,14,16)]

Note that we do not have any aux files at the present time, so I do not need to omit them from the file list.

Minimum Volume Ellipsoids

This code is based on code from Dr. Jorge Soberón. It will create models for all time periods at the same time.

#MAJA function
maja=function(p,m,s)((p-m)%*%s%*%t(p-m))^0.5

#Quantile function
##Double check function? divide by 1 is 1...
##changed to 4, for quantiles
NDquantil=function(nD,level){
  return(round(nD*level))
}

These minimum volume ellipsoids are less sensitive to point density, and do not rely on pseudoabsence data for determining where species do not occur.

genderu=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"),]
reich=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="reichenowi"),]
preuss=x[which(x$SUBSPECIES.SCIENTIFIC.NAME=="preussi"),]

y1=stack(paste0(rasterpath,y))
y=y1

#Create function for individual plot formation
ssp.plot=function(ssp,ssp.text){
  vals=extract(x=y,y=ssp[,2:3])
  vals=na.omit(vals)
  vals=unique(vals)
  #vals=vals[,-10]

  n1=NDquantil(nrow(vals),0.9)

  #for(i in 1:ncol(vals)){print(IQR(vals[,i]))}

  mve1=cov.mve(vals,quantile.use=n1)

  nc=ncell(y)
  
  mu1=matrix(mve1$center,nrow=1)
  s1=mve1$cov
  invs1=solve(s1)

  dT1=matrix(0,ncol=1,nrow=nc)
  
  #Load values for current time period
  
  valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
  
  #Create current models
  
  valsT1=as.matrix(valsT)
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
      
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y),ncol=ncol(y),
         ext=extent(y),resolution=res(y),vals=dT1)
  setwd(paste0(filepath,
               'Ecological Analysis/no-out/raw-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  plot(q)
  
  ext=extract(x=q,y=ssp[,2:3])
  ext2=na.omit(ext)

  #Remove the furthest 20% to reflect issues with plotting in eBird
  #threshold based on these values
  
  ext2=ext2[order(ext2)]
  cutoff=round(0.8*length(ext2))
  ext3=ext2[1:cutoff]
  
  ##The following sets binary presence to everything above 1.5 sd below mean of occurrence
  #n=max(ext2)
  #ND=(round(n*0.95))
  #m=c(NA,NA,NA,0,ND,1,ND,Inf,0)
  #m=matrix(m,ncol=3,byrow=T)
  #rc=reclassify(q,m)
  #y2=y[which(ext>ND),]

  #Everything up to 1.5 sd above the mean included

  #ext2
  sdext=sd(ext3)
  mext=mean(ext3)
  ND=mext+1.5*sdext
  m=c(NA,NA,NA,0,ND,1,ND,Inf,0)
  
  #Current threshold
  ##Used only here; heirarchical for other parts
  
  m=matrix(m,ncol=3,byrow=T)
  rc=reclassify(q,m)
  y2=y[which(ext>ND),]

  #Create color threshold for past models
  #color change for every standard deviation
  
  #New threshold on current conditions, then hierarchical
  #Created here, executed further down
  
  m2=m
  
  m=c(NA,NA,NA,
      0,(mext+(1.5*sdext)),1,
      (mext+(1.5*sdext)),(mext+(3*sdext)),2,
      (mext+(3*sdext)),(mext+(6*sdext)),3,
      (mext+(6*sdext)),(mext+(12*sdext)),4,
      (mext+(12*sdext)),(mext+(24*sdext)),5,
      (mext+(24*sdext)),Inf,6)
  
  m=matrix(m,ncol=3,byrow=T)
  
  species=ssp.text

  if(nrow(y2)!=0){
    setwd(paste0(filepath,
                 "Ecological Analysis/no-out/threshold-mve/"))
    write.csv(y2,file=paste0(species,'_out.csv'),quote=F,row.names=F)
  }
  pathway=paste0(filepath,
                 "Ecological Analysis/no-out/threshold-mve/",
                 species,".asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  points(ssp[,2:3],pch=19,col="black")

  #threshold classify tier
  thresh=reclassify(q,m)
  pathway=paste0(filepath,
                 "Ecological Analysis/no-out/threshold-mve/",
                 species,"-tier.asc",sep="")
  writeRaster(thresh,pathway,overwrite=T)
  plot(thresh)
  rm(thresh)
  
  #Create color bands of how far it is from center
    
  #Holocene
  ##CCSM
  
  rm(valsT1)
  
  y.l=stack(paste0(holopath1,holo1))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,
               'Ecological Analysis/no-out/holo-ccsm-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("Holocene CCSM")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,
                 "Ecological Analysis/no-out/holo-ccsm-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  ##miroc
  
  rm(valsT1)
  
  y.l=stack(paste0(holopath2,holo2))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,
               'Ecological Analysis/no-out/holo-miroc-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("Holocene MIROC")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,
                 "Ecological Analysis/no-out/holo-miroc-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  ##mpi
  
  rm(valsT)
  rm(valsT1)
  
  y.l=stack(paste0(holopath3,holo3))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,
        'Ecological Analysis/no-out/holo-mpi-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("Holocene MPI")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,
                 "Ecological Analysis/no-out/holo-mpi-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  #Last Glacial Maximum
  ##CCSM
  
  rm(valsT1)
  
  y.l=stack(paste0(lgmpath1,lgm1))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,
               'Ecological Analysis/no-out/lgm-ccsm-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("Last Glacial Maximum CCSM")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,
                 "Ecological Analysis/no-out/lgm-ccsm-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  ##miroc
  
  rm(valsT)
  rm(valsT1)
  
  y.l=stack(paste0(lgmpath2,lgm2))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,
        'Ecological Analysis/no-out/lgm-miroc-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("LGM MIROC")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,
                 "Ecological Analysis/no-out/lgm-miroc-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
  
  ##mpi
  
  rm(valsT)
  rm(valsT1)
  
  y.l=stack(paste0(lgmpath3,lgm3))
  nc=ncell(y.l)
  
  valsT=extract(x=y.l,y=seq(from=1,to=nc,by=1))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  valsT1=as.matrix(valsT)
  rm(valsT)
  
  mu2=as.matrix(mu1)
  invs2=as.matrix(invs1)
  
  for(j in 1:nrow(valsT1)){
    dT1[j,1]=maja(valsT1[j,],mu2,invs2)
  }
    
  q=raster(nrow=nrow(y.l),ncol=ncol(y.l),
         ext=extent(y.l),resolution=res(y.l),vals=dT1)
  setwd(paste0(filepath,
               'Ecological Analysis/no-out/lgm-mpi-mve/'))
  #sp1=strsplit(sp,'[.]')[[1]][1]
  sp1=ssp.text
  writeRaster(q,filename=sp1,format='ascii',overwrite=T)
  print("LGM MPI")
  plot(q)

  #Threshold on current conditions
  rc=reclassify(q,m)
  
  rm(q)
  
  species=ssp.text

  pathway=paste0(filepath,
                 "Ecological Analysis/no-out/lgm-mpi-mve/",
                 species,"_threshold.asc",sep="")
  writeRaster(rc,pathway,overwrite=T)

  plot(rc)
  #points(ssp[,2:3],pch=19,col="black")
}

Now, to perform individual iterations of the MVE script.

ssp.plot(ssp=preuss,ssp.text="preussi")

ssp.plot(ssp=reich,ssp.text="reichenowi")

ssp.plot(ssp=genderu,ssp.text="genderuensis")

Niche comparisons of the different populations

nc=ncell(y)
valsT=extract(x=y,y=seq(from=1,to=nc,by=1))
valsT1=as.matrix(valsT)

randomizer=function(data,type,sp.text){
  # set to GISpath
  x=readShapePoly(paste0(GISpath,sp.text,'.shp'))
  
  dT1=matrix(0,ncol=1,nrow=nc)

  nx=nrow(data)
  for(i in 1:100){
    yy=spsample(x=x,n=nx,type=type)
    
    #Alternate method, not as effective
    #yy=randomPoints(mask=x,n=nrow(data),
    #                p=data[,2:3],excludep=T,
    #                cellnumbers=F,tryf=5)
    
    yy2=as.data.frame(coordinates(yy))
    colnames(yy2)=c("Long","Lat")
    yy2$Population=sp.text
    yy2=yy2[,c('Population','Long','Lat')]
    
    vals=extract(x=y,y=yy2[,2:3])
    vals=na.omit(vals)
    vals=unique(vals)
    #vals=vals[,-10]

    n1=NDquantil(nrow(vals),0.9)

    #for(i in 1:ncol(vals)){print(IQR(vals[,i]))}

    mve1=cov.mve(vals,quantile.use=n1)

    mu1=matrix(mve1$center,nrow=1)
    s1=mve1$cov
    invs1=solve(s1)
    
    dT1=matrix(0,ncol=1,nrow=nc)
    
    mu2=as.matrix(mu1)
    invs2=as.matrix(invs1)
      
    for(j in 1:nrow(valsT1)){
      dT1[j,1]=maja(valsT1[j,],mu2,invs2)
    }
    
    q=raster(nrow=nrow(y),ncol=ncol(y),
         ext=extent(y),resolution=res(y),vals=dT1)
    setwd(paste0(filepath,
                 'Ecological Analysis/no-out/random/',sp.text,'/'))
    sp1=sp.text
    write.csv(yy2,file=paste0(sp.text,"_random-",i,'.csv'),quote=F,row.names=F)
    writeRaster(q,filename=paste0(sp.text,"_random-",i),format='ascii',overwrite=T)
    
    #plot(q)
  }
}

randomizer(data=reich,type='random',sp.text="reichenowi")
randomizer(data=preuss,type='random',sp.text="preussi")
randomizer(data=genderu,type='random',sp.text="genderuensis")

Now, to compare niche distributions. First, we must reduce the datasets down to the number of points being used to train the above models.

#restrict to closest 80% of points to centroid for comparisons, just like models

#reichenowi
r.q=raster(paste0(filepath,
                  "Ecological Analysis/no-out/raw-mve/reichenowi.asc"))
reich$r.dist=extract(r.q,reich[,2:3])
hist(reich$r.dist)

reich=reich[order(reich$r.dist),]
r.pt=round(nrow(reich)*0.8)

plot(r.q)
points(reich[1:r.pt,2:3],col="black",pch=19)
points(reich[r.pt:nrow(reich),2:3],col="red",pch=19)

reich2=reich[1:r.pt,]

#preussi
r.q=raster(paste0(filepath,
                  "Ecological Analysis/no-out/raw-mve/preussi.asc"))
preuss$r.dist=extract(r.q,preuss[,2:3])
hist(preuss$r.dist)

preuss=preuss[order(preuss$r.dist),]
r.pt=round(nrow(preuss)*0.8)

plot(r.q)
points(preuss[1:r.pt,2:3],col="black",pch=19)
points(preuss[r.pt:nrow(preuss),2:3],col="red",pch=19)

preuss2=preuss[1:r.pt,]

#genderuensis
r.q=raster(paste0(filepath,
                  "Ecological Analysis/no-out/raw-mve/genderuensis.asc"))
genderu$r.dist=extract(r.q,genderu[,2:3])
colnames(genderu)

##  [1] "SUBSPECIES.SCIENTIFIC.NAME"               
##  [2] "LONGITUDE"                                
##  [3] "LATITUDE"                                 
##  [4] "SOURCE"                                   
##  [5] "current_30arcsec_annualPET"               
##  [6] "current_30arcsec_aridityIndexThornthwaite"
##  [7] "current_30arcsec_climaticMoistureIndex"   
##  [8] "current_30arcsec_continentality"          
##  [9] "current_30arcsec_embergerQ"               
## [10] "current_30arcsec_growingDegDays0"         
## [11] "current_30arcsec_growingDegDays5"         
## [12] "current_30arcsec_maxTempColdest"          
## [13] "current_30arcsec_minTempWarmest"          
## [14] "current_30arcsec_monthCountByTemp10"      
## [15] "current_30arcsec_PETColdestQuarter"       
## [16] "current_30arcsec_PETDriestQuarter"        
## [17] "current_30arcsec_PETseasonality"          
## [18] "current_30arcsec_PETWarmestQuarter"       
## [19] "current_30arcsec_PETWettestQuarter"       
## [20] "current_30arcsec_thermicityIndex"         
## [21] "PC1"                                      
## [22] "PC2"                                      
## [23] "PC3"                                      
## [24] "PC4"                                      
## [25] "PC5"                                      
## [26] "PC6"                                      
## [27] "PC7"                                      
## [28] "PC8"                                      
## [29] "PC9"                                      
## [30] "PC10"                                     
## [31] "PC11"                                     
## [32] "PC12"                                     
## [33] "PC13"                                     
## [34] "PC14"                                     
## [35] "PC15"                                     
## [36] "PC16"                                     
## [37] "r.dist"

hist(genderu$r.dist)

genderu=genderu[order(genderu$r.dist),]
r.pt=round(nrow(genderu)*0.8)

plot(r.q)
points(genderu[1:r.pt,2:3],col="black",pch=19)
points(genderu[r.pt:nrow(genderu),2:3],col="red",pch=19)

genderu2=genderu[1:r.pt,]

And now to perform the tests.

filepath2=paste0(filepath,"Ecological Analysis/no-out/random/")
splist=list.files(filepath2)

#comparisons=matrix(nrow=100,ncol=2,data=NA)

truecomps=-99
truelists=matrix(nrow=100,ncol=1,data=-99)

for(i in 1:length(splist)){
  sp=splist[i]
  splist2=splist[-i]
  
  null.x=raster(paste0(filepath,
                       "Ecological Analysis/no-out/raw-mve/",sp,".asc"))
  comparisons=matrix(nrow=100,ncol=2,data=NA)
 
  comparisons=as.data.frame(comparisons)
  
  compvals=NULL
  
  for(j in 1:length(splist2)){
    comparelist=list.files(paste0(filepath2,splist2[j],"/"),
                           pattern="*.asc")
    
    true2=raster(paste0(filepath,
                        "Ecological Analysis/no-out/raw-mve/",
                        splist2[j],".asc"))
      
    compvals=NULL
  
    for(k in 1:length(comparelist)){
     rando=raster(paste0(filepath2,splist2[j],"/",comparelist[k]))
     compvals[k]=nicheOverlap(x=null.x,y=rando,stat="D")
    }
    
    comparisons[,j]=compvals
    colnames(comparisons)[j]=paste0(splist[i],"-",splist2[j])
    
    truecomps=c(truecomps,
                nicheOverlap(x=null.x,y=true2,stat="D"))
    
  }
  
  truelists=cbind(truelists,comparisons)
}

truecomps2=t(as.data.frame(truecomps))
colnames(truecomps2)=colnames(truelists)

fullcomps=rbind(truecomps2,truelists)

write.csv(fullcomps,file=paste0(filepath,
                                "Schoener-first-row-true_no-out.csv"),
          quote=F,row.names=F)

We can now look at and compare the niche models derived from the MVE envelopes of where these species occur.

x=read.csv(paste0(filepath,"Schoener-first-row-true_no-out.csv"))
x=x[,-1]
head(x)

##   genderuensis.preussi genderuensis.reichenowi preussi.genderuensis
## 1            0.7309587               0.7937367            0.7309587
## 2            0.7617447               0.7717766            0.7252408
## 3            0.7467540               0.7633932            0.6425719
## 4            0.7207890               0.7765976            0.6421447
## 5            0.7433229               0.7750977            0.8138083
## 6            0.7210789               0.7784997            0.5783603
##   preussi.reichenowi reichenowi.genderuensis reichenowi.preussi
## 1          0.6985322               0.7937367          0.6985322
## 2          0.7824307               0.5872728          0.7628152
## 3          0.7780641               0.5528911          0.7304288
## 4          0.7759432               0.6623748          0.7123873
## 5          0.7661090               0.6155956          0.7505732
## 6          0.7628315               0.6677287          0.7309420

We know that the first row is the “true” comparisons. We can therefore compare these to the entire distribution of the comparisons.

datax=matrix(data=NA,nrow=600,ncol=2)
datax=as.data.frame(datax)

colnames(datax)=c("ID","Value")

trues=x[1,c(1,2,6)]

datax$ID[1:100]="genderuensis.preussi"
datax$ID[101:200]="genderuensis.reichenowi"
datax$ID[201:300]="preussi.genderuensis"
datax$ID[301:400]="preussi.reichenowi"
datax$ID[401:500]="reichenowi.genderuensis"
datax$ID[501:600]="reichenowi.preussi"

datax$Value[1:100]=x[-1,1]
datax$Value[101:200]=x[-1,2]
datax$Value[201:300]=x[-1,3]
datax$Value[301:400]=x[-1,4]
datax$Value[401:500]=x[-1,5]
datax$Value[501:600]=x[-1,6]

datax$ID=as.factor(datax$ID)
datax$Value=as.numeric(datax$Value)

We have created a new data frame that is easier to manipulate in ggplot to look at the results. We can now go through things iteratively.

gen.preus=datax[which(datax$ID=='genderuensis.preussi'|datax$ID=='preussi.genderuensis'),]

inter=trues$genderuensis.preussi

a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
  
print(a+b+b.5+c+d)

gen.preus=datax[which(datax$ID=='genderuensis.reichenowi'|datax$ID=='reichenowi.genderuensis'),]

inter=trues$genderuensis.reichenowi

a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
  
print(a+b+b.5+c+d)

gen.preus=datax[which(datax$ID=='reichenowi.preussi'|datax$ID=='preussi.reichenowi'),]

inter=trues$reichenowi.preussi

a=ggplot(data=gen.preus,aes(x=Value,fill=ID))
b=scale_x_continuous(limits=c(0,1))
b.5=geom_density(alpha=0.6)
c=theme_classic()
d=geom_vline(xintercept=inter,colour="black",linetype="dashed")
  
print(a+b+b.5+c+d)

We are going to calculate \(P\) values for these distributions and compare them to the test statistic.

dist=unique(datax$ID)

trues=x[1,]

for(i in 1:length(dist)){
  distx=dist[i]
  print(paste0("Testing ",distx))
  datadist=datax[which(datax$ID==distx),]
  
  xbar=trues[,which(colnames(trues)==distx)]
  
  mu=mean(datadist$Value)
  sigma=sd(datadist$Value)
  n=nrow(datadist)
  
  z=(xbar-mu)/(sigma/sqrt(n))
  
  lowcrit=qnorm(p=0.025,mean=mu,sd=sigma)
  hicrit=qnorm(p=0.975,mean=mu,sd=sigma)
  
  if(xbar<lowcrit){
    print("Test statistic below low critical value.")
    print(paste0(lowcrit,"; statistic = ",xbar))
    }
  if(xbar>hicrit){
    print("Test statistic above high critical value.")
    print(paste0(hicrit,"; statistic = ",xbar))
    }
  
  print(paste0("P value for ",distx,
               " = ",pnorm(xbar,
                           mean=mu,sd=sigma)))
  print(paste(" "))
  print(paste(" "))
}

## [1] "Testing genderuensis.preussi"
## [1] "P value for genderuensis.preussi = 0.2387014523214"
## [1] " "
## [1] " "
## [1] "Testing genderuensis.reichenowi"
## [1] "Test statistic above high critical value."
## [1] "0.790069175213148; statistic = 0.793736744682844"
## [1] "P value for genderuensis.reichenowi = 0.990289874434537"
## [1] " "
## [1] " "
## [1] "Testing preussi.genderuensis"
## [1] "P value for preussi.genderuensis = 0.570069422343806"
## [1] " "
## [1] " "
## [1] "Testing preussi.reichenowi"
## [1] "Test statistic below low critical value."
## [1] "0.753354646345893; statistic = 0.698532194352503"
## [1] "P value for preussi.reichenowi = 1.36471563895498e-10"
## [1] " "
## [1] " "
## [1] "Testing reichenowi.genderuensis"
## [1] "Test statistic above high critical value."
## [1] "0.720624997483034; statistic = 0.793736744682844"
## [1] "P value for reichenowi.genderuensis = 0.999835050352036"
## [1] " "
## [1] " "
## [1] "Testing reichenowi.preussi"
## [1] "P value for reichenowi.preussi = 0.0397011642748739"
## [1] " "
## [1] " "

One last thing, visualizing the PC plots from the ENVIREM extracts.

x=read.csv(paste0(filepath,
                  "Ecological Analysis/envirem_extracts_no-outlier_PCA.csv"))

a=ggplot(data=x,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()

print(a+b+c)

a=ggplot(data=x,aes(x=PC3,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()

print(a+b+c)

for(i in 4:19){
  x[,i]=as.numeric(x[,i])
  
  nombre=colnames(x)[i]
  
  a=ggplot(data=x,aes(y=x[,i],x=SUBSPECIES.SCIENTIFIC.NAME))
  b=geom_boxplot(notch=T)
  c=theme_classic()
  d=ggtitle(paste(nombre))
  
  print(a+b+c+d)
}

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.
## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

## notch went outside hinges. Try setting notch=FALSE.

Lastly, I am going to do t-tests comparing the two Cameroonian populations to each other to see if they differ significantly in any aspects.

x2=x[x$SUBSPECIES.SCIENTIFIC.NAME=="genderuensis"|x$SUBSPECIES.SCIENTIFIC.NAME=="preussi",1:19]

rda.x=rda(x2[,-c(1:3)],scale=T)
rda.x.data=rda.x$CA$u

x3=cbind(x2,rda.x.data)

a=ggplot(x3,aes(x=PC1,y=PC2,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()

print(a+b+c)

Now for iterative tests. I’m assuming random distribution, but unequal population sizes.

for(i in 4:19){
  names=unique(x3$SUBSPECIES.SCIENTIFIC.NAME)
  pop1=x3[x3$SUBSPECIES.SCIENTIFIC.NAME==names[1],i]
  pop2=x3[x3$SUBSPECIES.SCIENTIFIC.NAME==names[2],i]
  print(colnames(x3)[i])
  z=t.test(x=pop1,y=pop2,c="two.sided",conf.level=0.95)
  
  print(z)
  
  z2=wilcox.test(x=pop1,y=pop2,alternative="two.sided",conf.level=0.95)
  
  print(z2)
}

## [1] "SOURCE"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 4.5826, df = 7, p-value = 0.002536
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3629975 1.1370025
## sample estimates:
## mean of x mean of y 
##      1.75      1.00

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 189, p-value = 1.283e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_annualPET"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 8.5891, df = 23.927, p-value = 8.991e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  274.2719 447.8102
## sample estimates:
## mean of x mean of y 
##  1640.306  1279.265 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 213, p-value = 5.948e-07
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_aridityIndexThornthwaite"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 2.3178, df = 16.784, p-value = 0.03336
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   1.153794 24.815930
## sample estimates:
## mean of x mean of y 
##  67.38375  54.39889 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 167, p-value = 0.01933
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_climaticMoistureIndex"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = -10.515, df = 12.751, p-value = 1.197e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5640744 -0.3714812
## sample estimates:
##  mean of x  mean of y 
## -0.0200000  0.4477778

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 0, p-value = 2.37e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_continentality"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 1.3196, df = 9.5331, p-value = 0.2178
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1770614  0.6830799
## sample estimates:
## mean of x mean of y 
##  2.793750  2.540741

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 139, p-value = 0.2291
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_embergerQ"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = -6.8632, df = 17.381, p-value = 2.442e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -434.0855 -230.2137
## sample estimates:
## mean of x mean of y 
##  364.1900  696.3396 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 11, p-value = 1.598e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_growingDegDays0"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 7.504, df = 32.3, p-value = 1.436e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  19237.63 33565.54
## sample estimates:
## mean of x mean of y 
##  99020.25  72618.67 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 216, p-value = 8.498e-08
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_growingDegDays5"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 7.3281, df = 32.072, p-value = 2.448e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  19218.00 34012.72
## sample estimates:
## mean of x mean of y 
##  99020.25  72404.89 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 216, p-value = 8.498e-08
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_maxTempColdest"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 7.9673, df = 30.754, p-value = 5.705e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  51.57576 87.08165
## sample estimates:
## mean of x mean of y 
##  261.6250  192.2963

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 216, p-value = 2.395e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_minTempWarmest"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 6.0773, df = 32.098, p-value = 8.575e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  36.71873 73.73498
## sample estimates:
## mean of x mean of y 
##  184.3750  129.1481

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 208, p-value = 9.236e-05
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_monthCountByTemp10"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 1.776, df = 26, p-value = 0.08745
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1807469  2.4770432
## sample estimates:
## mean of x mean of y 
##  12.00000  10.85185

## Warning in wilcox.test.default(x = pop1, y = pop2, alternative = "two.sided", :
## cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  pop1 and pop2
## W = 120, p-value = 0.3523
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETColdestQuarter"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 9.8209, df = 33, p-value = 2.541e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  18.43850 28.07418
## sample estimates:
## mean of x mean of y 
## 120.61375  97.35741 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 216, p-value = 8.498e-08
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETDriestQuarter"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 6.8422, df = 15.391, p-value = 4.867e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  24.39796 46.40584
## sample estimates:
## mean of x mean of y 
##  147.6438  112.2419 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 208, p-value = 5.693e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETseasonality"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 4.8474, df = 9.021, p-value = 0.0009055
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  256.4243 704.8808
## sample estimates:
## mean of x mean of y 
## 1445.8500  965.1974 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 193, p-value = 0.0003421
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETWarmestQuarter"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 7.3659, df = 21.019, p-value = 2.998e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  23.35025 41.72067
## sample estimates:
## mean of x mean of y 
##  151.7325  119.1970 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 212, p-value = 1.02e-06
## alternative hypothesis: true location shift is not equal to 0
## 
## [1] "current_30arcsec_PETWettestQuarter"
## 
##  Welch Two Sample t-test
## 
## data:  pop1 and pop2
## t = 12.598, df = 29.411, p-value = 2.222e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  19.90955 27.62137
## sample estimates:
## mean of x mean of y 
## 122.49250  98.72704 
## 
## 
##  Wilcoxon rank sum test
## 
## data:  pop1 and pop2
## W = 216, p-value = 8.498e-08
## alternative hypothesis: true location shift is not equal to 0

Example of divergence:

a=ggplot(data=x,aes(x=current_30arcsec_PETWettestQuarter,
                    y=current_30arcsec_embergerQ,colour=SUBSPECIES.SCIENTIFIC.NAME))
b=geom_point()
c=theme_classic()

print(a+b+c)

Modeling for past climates

holo.ccsm=paste0(filepath,"Ecological Analysis/no-out/holo-ccsm-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/no-out/holo-ccsm-mve/"),
                            pattern="*.asc"))

holo.miroc=paste0(filepath,"Ecological Analysis/no-out/holo-miroc-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/no-out/holo-miroc-mve/"),
                            pattern="*.asc"))

holo.mpi=paste0(filepath,"Ecological Analysis/no-out/holo-mpi-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/no-out/holo-mpi-mve/"),
                            pattern="*.asc"))

lgm.ccsm=paste0(filepath,"Ecological Analysis/no-out/lgm-ccsm-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/no-out/lgm-ccsm-mve/"),
                            pattern="*.asc"))

lgm.miroc=paste0(filepath,"Ecological Analysis/no-out/lgm-miroc-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/no-out/lgm-miroc-mve/"),
                            pattern="*.asc"))

lgm.mpi=paste0(filepath,"Ecological Analysis/no-out/lgm-mpi-mve/",
                 list.files(paste0(filepath,
                                   "Ecological Analysis/no-out/lgm-mpi-mve/"),
                            pattern="*.asc"))

Now we have a list of files for each scenario in the same order for each situation. Now we have to average these together and save them.

Holocene Visualizations

#Plot preussi

#avg holo

holo=stack(holo.ccsm[4],holo.miroc[4],holo.mpi[4])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/holo-all-avg/preussi-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

holo=stack(holo.ccsm[3],holo.miroc[3],holo.mpi[3])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/holo-all-avg/preussi-threshold-avg.asc"),
            overwrite=T)

#Plot genderuensis

#avg holo

holo=stack(holo.ccsm[2],holo.miroc[2],holo.mpi[2])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/holo-all-avg/genderuensis-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

holo=stack(holo.ccsm[1],holo.miroc[1],holo.mpi[1])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/holo-all-avg/genderuensis-threshold-avg.asc"),
            overwrite=T)

#Plot reichenowi

#avg holo

holo=stack(holo.ccsm[6],holo.miroc[6],holo.mpi[6])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/holo-all-avg/reichenowi-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

holo=stack(holo.ccsm[5],holo.miroc[5],holo.mpi[5])

y=mean(holo)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/holo-all-avg/reichenowi-threshold-avg.asc"),
            overwrite=T)

Last Glacial Maximum Visualizations

#Plot preussi

#avg lgm

lgm=stack(lgm.ccsm[4],lgm.miroc[4],lgm.mpi[4])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/lgm-all-avg/preussi-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

lgm=stack(lgm.ccsm[3],lgm.miroc[3],lgm.mpi[3])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/lgm-all-avg/preussi-threshold-avg.asc"),
            overwrite=T)

#Plot genderuensis

#avg lgm

lgm=stack(lgm.ccsm[2],lgm.miroc[2],lgm.mpi[2])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/lgm-all-avg/genderuensis-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing
  
lgm=stack(lgm.ccsm[1],lgm.miroc[1],lgm.mpi[1])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/lgm-all-avg/genderuensis-threshold-avg.asc"),
            overwrite=T)

#Plot reichenowi

#avg lgm

lgm=stack(lgm.ccsm[6],lgm.miroc[6],lgm.mpi[6])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/lgm-all-avg/reichenowi-avg.asc"),
            overwrite=T)

#averaging threshold distance; not as scientific but for visualizing

lgm=stack(lgm.ccsm[5],lgm.miroc[5],lgm.mpi[5])

y=mean(lgm)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/lgm-all-avg/reichenowi-threshold-avg.asc"),
            overwrite=T)

Path of most likely colonization

Each subspecies tells us something about the colonization path across Africa. We can similar average these scenarios together to understand where, exactly, the species most likely cross Africa.

holo=paste0(filepath,
            "Ecological Analysis/no-out/holo-all-avg/",
            list.files(paste0(filepath,
                              "Ecological Analysis/no-out/holo-all-avg/")))

lgm=paste0(filepath,"Ecological Analysis/no-out/lgm-all-avg/",
            list.files(paste0(filepath,"Ecological Analysis/no-out/lgm-all-avg/")))

HOLOCENE

#Average all occurrence

lgm2=stack(holo[1],holo[3],holo[5])

y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/holo-all-avg/holo-avg.asc"),
            overwrite=T)

#Average all threshold

lgm2=stack(holo[2],holo[4],holo[6])

y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/holo-all-avg/holo-threshold-avg.asc"),
            overwrite=T)

LGM

#Average all occurrence

lgm2=stack(lgm[1],lgm[3],lgm[5])

y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/lgm-all-avg/all-avg.asc"),
            overwrite=T)

#Average all threshold

lgm2=stack(lgm[2],lgm[4],lgm[6])

y=mean(lgm2)
plot(y)
writeRaster(y,paste0(filepath,
                     "Ecological Analysis/no-out/lgm-all-avg/all-threshold-avg.asc"),
            overwrite=T)

Phylogeography of Cinnyris reichenowi

21 May 2020

Introduction

Required Software and Packages

Genetic Data Cleaning

Samples

East Africa

Rwenzori Mountains:

Kahuzi-Biega Mountains:

Bwindi Highlands:

Rwanda-Burundi Highlands:

Mt. Kabobo:

West Africa

Bioko Island:

Mt. Cameroon:

Bamenda Highlands:

Xeric Interior Cameroon:

Color Palatte

Cleaning and Processing UCEs

Phylogeographic Relationships

Harvesting Single Nucleotide Polymorphisms

Zarza et al. (2016) Pipeline

Notes about this Section

Pipeline

Convert .012 to alternative formats

ABBA/BABA Gene Flow

Genetic Analyses in LEA

Loading LEA

SNMF Analyses

PCA Analyses

Discriminant Function Analysis of Groups

Morphological Analyses

Male Sunbirds

Female Sunbirds

Cinnyris regius

Ecological Analyses

Adding specimen data from sparse regions

Extracting environmental variables

Covariation at 2.5 ArcMinutes

Ecological Niche Modeling

Current Environmental Conditions

Minimum Volume Ellipsoids

Niche comparisons of the different populations

Holocene Visualizations

Last Glacial Maximum Visualizations

Path of most likely colonization

HOLOCENE

LGM

Outlier Identification

Ecological Niche Modeling

Current Environmental Conditions

Minimum Volume Ellipsoids

Niche comparisons of the different populations

Modeling for past climates

Holocene Visualizations

Last Glacial Maximum Visualizations

Path of most likely colonization

HOLOCENE

LGM

Genetic Analyses in `LEA`

Loading `LEA`