A whole genome atlas of 81 Psilocybe genomes as a resource for psilocybin production.

The Psilocybe genus is well known for the synthesis of valuable psychoactive compounds such as Psilocybin, Psilocin, Baeocystin and Aeruginascin. The ubiquity of Psilocybin synthesis in Psilocybe has been attributed to a horizontal gene transfer mechanism of a ~20Kb gene cassette. A recently published highly contiguous reference genome derived from long read single molecule sequencing has underscored interesting variation in this Psilocybin synthesis gene cassette. This reference genome has also enabled the shotgun sequencing of spores from many Psilocybe strains to better catalog the genomic diversity in the Psilocybin synthesis pathway. Here we present the de novo assembly of genomes of 81 Psilocybe genomes compared to the P.Envy reference genome. Surprisingly, the genomes of Psilocybe galindoi , Psilocybe tampanensis and Psilocybe azurescens lack sequence coverage over the previously described Psilocybin synthesis pathway but do demonstrate amino acid sequence homology to an alternative pathway and may illuminate previously proposed convergent evolution of Psilocybin synthesis.


Introduction
Psilocybin has recently been awarded breakthrough drug status by the FDA (Tullis 2021). Psilocybin has also been shown to be a safe alternative to traditional Selective Serotonin Reuptake Inhibitors (SSRIs) for the treatment of depression (Griffiths et al. 2016;Ross et al. 2016;Bogenschutz et al. 2018;Carhart-Harris et al. 2021;Davis et al. 2021). SSRI's have also been shown to be effective at reducing SARs-CoV-2 viral load (Schloer et al. 2020;Creeden et al. 2021;Dechaumes et al. 2021;Schloer et al. 2021;Zimniak et al. 2021). The mechanism of the viral interference is hypothesized to be related to the co-expression of ACE2 and Dopa Decarboxylase (DDC) which catalyzes the synthesis of serotonin (Attademo and Bernardini 2021). It is possible psilocybin's serotonin agonism will show promise in COViD treatment and prevention in the future, thus understanding the biological mechanism of synthesis is of increasing importance.
Two models have been proposed for psilocybin production in Psilocybe. Horizontal gene transfer has been thoroughly described by Reynolds et al. and Fricke et al., while convergent evolution has been proposed by Awan et al. (Fricke et al. 2017;Awan 2018;Reynolds et al. 2018). Given the broad geographic distribution of psilocybe producing mushrooms and the history of human cultivation and selection for psilocybin producing strains, these models may not be mutually exclusive.
To address this question, we whole genome sequenced 81 Psilocybe spore samples to assess the sequence coverage over the previously well characterized Psilocybe synthesis cassette.

Results
In a whole genome sequencing survey of 81 Psilocybe genomes we noticed a lack of sequence coverage over the Psilocybe gene cluster described by Reynolds et al. for 5 of the 81 Psilocybin producing genomes (3 Psilocybe tampanensis, 1 Psilocybe galindoi, 1 Psilocybe azurescens). These genomes also exhibited genome-wide poor read mapping efficiency to the P.Envy reference genome with the exception of the mitochondria and ITS regions. These non-conical psilocybe synthesizing genomes share 99%, 100%, and 91% ITS sequence identity to a Psilocybe tampanensis sequence described by Rockefeller et al. (NCBI accession number: MH220315.1). The sequence from Rockefeller et al. was obtained in the wild and verified with photography of morphological features. The sequences described in this study were derived from spores that can be legally sold for taxonomy purposes but cannot be legally cultured to obtain morphological or chemical verification.
One other genus was labelled as Panaeolus copelandia but delivered two 100% identical ITS sequence to Psilocybe cubensis and Aspergillus fumigatus. This samples was omitted from further analysis. Sequencing and variant calling statistics are displayed in Figure 1 and Supplement Table 1. Despite the low read mapping rates for Psilocybe tampanensis, Psilocybe galindoi and Psilocybe azurescens, the de novo assemblies of these genomes have high BUSCO completeness scores (93%) and matching ITS sequences suggesting they are indeed Psilocybe but differ enough at the species level to produce low cross species read mapping rates.
An alternative hypothesis is that these libraries are metagenomic or contaminated with non-Psilocybe fungi but still provide enough sequence coverage of the ITS region due to its high copy number. Contaminated libraries are usually detectable with bimodal sequencing coverage as its rare for the organisms to be equimolar. All 5 non-cubensis Psilocybe have very uniform sequence coverage across the contigs and high BUSCO completion scores with strong ITS sequence implying clean assemblies.
In combination, these data demonstrate close relatedness of Psilocybe tampanensis and Psilocybe galindoi but distant relatedness to the Psilocybe cubensis (P.Envy) reference genome. Given the known psilocybin production in these alternative Psilocybe species, these data also imply an alternative synthesis pathway must exist in these species as suggested by Awan et al.

Sequence Coverage Analysis
Uniform sequence coverage and high read mapping rates were observed over most of the of Psilocybe P.Envy genome for most of the strains (50 of the 81 are displayed in Figure 2 and Figure 3). A few notable higher coverage exceptions were the ITS region on Scaffold_9 and the mitochondrial genome represented by Scaffold_26. This is not a surprising result given the increased copy number of ITS and Mitochondrial sequences in fungal genomes.
A few smaller, repeat rich contigs (scaffolds 27-31) demonstrated more variable coverage across the Psilocybe genus. The ITS region on Scaffold_9 appears to be a tandem repeat which is collapsed in the reference sequence. Scaffolds 27, 28, 31, and 32 have telomeric ends and are small (171Kb, 117Kb, 69Kb, 33Kb). While these smaller contigs have variable coverage in Psilocybe, it is important to understand they represent less than 2% of the genome. Scaffolds 29 and 30 lack telomeric sequence and are 74Kb and 69Kb respectively. Three scaffolds have telomeric sequences on both ends. (Scaffold_2 = 4.3Mb, Scaffold_3 = 4.1Mb, Scaffold_9 = 2.3Mb) suggesting these are tip to tip chromosome assemblies.
The genomes most diverged from the P.Envy reference genome (P. tampanensis, P.galindoi, P.azurescens) exhibited very low read mapping rates. This is demonstrated by P.azurescens (OR-Coast) in Figure 3. This genome-wide low mapping rate was also observed in the Psilocybin production cassette on Scaffold_7 (Figure 4 and Figure 5).
Despite the low read mapping rates to P.Envy, P.tampanensis assembled into 23Kb N50 genome with a 93% BUSCO completeness score and a 99% identical match to Rockefeller et al. ITS sequence for Psilocybe tampanensis. These low mapping rates are reflective of a highly diverged species at the nucleotide level but to exhaust the exploration of psilocybin production, amino acid level homology was further explored.

Figure 2.
Read mapping efficiency of species specific reads mapped to the P.Envy reference genome. Due to size limitations, only 50 genomes are presented. The remaining read mapping results can be seen in Supplement Table 1.

Figure 3.
Coverage plot of 50 genomes mapped to P.Envy reference. Coverage (log) on the Y axis and scaffold 1-32 on the X axis. Scaffold labeled with a T have a single ~200bp terminal telomeric sequence. Scaffolds with TT have telomeric sequence on both ends and are assumed to be tip to tip telomeric chromosomes. Scaffolds are ordered from largest to smallest from left to right. P.azurascens (OR-Coast) has low mapping frequency due to divergence.  To search for pathway redundancy, we utilized tBLASTn to identify alternative Psilocybin producing enzymes in the reference P.Envy genome. PsiK exists in the conical cluster on scaffold_7 but also has a close homolog on scaffold_1. The copy in the conical cluster on scaffold_7 has a unique SnpEFF high impact variant at p.Arg173Gly that only exists in P.Envy ( Figure 6B). The copy on Scaffold_1 is missing the 5' Exon in P.tampanensis, P.galindoi and P.azurenscens, and is significantly diverged. The most closely related scaffold_7 PsiK amino acid sequences are seen in a multiple sequence alignment seen in Figure 6A. Further cloning and expression is required to confirm if this additional copy provides any pathway redundancy. Of interest, these alternative alignments are only partially clustered with PsiM and often PsiM is located in the middle of 80Kb contigs in the absence of other pathway related genes (Table1). This demonstrates a non-clustered Psilocybin synthesis pathway in the non-Psilocybe cubensis strains. Figure 6A. Multiple sequence alignment using CLUSTALW of the most closely related PsiK genes in P.tampanensis, P.galindoi and P.azurenscens. The R173G mutation is also unique to P.envy when considering non-cubensis species. Table 1. tBLASTn search of P.tampanensis, P.galindoi and P.azurenscens for PsiK, PsiH, PsiD and PsiM reveals a fragmented Psilocybin synthesis cluster with PsiM most often being on large but independent contigs disrupting the contiguity of the Psilocybin synthesis cluster. Figure 6B. Variation map of PsiK, PsiD, PsiM, PsiH, PsiR -Scaffold_7. Grey is no alignment, Blue is reference allele, Aqua is low impact variant and Red represent High Impact SNPs as according to SNPeff.
We constructed a phylogenetic tree (Figure 7) of the various genomes. Many of the genomes share strain names but were acquired from different spore banks (PS vs SW vs ITW vs Mush). These phylogenetically cluster together as expected. Many samples were sequenced multiple times as biological replicas and are represented in the sample nomenclature as Name_replica_vendor. Panaeolus Copelandia samples has an Psilocybe cubensis ITS sequence is likely a mislabeled sample.

Conclusions
These data support both Reynolds et al. and Awan et al. in that Psilocybin synthesis appears to have evolved both a conserved ~20Kb cassette seen in many P.cubensis fungi but also a less clustered pathway in regards to a non-contiguous PsiM gene that still needs further characterization and scrutiny. Alternative psilocybin production has also been suggested in the cicada infecting Massospora fungi (Boyce et al. 2019). Taken together, these data underscore the need for further exploration of psilocybin genomics for alternative or redundant synthesis pathways. Given the divergence of the other Psilocybin producing mushrooms, simply mapping reads from other Psilocybe species to the P.Envy reference genome can be misleading. Until more complete references exist for the P.tampanensis, P.azurescens and P.galindoi, searching for conserved amino acid sequences more tolerant to synonymous DNA mutations will be required. The genetic variation in the species is substantial and these genomes will help to further correlate genotype to chemotype associations for this nascent field.

DNA Isolation
Spores were obtained from 4 vendors (Sporeworks.com, Premiumspores.com, Mushroom.com and InnoculateTheWorld.com). Spore preparations utilized a modified DNA isolation procedure described in McKernan et al. Briefly 1.4ml of spores were centrifuged, decanted and resuspended in 200ul of ddH20. 25 µl of a Thaumatin-like protein was added and incubated at 37°C (Medicinal Genomics part #420206) for 30 minutes. 12.5 µl of MGC lysis buffer was added and incubated at 65°C for 30 minutes with 9 steel beads. Vortexing was performed every 7 minutes. Lysed sample were micro-centrifuged and 200µl of supernatant was aspirated and added to 250µl of Medicinal Genomics (MGC) binding buffer (MGC part# 420001) for magnetic bead isolation. The samples were incubated with the MGC magnetic bead mixture for 10 minutes, magnetically separated and washed two times with 70% ethanol. The beads were dried at 37°C for 5 minutes to remove excess ethanol and eluted with 25µl of ddH20 Library construction for whole genome sequencing. Fragmentation Genomic DNA (gDNA) was quantified with a Qubit (Thermo Fisher Scientific) and normalized to reflect 4-8ng/µl in 13µl of TE buffer. Libraries were generated using enzymatic fragmentation with the NEB Ultra II kits (NEB part # E7103). Briefly, 3.5µl of 5X NEB fragmentation buffer and 1µl of Ultra II fragmentation enzyme mix are added to 13µl of DNA. This reaction was tipmixed 10 times, vortexed, and quickly centrifuged. Fragmentation was performed in a BioRad CFX96 thermocycler at 3.5 minutes at 37°C, 30 minutes at 65°C. The reaction was kept on ice until ready for adaptor ligation.

Adaptor ligation
The master mix for ligation was prepared on ice using 0.75µl of Agilent SureSelect Adaptor Oligo Mix, 0.5µl of ddH20, 15µl of New England Biolabs (NEB) Ultra II Ligation Master Mix, 0.5µl of NEB Ligation enhancer for a total reaction volume of 16.75µl.
Ligation was performed by the addition of 16.75µl of ligation master mix to the 17.5µl Fragmentation/End Prep DNA reaction mixture, incubate for 15 minutes at 20°C. To purify excess adaptors and adaptor dimers, vortex at room temp AMPure XP beads (Beckman Coulter #A63881) to resuspend and add 16μl (approximately 0.45X) of resuspended AMPure XP beads to the ligation reactions. Mix well by pipetting up and down at least 10 times. Incubate the mixture for 5 minutes at 25°C. Put the PCR plate on an appropriate magnetic stand (Medicinal Genomics #420202) to separate beads from supernatant. After the solution is clear (about 5 minutes), carefully remove and discard the supernatant. Be careful not to disturb the beads that contain DNA targets. Wash the magnetic beads by adding 200μl of 70% ethanol to the PCR plate while in the magnetic stand. Incubate at room temperature for 30 seconds, and then carefully remove and discard the supernatant. Repeat the ethanol wash once for a total of 2 washes. Remove trace amounts of ethanol from the beads. Air dry beads for ~ 7 minutes while the PCR plate is on the magnetic stand with the lid open. Remove PCR plate from magnet and elute DNA target from beads into 10μl of H20, transfer 9µl cleaned DNA to a fresh well.

PCR amplification
Add 12.5µl 2x NEBNext Q5 Hot Start Master Mix (New England Biolabs #M0492S) to 9µl ligated DNA, then add 3.5µl NEB 8bp index primer/universal primer. The reaction was run the cycling program using 98°C for 30 seconds as an initial denaturization step. 6 cycles of denaturization, annealing and extension were performed cycling between 98°C for 10 seconds and 65°C for 75 seconds. A final 65°C for 5 minutes was performed with a final 4°C forever step.

PCR reaction cleanup
Resuspend room temp AMPure XP beads with a brief vortex. Add 15μl of resuspended AMPure XP beads to the PCR reactions (~ 25μl). Mix well by pipetting up and down at least 10 times. Incubate mixture for 5 minutes at room temperature. Put the PCR plate on an appropriate magnetic stand to separate beads from supernatant. After the solution is clear (about 5 minutes), carefully remove and discard the supernatant. Be careful not to disturb the beads that contain DNA targets. Add 200μl of 70% ethanol to the PCR plate while in the magnetic stand. Incubate at room temperature for 30 seconds, and then carefully remove and discard the supernatant. Repeat the ethanol wash once. Air dry beads for 7 minutes while the PCR plate is on the magnetic stand with the lid open. Elute DNA target from beads into 15µl of nuclease free H2O, transfer 15µl into a fresh well.

Sample quality control and sequencing
Libraries were evaluated on an Agilent Tape Station prior to pooling for Illumina sequencing. Sequencing was performed by GeneWiz, Cambridge MA. A total of 690 million paired reads (2 x 150bp) were generated, averaging over 12 million read pairs per sample and a total sequence of 207Gb.

Read Mapping parameters
Reads were mapped to the P.Envy reference and to their own assemblies to generate BAM files and coverage statistics using bwa-mem (version 0.7.17-r1188).