2024-03-29T08:57:23Z
https://zenodo.org/oai2d
oai:zenodo.org:35100
2020-01-20T17:11:08Z
user-intercrossing
openaire
Alekhin, Alexey
2014-10-08
<p>Bio4j bioinformatics graph database is modular and customizable, allowing you to import just the data you are interested in. There exist, though, dependencies among these resources that must be taken into account and that's where Statika enters the picture; a set of Scala libraries which allows you to declare dependencies between components of any modular system and track their correctness using Scala type system. Thanks to this, it's possible now to deploy only selected components of the integrated data sets, with Amazon Web Services deployments on hardware specifically configured for them.</p>
<p> </p>
<p> </p>
<div> </div>
https://archive.fosdem.org/2014/schedule/event/graphdevroom_bio4j_1/
https://doi.org/10.5281/zenodo.35100
oai:zenodo.org:35100
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
FOSDEM, Free and Open source Software Developers’ European Meeting, Brussels, Belgium, 1-2 February 2014
graph databases
dependency management
Bio4j + Statika: Managing module dependencies on the type level
info:eu-repo/semantics/lecture
oai:zenodo.org:35104
2020-01-20T13:30:04Z
user-intercrossing
openaire
Alekhin, Alexey
2014-09-18
<p>Next Generation Sequencing has revolutionized the bioinformatics landscape, reshaping fields such as genomics and transcriptomics, by offering huge amounts of data about previously inaccessible domains in a cheap and scalable way. Thus, biological data analysis demands, more than ever, high performance computing architectures. Cloud Computing, a comparable breakthrough in the IT world, holds promise for being the foundation on which a solution could be built (as already demonstrated by pioneering efforts such as Galaxy or CloudBioLinux). It provides a perfect framework for high throughput data analysis: deploying architectures with as much computing capacity as needed, scaling in a horizontal way, being also able to scale down adjusting to the computing needs real time, with the pay-as-you-go model.</p>
<p>However, fast and cost-effective data analysis in the cloud at such scale remains elusive. High throughput analysis, where a lot of resources are to be used and paid for, critically needs to have an ability to manage both the tools and data in a robust, reproducible and automated way. As in bioinformatics analysis often a pretty complex and unstable chain of dependencies underlies tools and data, knowing beforehand that all the resources to be used are properly configured is invaluable.</p>
<p>Statika (http://ohnosequences.com/statika) aims to be a basic tool for the declaration and automated deployment of composable cloud infrastructures for the bioinformatics space. Using Statika data, tools and infrastructure are treated on an equal basis with a expressive domain specific language that allows the user to express complex dependency relationships. Statika will automatically check for possible version conflicts and choose a safe resource creation order.</p>
<p>Statika has been applied in different scenarios: from a cloud-based system for scalable and composable parallel computations in the bioinformatics domain as in Nispero tool, to modular automated deployments of complex databases as Bio4j. Bio4j (bio4j.com)is a graph database integrating all data from key resources in the bioinformatics data space, including UniProt, Gene Ontology, the NCBI Taxonomy or UniRef. We use Statika internally for the integration and automated deployment of all sort of bioinformatics tools and data.</p>
<p>Statika is open source, available under the AGPLv3 license.</p>
<p> </p>
<p> </p>
<div> </div>
https://doi.org/10.5281/zenodo.35104
oai:zenodo.org:35104
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
JdBI2014, The XII Symposium on Bioinformatics, Sevilla, Spain, 21-24 September 2014
Statika: managing bioinformatics tools and resources in the cloud
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35248
2020-01-20T16:45:39Z
user-intercrossing
openaire
user-eu
Sollars, Elizabeth
Zohren, Jasmin
Clark, Jo
Boshier, David
Joecker, Anika
Buggs, Richard
2014-01-10
<p><em>Fraxinus excelsior</em> (European ash) is a common tree in Europe with about 80 million individuals in the UK, and is of great ecological and economic value as a key forest species. Ash is diploid, with 2C = 46 chromosomes. Ash trees in Europe are currently threatened by the fungal pathogen <em>Hymenoscyphus pseudoalbidus</em>, which causes ash dieback. It is estimated that around 99% of the ash trees in the UK are at risk of infection. Natural genetic variation within ash populations includes a small percentage of low susceptibility individuals, as has been confirmed by a field study in Denmark. The British Ash Tree Genome Project is providing a reference <em>F. excelsior</em> genome sequence, to make the identification of genetic variants conferring low susceptibility possible on a genome-wide scale.</p>
<p>We extracted DNA from a young ash tree produced by the self-pollination of a low heterozygosity tree from the Cotswolds in England. Its 1C genome size was measured with flow cytometry as 880Mb. The genome was sequenced <em>de novo </em>using a combination of Illumina HiSeq2000 and Roche 454 FLX+ technologies at Eurofins. Comparing the results of several <em>de novo</em> assemblers, we have selected the best assembly, consisting of 90k scaffolds and an N50 of almost 100kb. It is available for download on the project’s website, www.ashgenome.org, along with the raw sequencing data. We are also annotating the assembly with RNA-seq data from five different tissues.</p>
<p>Our work lays foundations for the study of genome-wide polymorphisms in European ash, and selection of genes for resistance to <em>Hymenoscyphus pseudoalbidus</em>. This reference genome may also assist scientists working on the emerald ash borer infestation of <em>F. pennsylvanica </em>and <em>F. americana </em>in North America.</p>
https://doi.org/10.5281/zenodo.35248
oai:zenodo.org:35248
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
PopGroup, 46 Population Genetics Group Meeting, Bath, UK, 7-10 January 2014
whole genome sequence, ash dieback, Fraxinus
Sequencing the genome of Fraxinus excelsior (European ash)
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35361
2020-01-20T14:48:45Z
user-intercrossing
openaire
Blanckaert, Alexandre
Hermisson, Joachim
2015-12-09
<p>Interest in speciation research has experienced a recent shift from the classical problem of “When does it happen?” to more process-oriented question: “How does it happen?” This is of relevance, in particular, for parapatric speciation, where the build-up of pre- or postzygotic barriers to gene-flow is a gradual process. The standard mechanism for the evolution of post-zygotic isolation is the accumulation of Dobzhansky-Muller incompatibilities (DMI).<br />
While this process is reasonably well understood for allopatric speciation, one can ask how it unfolds in the face of gene flow. In a recent paper, Bank et al. (2012) have studied the very first step of this process and described the conditions for a first two-locus DMI to appear and be maintained. Other authors propose mechanisms purely based on divergent selection. However, during these early steps of divergence, one can expect that the amount of local adaptation between the two splitting populations is quite restricted.<br />
Here, we want to understand how the amount of local adaptation is going to affect this first step and whether it forms a limit to the strength of any genetic barrier formed by DMIs. We use a three locus continent island model with potential epistasis between these loci to address this question. We will demonstrate that whereas with classic DMIs local adaptation is a limit to the strength of a genetic barrier, this is no longer true if more complex epistasis pattern are involved. By complex, we refer, to a combination of positive and negative epistasis or three locus epistasis.<br />
In addition, we will show that having 3 mutations, with uniquely negative epistasis and codominance both at the single locus effect and epistatic effect, is a sufficient condition, if the incompatibilities are lethal, to have reproductive isolation and therefore speciation.</p>
https://doi.org/10.5281/zenodo.35361
oai:zenodo.org:35361
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
EBM, 19th Evolutionary Biology Meeting, Marseille, France, 15-18 September 2015
Local Adaptation: a Limit to Early Speciation?
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:49562
2020-01-20T15:10:01Z
user-intercrossing
user-eu
Zohren, Jasmin
Wang, Nian
Kardailsky, Igor
Borrell, James S
Joecker, Anika
Nichols, Richard A
Buggs, Richard JA
2016-04-11
<p>Hybridisation may lead to introgression of genes among species. Introgression may be bidirectional or unidirectional, depending on factors such as the demography of the hybridising species, or the nature of reproductive barriers between them. Previous microsatellite studies suggested bidirectional introgression between diploid <em>Betula nana</em> (dwarf birch) and tetraploid <em>B. pubescens</em> (downy birch) and also between <em>B. pubescens</em> and diploid <em>B. pendula</em> (silver birch) in Britain. Here we analyse introgression among these species using 51,237 variants in restriction-site associated (RAD) markers in 194 individuals, called with allele dosages in the tetraploids. In contrast to the microsatellite study, we found unidirectional introgression into <em>B. pubescens</em> from both of the diploid species. This pattern fits better with the expected nature of the reproductive barrier between diploids and tetraploids. As in the microsatellite study, introgression into <em>B. pubescens</em> showed clear clines with increasing introgression from <em>B. nana</em> in the north and from <em>B. pendula</em> in the south. Unlike <em>B. pendula</em> alleles, introgression of <em>B. nana</em> alleles was found far from the current area of sympatry or allopatry between <em>B. nana</em> and <em>B. pubescens</em>. This pattern fits a shifting zone of hybridisation due to Holocene reduction in the range of <em>B. nana</em>, and expansion in the range of <em>B. pubescens</em>.</p>
Special Issue on Genomics of Hybridization
https://doi.org/10.1111/mec.13644
oai:zenodo.org:49562
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
Molecular Ecology, 25(11), 2413–2426, (2016-04-11)
climate change
genotyping
hybridisation
introgression
polyploidy
Unidirectional diploid-tetraploid introgression among British birch trees with shifting ranges shown by RAD markers
info:eu-repo/semantics/article
oai:zenodo.org:45049
2020-01-20T15:18:04Z
user-intercrossing
Sylvie Larrat
Kulkarni Om
Jean-Baptiste Claude
Réjane Beugnot
Michaël G. B. Blum
Katia Fusillier
Julien Lupo
Pauline Tremeaux
Agnès Plages
Alice Marlu
Hervé Duborjal
Anne Signori-Schmuck
Olivier Francois
Jean-Pierre Zarski
Patrice Morand
Vincent Leroy
2014-11-19
<p><strong>Despite the gain in sustained virological responses (SVR) provided by protease inhibitors (PIs), failures still occur. The aim of this study was to determine if a baseline analysis of the NS3 region using ultradeep pyrosequencing (UDPS) can help to predict an SVR. Serum samples from 40 patients with previously nonresponding genotype 1 chronic hepatitis C who were retreated with triple therapy, including a PI, were analyzed. Baseline UDPS of the NS3 gene was performed on plasma and peripheral blood mononuclear cells (PBMC). Mutations conferring resistance to PIs were sought. The overall diversity of the quasispecies was evaluated by calculating the Shannon entropy (SE). Resistance mutations were found in plasma and PBMC but were not discriminating enough to predict an SVR. NS3 quasispecies heterogeneity was significantly lower at baseline in patients achieving an SVR than in those not achieving an SVR (SE of 26.98 ±</strong> <strong>16.64 x</strong> <strong>10</strong>^<strong>3 </strong><strong>versus 44.93 ±</strong> <strong>19.58 x</strong> <strong>10</strong>^<strong>3</strong><strong>, </strong><em>P =</em> <strong>0.0047). With multivariate analysis, the independent predictors of an SVR were fibrosis of stage F </strong><<strong>2 (odds ratio [OR], 13.3; 95% confidence interval [CI], 1.25 to 141.096; </strong><em>P </em><strong>< </strong><strong>0.03) and SE below the median (OR, 5.4; 95% CI, 1.22 to 23.87; </strong><em>P </em><strong>< </strong><strong>0.03). More than the presence of minor mutations at the baseline in plasma or in PBMC, the NS3 viral heterogeneity determined by UDPS is an independent factor for an SVR in previously treated patients receiving triple therapy that includes a PI. </strong></p>
https://doi.org/10.1128/JCM.02547-14
oai:zenodo.org:45049
Zenodo
https://zenodo.org/communities/intercrossing
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
HCV
NS3
Entropy
Ultradeep Pyrosequencing of NS3 To Predict Response to Triple Therapy with Protease Inhibitors in Previously Treated Chronic Hepatitis C Patients
info:eu-repo/semantics/article
oai:zenodo.org:35082
2020-01-20T14:55:33Z
user-intercrossing
openaire
user-eu
Marquardt, Jeannine
Leitch, Andrew
Nichols, Richard
Schneider, Harald
2015-12-09
<p>The native British bluebell, Hyacinthoides non-scripta, is a spring flowering lily, which is well- known in natural, old forests across the British Isles. In the 17th century another bluebell taxon from the Iberian Peninsula was introduced as an ornamental plant. Since the beginning of the last century there have been reports of it in the wild, and it has recently been found to form fully fertile hybrids with the native bluebell. Both hybrids and the alien taxa are spreading, yet that usually occurs close to urban areas probably because of garden escapes. Several surveys and studies looking into the distribution of these taxa have suggested they may be putting the native British bluebell at risk, by outcompeting and replacing them in their natural habitats. However, our understanding of environmental drivers influencing alien invasion is confused by human impact, including on-going plantings and changing land use.<br />
To better understand the dynamic between alien and native taxa, I study a natural hybrid zone between the British bluebell and its sister species, Hyacinthoides hispanica, in Spain, where there is minimal human influence. In this natural hybrid zone we learned from the hybrid’s flower morphologies and additional chloroplast data that the parents contribute symmetrically to the intermediate forms, and the hybrids appear without evident hybrid deficiencies. Gene flow is mainly mediated through pollen exchange, because clonal buds and seed dispersal are only important on an imminent range.<br />
We will use genome-wide markers to get an understanding of patterns of introgression across hundreds of loci.</p>
https://doi.org/10.5281/zenodo.35082
oai:zenodo.org:35082
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
SMBE, Society for Molecular Biology and Evolution, Vienna, 12-16 July 2015
Hybridisation between the British and the Spanish bluebell
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35141
2020-01-20T14:52:23Z
user-intercrossing
openaire
Sollars, Elizabeth
Zohren, Jasmin
Boshier, David
Clark, Jo
Joecker, Anika
Buggs, Richard
2014-01-06
<p>Poster presented at Plant and Animal Genomics Conference January 2014.</p>
https://doi.org/10.5281/zenodo.35141
oai:zenodo.org:35141
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
ash tree
ash dieback
genome assembly
genome annotation
Sequencing the Genome of Fraxinus Excelsior (European Ash)
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:45050
2020-01-20T15:28:04Z
user-intercrossing
openaire
Kulkarni Om
Francois Olivier
Morand Patrice
Michaël G. B. Blum
Larrat Sylvie
2016-01-20
<p>Viruses are highly diverse in their genetic constitution, both within and between infected hosts. They adapt to the host’s environment with a high mutation rate and make identification and consequent treatment very difficult, especially if the treatment targets a particular genetic feature.</p>
<p>Hepatitis C Virus (HCV), which causes nearly 200 million chronic infections worldwide, is being studied using various bioinformatics approaches. It is curable with treatments which vary in dosage and duration based on the genotype of the virus. It is hypothesized that direct antiviral therapies which target viral proteins create a selection pressure in the virus. This ultimately leads to the emergence of viral strains in which the gene coding for the target protein is mutated, further complicating treatment.</p>
<p>There is a need for highly accurate genotyping capabilities, so variants are correctly identified and appropriate treatment can be administered. We propose a case study in which viral samples from 45 patients with chronic HCV are being sequenced using a Roche GS Junior at different stages of treatment. Using variant callers like GS Amplicon Variant Analyzer and other tools the variants in the viral population over the course of treatment are observed. It is of interest to know if the increase in viral diversity is linked with onset of the treatment, and the difference in viral population for each outcome of the treatment (responders vs non-responders). This would ultimately help in understanding the evolution of the virus and further insights into drug designing and personalized treatment.</p>
https://doi.org/10.5281/zenodo.45050
oai:zenodo.org:45050
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
HCV
Diversity
NS3
Estimation of genetic diversity in NS3 region of Hepatitis C virus using 454 pyrosequencing
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35162
2020-01-20T14:48:09Z
user-intercrossing
Vatsiou Alexandra
Bazin Eric
Gaggiotti Oscar
2015-08-25
<p>Identifying genomic regions targeted by positive selection has been a long-standing interest of evolutionary biologists. This objective was difficult to achieve until the recent emergence of next-generation sequencing, which is fostering the development of large-scale catalogues of genetic variation for increasing number of species. Several statistical methods have been recently developed to analyse these rich data sets, but there is still a poor understanding of the conditions under which these methods produce reliable results. This study aims at filling this gap by assessing the performance of genome-scan methods that consider explicitly the physical linkage among SNPs surrounding a selected variant. Our study compares the performance of seven recent methods for the detection of selective sweeps (iHS, nSL, EHHST, xp-EHH, XP-EHHST, XPCLR and hapFLK). We use an individual-based simulation approach to investigate the power and accuracy of these methods under a wide range of population models under both hard and soft sweeps. Our results indicate that XPCLR and hapFLK perform best and can detect soft sweeps under simple population structure scenarios if migration rate is low. All methods perform poorly with moderate-to-high migration rates, or with weak selection and very poorly under a hierarchical population structure. Finally, no single method is able to detect both starting and nearly completed selective sweeps. However, combining several methods (XPCLR or hapFLK with iHS or nSL) can greatly increase the power to pinpoint the selected region.</p>
Mol Ecol. 2015 Aug 28. doi: 10.1111/mec.13360. [Epub ahead of print]
PMID:26314386
https://doi.org/10.5281/zenodo.35162
oai:zenodo.org:35162
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Detection of selective sweeps in structured populations: a comparison of recent methods.
info:eu-repo/semantics/article
oai:zenodo.org:35229
2020-01-20T13:33:04Z
user-intercrossing
Sestak, Martin Sebastijan
Bozicevic, Vedran
Bakaric, Robert
Dunjko, Vedran
Domazet-Loso, Tomislav
2013-04-12
<p>Background</p>
<p>The vertebrate head is a highly derived trait with a heavy concentration of sophisticated sensory organs that allow complex behaviour in this lineage. The head sensory structures arise during vertebrate development from cranial placodes and the neural crest. It is generally thought that derivatives of these ectodermal embryonic tissues played a central role in the evolutionary transition at the onset of vertebrates. Despite the obvious importance of head sensory organs for vertebrate biology, their evolutionary history is still uncertain.</p>
<p>Results</p>
<p>To give a fresh perspective on the adaptive history of the vertebrate head sensory organs, we applied genomic phylostratigraphy to large-scale <em>in situ</em> expression data of the developing zebrafish <em>Danio rerio</em>. Contrary to traditional predictions, we found that dominant adaptive signals in the analyzed sensory structures largely precede the evolutionary advent of vertebrates. The leading adaptive signals at the bilaterian-chordate transition suggested that the visual system was the first sensory structure to evolve. The olfactory, vestibuloauditory, and lateral line sensory organs displayed a strong link with the urochordate-vertebrate ancestor. The only structures that qualified as genuine vertebrate innovations were the neural crest derivatives, trigeminal ganglion and adenohypophysis. We also found evidence that the cranial placodes evolved before the neural crest despite their proposed embryological relatedness.</p>
<p>Conclusions</p>
<p>Taken together, our findings reveal pre-vertebrate roots and a stepwise adaptive history of the vertebrate sensory systems. This study also underscores that large genomic and expression datasets are rich sources of macroevolutionary information that can be recovered by phylostratigraphic mining.</p>
https://doi.org/10.1186/1742-9994-10-18
oai:zenodo.org:35229
Zenodo
https://zenodo.org/communities/intercrossing
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
Genomic phylostratigraphy
Macroevolution
Sensory systems
Vertebrates
Placodes
Neural crest
Zebrafish
Phylostratigraphic profiles reveal a deep evolutionary history of the vertebrate head sensory systems
info:eu-repo/semantics/article
oai:zenodo.org:34941
2020-01-20T14:21:06Z
user-intercrossing
openaire
user-eu
Kovach,Evdokim
Alekhin,Alexey
Manrique,Marina
Pareja-Tobes,Pablo
Pareja,Eduardo
Tobes,Raquel
Pareja-Tobes,Eduardo
2015-06-25
<p>The poster was presented at Exploring Human Host-Microbiome Interactions in Health Wellcome Trust Conference at 30.06.2015</p>
https://doi.org/10.5281/zenodo.34941
oai:zenodo.org:34941
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Exploring Human Host-Microbiome Interactions in Health, Hinxton, UK, 29.06.2015-1.07.2015
Metapasta: scalable tool for microbial community profiling
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35368
2020-01-20T17:32:33Z
user-intercrossing
openaire
user-eu
Evdokim Kovach
Alexey Alekhin
Eduardo Pareja Tobes
Raquel Tobes
Eduardo Pareja
Marina Manrique
2014-04-01
<p>Nowadays it is widely accepted that the bioinformatics data analysis is a real bottleneck in many research activities related to life sciences. High-throughput technologies like Next Generation Sequencing (NGS) have completely reshaped the biology and bioinformatics landscape. Undoubtedly NGS has allowed important progress in many life-sciences related fields but has also presented interesting challenges in terms of computation capabilities and algorithms. Many kinds of tasks related with NGS data analysis, as well as other bioinformatics data analysis, can be computed in a parallel, independent way; taking the maximum advantage of this can obviously help in leveraging the analysis bottleneck. </p>
<p>Given the way NGS data is generated scalability plays also an important role in its analysis. NGS data is not generated in a continous fashion but in a batch way, thus the computation needs can be dramatically different at different points. </p>
<p>Cloud computing provides a perfect framework for systems with these two requirements: parallel and scalable. Besides, it allows adjusting the computation power on demand, and thus not being attached to (and paying for) a fixed compute infrastructure. </p>
<p>Nispero is a Scala library for declaring stateless computations and scaling them using cloud computing, in particular a combination of services from AWS (Amazon Web Services). Some highlights are: </p>
<ul>
<li>strongly typed configuration based on Scala code </li>
<li>CRDT-like semantics (a nispero instance is essentially a morphism between idempotent commutative monoids) </li>
<li>automatic deploy/undeploy </li>
</ul>
<p>Nispero relies on the EC2 service (Elastic Compute Cloud) to carry out the computations, on the S3 service (Simple Storage Service) for data storage and on SQS (Simple Queue Service) and SNS (Simple Notification Service) for communication between the different system components. </p>
<p>A Nispero system is composed by: </p>
<ul>
<li>a "console" instance that tracks at any moment the status of the whole system giving the user the opportunity to check at any point the current status of the computations, workers, etc. </li>
<li>a "manager" instance that is in charge of deploying and undeploying the group of workers </li>
<li>a set of "workers" that performs the computations/tasks in a parallel, independent way </li>
<li>SQS queues for "input", "output" and "error" messages </li>
<li>S3 objects for "input" and "output" files </li>
</ul>
<p>The lifecycle of a Nispero system is simple but robust. It starts with the launch of the "console" and "manager" instances, the "manager" then takes the tasks from an S3 object, publishes them in a SQS queue and launches the workers. The workers take the messages with the tasks from the corresponding SQS queue (i.e. the "input" queue) in an independent, parallel way. Once they have finished the computation they put the results of the computation in S3 objects, publish a message in the "output" SQS queue and delete the input message of the corresponding task from the "input" queue. </p>
<p>Nispero is an open-source project released under AGPLv3 license. The source code is available at https://github.com/ohnosequences/nispero</p>
<p>This project is funded in part by the ITN FP7 project INTERCROSSING (Grant 289974)</p>
https://doi.org/10.5281/zenodo.35368
oai:zenodo.org:35368
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
MIT License
https://opensource.org/licenses/MIT
IWBBIO 2014, 2nd International Work-Conference on Bioinformatics and Biomedical Engineering, 1-4 April 2014
AWS
bioinformatics
Nispero: a cloud-computing based Scala tool specially suited for bioinformatics data processing
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:45145
2020-01-20T16:24:56Z
user-intercrossing
openaire
Jan,Habib
Jan,Habib
2016-01-25
<p>Presentation on Genome-wide Selection in Plant Breeding.</p>
https://doi.org/10.5281/zenodo.45145
oai:zenodo.org:45145
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Genomic Selection,Canola,Plant Breeding
Genomic Selection in Plant Breeding
info:eu-repo/semantics/lecture
oai:zenodo.org:35366
2020-01-25T07:21:19Z
user-intercrossing
software
user-eu
Evdokim Kovach
2015-12-14
<p>Scala library for execution paralel computations expessed as monoid homomorphisms.</p>
https://doi.org/10.5281/zenodo.35366
oai:zenodo.org:35366
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
MIT License
https://opensource.org/licenses/MIT
cloud computing
Scala
Compota
info:eu-repo/semantics/other
oai:zenodo.org:34950
2020-01-20T13:50:37Z
user-intercrossing
user-eu
Robert Verity
Jeannine Marquardt
Andrea Hatlen
Jasmin Zohren
2013-01-28
<p>Shortly before Christmas 2012, the 46th PopGroup (Popu­ lation Genetics Group) meeting was held at Glasgow University, UK. Over 180 scientists attended from 19 different countries, with speakers from diverse research areas and ranging from PhD students to retired profes­ sors. Some talks dealt with the conservation of exotic species on remote islands, while others took a more theo­ retical approach. Almost all made use of, or anticipated, the volume and ever­reducing cost of data from next­ generation sequencing, which is opening up access to research topics and study organisms that were previously off limits. Here, we report on some of the main themes of the conference, and the directions in which next­ generation sequencing technology is taking us.</p>
https://doi.org/10.1186/gb-2013-14-1-301
oai:zenodo.org:34950
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
genetics
evolution
coalescent theory
selective sweeps
blogging
gynodioecy
androdioecy
From hybrids to hermaphrodites in population genetics
info:eu-repo/semantics/article
oai:zenodo.org:35102
2020-01-20T17:10:15Z
user-intercrossing
openaire
Pareja-Tobes, Pablo
Kovach, Evdokim
Manrique, Marina
Pareja, Eduardo
Tobes, Raquel
Pareja-Tobes, Eduardo
Alekhin, Alexey
2014-04-08
<p>Bio4j (http://bio4j.com) is a high-performance cloud-enabled graph-based bioinformatics data platform. It is one of the first and most important graph databases for biological data, specially designed to cope with and manage the huge amount of data brought by NGS technologies: it integrates most data available in UniProt KB (SwissProt + Trembl), Gene Ontology (GO), UniRef (50, 90, 100), RefSeq, NCBI taxonomy, and Expasy Enzyme DBs. Data is organized in a way semantically equivalent to what it represents by taking advantage of the graph structure; in this paradigm it is easy to have many different types of relationships and nodes thus making it perfect for highly interconnected complex data (as it is the case of biological data). From a performance point of view, relational databases with their tabular data structure are not able to respond to some complex queries that are possible to resolve using the graph paradigm; graph databases give you fast local access to all the elements related with each entity, through the edges that connect them with others. In this way, queries which would even be impossible to perform with a standard Relational DB, just take a couple of seconds with Bio4j.</p>
<p>This year has seen important updates and new developments on Bio4j; it now includes 1,216,993,547 relationships and 190,625,351 nodes, close to triplicating the figures from one year ago. We have introduced a new level of abstraction for the domain model, by decoupling the inner database implementation from the relationships among entities themselves. Interfaces has been developed for each node and relationship present in the database, including methods to access both the properties of the entity it represents and utility methods that allow to easily navigate to the entities that will be linked to it. </p>
<p>Implementing that set of interfaces we have developed another layer for the domain model using Blueprints, the de-facto standard for graph data modeling, thus making the domain model independent from the choice of database technology. Building on that, we now offer specifically tuned data binary distributions for TitanDB, yielding a dramatic increase in performance due to vertex-local edge-typed indexes. </p>
<p>The introduction of a module system based in Statika makes now possible to deploy only selected components of the integrated data sets, with Amazon Web Services deployments on hardware specifically configured for them. </p>
<p>Bio4j is open source, available under the AGPLv3 license.</p>
<div> </div>
https://doi.org/10.5281/zenodo.35102
oai:zenodo.org:35102
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
IWBBIO, 2nd International Work-Conference on Bioinformatics and Biomedical Engineering, Granada, Spain, 7-9 April 2014
Bio4j: bigger, faster, leaner
info:eu-repo/semantics/lecture
oai:zenodo.org:35105
2020-01-20T14:57:14Z
user-intercrossing
openaire
Alekhin, Alexey
2014-09-17
<p>Bio4j (http://bio4j.com) is a cloud-based high-performance bioinformatics graph database. It is one of the first and most important graph databases for biological data with a special focus on data integration: it integrates most data available in UniProt KB (SwissProt + Trembl), Gene Ontology (GO), UniRef (50, 90, 100), RefSeq, NCBI taxonomy, and Expasy Enzyme DBs. All this data in Bio4j is organized in a semantically equivalent graph structure. It allows having many different types of nodes and relationships, making it perfect for highly interconnected complex biological data. Graph databases allow fast local access to all the elements related with each entity, through the edges that connect them with others. So, from a performance point of view, queries which would even be impossible to perform with a standard relational database take no more than a couple of seconds with Bio4j.</p>
<p>Bio4j is in active development and grows rapidly: it includes now 1,216,993,547 relationships and 190,625,351 nodes, which is close to triplicating the figures from one year ago. A flexible module system based on Statika is provided with Bio4j enabling the user to build and deploy only the modules needed for the analysis. </p>
<p>Bio4j is now based on an abstract domain model which decoupling the inner database implementation from the relationships among entities themselves. This allowed us to have a default implementation using Blueprints, the de-facto standard for graph data modeling, thus making the domain model independent from the choice of database technology. Building on that, we now offer binary distributions for Neo4j and TitanDB backends, yielding a dramatic increase of performance using the backend-specific optimizations, such as vertex-local edge-typed indexes in TitanDB for instance. </p>
<p>Bio4j is open source, available under the AGPLv3 license.</p>
<div> </div>
https://doi.org/10.5281/zenodo.35105
oai:zenodo.org:35105
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
JdBI2014, The XII Symposium on Bioinformatics, Sevilla, Spain, 21-24 September 2014
Bio4j: the bioinformatics data platform
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35247
2020-01-20T14:53:10Z
user-intercrossing
openaire
user-eu
Zohren, Jasmin
Sollars, Elizabeth
Clark, Jo
Boshier, David
Joecker, Anika
Buggs, Richard
2013-08-19
<p><em>Fraxinus excelsior</em> (European Ash) is a common tree in Europe with about 80 million individuals in the UK, and is of great ecological and economic value. Ash is diploid, with 2C = 46 chromosomes. Ash trees in Europe are now threatened by the fungal pathogen <em>Hymenoscyphus pseudoalbidus</em>; it is estimated that around 99% of the ash trees in the UK are at risk of infection. Natural genetic variation within Ash populations includes a small percentage of low susceptibility individuals, as has been confirmed by a field study in Denmark. However, until now there is no reference sequence available which made it impossible to study the genetic variants that confer a low susceptibility phenotype on a genome-wide scale.</p>
<p>We extracted DNA from a young ash tree produced by the self-pollination of a low heterozygosity tree from the Cotswolds in England. We measured its 1C genome size with flow cytometry as 880Mb. We are sequencing this <em>de novo </em>using a combination of Illumina HiSeq2000 and Roche 454 FLX+ technologies at Eurofins. We are assembling the reads with the CLC Genomics Workbench and other <em>de novo</em> assemblers for comparison and annotating it using MAKER.</p>
<p>This work lays foundations for the study of genome-wide polymorphisms in European ash, and selection of genes for resistance to <em>Hymenoscyphus pseudoalbidus</em>.</p>
https://doi.org/10.5281/zenodo.35247
oai:zenodo.org:35247
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
ESEB, European Society of Evolutionary Biology Meeting, Lisbon, Portugal, 19-24 August 2013
whole genome, ash dieback, Fraxinus
Sequencing the genome of Fraxinus excelsior (European Ash)
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35166
2020-01-20T16:13:35Z
user-intercrossing
Bozicevic, Vedran
Hutter, Stephan
Stephan, Wolfgang
Wollstein, Andreas
2015-11-11
<p>We studied <em>Drosophila melanogaster</em> populations from Europe (the Netherlands and France) and Africa (Rwanda and Zambia) to uncover genetic evidence of adaptation to cold. We present here four lines of evidence for genes involved in cold adaptation from four perspectives: (1) the frequency of SNPs at genes previously known to be associated with chill-coma recovery time (CCRT), startle reflex (SR), and resistance to starvation stress (RSS) vary along environmental gradients and therefore among populations; (2) SNPs of genes that correlate significantly with latitude and altitude in African and European populations overlap with SNPs that correlate with a latitudinal cline from North America; (3) at the genome-wide level, the top candidate genes are enriched in gene ontology (GO) terms that are related to cold tolerance; (4) GO enriched terms from North American clinal genes overlap significantly with those from Africa and Europe. Each SNP was tested in 10 independent runs of <em>Bayenv2</em>, using the median Bayes factors to ascertain candidate genes. None of the candidate genes were found close to the breakpoints of cosmopolitan inversions, and only four candidate genes were linked to QTLs related to CCRT. To overcome the limitation that we used only four populations to test correlations with environmental gradients, we performed simulations to estimate the power of our approach for detecting selection. Based on our results we propose a novel network of genes that is involved in cold adaptation.</p>
https://doi.org/10.1111/mec.13464
oai:zenodo.org:35166
Zenodo
https://zenodo.org/communities/intercrossing
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
local adaptation
polygenic traits
Population genetic evidence for cold adaptation in European Drosophila melanogaster populations
info:eu-repo/semantics/article
oai:zenodo.org:35145
2020-01-20T14:13:52Z
user-intercrossing
openaire
Sollars, Elizabeth
Kelly, Laura
Swarbreck, David
Zohren, Jasmin
Boshier, David
Clark, Jo
Joecker, Anika
Caccamo, Mario
Buggs, Richard
2015-01-09
<p>Slides presented at Plant and Animal Genomics conference January 2015.</p>
https://doi.org/10.5281/zenodo.35145
oai:zenodo.org:35145
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
ash tree
ash dieback
fraxinus
genome assembly
genome annotation
The genome of Fraxinus Excelsior (European Ash)
info:eu-repo/semantics/lecture
oai:zenodo.org:45404
2020-01-20T16:53:24Z
user-intercrossing
openaire
Soumya Ranganathan , Fabian Grandke, Dirk Metzler
2016-02-01
<p>We compare two polyploid phasing tools with different approaches, statistical method called 'Polyhap' and a combinatorial method.</p>
https://doi.org/10.5281/zenodo.45404
oai:zenodo.org:45404
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
An overview of phasing methods for polyploids
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:45406
2020-01-20T15:23:54Z
user-intercrossing
Jan, Habib
Abbadi, Amine
Lücke, Sophie
Nichols,Richard
Snowdon, Rod
2016-01-29
<p>Genome-wide selection in hybrid canola (Brassica napus)</p>
https://doi.org/10.1371/journal.pone.0147769
oai:zenodo.org:45406
Zenodo
https://zenodo.org/communities/intercrossing
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Genomic selection,hybrid canola,Plant breeding,GCA
Genomic Prediction of Testcross Performance in Canola (Brassica napus)
info:eu-repo/semantics/article
oai:zenodo.org:45144
2017-09-06T07:01:07Z
user-intercrossing
Jan,Habib
Jan,Habib
2016-01-25
<p>Public outreach activity in NHM London Live.<br />
</p>
https://doi.org/10.5281/zenodo.45144
oai:zenodo.org:45144
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Genomic Selection,Genetics,Plant Breeding
NHM London Live
info:eu-repo/semantics/other
oai:zenodo.org:35101
2020-01-20T15:23:19Z
user-intercrossing
openaire
Kovach, Evdokim
Pareja-Tobes, Pablo
Manrique, Marina
Pareja, Eduardo
Tobes, Raquel
Pareja-Tobes, Eduardo
Alekhin, Alexey
2014-04-03
<p>Next Generation Sequencing (NGS) has brought a revolution to the bioinformatics landscape, definitely reshaping fields such as genomics and transcriptomics, by offering sheer amounts of data about previously inaccessible domains in a cheap and scalable way. Thus biological data analysis demands, more than ever, high performance computing architectures; in particular, Cloud Computing, a comparable breakthrough in the IT world, holds promise for being the foundation on which a solution could be built (as already demonstrated by pioneering efforts such as Galaxy or CloudBioLinux). It provides a perfect framework for high throughput data analysis: deploying architectures with as much computing capacity as needed, scaling in an horizontal way, being also able to scale down adjusting to the computing needs real time, or the pay-as-you-go model make for a strong case.</p>
<p>However, fast, reproducible, and cost-effective data analysis in the cloud at such scale remains elusive. Certainly, one fundamental prerequisite for achieving this is having the ability to manage both the tools and data to be used in a robust, reproducible, and automated way. High throughput analysis, where a lot of resources are to be used and paid for, needs to have a robust configuration system to rely on. In the cloud computing world, due to its on-demand nature, automated resource configuration is a critical factor. This is even more so in the case of bioinformatics analysis where pretty often a pretty intricated and unstable chain of dependencies underlies tools and data; knowing beforehand that all the resources to be used are properly configured is invaluable.</p>
<p>Statika (http://ohnosequences.com/statika) aims to be a basic tool for the declaration and deployment of composable, versioned and reproducible cloud infrastructures for the bioinformatics space.</p>
<p>Data, tools and infrastructure are treated on an equal footing, and a expressive domain specific language allows the user to express complex dependency relationships, check for possible version conflicts and automatically choose a safe resource creation order. </p>
<p>By making use of advanced features of the Scala programming language such as dependent types and type-level computations a great deal of structure can be expressed abstractly, and checked at compile time before any cost is incurred. A strong versioning system where both data and tools are included makes reproducibility not only possible but actually enforced. </p>
<p>Statika has been put to work on scenarios as different as a cloud-based system for scaling inherently parallel computations in the bioinformatics domain: Nispero, or by providing versioned and modular automated deployments of Bio4j, a graph database integrating all data from key resources in the bioinformatics data space, including: UniProt, Gene Ontology, the NCBI Taxonomy or UniRef. We use it internally for the integration and automated deployment of all sort of bioinformatics tools and data.</p>
<p>Statika is open source, available under the AGPLv3 license.</p>
<p> </p>
<div> </div>
https://doi.org/10.5281/zenodo.35101
oai:zenodo.org:35101
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
IWBBIO, 2nd International Work-Conference on Bioinformatics and Biomedical Engineering, Granada, Spain, 7-9 April 2014
Statika: managing cloud resources, bioinformatics tools and data
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:34948
2020-01-20T15:42:14Z
user-intercrossing
Lu MA
Andrea Hatlen
Laura J. Kelly
Hannes Becher
Wencai Wang
Ales Kovarik
Ilia Leitch
Andrew Leitch
2015-09-02
<p>The RNA-directed DNA methylation pathway (RdDM) can be divided into three phases: (i) small interfering RNA biogenesis, (ii) <em>de novo </em>methylation and (iii) chromatin modification. To determine the degree of conservation of this pathway we searched for key genes amongst land plants. We used OrthoMCL and the OrthoMCL Viridiplantae database to analyse proteomes of species in bryophytes, lycophytes, monilophytes, gymnosperms and angiosperms. We also analysed small RNA size categories and, in two gymnosperms, cytosine methylation in ribosomal DNA.</p>
<p>Six proteins were restricted to angiosperms, these being NRPD4/NRPE4, RDM1, DMS3, SHH1, KTF1 and SUVR2, although we failed to find the latter three proteins in <em>Fritillaria persica</em>, a species with a giant genome. Small RNAs of 24 nucleotides in length were abundant only in angiosperms. Phylogenetic analyses of Dicer-like (DCL) proteins showed that DCL2 was restricted to seed plants, although it was absent in <em>Gnetum gnemon </em>and <em>Welwitschia mirabilis</em>.</p>
<p>The data suggest that phases (i) and (ii) of the RdDM pathway, described for model angiosperms, evolved with angiosperms. The absence of some features of RdDM in <em>F. persica </em>may be associated with its large genome. Phase (iii) is probably the most conserved part of the pathway across land plants. DCL2, involved in virus defence and interaction with the canonical RdDM pathway to facilitate methylation of CHH, is absent outside seed plants. Its absence in <em>G. gnemon, </em>and <em>W. mirabilis </em>coupled with distinctive patterns of CHH methylation, suggest a secondary loss of DCL2 following the divergence of Gnetales.</p>
https://doi.org/10.1093/gbe/evv171
oai:zenodo.org:34948
Zenodo
https://zenodo.org/communities/intercrossing
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
chromatin modification
DNA methylation
evolution
RNA dependent DNA methylation
seed plants
ANGIOSPERMS ARE UNIQUE AMONGST LAND PLANT LINEAGES IN THE OCCURRENCE OF KEY GENES IN THE RNA DEPENDENT DNA METHYLATION (RDDM) PATHWAY
info:eu-repo/semantics/article
oai:zenodo.org:34707
2020-01-20T15:06:42Z
user-intercrossing
AMELIE BONNET-GARNIER,, PRISCA FEUERSTEIN, MARTINE CHEBROUT, RENAUD FLEUROT, HABIB-ULLAH JAN, PASCALE DEBEY and NATHALIE BEAUJEAN
2015-12-04
<p>Genome organization and Epigenetics in Mouse oocytes.</p>
<p> </p>
<p> </p>
<p> </p>
https://doi.org/10.5281/zenodo.34707
oai:zenodo.org:34707
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Zero v1.0 Universal
https://creativecommons.org/publicdomain/zero/1.0/legalcode
Int. Journal of Developmental Biology, 56, 877-887, (2015-12-04)
Epigenetics,Mouse,Oocytes
Genome organization and epigenetic marks in mouse germinal vesicle oocytes
info:eu-repo/semantics/article
oai:zenodo.org:35269
2020-01-20T14:47:19Z
user-intercrossing
openaire
Vatsiou Alexandra
2015-08-19
<p>Uncovering signatures of positive selection has been a long-standing interest in the field of genomics. The high prevalence of metabolic diseases such as diabetes has been suggested to be associated with positive selective pressures. Advantageous alleles increase in frequency and linked surrounding deleterious mutations rise in frequency as well, therefore the high prevalence of many diseases. High-density SNP maps of the human genome enable us to look for such regions involved in the susceptibility to diseases, particularly diabetes, obesity and metabolic syndrome. Firstly, we conduct a sensitivity analysis to evaluate the performance of several existing methods to detect positive selection. Out of the 7 methods (EHHST, XPEHHST, XP-EHH, iHS, nSL, XPCLR and hapFLK) that were compared under various demographic scenarios, XPCLR and iHS were found to perform best. These two methods were used for a genome scan of the HapMap Phase II database. Based on these results, we carried out an enrichment analysis to uncover signals enriched for positive selection. Two methods to conduct the enrichment analysis were used: the SUMSTAT statistic and Gowinda, an already available tool. String, Intact and Bio4j databases were also used to extract information about possible Protein-Protein Interactions associated with the ‘interesting genes’. Our results indicate that selection has affected in a large percentage the evolution of diseases in the human history. More specifically, 64 pathways were discovered to have undergone selection and a total of 16 positively selected genes were found to have a direct or indirect links with diabetes, obesity or metabolic syndrome.</p>
http://epidemiology.conferenceseries.com/abstract/2015/pathways-and-genes-under-positive-selection-in-metabolic-diseases
https://doi.org/10.5281/zenodo.35269
oai:zenodo.org:35269
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Pathways and Genes under positive selection in metabolic diseases
info:eu-repo/semantics/lecture
oai:zenodo.org:35271
2020-01-20T15:10:48Z
user-intercrossing
openaire
Vatsiou Alexandra
Bazin Eric
Gaggiotti Oscar
2014-09-17
<p>Abstract: Motivation: The identification of genetic variations that contribute to a genetic disease remains a major challenge in the research in human genetics. One of the factors that could have shaped the genetic diversity in a population is natural selection. Many studies investigate the effects of mutations and genes in the phenotype independently and therefore do not consider for large functional and gemonic effects that could originate from multiple small ones. Assessing and analyzing large-scale genomic data based on the gene sets level or in the network of gene-sets could provide further biological insight. Our objective in this work is to detect signals enriched for positive selection in the biological pathway level that could improve our knowledge in the biological interpretation of these effects. Methods & Results: We used the HapMap data phase II for genetic data http://hapmap.ncbi.nlm.nih.gov/downloads/phasing/2007-08_rel22/, the human genes from coding regions from the Entrez NCBI database (data downloaded on 5/2014) and the human pathways from the Biosystems database http://www.ncbi.nlm.nih.gov/biosystems (data downloaded on 5/2014). The analysis consists of the following steps: 1. Analyze the haplotype and SNP data using a composite likelihood method to test for positive selection between populations. The method XPCLR was chosen as it was proved to detect the wider range of signals over other methods (Vatsiou et al. in prep). Three comparisons were conducted: a. CEU-YRI, b. CEU-CHB/JTP and c. YRI-CHB/JTP. 2. Match the SNPs to 27081 genes according to their start and end position that were extracted from the Entrez database. We also included 50kb upstream or downstream away from the gene to account for intergenic regions. 3. We considered the highest XPCLR normalized score as the representative of the whole gene. A further normalization though was made to account for genes with large number of SNPs. 4. We tested for enrichment signals for positive selection calculating the sum of all the scores in each of the gene sets (total number of gene sets: 2362) and we estimated the significance taking into account the different gene set sizes. Conclusions: It is about to be shown the exact signals we observe from the human genome pathways and the way that they could possibly be involved in adaptation events. Even though many enrichment analysis have been conducted before (Daud et al. 2013), our study will possibly reveal more biological pathways enriched for positive selection as we are mainly based on pairwise comparisons between populations using XPCLR, and such an analysis has not yet been considered. </p>
http://eccb14.loria.fr/poster_proceedings/poster_proc_track_F.pdf
https://doi.org/10.5281/zenodo.35271
oai:zenodo.org:35271
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Signals enriched for positive selection
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35158
2020-01-20T17:23:48Z
user-intercrossing
Foll Matthieu
Gaggiotti Oscar
Josephine T. Daub
Vatsiou Alexandra
Laurent Excoffier
2014-09-25
<p>Living at high altitude is one of the most difficult challenges that humans had to cope with during their evolution. Whereas several genomic studies have revealed some of the genetic bases of adaptations in Tibetan, Andean, and Ethiopian populations, relatively little evidence of convergentevolution to altitude in different continents has accumulated. This lack of evidence can be due to truly different evolutionary responses, but it can also be due to the low power of former studies that have mainly focused on populations from a single geographical region or performed separate analyses on multiple pairs of populations to avoid problems linked to shared histories between some populations. We introduce here a hierarchical Bayesian method to detect local adaptation that can deal with complex demographic histories. Our method can identify selection occurring at different scales, as well as convergent adaptation in different regions. We apply our approach to the analysis of a large SNP data set from low- and high-altitude human populations from America and Asia. The simultaneous analysis of these two geographic areas allows us to identify several candidate genome regions for altitudinal selection, and we show that convergent evolution among continents has been quite common. In addition to identifying several genes and biological processes involved in high-altitude adaptation, we identify two specific biological pathways that could have evolved in both continents to counter toxic effects induced by hypoxia.</p>
American Journal Human Genetics, 2014 October, 4:95, DOI: http://dx.doi.org/10.1016/j.ajhg.2014.09.002, PMID: 25262650
https://doi.org/10.5281/zenodo.35158
oai:zenodo.org:35158
Zenodo
https://doi.org/10.1016/j.ajhg.2014.09.002
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Widespread signals of convergent adaptation to high altitude in Asia and america.
info:eu-repo/semantics/article
oai:zenodo.org:35273
2020-01-20T14:33:12Z
user-intercrossing
openaire
Vatsiou Alexandra
Bazin Eric
Gaggiotti Oscar
2013-08-20
<p>Motivation: The main topic of research in human genetics is the identification of genes and mutations that contribute to a genetic disease. One of the factors that can influence the genetic diversity in a population is natural selection. In this work, we will compare the existing long-range haplotype methods to detect selection. Our primary objective is to obtain a clear view about their power and validity under complex demographic scenarios. Methods & Results: Literature review: A systematic review that was conducted revealed five haplotype methods with available software to identify loci that have undergone selection. The five methods are the following: LRH (Sabeti, 2002; Nature 419:832-837), iHS (Voight, 2006; PLos Biol 4:72), xp-EHH (Sabeti, 2007; Nature 449:913-8), EHHST (Zhong, 2010; Hum Genet 18:1148-59) and xp-EHHST (Zhong, 2011; Stat & Its Interface 4:51–63). All of them used simulations to test their performance with the ms (Hudson, 2002; Bioinf 18:337-8) and SelSim programs (Spencer, 2004; Bioinf 20:3673-5). Sensitivity Analysis: Ms program can consider different demographic models, without selection and SelSim provides simulations under natural selection with a simple population structure. To determine the best method, we will generate simulated data using SimuPOP. SimuPOP is a forward-in-time simulation program that can construct models with selection under complex evolutionary scenarios. We will begin with an island model, a stepping stone model incorporating an environmental gradient and more complex scenarios including a hierarchically structured population. Conclusions: We will thoroughly investigate their behavior under complex scenarios. Our study sets the basis to identify the advantages and disadvantages of each method under each modeling assumption. Here, we restrict ourselves to the comparison of the methods but an extension to a model to detect selection to N populations could be developed at a later stage.</p>
<p> </p>
http://www.eseb2013.com/delegates/avatsiou
https://doi.org/10.5281/zenodo.35273
oai:zenodo.org:35273
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Comparison of Haplotype Methods to detect selection
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:34951
2020-01-20T17:40:51Z
user-intercrossing
Grandke, Fabian
Ranganathan, Soumya
Czech, Andrzej
De Haan, Jorn
Metzler, Dirk
2014-08-14
<p>Overview about bioninformatic tools for polyploid crops.</p>
https://doi.org/10.17265/2161-6264/2014.08.001
oai:zenodo.org:34951
Zenodo
https://zenodo.org/communities/intercrossing
info:eu-repo/semantics/openAccess
Other (Open)
Polyploid
Bioinformatics
Bioinformatic Tools for Polyploid Crops
info:eu-repo/semantics/article
oai:zenodo.org:45405
2020-01-20T14:15:59Z
user-intercrossing
openaire
Ranganathan Soumya, Metzler Dirk
2013-10-23
<p>Summary of R-packages which facilitate the data anlysis for bio-informatics</p>
https://doi.org/10.5281/zenodo.45405
oai:zenodo.org:45405
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Data Visualization(DV) with R
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:35367
2020-01-24T19:27:28Z
user-intercrossing
software
user-eu
Evdokim Kovach
Eduardo Pareja Tobes
Raquel Tobes
Marina Manrique
2015-12-14
<p>Metapasta is an open-source, fast and horizontally scalable tool for community profiling based on the analysis of 16S metagenomics data. It is entirely cloud-based and specifically designed to take advantage of it: it performs the community profiling of a sample starting from raw Illumina reads in approximately 1 hour, needing approximately the same time for doing the same on hundreds of samples. It uses BLAST or LAST, but other mapping solutions can be integrated. The taxonomic assignment is done using a best hit and a lowest common ancestor paradigm taking the NCBI taxonomy as reference. As an output, Metapasta generates the frequencies of all the identified taxa in any of the samples in tab-separated value text files. This output includes direct assignment frequencies and cumulative frequencies based on the hierarchical structure of the taxonomy tree. Reports format can be configured using DSL similar to spreadsheet formulas. PDF files with assigned taxonomy tree can be rendered.</p>
<p>Metapasta is implemented in Scala and based on cloud computing (Amazon Web Services). The graph data platform Bio4jis used for retrieving taxonomy related information and the tool Compota is used for distributing and coordinating compute tasks.</p>
https://doi.org/10.5281/zenodo.35367
oai:zenodo.org:35367
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
MIT License
https://opensource.org/licenses/MIT
metagenomics
16S
AWS
Metapasta
info:eu-repo/semantics/other
oai:zenodo.org:35252
2020-01-20T14:22:50Z
user-intercrossing
openaire
user-eu
Wang, Nian
Kardailsky, Igor
Borrell, James
Joecker, Anika
Nichols, Richard
Buggs, Richard
Zohren, Jasmin
2015-11-11
<p>Hybridisation may lead to introgression of genes among species. Introgression may be bidirectional or unidirectional, depending on factors such as the demography of the hybridising species, or the nature of reproductive barriers between them. Previous microsatellite studies suggested bidirectional introgression between diploid <em>Betula nana</em> and tetraploid <em>B. pubescens</em> and also between <em>B. pubescens </em>and diploid <em>B. pendula</em>. Here we analyse introgression among these species using 76,587 variants in restriction-site associated (RAD) markers in 196 individuals. We found unidirectional introgression into <em>B. pubescens </em>from both of the diploid species, in clear clines with greater <em>B. nana </em>introgression in the north and greater <em>B. pendula </em>introgression in the south. These patterns of introgression fit with the nature of the reproductive barrier between diploids and tetraploids, and historical range shifts of the three species since glaciation. To facilitate our analysis we developed a new script for genotyping polyploids based on read counts and average base quality, with an algorithm that calculates the maximum likelihood variant configuration at each locus. This is implemented in R and, besides population genetics, should be of use in cancer research and haplotype based analyses.</p>
https://doi.org/10.5281/zenodo.35252
oai:zenodo.org:35252
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
LERN, London Evolutionary Research Network Conference, London, UK, 11 November 2015
climate change, genotyping polyploids, hybridisation, introgression
INTROGRESSION AMONG BRITISH BIRCH TREES
info:eu-repo/semantics/lecture
oai:zenodo.org:35251
2020-01-20T15:36:49Z
user-intercrossing
openaire
user-eu
Zohre, Jasmin
Kardailsky, Igor
Nielsen, Kare Lehmann
Joecker, Anika
Buggs, Richard
2015-01-12
<p>Genotyping of SNP loci polyploids has always been challenging, but may become easier using high-coverage sequencing. The production of such data in turn requires the development of new methods and software that can be used to analyse it in user friendly software such as the CLC Genomics Workbench. The Fixed Ploidy Variant Detection tool of the CLC Genomics Workbench already takes higher ploidy levels into account, but an explicit evaluation of the locus configuration is currently not provided. We have developed an algorithm that uses read counts and average base quality to estimate the most likely allelic configuration at each variant position in polyploid samples. The underlying model<br />
uses a log-likelihood approach and the applications for this are very broad. Not only population genetics, but also cancer research and haplotype based analyses can benefit from this. We are applying the algorithm to a <em>Betula </em>RAD data set to investigate introgression and gene flow in three species of the genus. Independently, a read-based haplotype caller is currently being developed at CLC bio. The number of called haplotypes and their support is expected to correlate with the locus configuration and will serve as a validation of our approach.</p>
https://doi.org/10.5281/zenodo.35251
oai:zenodo.org:35251
Zenodo
https://zenodo.org/communities/intercrossing
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial No Derivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
PAG XXIII, Plant and Animal Genome XXIII Conference, San Diego, CA, USA, 10-14 January 2015
genotyping polyploids, allele dosage, Betula
Assessing Allelic Configuration Models in Fixed Ploidy Variant Calling Using R
info:eu-repo/semantics/conferencePoster
oai:zenodo.org:45452
2020-01-20T17:25:55Z
user-intercrossing
openaire
Ranganathan, Soumya
2015-04-28
<p>Many agricultural crops like cotton, potato, alfalfa and many of the flowering plantsare polyploid. In order to manipulate these crops genetically so that they have superior traits like higher nutritive value, improved tolerance abilities against harsh climates etc., it is important to characterize the genetic markers associated with these traits. It is required to identify their genotypes and commonly found haplotypes. There are some reliable methods to perform genotyping and phasing of diploid species, but not many such methods are available for polyploids. When we tested different programs to perform genotyping of tetraploids, we learnt that the methods which are available are very specific to the ploidy and that the various available methods are not in agreement with one another for a given set of data. My PhD project aims to develop and implement a phasing method, which may be applied for species of any ploidy. Our phasing method, which is very close to completion is based on observation that haplotypes tend to form clusters and the similarity of the clusters varies along the genome according to a recombination rate. This idea is motivated by the model behind the method fastPHASE, a tool to phase diploids, detailed in Scheet & Stephans (2006). We use Hidden markov model (HMM) approach based model and then use Numerical optimization to estimate the parameters. In the end, we use backward sampling to perform the phasing, which is a standard method with HMM based approaches. In order to address the performance load which is typically imposed by HMM methods with large state spaces, we employ the particle filter method to sample and filter the states and this makes the size of our state space easily manageable.The implementation with respect to coding (all the modules ) is complete, but it is yet to be tested rigorously on various kinds of data sets and then we intend to perform comparative analysis against current available methods of phasing.</p>
https://doi.org/10.5281/zenodo.45452
oai:zenodo.org:45452
Zenodo
https://zenodo.org/communities/intercrossing
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Haplotype-phasing in polyploids
info:eu-repo/semantics/lecture