Dataset Open Access

Searching for anthrax in the New York City subway metagenome.

Petit III, Robert A.; Ezewudo, Matthew; Joseph, Sandeep J.; Read, Timothy D.

You can view the write up at the following link:

This data set includes the scripts and write up of the following GitHub repository:


In January 2015 Chris Mason and his team published1 an in-depth analysis of metagenomic2 data(environmental shotgun DNA sequence) from samples isolated from public surfaces in the New York City (NYC) subway system. Along with a ton of really interesting findings, the authors claimed to have detected DNA from the bacterial biothreat pathogens Bacillus anthracis (which causes anthrax) and Yersinia pestis(causes plague) in some of the samples. This predictably led to a huge interest from the press and scientists on social media. The authors followed up with an re-analysis of the data on microbe.net3, where they showed some results that suggested the tools that they were using for species identification overcalled anthrax and plague.

B. anthracis is a Gram-positive bacterium that forms tough spores as part of its lifecycle. The 5.2 M basepair (Mb) main chromosome is very similar to those of other bacteria in species informally called the ‘Bacillus cereus group’4 (including B. cereusB. thuringiensis and B. mycoides). Bacillus cereus group strains in general are commonly found in soil but B. anthracis itself is very rare and generally associated with livestock grazing sites with a past history of anthrax.

What sets B. anthracis apart from close relatives is the presence of two plasmids: pXO1 (181kb), which carries the lethal toxin genes and pXO2 (94kb), which includes genes for a protective capsule. Without one of these plasmids, B. anthracis is considered attenuated in virulence and unable to cause classic anthrax. Other B. cereus group bacteria can have plasmids very similar to pXO1 and pXO2 but missing the important virulence genes. Rarely, other B. cereus group carry pXO1 and appear to cause anthrax-like disease. Its a confusing situation, not helped by the current overly-narrow species definitions. This recent review5 gives more information.

The NYC subway metagenome study raised very timely questions about using unbiased DNA sequencing for pathogen detection. We were interested in this dataset as soon as the publication appeared and started looking deeper into why the analysis software gave false positive results and indeed what exactly was found in the subway samples. We decided to wrap up the results of our preliminary analysis and put it on this site. This report focuses on the results for B. anthracis but we also did some preliminary work on Y.pestis and may follow up on this later.




You can view the write up at the following link: This data set includes the scripts and the write up of the following GitHub repository: (release 1.0)
Files (474.7 MB)
Name Size
474.7 MB Download
All versions This version
Views 114114
Downloads 33
Data volume 1.4 GB1.4 GB
Unique views 103103
Unique downloads 33


Cite as