We’re in the midst of rebuilding our website: please email if you have any questions about our research or openings.
In July 2021, the lab packed up our >10,000 bacteria stocks and >3,000 seed stocks in Chicago and moved across the country to our new home at NYU. We’ve been establishing our lab in the the Center for Genomics and Systems Biology (CGSB) building, NYU’s recently-renovated science facility located near Washington Square.
We are so thrilled to congratulate Joy on this seminal accomplishment! The good news was celebrated by the lab and the department at large, and covered in a UChicago news article.
The lab joins the department in celebrating Joy’s election
Joy with Mercedes Pascual, Tim Morton, and Grey Dwyer
In prior studies, our lab found that costs of resistance to pathogens in the absence of disease was ~5-10% for the resistance (R) genes Rps5 and Rpm1, respectively. However, Arabidopsis thaliana has 149 R-genes so it is unlikely that many R genes incur such a high cost. The now published research of former PhD student Alice MacQueen focuses on Rps2 that exists as an ancient balanced polymorphism with two long-lived clades of alleles. Alice conducted field trials that show that Arabidopsis thaliana plants with resistant Rps2 are no less fit than those with a susceptible Rps2 allele in the absence of disease. Both resistant and susceptible Rps2 alleles contribute to controlling defense and stress gene expression thus presenting a pleiotropic effect to explain the maintenance of both alleles.
“These results demonstrate how profoundly the magnitude of fitness costs associated with disease resistance may be shaped by genomic architecture and pleiotropy… These findings shed much-needed light on how the full repertoire of R genes is maintained in the A. thaliana genome. More broadly, these results show that the nature of fitness costs and trade-offs of disease resistance vary among loci even within the same host. Such information is crucial for crop breeding, where the challenge lies in producing high-yield crops while minimizing the cost of disease control.”
We illustrated this post with Sir John Tenniel’s drawing of the Red Queen and Alice from Lewis Carroll’s Through the Looking-Glass. The Red Queen tells Alice: “Now, here, you see, it takes all the running you can do, to keep in the same place”. This is commonly used as an analogy for co-evolution, as hosts and parasites have to rapidly adapt to each other in order to not loose the race. A concept introduced by Leigh Van Valen’s 1973 article. The rate of this co-evolutionary arms race is expected to be constrained by fitness costs.
Alice MacQueen performed fitness experiments as part of her doctoral dissertation and is now a post doctoral researcher with the Juenger lab in Austin Texas.
Xiaoqin Sun worked with the Bergelson lab from 2007-2009 and is now at the Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing.
The 1001 Genomes Consortium set out to provide detailed whole-genome sequences of at least 1001 genotypes of the model plant Arabidopsis thaliana. In a worldwide collaboration, including (past) lab members Angela Hancock, Matthew Horton, Wayan Muliyati, Gianluca Sperone and Joy Bergelson, the consortium released 1,135 genome sequences of A. thaliana. The joint effort results in a publicly available, invaluable resource to study phenotypic variation and adaptation in plants.
The release of the genomes in Cell 2016, 166, provides a fascinating insight into A. thaliana’s global population structure, migration patterns, and evolutionary history. When combined with the RegMap panel, we now have 2,029 natural A. thaliana genotypes with high quality polymorphism data that will greatly expand our ability to study how wild plants adapt to biotic and abiotic environments.
Origins of the 1001 Genomes Accessions (A) Collection locations of the 1001 Genomes accessions by diversity set (colors correspond to Venn diagram in B). (B) Relationships between 1001 Genomes accessions and other A. thaliana diversity sets (Nordborg et al., 2005; Cao et al., 2011; Horton et al., 2012; Long et al., 2013; Schmitz et al., 2013).
In scientific manuscripts, we tell stories of our research, generally in straight-line fashion with clear motivations and results. This type of research is rare (in my experience), with stories, motivations, and applications only realized post hoc. This is the nature of science, and our recent ISMEJ publication is no different.
With “16Stimator: statistical estimation of ribosomal gene copy numbers from draft genome assemblies“, we introduce an exciting method to generate 16S rRNA gene (16S) copy number estimates for bacterial genomes based on comparison of sequencing read depths of ribosomal and single copy gene regions. Application of this method resulted in 16S copy number estimates for hundreds of bacterial species without closed genome representatives. This extended database of known 16S copy numbers combined with phylogenetic based normalization methods [ – PICRUSt] for 16S amplicon sequencing studies will lead to more accurate organismal abundance measurements. Note: Our article is not open access but the code is freely available.
These are valid and important motivations and applications, but really, we just wanted to know the 16S copy numbers for a handful of isolates so we could properly measure their abundances by amplicon sequencing in controlled community studies.
So here is the actual development route of 16Stimator:
A caveat of 16S amplicon sequencing studies is that, due to variation in bacterial 16S copy number, sequencing read and organismal abundances are not equivalent. For our controlled community experiments using leaf endophytic bacteria originally isolated from Arabidopsis thaliana, we needed to determine each isolate’s 16S copy number. We chose whole genome sequencing and assembly for this task. That was a horrible choice.
Current assembly algorithms do a poor job resolving repetitive genomic regions. Longer reads or larger insert sizes can overcome this limitation, but alas, we had short read, Illumina sequencing libraries with insert sizes smaller than ribosomal rRNA gene regions. After assembly, the 16S rRNA gene was found in one to few contigs. When we mapped reads back to the assembly, the coverage of the 16S contig was much greater than the average genomic coverage, so we sought to use read-depths to resolve 16S copy numbers. By statistical coverage comparisons of 16S to single copy, conserved genes, we were able to accurately estimate copy numbers.
16Stimator pipeline overview.
Though the focus of the paper is on the sequencing read-depth approach, we did confirm 16S copy numbers experimentally, using an efficient qPCR approach. We compared amplification of 16S to single copy, conserved genes to determine copy number. The IDT-DNA gBlocks provided a convenient alternative to plasmid construction for creating standards with a 1:1 ratio of 16S to single copy gene.
16S copy number estimates from de novo assemblies. For each endophytic isolate, paired-end sequencing reads (R1, R2) were generated on the Illumina HiSeq 2000 from short (~250 bp) and long (~2500) insert libraries (Short_Ins and Long_Ins, respectively). For closed-genome controls, similarly generated sequencing reads were downloaded from SRA: Escherichia coli TY-2482 (GCA_000217695.2, SRR292678, SRR292862), Bacteroides fragilis HMW 615 (GCA_000297735.1, SRR488169, SRR488170), Pseudomonas aeruginosa PAO1 (GCA_000006765.1, SRR032420, SRR032832) and Staphylococcus aureus KPL1828 (GCA_000507725.1, SRR835799, SRR958927). The 16Stimator pipeline was used to estimate 16S copy number as the ratio of median coverage for 16S and single-copy genes. Confidence intervals (95%) were either calculated as in Price and Bonett (2002) (PB), or via permutations (Perm). For endophytic isolates, 16S copy numbers were independently verified by absolute quantification via qPCR with the mean and standard deviation of technical replicates shown. For closed-genome controls, each horizontal line marks the rrnDB (Stoddard et al., 2014) consensus 16S copy number for each species. Note: the short-insert library for MEDvA23 and the long-insert library for MEB061 did not meet quality thresholds. 16S copy number was not experimentally determined by qPCR for E. coli TY-2482, B. fragilis HMW 615, P. aeruginosa PAO1 and S. aureus KPL1828.
Only after resolving 16S copy numbers for our isolates of interest did we realize that this method could be applied to thousands of other draft genomes. We scaled 16Stimator to process tens of thousands of sequencing libraries deposited in SRA, resulting in 16S copy number estimations for hundreds of species without closed genome representatives. A large and diverse database of 16S copy numbers combined with methods to correct for copy number bias in 16S amplicon sequencing studies will ultimately result in more accurate abundance and diversity estimates. If sequencing reads are publicly deposited along with draft genome sequences, then the database can continue to grow.
Though we did not initially intend to create a method to estimate 16S copy numbers from draft genomes, science threw us a curveball and 16Stimator was our response. All the scripts and data are publicly available at https://bitbucket.org/perisin/16stimator. We look forward to feedback on our method to continue to improve and generate 16Stimates!
Kembel, S. W., Wu, M., Eisen, J. A., & Green, J. L. (2012). Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol, 8(10), e1002743. https://doi.org/10.1371/journal.pcbi.1002743Cite
Langille, M. G. I., Zaneveld, J., Caporaso, J. G., McDonald, D., Knights, D., Reyes, J. A., Clemente, J. C., Burkepile, D. E., Vega Thurber, R. L., Knight, R., Beiko, R. G., & Huttenhower, C. (2013). Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnology, 31(9), 814–821. https://doi.org/10.1038/nbt.2676Cite
Perisin, Matthew, Madlen Vetter, Jack A. Gilbert, and Joy Bergelson. 2015. “16Stimator: Statistical Estimation of Ribosomal Gene Copy Numbers from Draft Genome Assemblies.” The ISME Journal. https://doi.org/10.1038/ismej.2015.161.
Stoddard, Steven F., Byron J. Smith, Robert Hein, Benjamin R. K. Roller, and Thomas M. Schmidt. 2015. “RrnDB: Improved Tools for Interpreting RRNA Gene Abundance in Bacteria and Archaea and a New Foundation for Future Development.” Nucleic Acids Research 43 (D1): D593–D598. https://doi.org/10.1093/nar/gku1201.
Price, Robert M., and Douglas G. Bonett. 2002. “Distribution-Free Confidence Intervals for Difference and Ratio of Medians.” Journal of Statistical Computation and Simulation 72 (2): 119–124. https://doi.org/10.1080/00949650212140.
Durable resistance in agriculture is difficult to achieve, and in fact most resistance factors that are introduced into crops are effective for fewer than five years. In contrast, resistance polymorphisms in nature often persist for thousands, if not millions, of years. Why are these dynamics so different?
In this work, Talia Karasov with recent members of the Bergelson group and in collaboration with Richard Hudson and Roger Innes investigated how polymorphisms in resistance (R) genes are maintained over long time scales.
Through dissecting a resistance polymorphism in nature the authors show that the complexity inherent in ecological communities is key to its longevity. This suggests that the simplicity of agricultural communities may not be conducive to long-term resistance. Our study highlights the value of understanding natural species interactions for resistance management.
Karasov, T. L., Kniskern, J. M., Gao, L., DeYoung, B. J., Ding, J., Dubiella, U., … & Bergelson, J. (2014). The long-term maintenance of a resistance polymorphism through diffuse interactions. Nature, 512(7515), 436-440.