Detection of strain-level variation in the microbiome

A paper I recently contributed to is now accepted at Genome Research:

An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography

Stephen Nayfach, Beltran Rodriguez-Mueller, Nandita Garud, Katherine S Pollard
In this paper, we introduce a new software, MIDAS, which can identify SNPs and CNVs in shotgun metagenomic data. We then apply the software to a mother-infant data set and show that while infant gut microbiomes resemble mother’s microbiomes over time at the species level, the majority of the strain transmissions from mother to infant occur closer to birth rather than later in life. We also apply MIDAS to ocean metagenomic data and show that there is substructure at the strain level in different geographic regions. MIDAS offers the ability to track strain level variation in the microbiome, making it possible to delve more deeply into the evolutionary forces shaping the the microbiome.

Paper accepted: Elevation of linkage disequilibrium above neutral expectations in ancestral and derived populations of Drosophila melanogaster

My paper with Dmitri Petrov on Elevation of linkage disequilibrium above neutral expectations in ancestral and derived population of Drosophila melanogaster is accepted at Genetics. In this paper we show that signatures of soft sweeps are common to multiple populations of D. melanogaster.

In our previous paper in PLoS Genetics, we showed that soft sweeps are common in the Raleigh population of D. melanogaster. However, there were many questions raised regarding the extent to which soft sweeps are specific to the North Carolina population we studied. There are many factors that challenge the results of the North Carolina data set which we sought to address. First, the North American flies have experienced extensive admixture, the effects of which are largely unknown on LD. Second, the Raleigh data set was generated by extensive inbreeding, which could also impact LD. Dmitri and I analyzed a sample of >100 fully sequenced strains from Zambia, an ancestral population that has experienced little to no admixture and was generated by sequencing haploid embryos rather than inbred strains. My results revealed that soft sweeps are common to both Raleigh and Zambia. In addition, in Zambia we found evidence for some hard sweeps.

A copy of our paper is available here and will be available on the Genetics website soon.

Figure 3 from our paper: Haplotype frequency spectra for the 25 H12 peaks in Zambian and Raleigh data. Shown are haplotype frequency spectra for the top 25 peaks in the Zambian H12 scan conducted in 801 SNP windows down-sampled to 401 SNPs (A) and the Raleigh H12 scan conducted in 401 SNP windows (B). For each peak, the frequency spectrum corresponding to the analysis window with the highest H12 value was plotted. The height of the upmost shaded region (light blue) in each bar indicates the frequency of the most prevalent haplotype in the sample of 145 individuals, and heights of subsequent colored bars indicate the frequency of the second, third, and so on most frequent haplotypes in a sample. Grey bars indicate singletons. In Zambia, sweeps reach a smaller partial frequency than Raleigh. Many peaks in the Zambian data have multiple haplotypes present at high frequency indicative of soft sweeps, and many peaks have a single haplotype dominating the haplotype spectra, indicative of hard sweeps. In Raleigh all sweeps have multiple haplotypes at high frequency consistent with signatures of soft sweeps.

fig3

 

Videos on our review on adaptation in pathogens

Along with Pleuni Pennings, Ben Wilson, Alison Feder, and Zoe Assaf, I recently published a review on adaptation in pathogens in Molecular Ecology. In this review we discuss the state of the art population genetic analyses conducted in a wide array of pathogens including P. falciparum (the malaria causing pathogen), HIV, tuberculosis, Staph, and flu. Please check out our paper here.

We made short videos highlighting the different pathogens that we wrote about.

This is me discussing adaptation in P. falciparum:

Ben on influenza:

Here’s Pleuni talking about HIV:

Alison talking about tuberculosis:

And Zoe sharing work on Staphylococcus aureus:

 

Review accepted: The population genetics of drug resistance evolution in natural populations of viral, bacterial, and eukaryotic pathogens.

My co-authors, Pleuni Pennings, Zoe Assaf, Alison Feder, and Ben WIlson, and I recently wrote a review on: The population genetics  of drug resistance evolution in natural populations of viral, bacterial, and eukaryotic pathogens. Our review will be coming out in Molecular Ecology.

Paper on elevation of LD in Drosophila on BioRxiv

I posted my latest paper with Dmitri Petrov on BioRxiv on the Elevation of linkage disequilibrium above neutral expectations in ancestral and derived populations of Drosophila melanogaster. In this paper, we show that signatures of elevated LD and haplotype homozygosity are common in multiple populations of D. melanogaster and that signatures of partial soft sweeps are generic to multiple populations. We welcome any feedback or questions about the paper.

SMBE talk: Pervasive long-range linkage disequilibrium in D. melanogaster

I had the opportunity to present my latest paper draft on long range linkage disequilibrium in D. melanogaster at the SMBE 2015 meeting held in Vienna Austria. In this paper, Dmitri Petrov and I show that levels of LD both at short and long distances are elevated above neutral expectations in both Raleigh and Zambian populations of D. melanogaster. Furthermore, we find that levels of haplotype homozygosity are also elevated in both populations. Examination of the haplotype frequency spectra in the two populations reveals that signatures of soft sweeps are common in both  populations, suggesting that soft sweeps are generic to multiple populations of Drosophila.

Here is a picture that Alex Cagan drew of me and my talk!

CKBRxKGWIAA0ed9.jpg_large

Enhancing the mathematical properties of new haplotype homozygosity statistics for the detection of selective sweeps

I’m really pleased to share that my paper with Noah Rosenberg on the mathematical properties of H12 and H2/H1 is now published in Theoretical Population Biology. In this paper we introduce a normalization for the H2/H1 statistic as a function of H12 and show that the two statistics must be used in conjunction with each other to be able to differentiate hard and soft sweeps.

SMBE Young Investigator Travel Award

I am very grateful to have received the SMBE Young Investigator Travel Award for my oral presentation in Vienna this July on Long-range linkage disequilibrium in multiple natural populations of D. melanogaster.

Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps

Our paper on detecting hard and soft sweeps in D. melanogaster population genomic data from North America is finally published in PLoS Genetics!

Check it out here.

Current Issues In Genetics: Pervasive long-range linkage disequilibrium in natural populations of D. melanogaster

On Friday, November 14, I will be giving a talk to the Stanford Genetics department about a paper I am writing on pervasive long-range linkage disequilibrium (LD) in natural populations of D. melanogaster. LD is a measure of the amount of correlation between pairs of polymorphisms in the data, also known in statistics at R^2. The expectation is that polymorphisms far apart from one another should have low amounts of LD because recombination and mutation events should break up any structure in the genome. However, I show that there actually is a very high amount of LD even at long ranges where neutral expectations suggest there should be little to no LD. I suggest that a plausible explanation for the genome-wide elevation in LD is repeatable selective events in the Drosophila genome.