Enhancing the mathematical properties of new haplotype homozygosity statistics for the detection of selective sweeps

I’m really pleased to share that my paper with Noah Rosenberg on the mathematical properties of H12 and H2/H1 is now published in Theoretical Population Biology. In this paper we introduce a normalization for the H2/H1 statistic as a function of H12 and show that the two statistics must be used in conjunction with each other to be able to differentiate hard and soft sweeps.

SMBE Young Investigator Travel Award

I am very grateful to have received the SMBE Young Investigator Travel Award for my oral presentation in Vienna this July on Long-range linkage disequilibrium in multiple natural populations of D. melanogaster.

Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps

Our paper on detecting hard and soft sweeps in D. melanogaster population genomic data from North America is finally published in PLoS Genetics!

Check it out here.

Current Issues In Genetics: Pervasive long-range linkage disequilibrium in natural populations of D. melanogaster

On Friday, November 14, I will be giving a talk to the Stanford Genetics department about a paper I am writing on pervasive long-range linkage disequilibrium (LD) in natural populations of D. melanogaster. LD is a measure of the amount of correlation between pairs of polymorphisms in the data, also known in statistics at R^2. The expectation is that polymorphisms far apart from one another should have low amounts of LD because recombination and mutation events should break up any structure in the genome. However, I show that there actually is a very high amount of LD even at long ranges where neutral expectations suggest there should be little to no LD. I suggest that a plausible explanation for the genome-wide elevation in LD is repeatable selective events in the Drosophila genome.

Github repository: SelectionHapStats

SelectionHapStats is a repository of Python scripts written to identify natural selection events in the genome and R scripts written to visualize the signatures of selective sites. The python code provided calculates haplotype homozygosity statistics H12 and H2/H1 in a genome-wide scan, as well as identified H12 peaks in genomic data. The R code provided visualizes the haplotype frequency spectra for the top peaks in the data and the genome-wide scan of H12.

This code presented in this repository is based on the arXived paper, Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps (http://arxiv.org/abs/1303.0906).

Check out my blog post for further description on the project and examples of visual output from the code!

URL: https://github.com/ngarud/SelectionHapStats

Video of my talk on the detection of hard and soft sweeps

Here is a video of me giving at talk at the Bay Area Population Genetics meeting at Stanford University in January 2013 on my project on the detection of hard and soft selective sweeps. 

Stanford CEHG Fellow

I have been awarded a fellowship from the Stanford Center for Human Genomics to support my PhD studies for the upcoming Fall and Winter quarters.

SMBE oral presentation: Disentangling the effects of demography and selection on haplotype structure in Drosophila melanogaster

This year at the Society for Molecular Biology and Evolution meeting I am presenting a talk on my work on “Disentangling the effects of demography and selection on haplotype structure in Drosophila melanogaster“. In my paper, I show that current demographic models that have been fit to neutral regions of the genome fit some summary statistics which assume independence between polymorphic sites, such as S, Pi, but fail to fit other summary statistics which take into account correlation in the data, such as long-range linkage disequilibrium and genome-wide haplotype homozygosity levels.

In addition, I am a co-author on Philipp Messer’s talk on “New statistical methods detect both hard and soft sweeps in malaria parasites.” In this paper, Philipp and I apply several different, but related, haplotype homozygosity statistics to the malaria genome and show that we have great power to recover several positive controls, depending on the method used.

We will both be presenting in the session Wednesday, June 11 titled “Detecting selection in natural populations: making sense of genome scans and towards alternative solutions.”

Bay Area Population Genetics X

I presented a poster at the tenth Bay Area Population Genetics meeting based on my work with Dr. Noah Rosenberg on the mathematical properties of the H12 and H2/H1 statistics. The H12 statistic is a haplotype homozygosity statistic used to identify regions of the genome under positive selection, and the H2/H1 statistic is used to distinguish whether the candidate region under selection shows signatures of a hard versus soft sweep. In our paper, Noah and I show that there is an upper bound for H2/H1 as a function of the corresponding H12 value. We apply this upper bound to data and show that it can help facilitate the interpretation of H12 and H2/H1 measured in heterogenous data sets with varying sample sizes and missing data rates.

Here is a copy of my poster:

normalization_poster_052114

Stanford CEHG Evolgenome speaker seminar

This spring quarter at Stanford, I am co-organizing the Stanford CEHG Evolgenome speaker seminar. This is a weekly seminar with speakers from around the campus part of the Center for Evolution and Human Genomics, as well as local visitors from nearby institutions in the Bay Area. Check out our exciting line up of speakers:

cehgPoster