Github repository: SelectionHapStats

SelectionHapStats is a repository of Python scripts written to identify natural selection events in the genome and R scripts written to visualize the signatures of selective sites. The python code provided calculates haplotype homozygosity statistics H12 and H2/H1 in a genome-wide scan, as well as identified H12 peaks in genomic data. The R code provided visualizes the haplotype frequency spectra for the top peaks in the data and the genome-wide scan of H12.

This code presented in this repository is based on the arXived paper, Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps (http://arxiv.org/abs/1303.0906).

Check out my blog post for further description on the project and examples of visual output from the code!

URL: https://github.com/ngarud/SelectionHapStats

Video of my talk on the detection of hard and soft sweeps

Here is a video of me giving at talk at the Bay Area Population Genetics meeting at Stanford University in January 2013 on my project on the detection of hard and soft selective sweeps. 

Stanford CEHG Fellow

I have been awarded a fellowship from the Stanford Center for Human Genomics to support my PhD studies for the upcoming Fall and Winter quarters.

SMBE oral presentation: Disentangling the effects of demography and selection on haplotype structure in Drosophila melanogaster

This year at the Society for Molecular Biology and Evolution meeting I am presenting a talk on my work on “Disentangling the effects of demography and selection on haplotype structure in Drosophila melanogaster“. In my paper, I show that current demographic models that have been fit to neutral regions of the genome fit some summary statistics which assume independence between polymorphic sites, such as S, Pi, but fail to fit other summary statistics which take into account correlation in the data, such as long-range linkage disequilibrium and genome-wide haplotype homozygosity levels.

In addition, I am a co-author on Philipp Messer’s talk on “New statistical methods detect both hard and soft sweeps in malaria parasites.” In this paper, Philipp and I apply several different, but related, haplotype homozygosity statistics to the malaria genome and show that we have great power to recover several positive controls, depending on the method used.

We will both be presenting in the session Wednesday, June 11 titled “Detecting selection in natural populations: making sense of genome scans and towards alternative solutions.”

Bay Area Population Genetics X

I presented a poster at the tenth Bay Area Population Genetics meeting based on my work with Dr. Noah Rosenberg on the mathematical properties of the H12 and H2/H1 statistics. The H12 statistic is a haplotype homozygosity statistic used to identify regions of the genome under positive selection, and the H2/H1 statistic is used to distinguish whether the candidate region under selection shows signatures of a hard versus soft sweep. In our paper, Noah and I show that there is an upper bound for H2/H1 as a function of the corresponding H12 value. We apply this upper bound to data and show that it can help facilitate the interpretation of H12 and H2/H1 measured in heterogenous data sets with varying sample sizes and missing data rates.

Here is a copy of my poster:

normalization_poster_052114

Stanford CEHG Evolgenome speaker seminar

This spring quarter at Stanford, I am co-organizing the Stanford CEHG Evolgenome speaker seminar. This is a weekly seminar with speakers from around the campus part of the Center for Evolution and Human Genomics, as well as local visitors from nearby institutions in the Bay Area. Check out our exciting line up of speakers:

cehgPoster

Simons Institute for the Theory of Computing

I attended the Simons Institute workshop titled Computation-Intensive Probabilistic and Statistical Methods for Large-Scale Population Genomics. There was a great lineup of speakers and some opportunities to meet colleagues!

Talk at the Biomedical Computation at Stanford Conference (BCATS)

Today I gave a talk on my work on detecting hard and soft sweeps in Drosophila at BCATS. I presented new work inferring the softness of the sweeps in Drosophila, showing that sweeps on average have an adaptive theta compatible with the number of sweeping haplotypes to be around 12.8. The talk was well received, and I appreciate all the questions and comments from the audience.

Recent selective sweeps in Drosophila were abundant and primarily soft

We recently rewrote our paper on identifying soft selective sweeps in Drosophila and posted version 2 on the ArXiv. In this new version, we (i) focus much more on the possibility of complex demographic scenarios generating detected signatures, (ii) carry out extensive ABC computations to estimate the likeliest adaptive theta for all of our peaks, and (iii) investigate the power of our statistics to detect sweeps of varying “softness” either due to varying adaptive theta values or because the sweeping allele starts at varying initial frequency. In the end, our conclusions remain the same: that recent selective sweeps in Drosophila were abundant and primarily soft.

Please feel free to check out version 2 and send us your comments!

CEHG symposium

Today I had the opportunity to attend the Stanford CEHG symposium and presented a poster on my work on soft sweeps. It was great to learn some new ideas and meet researchers interested in scanning for soft sweeps in various genomes!