The effects of sample size on population genomic analyses: Implications for the tests of neutrality
File version
Version of Record (VoR)
Author(s)
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Abstract
Background: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. Results: We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). Conclusions: The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24 % vs 10 %).
Journal Title
BMC Genomics
Conference Title
Book Title
Edition
Volume
17
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2016 Subramanian. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/ zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Item Access Status
Note
Access the data
Related item(s)
Subject
Genomics
Biological Sciences
Information and Computing Sciences
Medical and Health Sciences