Impacts of low coverage depths and post-mortem DNA damage on variant calling: A simulation study
View/ Open
File version
Version of Record (VoR)
Author(s)
Parks, Matthew
Lambert, David
Griffith University Author(s)
Year published
2015
Metadata
Show full item recordAbstract
Background:
Massively parallel sequencing platforms, featuring high throughput and relatively short read lengths, are well suited to ancient DNA (aDNA) studies. Variant identification from short-read alignment could be hindered, however, by low DNA concentrations common to historic samples, which constrain sequencing depths, and post-mortem DNA damage patterns.
Results:
We simulated pairs of sequences to act as reference and sample genomes at varied GC contents and divergence levels. Short-read sequence pools were generated from sample sequences, and subjected to varying levels of “post-mortem” damage by adjusting levels ...
View more >Background: Massively parallel sequencing platforms, featuring high throughput and relatively short read lengths, are well suited to ancient DNA (aDNA) studies. Variant identification from short-read alignment could be hindered, however, by low DNA concentrations common to historic samples, which constrain sequencing depths, and post-mortem DNA damage patterns. Results: We simulated pairs of sequences to act as reference and sample genomes at varied GC contents and divergence levels. Short-read sequence pools were generated from sample sequences, and subjected to varying levels of “post-mortem” damage by adjusting levels of fragmentation and fragmentation biases, transition rates at sequence ends, and sequencing depths. Mapping of sample read pools to reference sequences revealed several trends, including decreased alignment success with increased read length and decreased variant recovery with increased divergence. Variants were generally called with high accuracy, however identification of SNPs (single-nucleotide polymorphisms) was less accurate for high damage/low divergence samples. Modest increases in sequencing depth resulted in rapid gains in total variant recovery, and limited improvements to recovery of heterozygous variants. Conclusions: This in silico study suggests aDNA-associated damage patterns minimally impact variant call accuracy and recovery from short-read alignment, while modest increases in sequencing depth can greatly improve variant recovery.
View less >
View more >Background: Massively parallel sequencing platforms, featuring high throughput and relatively short read lengths, are well suited to ancient DNA (aDNA) studies. Variant identification from short-read alignment could be hindered, however, by low DNA concentrations common to historic samples, which constrain sequencing depths, and post-mortem DNA damage patterns. Results: We simulated pairs of sequences to act as reference and sample genomes at varied GC contents and divergence levels. Short-read sequence pools were generated from sample sequences, and subjected to varying levels of “post-mortem” damage by adjusting levels of fragmentation and fragmentation biases, transition rates at sequence ends, and sequencing depths. Mapping of sample read pools to reference sequences revealed several trends, including decreased alignment success with increased read length and decreased variant recovery with increased divergence. Variants were generally called with high accuracy, however identification of SNPs (single-nucleotide polymorphisms) was less accurate for high damage/low divergence samples. Modest increases in sequencing depth resulted in rapid gains in total variant recovery, and limited improvements to recovery of heterozygous variants. Conclusions: This in silico study suggests aDNA-associated damage patterns minimally impact variant call accuracy and recovery from short-read alignment, while modest increases in sequencing depth can greatly improve variant recovery.
View less >
Journal Title
BMC Genomics
Volume
16
Issue
1
Copyright Statement
© Parks and Lambert; licensee Biomed Central. 2015. This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Note
Page numbers are not for citation purposes. Instead, this article has the unique article number of 19.
Subject
Biological sciences
Other biological sciences not elsewhere classified
Information and computing sciences
Biomedical and clinical sciences