Today a guest post by Jarrett Philips. He writes about a project of his Masters studies in Bioinformatics which he also presented as a poster contribution at the 6th International Barcode of Life Conference last August. Enjoy and many thanks to Jarrett for his contribution.
The ability of DNA barcodes to detect meaningful genetic variation within and between species is strongly influenced by the scale of specimen sampling. Unfortunately, global barcoding efforts have only been partially successful in this regard due to the majority of studies forgoing deep taxon sampling in favour of optimizing the number of taxa sampled.
A practical sample size of five individuals per species is common in barcoding studies, but such a strategy is by no means sufficient. This has led to sampling schemes in which many more specimens per species are collected.
A group of researchers from the University of Guelph wished to test whether current sampling efforts are adequate to document standing genetic variation at the species level.
To do this, they developed a simple quantitative model to predict total sample sizes given estimates of observed specimen as well as observed COI haplotype numbers and total haplotype diversity for a species. In creating their model, one very important assumption was made: that haplotypes occur at equal frequency within species populations. Such an assumption is not biologically realistic since species abundances are often skewed geographically.
The authors found wide-ranging sample sizes (between 150-5400 individuals/species) are likely needed to uncover all haplotype diversity across 18 selected species comprising freshwater, marine and migratory ray-finned fishes (Chordata: Actinopterygii). This is a far cry from sampling intensities currently employed in many barcoding initiatives; however such numbers may not be practical and further investigation will be required to fully probe the extent of sampling necessary to gauge existing species genetic diversity in this group and others.
The final paragraph of their study is particularly motivating:
We recognize that estimates of N* calculated from our model likely represent underestimates of the true number of individuals of a given species which should be sampled. Many more specimens should therefore be sampled in order to ensure a sufficient number of haplotypes have been recovered. Equal haplotype frequencies are rarely observed in natural populations, and we suggest the development of more sophisticated models should explore the use of data simulations to evolve models that explicitly account for variation in species haplotype frequencies.