Wednesday, February 6, 2013

The seven deadly sins of DNA Barcoding (5)

If there is any sin listed by Collins and Cruickshank I would dare to call it one it would be this one:

Inappropriate use of bootstrap sampling

The use of bootstrap sampling in most DNA Barcoding publications is actually meaningless. In my last post I have discussed at length that trees are not the best method to depict DNA Barcoding data sets unless you want to do some explicit phylogenetic analysis. The use of trees as DNA Barcoding visualization method has led to many misunderstandings and the inclusion of bootstrap values made it even worse.
I remember that such trees with bootstrap values showed up the first time maybe a year after the first DNA Barcoding papers. The intention might have been to give more credibility to the findings (especially species groups) and in some cases it was actually suggested or even demanded by reviewers. I have seen this and more recently I sometimes found myself justifying at length why I don't want to include a bootstrap analysis. Sometimes I win, sometimes not, although from a methodological standpoint I am fully on the side with Collins and Cruickshank:
However, the use of bootstrapping for specimen identification is somewhat perplexing. The aim here is to maximize congruence with a priori defined species, viz. the taxonomic names from a morphological identification process. A species with low bootstrap support does not falsify a species hypothesis when this assessment was based on independent data (i.e. morphology from the original description). In many cases, recently diverged sister species on short branches will have low support and therefore fail to be identified, even if they are morphologically distinct and diagnosable by unique mutations. [...] On top of this, bootstrap resampling does not make an assessment of the uncertainty in identification; an unknown can group with a reference specimen at 100% bootstrap support, and yet be an entirely different species.
Furthermore, e.g. a COI tree of a DNA Barcoding dataset with bootstrap values will give the wrong impression that nodes beyond the species level have any meaning which is often not the case simply because the amount of DNA information is insufficient.
My recommendation in short would be: If you want to use trees as a visualization of your DNA Barcoding set make sure you state this clearly. Avoid any phylogenetic terminology and stay away from any tree evaluation method such as bootstrap. There are indeed better methods to infer the coherence of groups in your dataset.

No comments:

Post a Comment