If there is any sin listed by Collins and Cruickshank I would dare to call it one it would be this one:
Inappropriate use of bootstrap sampling
The use of bootstrap sampling in most DNA Barcoding publications is actually meaningless. In my last post I have discussed at length that trees are not the best method to depict DNA Barcoding data sets unless you want to do some explicit phylogenetic analysis. The use of trees as DNA Barcoding visualization method has led to many misunderstandings and the inclusion of bootstrap values made it even worse.
I remember that such trees with bootstrap values showed up the first time maybe a year after the first DNA Barcoding papers. The intention might have been to give more credibility to the findings (especially species groups) and in some cases it was actually suggested or even demanded by reviewers. I have seen this and more recently I sometimes found myself justifying at length why I don't want to include a bootstrap analysis. Sometimes I win, sometimes not, although from a methodological standpoint I am fully on the side with Collins and Cruickshank:
However,
the use of bootstrapping for specimen identification is somewhat
perplexing. The aim here is to maximize congruence with a priori defined
species, viz. the taxonomic names from a morphological
identification process. A species with low bootstrap support does not
falsify a species hypothesis when this assessment was based on
independent data (i.e. morphology from the original description). In
many cases, recently diverged sister species on short branches will have
low support and therefore fail to be identified, even if they are
morphologically distinct and diagnosable by unique mutations.
[...] On top of this, bootstrap resampling does not make
an assessment of the uncertainty in identification; an unknown can group
with a reference specimen at 100% bootstrap support, and yet be an
entirely different species.
Furthermore, e.g. a COI tree of a DNA Barcoding dataset with bootstrap values will give the wrong impression that nodes beyond the species level have any meaning which is often not the case simply because the amount of DNA information is insufficient.
My recommendation in short would be: If you want to use trees as a visualization of your DNA Barcoding set make sure you state this clearly. Avoid any phylogenetic terminology and stay away from any tree evaluation method such as bootstrap. There are indeed better methods to infer the coherence of groups in your dataset.
No comments:
Post a Comment