Tuesday, February 12, 2013

The seven deadly sins of DNA Barcoding (7)

We've reached the last sin of the Collins and Cruickshank paper:

Incorrectly interpreting the barcoding gap

The classic (Mayer and Paulay, 2005)
And here is the problem: Many DNA barcoding studies present histograms showing frequency distributions of both intra- and interspecific divergences for all pooled species analysed in a study. Overlap between the two distributions can be interpreted as a failure of DNA barcoding, but the only failure demonstrated in this case is that of defining a universal cut-off value. In this regard, and as stated previously, it is widely acknowledged that coalescent depths vary among species, and substantial overlap between intra- and interspecific distances may be the rule, rather than the exception. Therefore, for specimen identification purposes this type of presentation is wholly uninformative, as intraspecific distances for one species can exceed interspecific distances for other species in the analysis, but without necessarily compromising identification success (the local gap).


I guess it is safe to state this and we should have a look at alternative displays. Collins and Cruickshank suggest a dot plot in which, for each individual data point, the distance to the furthest conspecific is plotted against the distance to the nearest neighbour, with a 1:1 slope representing the point at which the difference between both is zero. The example they refer to is this (Robinson et al. 2009):

Another dot plot approach has been suggested by Mark Stoeckle in his blog way back in 2006. He proposed a half-logarithmic dot plot of genetic distances within each species against genetic distances to the nearest neighbor. For each species, there is one dot showing intraspecific distance and another one directly above or below it, which shows distance to nearest neighbor. Sorting by intra- and interspecific distance allows the relative distances for each species to be seen. Here an example from my Barcoding Nemo paper:



The main point of Collins and Cruickshank's criticism here is that they see two objectives of DNA barcoding confused (specimen identification and species discovery).  The barcoding gap as proposed by Meyer & Paulay (2005) can represent two distinct scenarios: one for specimen identification, with an individual being closer to a member of its own species than a different species (i.e. a ‘local’ barcoding gap); and one for species discovery, a distance that equates to a threshold applicable to all species (i.e. a ‘global’ barcoding gap).

No comments:

Post a Comment