Tuesday, May 24, 2016

Yes, I trust BOLD!

Another week and another paper for which I feel the need to comment. This time it is not about quality as I am convinced the authors know what they are talking about. In fact they are taxonomic experts for a family of true bugs (Cydnidae) but that doesn't save them from my blog post as I think they unfairly criticized BOLD for doing its job.

Numerous mistakes in taxonomy, the relevance of the taxa names, and species misidentifications in BOLD version 3 were found and, more importantly, similar errors were detected in BOLD version 4 as well. We suggest that if the BOLD system is presumed to be taxonomically trustworthy, it can’t exist without an adequate a priori identification of barcoded specimens. Otherwise, the erroneous data deposited onto the BOLD platform will have a negative impact on studies in which molecular data imported from BOLD are utilized.

Just to clarify, BOLD versions 3 and 4 are just different user interfaces with different sets of tools. The underlying database is the same. But what are we talking about? What's the extent of the problem?

Our search revealed 220 specimens, including 106 specimens with barcodes, so the percent of misidentified specimens is 3.78% for specimens with sequences, and 1.81% for all specimens. If nomenclatural issues are added to the cases of misidentifications, the percentages are 7.55% (specimens with barcodes) and 3.64% (all specimens).

It is not my intention to belittle the issue the authors point out. There are some errors they came across and it is good to point those out although I think there are different ways to do that (see below) and certainly without such bold statements as shown at the top of this post. To put it into perspective 3.78% of 106 barcodes out of 5 Million were misidentified. Too much for the authors: any information connected with DNA barcodes and deposited into databases such as GenBank and BOLD should be beyond any suspicion of inaccuracy or unreliability.

This means error is not an option for the work of a taxonomist and such comments are grist to the mill for those who anxiously keep their BOLD data private as they are afraid that somebody will find a mistake they made and they will be in the pillory for being wrong instead of working with the community to make the data better. 

BOLD is a workbench and not the holder of the all-ultimate taxonomic truth. Actually, BOLD's accuracy depends on its user community and its quality grows with the amount of experts that help building an maintaining it. All the errors listed if meant as improvements of the database are welcome but such errors are by no means different from a misidentified specimen in a museum drawer that might be used for identifying other specimens by direct morphological comparison. Would that justify a paper title such as: In (fill in museum of choice) we trust?"

BOLD needs the input of the user community to minimize error and stay on top of nomenclatural and systematic changes and revisions. To me the most useful thing to do is to stop complaining and start helping. If one finds errors that might hamper future use of the barcode reference library, best is to engage with the community and especially the data owner. BOLD was developed for this kind of interaction. It facilitates dialogue and resulting data improvement. It even has vetted vocabulary and taxonomy controlled by humans. I don't know of any other tool better suited for that, so why a public display of errors in form of a paper? 

But first and foremost - stop bashing BOLD for errors and mistakes made by its users and for enabling the community to actually identify and rectify such errors. Without the detail of information available through this interface, the authors of this paper would have never been able to identify all the problems they listed let alone have a chance to contribute to any solution.

