A new week and a new paper I need to comment on. This one comes from one of DNA barcoding's old critics, Malte Ebach, and he and his co-authors take on Big Data and of course DNA barcoding although the latter it is not necessarily Big Data science yet. Actually the barcoding critique is pretty much out of context and as per usual incorrect:
DNA barcoding has had minor impact in taxonomy, with some benefit for phylogeneticists, ecologists and conservationists; it has largely been superseded by technological advances, namely Next Generation Sequencing (NGS), in which whole genomes can now be routinely sequenced.
Were do I start? Well, DNA barcoding ain't dead and it had a big impact on taxonomy and still has. Actually, I think it is fair to say, that it became a regular tool in many taxonomist's toolbox. After reading the above statement I did a meta-analysis of the DNA barcoding literature searching for species descriptions that involved barcoding as diagnostic tool. I knew I would find some but I was surprised that I was looking at some 500 papers over the last decade. That is anything but a minor impact.
Ebach also completely ignores other societal benefits of DNA barcoding, e.g. food authentication in his futile attempt to downplay a method he never liked. However, the best is his claim that new technology has superseded barcoding. Nope, that's not true. Actually both barcoding and high-throughput sequencing (HTS) complement each over very well and metabarcoding was probably DNA barcoding's most successful spin-off. Many colleagues around the world are already using HTS (that acronym will soon supersede NGS) in conjunction with organismal barcode reference libraries.
All this doesn't sound like DNA barcoding's swan song to me.
However, this wasn't even the main story of the paper which is rather a diatribe on the use of Big Data in biology. I will not attempt to criticize the paper from a informatics standpoint. This should be done by an expert but it speaks volumes that Ebach's examples for what he refers to as "Big Data hubris" are 6 and 14 years old which in computer science are ages. Even worse, one is an opinion piece taken from Wired, and another one from the New York Times which today would trigger a lot of questions with respect to an almost homophobic connotation.
There is nothing wrong with cautioning users of the vast amounts of internet information sources, and Ebach actually nicely illustrates the issues of taking information provided by search engines such as Google Scholar at face value. Our new age of communication with ubiquitous information requires new critical thinking skills especially when it comes to the interpretation of such results but to me that is not an issue of the use of Big Data in science as proper research always questions results and correlations. It is actually our job as scientists to question them as much as possible and only if it endures such scrutiny a new scientific finding should be considered likely. And it is our job as educators to teach the new generation of students how to critically evaluate information and analytical results. So, if the intention was to write a piece that should caution us I am afraid that this publication failed to do so and the choice to include DNA barcoding can be at best considered mission-drift or at worst missionary.