Thursday, July 11, 2013

The Barcode Index Number (BIN) System

Frequent users of the DNA Barcoding platform BOLD know them already - Barcode Index Number's or short, BINs. BOLD features BIN pages already for a while and I know from my own work with a multitude of projects in this system how helpful they can be. Communication with colleagues working on the same set of species has become so much easier because we are operating on the same platform that provides us with all relevant information on a single web page. A BIN page is often the starting point of extensive communication resulting in corrections or revisions mostly coming from taxonomic discordance discovered through BINs. Aside from representing a new algorithm to assign individuals to an operational group this system simply represents the best collaborative tool in biodiversity science and taxonomy that there is at this point.

It took a bit to publish a paper which summarizes both the underlying algorithm and database structure as well as the utilization on BOLD. However, after a first read of the publication I would say it was worth the wait. As a reaction to this publication Rod Page said on his blog: Might be time to revisit the dark taxa idea

You want to know more? I think I let the authors speak for themselves. Here is the abstract:

Because many animal species are undescribed, and because the identification of known species is often difficult, interim taxonomic nomenclature has often been used in biodiversity analysis. By assigning individuals to presumptive species, called operational taxonomic units (OTUs), these systems speed investigations into the patterning of biodiversity and enable studies that would otherwise be impossible. Although OTUs have conventionally been separated through their morphological divergence, DNA-based delineations are not only feasible, but have important advantages. OTU designation can be automated, data can be readily archived, and results can be easily compared among investigations. This study exploits these attributes to develop a persistent, species-level taxonomic registry for the animal kingdom based on the analysis of patterns of nucleotide variation in the barcode region of the cytochrome c oxidase I (COI) gene. It begins by examining the correspondence between groups of specimens identified to a species through prior taxonomic work and those inferred from the analysis of COI sequence variation using one new (RESL) and four established (ABGD, CROP, GMYC, jMOTU) algorithms. It subsequently describes the implementation, and structural attributes of the Barcode Index Number (BIN) system. Aside from a pragmatic role in biodiversity assessments, BINs will aid revisionary taxonomy by flagging possible cases of synonymy, and by collating geographical information, descriptive metadata, and images for specimens that are likely to belong to the same species, even if it is undescribed. More than 274,000 BIN web pages are now available, creating a biodiversity resource that is positioned for rapid growth.

1 comment:

  1. I think it's an interesting combination of older (single-linkage clustering, silhouettes) and newer (profile hidden Markov models, Markov clustering) methods in a meaningful pipeline. Very, very promising as a diagnostic tool. However, mismatches as high as 14.6% relative to the "real" taxonomy in their test datasets invite some caution...