Wednesday, April 9, 2014

Barcodes to validate Mitogenomes

mtDNA (image 'stolen' here)
Today I found an article published in Mitchondrial DNA which tackles a problem that I have encountered myself in a couple of situations. Unfortunately, the article is hiding behind a paywall even for me at a university with rather good library access. This is particularly frustrating given the rather important message and recommendations provided in the publication.

The researchers deal with the issue coming from the misidentification of biological samples used for generating entire mitogenomes. As a consequence mitgenomes are attributed to incorrect species. This can have even more profound implications if the misidentified sequence ends up as the reference mitochondrial genome for the species in public curated databases such as the RefSeq section of GenBank. Large genomic databases are often used for annotation of unknown genes. As a result errors propagate quickly and a wrong species ID will spread across the entire database. That is not a fault of the people that designed and operate databases but rather of the data submitters that often fail to do their due diligence. Other problems that could reduce the quality of mitgenomic data are the potential occurrence of NUMTs or contamination which are also known issues in DNA Barcoding research. 

The example the colleagues used are in relation to a recently published sequence of the complete mitochondrial genome of a bat called Leschenault’s rousette (Rousettus leschenaultii), allegedly providing the second mitogenome for this genus of pteropodid bats in addition to the available Egyptian fruit bat (Rousettus aegyptiacus). By re-analyzing the mitogenome in comparison with available mitochondrial sequences, the authors were able to show that this sequence does not belong to Rousettus leschenaultii and that it is most probably a second mitochondrial genome for Rousettus aegyptiacus

I can relate to this as I share this experience. When I started working on an analysis of mitochondrial fish genes the first step was to download all available mitogenomes or better all coding genes of those. As my analysis also included a comparison of divergence values for different lengths of COI I started with identification runs on BOLD for each sequence and sure enough I found two mitogenomes that were not identified correctly. The identifications were actually way off and could not be explained by any variation in the gene region. I knew the sequences on BOLD were properly identified by experts and when I looked at the original publication that used the mitogenomic data I couldn't find any information on the location the fish were collected let alone any voucher information. In the end I had to leave the sequences out of any further analysis.

The problem is not new and supposedly well known in the scientific community although it might have been underestimated. What makes this paper unique though is a set of recommendations provided to help with quality control and the short list that follows should be hanging over each desk belonging to a genomic researcher:

(1) Provide detailed information on the origin of the sample used for mitogenomic sequencing. 
Ideally the sample should be attached to a specimen voucher deposited in a recognized museum and accessible through multi-institution, multi-collection databases.

(2) Conduct a phylogenetic analysis of the new mitogenome in the context of closely related species.
We therefore suggest using the 20 phylogenetically closest taxa that should allow for a clear depiction of both the evolutionary affinities of the new mitgenome and the degree of divergence as compared to its closest relatives.

(3) Provide a barcoding identification assessment of the sample thanks to a ML tree based on the closest available sequences.
..the strength of these databases [BOLD and Genbank] relies on the detection of misidentified sequences provided that sequences are available for the same marker for different individuals and populations of a given taxon.


  1. Hello, I can send you a copy of the paper, if you want !

  2. My dear Prof,
    Good morning. I hope this blog is maintained ? Please I will appreciate a copy of the paper to the address
    Many thanks

    Dr Olusola B SOKEFUN
    Faculty of Science
    Lagos State University. Ojo