Tuesday, February 25, 2014

Genome codes aren't revolutionary

A new naming structure proposed by an American researcher moves beyond the Linnaeus system to one based on the genetic sequence of each individual organism. This creates a more robust, precise, and informative name for any organism, be it a bacterium, fungus, plant, or animal. Coded names could be permanent, as opposed to the shifting of names typical in the current biological classification system. Codes could also be assigned without the current lengthy process that is required by analyzing one organism's physical traits compared to another's. Lastly, the sequence could be assigned to viruses, bacteria, fungi, plants, and animals and would provide a standardized naming system for all life on Earth.

This news bit comes with the headline Revolutionary naming system for all life on earth proposed. I have to admit that such claims make me skeptical and I immediately had a closer look at the paper in PLoSONE. Indeed the more I read through the publication the more I thought the claims are not that new and revolutionary, and frankly I have no idea why the authors submitted a provisional patent for this method (U.S. Patent Application number 61/774,030.).

The proposed naming process begins by sampling and sequencing the genome of an organism. Note that we are talking about bacterial genomes here and in the case of eukaryotes we would be looking at their mitochondrial DNA alone. The sequence is then used to generate a code unique to that individual organism based on its similarity to all previously sequenced organisms. The approach is theoretically based on DNA-DNA hybridization (DDH) which was firstly implemented in the mid sixties as a genome to genome comparison value trying to find an objective numerical measure of what a species could be, and since then it has been recalled to be the gold standard in prokaryotic classification. However, in the age of genomics there was a need to substitute DDH by a database-based method that provides equivalent information. The Average Nucleotide Identity (ANI) among the conserved genes of a pair of genomes was considered as a first reliable attempt. The most modern approach to calculate ANI shifted from using definded sections of the genome and instead splice the query genome in 1020 nucleotide fragments and blast each of them against the subject genome to calculate the average of nucleotide identity. ANI and a percentage of aligned fragments was used by the authors assign a so called genome code that they propose could serve as an alternative to Linnean names.

The authors seem not to give a lot of thought to the size of the task. Millions of species would require a lot of computing power and more sophisticated algorithms. The only reference I could find was this one: ANI will not need to be calculated against all genomes that already have a code. Instead, the group of genomes that is most similar to the new genome could be identified using only a few genes, and then ANI is calculated only against the most similar genomes to precisely identify the most similar genome and the corresponding ANI value. Well, the use of a few genes suggested here as a pre-assignment system sounds very much like DNA Barcoding to me and since we have been showing that DNA Barcodes alone can discriminate species why would we want to sequence full genomes instead? Admittedly the method requires sequencing only of a mitochondrial genome. But for the sole purpose of building a library of all life using the new name/code this seems a bit too much. Although it has become less of a technical issue in the last few years it will always be more costly than utilizing shorter, standard DNA fragments aka DNA Barcodes to do an even better job.

Because animals are much more closely related to each other than bacteria, mitochondrial genomes of all members of the chordata can be aligned with each other using BLAST and thus all chordata mitochondria share the same code at position A. Chordates represent only a very small fraction of the animals and I am very sure that alignments across all animal groups won't be as easy as proposed simply because mt DNA has been rearranged repeatedly. I also have to express my dislike for the equalization of animals and chordates here. All chordates are animals indeed but not all animals are chordates.

Lets face it - there is already a very pragmatic and functional system in place and it is based on one or a few genes (DNA Barcodes). We call the resulting code Barcode Identification Number or simply BIN and it overcomes all challenges also listed by the authors of this publication. It is disappointing that the authors of the present study did not bother to cite any DNA Barcoding study at all thereby ignoring more than 10 years of work on the same problem. The real breakthrough has been achieved already and I also note a difference in style. Ratnasingham and Hebert published their BIN paper last summer with much less ballyhoo and after quite a few years of careful evaluation. 

All this paper represents is the advancement of the ANI method proposing a more formal system to register species (and I would like to add 'prokaryotic' species). It is a great achievement consequently building on previous work but there is no need to oversell this especially not if an entire line of parallel research is ignored at the same time.

No comments:

Post a Comment