Genome sequencing is a powerful tool that helps us to understand the complexity of organisms and their evolutionary history. However, decades after the so called genomics revolution, half of the known eukaryotic lineages still remain unstudied at the genomic level. There is a significant bias against 'less popular', but potentially genetically rich, single-cell organisms. This lack of microbial representation leaves a world of untapped genetic potential undiscovered, according to an exhaustive survey of on-going genomics projects which was conducted by a team of international researchers.
It is not surprising that the first and main bias in the study of eukaryotes arises from our anthropocentric view of life. More than 96% of the described eukaryotic species are either Metazoa (animals), Fungi, or Embryophyta (land plants) – which we call the ‘big three’ of multicellular organisms (even though the Fungi also include unicellular members such as the yeasts). However, these lineages only represent 62% of the 18S rDNA Genbank sequences, which is of course a biased sample, or 23% of all operational taxonomic units (OTUs) in environmental surveys.
This problem is by no means new and DNA Barcoding shows a similarly skewed picture. One reason is that research has historically focused on multicellular organisms coming from the three large kingdoms. There is no doubt that they are important but according to the authors another reason is simply that they are more conspicuous and familiar to us. To date some 85% of the completed or projected genome projects belong to this group of three. When looking at DNA Barcodes the picture gets worse, as only 0.2% of all DNA Barcodes on BOLD are actually neither animal, nor plant, nor fungi. That reminds me of a quote I heard in Kunming last year during the DNA Barcoding Conference. Protist expert Jan Pawlowski summarized it as follows:
"Ultra-deep sequencing leads to ultra-deep frustration in protists with sometimes >80% unassigned OTUs"
The new study also emphasizes that there are biases within the dominating groups. For example, many invertebrate groups are not at all represented in the list of sequenced or yet to be sequenced genomes. The DNA Barcode world looks better in that case because 92% of the animals barcoded are invertebrates, that is about 76% of all barcoded species.
The authors argue that this needs to change and they propose a phylogeny-driven initiative to cover the full eukaryotic diversity because:
This makes for a pitiful future if we aim to understand and appreciate the complete eukaryotic tree of life. If we do not change this trend we risk neglecting the majority of eukaryotic diversity in future genomic or metagenomic-based ecological and evolutionary studies. This would provide us with a far from realistic picture.