Wednesday, April 30, 2014

The others

Genome sequencing is a powerful tool that helps us to understand the complexity of organisms and their evolutionary history. However, decades after the so called genomics revolution, half of the known eukaryotic lineages still remain unstudied at the genomic level. There is a significant bias against 'less popular', but potentially genetically rich, single-cell organisms. This lack of microbial representation leaves a world of untapped genetic potential undiscovered, according to an exhaustive survey of on-going genomics projects which was conducted by a team of international researchers.

It is not surprising that the first and main bias in the study of eukaryotes arises from our anthropocentric view of life. More than 96% of the described eukaryotic species are either Metazoa (animals), Fungi, or Embryophyta (land plants) – which we call the ‘big three’ of multicellular organisms (even though the Fungi also include unicellular members such as the yeasts). However, these lineages only represent 62% of the 18S rDNA Genbank sequences, which is of course a biased sample, or 23% of all operational taxonomic units (OTUs) in environmental surveys.

This problem is by no means new and DNA Barcoding shows a similarly skewed picture. One reason is that research has historically focused on multicellular organisms coming from the three large kingdoms. There is no doubt that they are important but according to the authors another reason is simply that they are more conspicuous and familiar to us. To date some 85% of the completed or projected genome projects belong to this group of three. When looking at DNA Barcodes the picture gets worse, as only 0.2% of all DNA Barcodes on BOLD are actually neither animal, nor plant, nor fungi. That reminds me of a quote I heard in Kunming last year during the DNA Barcoding Conference. Protist expert Jan Pawlowski summarized it as follows: 

"Ultra-deep sequencing leads to ultra-deep frustration in protists with sometimes >80% unassigned OTUs"

The new study also emphasizes that there are biases within the dominating groups. For example, many invertebrate groups are not at all represented in the list of sequenced or yet to be sequenced genomes. The DNA Barcode world looks better in that case because 92% of the animals barcoded are invertebrates, that is about 76% of all barcoded species.

The authors argue that this needs to change and they propose a phylogeny-driven initiative to cover the full eukaryotic diversity because:

This makes for a pitiful future if we aim to understand and appreciate the complete eukaryotic tree of life. If we do not change this trend we risk neglecting the majority of eukaryotic diversity in future genomic or metagenomic-based ecological and evolutionary studies. This would provide us with a far from realistic picture.

Tuesday, April 29, 2014

A Tale That Morphology Fails to Tell

Nudibranchs occur in oceans worldwide, including both the tropics and Antarctica. They live from the intertidal zone to depths of well over 700 m, e.g. a new nudibranch species was discovered recently at a depth of about 2,500 m. However, the greatest diversity of nudibranchs is seen in warm, shallow reefs.

Nudibranchs are soft-bodied marine snails that actually shed their shell after their larval stage. They are noted for their often extraordinary colors and striking forms. There are more than 3 000 described species and one of the largest groups within the order is called Aeolidida. I believe this group has the status of an infraorder and contains about 560 species.

A new phylogenetic study provides new insights into the relationships of some members of the group. The researchers were able to obtain 90 species from seven families of the Aeolidida. The researchers were specifically interested in the phylogenetic relationships within one of the families, the Aeolidiidae and they used two mitochondrial and one nuclear gene (mitochondrial COI and 16S rRNA, and nuclear H3). This is always good news for us as the inclusion of COI means there are more DNA Barcodes available to the community and these specific ones come with added value as they are part of a study that looked at the systematics of this group. The authors were not only able to resolve some previously unknown relationships and classifications, they also discovered new members:

Our results also suggest the existence of four sibling-species complexes within Aeolidiidae, which may increase to a total of 115 species, including 18 undescribed species and the resurrection of six species previously in synonymy. 

Given that this is just a subset of the Aeolidida I am convinced that there are a number of similar surprises waiting for the malacologists of the world.

Monday, April 28, 2014

Climb the CN tower to support biodiversity

I guess it is like preaching to the choir when I attempt to explain to you what the World Wildlife Fund (WWF) is. But just in case - it represents the world's largest non-governmental conservation organization with more than 5 million supporters worldwide, working in more than 100 countries, supporting some 1,300 conservation and environmental projects. WWF is a foundation, which means that most of the funding comes from donations. Actually more than half of their money is derived from individual donations, the remainder comes from governmental and corporate sources.

The WWF focuses on the conservation of the three biomes that contain most of the world's biodiversity: oceans, forests, and freshwater ecosystems. Traditionally they are concerned with endangered species, pollution and climate change but over time as the organisation grew the mission got broader:

Over time, our work has evolved from protecting particular wildlife species and habitats to protecting life on Earth – including our own. Today, our work is about life, because everything we do is about securing the future of healthy, thriving ecosystems. And living, because the choices we make will decide that future—for us and for all species.

Overall WWF has done so much good work that supporting them seems to be a natural thing to do. Worldwide there are always campaigns to raise funds and many of them are often very creative. A very big event here in the region is the Annual Canada Life CN Tower climb

The CN Tower is Toronto's landmark. It was completed in 1976 and with 553m it became the world's tallest free-standing structure and world's tallest tower at the time. It held both records for 34 years until the completion of both Burj Khalifa and Canton Tower. The tower has a restaurant and an observation deck with a glass floor - nothing for people with acrophobia (fear of heights). One can do a lot of crazy things on the tower, the newest addition being the Edge walk. The Edge walk is a full circle hands-free walk on a 1.5m wide ledge encircling the top of the Tower’s main pod, 356m above the ground.

The annual stair climb to raise money for the World Wildlife Fund is not as challenging as the Edge walk but for sure it requires some courage and a fair bit of endurance. This popular event draws thousands of people who want to climb the 1 776 stairs that lead to the observation platform. Normally, a visitor would take one of the elevators which takes you less than a minute. Climbing all those stairs (145 flights) takes about 15-25 min. The really fast people make it in 5-10 min. I am fairly certain that I would need at least half an hour if I could make it all. That's reason enough to demonstrate my highest respect for everyone who will climb up there this week. The daughter of friends of our family will be among the climbers and I am shamelessly promoting her climb and her fundraising efforts here: Go for it, Rowena!

Friday, April 25, 2014

Five opah

Despite its commercial value and common occurrence in pelagic fisheries, surprisingly little is known about the basic biology and ecology of the opah (Lampris guttatus) also known as Moonfish. Opah is becoming increasingly popular in seafood markets, a trend that started in the late 1980s when it was served as sushi or sashimi. I find this particularly interesting given that about 35% of an opah's weight is consumable meat, while the remaining 65% are bone and skin. 

A group of researchers from the US had a closer look into the diversity of the opah:

In the course of collecting additional life history data on this species, observations were made by the authors and other researchers that suggested the presence of two distinct morphotypes in the North Pacific. We subsequently initiated a study to investigate these differences and began collecting tissue samples for genetic analysis and compiling standard morphometric data. 

The results are quite interesting:

Sequencing of the mitochondrial cytochrome c oxidase I gene (655bp) for these morphotypes and other specimens collected worldwide (n=480) produced five strongly diverged and well-supported clades. Additional sequence data from the cytochrome b gene (1141bp) as well as the nuclear recombination activating gene 1 (1323bp) corroborated these results, suggesting these five clades likely represent separate species. Our conclusion that opah is a complex of five separate species has implications for management and indicates a need to gather additional data on these poorly understood fishes.
Fig 1: Subset of a lampriform NJ tree of available
COI sequences about 4 years ago.

I recalled that a few years ago I helped to generate a tree for a monograph about a relative of the opah, the king of herrings (Regalecus glesne) and in order to provide some more context I had included some publicly available sequences for both species of the genus Lampris. With the limited dataset at that time I already found two very distinct clades within Lampris guttatus (see figure 1) but we didn't bother to pursue this further. Reading the new publication I immediately became curious how old and new data would fit together.  

Fig 2: NJ tree of Lampris guttatus COI sequences, labels show BIN# and
Lineage assignment according to Hyde et al.

The authors provided GenBank accessions to each of the haplotypes they found for each marker. I fetched the COI sequences from the database and combined them with what I was able to find on BOLD (and GenBank) for the same species. A bit of editing and I had an alignment for a quick NJ tree (see figure 2). I wasn't surprised to retrieve all five lineages that are presented in the paper. What I found surprising was the fact that despite a much smaller sample size (<20) four of those were already present on BOLD and had BINs assigned to them. Most of this cryptic diversity was already hiding in the database as BOLD does BIN assignments automatically on a regular basis. Once more BINs seem to be a good proxy for species. I am not saying this will always be the case as I also know examples where BIN and species are not congruent but there is a good chance that there are many more of such hidden stories just waiting to be discovered by digging deeper into the barcode treasure chest BOLD. Some 3 Million DNA Barcodes certainly justify some 'treasure hunts'.

Wednesday, April 23, 2014

Slipper animacule

Image taken from micro*scope

Paramecia were among the first ciliates to be seen through early microscopes in the late 17th century. The first description occurs in a letter by his contemporary Christiaan Huygens in 1678. Huygens was a prominent Dutch mathematician and scientist. Interestingly is not known for this particular discovery but more for his telescopic studies of the rings of Saturn and the discovery of its moon Titan which was honored by   the European Space Agency (ESA) by naming an atmospheric entry probe after him. The probe landed successfully on Titan in 2005. Huygens was clearly one of those all-round talents that were typical for scientists at that time, e.g. he also invented the pendulum clock

But back to the paramecia - in 1718, the French mathematics teacher Louis Joblot published a description and illustration of what he called "slipper animacule". In some countries this phrase still remains the common name for members of the genus Paramecium. The widely used German term "Pantoffeltierchen" literally means slipper animal. The name relates to the typically ovoid, elongate, foot- or cigar-shaped cell usually ranging from 50 to 300 micrometres in length. 

Paramecia are widespread in freshwater, brackish and marine environments, and are often very abundant in stagnant basins and ponds. Because some species are readily cultivated and easily induced to conjugate and divide, Paramecium has been widely used in classrooms and laboratories to study biological processes. I am fairly confident that many of my readers had an encounter with Paramecium at school or university. Consequently, some Paramecium species such as Paramecium aurelia or Paramecium caudatumare are among the best known protozoans. Other groups are almost unexplored and in most cases we know little about their DNA sequence variation which seems to be fundamental for the determination of boundaries between those species. Some studies have revealed the existence of reproductively isolated groups within Paramecium as well as other ciliates. Those were initially called syngens which stands for “generating together”. In some cases syngens were recognized as sibling cryptic species but in other cases the isolation between syngens is imperfect, thus not each syngen is equivalent to a true cryptic species. 

One of those less widely know species is Paramecium putrinum. It is one of the smallest members of the genus, a cosmopolitan, freshwater species, that prefers cold and moderate-climate regions. A neat little new study provides a closer look at this species and potential existence of cryptic variation within it:

Herein we present an assessment of molecular variation in 27 strains collected from widely separated populations by using two selected DNA fragments (ITS1-5.8S-ITS2-5′LSU rDNA and COI mtDNA). Both the trees and haplotype networks reconstructed for both genome fragments show that the studied strains of P. putrinum form five main haplogroups. The mean distance between the studied strains is p-distance = 0.007/0.068 (rDNA/COI) and exhibits similar variability as that between P. bursaria syngens. Based on these data, one could hypothesize that the clusters revealed in the present study may correspond to previously reported syngens and that there are at least five cryptic species within P. putrinum.

Tuesday, April 22, 2014

Are vouchers always necessary?

We live in times with a heightened sense of urgency to confirm the return of animals thought to be extinct, or to confirm the presence of newly discovered species. Global climate change and rapidly disappearing habitat is endangering species and we become increasingly concerned about the consequences of their disappearance. The standard approach in biology is to go out and collect specimens either to confirm that they do still exist in the wild or to discover new species. However, sometimes this field work may actually pose a risk to vulnerable animal populations already on the brink of extinction:

Cases such as the extinction of the great auk remind us what is at stake in taking animals from small and declining populations. The last wild great auk (Pinguinus impennis) was sighted in 1844 on Eldey Island, Iceland. Centuries of exploitation for food and feathers, and, to some degree, a changing climate, had stressed the species, but overzealous museum collectors also played a role in its extinction. As the bird's numbers dwindled in the 19th century, ornithologists and curators increasingly prized great auk skins and eggs, with museums and universities sending out collection parties to procure specimens. On Eldey, fishermen killed the final breeding pair of the flightless birds and sold them to a local chemist, who stuffed the specimens and preserved them in spirits. Their internal organs now reside at the Zoological Museum in Copenhagen.

The great auk's disappearance predates the rise of a robust societal ethic of conservation and the emergence of a scientific concern for global biodiversity decline in the late 20th century. Yet, there is still a strong and widespread impulse to procure specimens of rare or rediscovered species for scientific purposes.

Researchers at Arizona State University and Plymouth University in the United Kingdom want to change the way biologists think about the current state-of-the-art of collecting a voucher specimen for species description and often identification. They say using modern technologies can be just as effective in identifying an organism and will also avoid increasing the extinction risk for small and isolated populations. The researchers suggest using a combination of modern, non-lethal techniques to confirm a species' existence including high-resolution photography and audio recordings of sounds or mating calls. Also, using DNA sampling by taking swabs of the mouth or skin offer ways to identify an animal without taking a specimen from the field. Especially this suggestion has DNA Barcoding written all over it and I am convinced that the barcoding community will wholeheartedly support this request:

For this system to work, the DNA of relict populations and newly discovered species must be sequenced and the data made publicly available. This would, for example, make future population rediscoveries easier to document.

The discussion about replacing non-lethal identification techniques with less-invasive ones is part of a more complex issue. Balancing ecological impact against value of improved scientific understanding of threatened species for conservation is a touchy subject. However, I concur with the authors stating that a change of our standard practices for scientific description would have more advantages than we might think:

The multivariate description of a species that results from combining high-resolution photographs, sonograms (as appropriate), molecular samples, and other characteristics that do not require taking a specimen from the wild can be just as accurate as the collection of a voucher specimen without increasing the extinction risk. Clearly there remains a long-running debate over the appropriate standards for scientific description absent a voucher specimen. The benefits and costs of verification-driven specimen collection, however, should be more openly and systematically addressed by scientific societies, volunteer naturalist groups, and museums. Sharing of specimen information, including obligations to store genetic information from voucher specimens in widely accessible digital repositories, can also help to reduce the future need to collect animals from the wild.

h/t Claudia Kleint-Steinke

Thursday, April 17, 2014

Stream Bioassessment with DNA Barcodes

Bioassessments measure both the physical condition of a water body, and the integrity of the associated biological communities. Adding such physical and biological metrics to standard chemical and toxicological assessments provides a more comprehensive evaluation of the condition of a given body of water. Resident organisms can be better indicators of overall environmental health than measurements of individual stressors (such as toxic chemicals or other pollutants) or more general ecosystem attributes. 

This form of assessment provides information on the condition of a site based on the taxonomic composition and a priori knowledge about tolerances of some taxa to pollution or other stressors. However, this can be a roadblock as the use of coarse taxonomic resolution can obscure patterns in bioassessment metrics and hinder detection of biological impacts. Thus, fine-scale taxonomic resolution is desirable to maximize the diagnostic capability of assessment tools.

I am sure by now every regular reader knows where this is going. It is well-known that obtaining such detailed taxonomic data is challenging because identifications typically are done by using morphological characteristics. There are a lot of issues with that especially for standardized and repeated assessments. Limited taxonomic resources, cryptic species, small size, damaged specimens, and polymorphism just to name a few of those.

A group of US researchers now compared the ability of several commonly used bioassessment metrics calculated with data derived from morphology and from DNA Barcoding to detect differences in stream condition of 6 paired sites in southern California with relatively subtle impacts to habitat. Their paper has now been officially published in Freshwater Science which was previously titled Journal of the North American Benthological Society. 

The results of this study are very interesting as the authors focused more on the level of sensitivity a DNA barcoding approach provides and not so much on the question if it would work in general. They found increased metric sensitivity associated with barcoding was most pronounced at high-quality (i.e., relatively unimpacted) sites, which often have higher species richness and are inhabited by undescribed, cryptic, or regionally rare species. For example, 43% of the additional taxa identified through barcoding consisted of 1 or 2 individuals and occurred at only 1 stream. The presence or absence of rare species may be diagnostic of specific environmental changes, so the increased information provided by barcoding at taxon-rich sites allows finer-scale resolution of sources of stress and increases our ability to detect subtle changes in environmental quality.

The conclusions are fairly positive:
The DNA barcoding approach can improve existing BMI [benthic macroinvertebrate]-based bioassessment programs by enabling development of new or improved metrics based on taxonomic groups that currently are under-described and underused. Additional benefits include applications for quality control, taxonomic standardization, and improvement of taxonomic keys (Pilgrim et al. 2011, Sweeney et al. 2011). Barcoding probably will be used with increasing frequency to augment or support existing methods and to provide cost-effective improvement of taxonomic capacity.

Potential challenges and solutions to meet them are also discussed:
However, full integration of barcode data in routine bioassessment will be challenging. First, a robust barcode reference library must be developed and vouchered. Standard handling and quality-control procedures must be developed to reduce risk of loss of samples because of contamination or DNA degradation (as happened for 1 of the sites in our study). Improved primers are needed for certain taxonomic groups to minimize bias caused by differential amplification. More research is needed on the effect of short-sequence reads on conclusions about taxonomic resolution.

Overall this paper is a very good read. It is clear, concise, and provides an objective assessment of a new approach to freshwater bioassessments.

Wednesday, April 16, 2014

Hydrological niche segregation

Mountain Fynbos
Fynbos is a natural shrubland or heathland vegetation occurring in a small belt of the Western Cape of South Africa, mainly in areas with a Mediterranean climate. Fynbos is known for its exceptional degree of biodiversity and endemism. As this floral community has the capacity to regenerate after fire it provides an opportunity to study the genesis of a variety of ecological phenomena such as hydrological niche segregation.

Species in plant communities normally separate along fine-scale hydrological gradients. Different plant species settle in different ecological niches based on the availability of water in the soil. One open question is at which stage of a plant's life history this segregation actually happens. one hypothesis is that it starts at the seedling stage because it is the most vulnerable as it is most prone to drought, competition, herbivory and disease. A group of researchers from the UK and Switzerland put this hypothesis to the test with a soil translocation experiment performed in the fynbos in South Africa, after a fire and before seed germination had started:

Following wildfires at two field sites where we had previously mapped the vegetation and monitored the hydrology, seeds were moved experimentally in >2500 intact soil cores up and down soil-moisture gradients to test the hypothesis that hydrological niche segregation is established during the seedling phase of the life cycle. Seedling numbers and growth were then monitored and they were identified using DNA Barcoding, the first use of this technology for an experiment of this kind.

The study focused on endemic species in the family Restionaceae because it is species-rich, ubiquitous, contains many keystone species and most species have been sequenced for the matK gene which is one of the plant DNA Barcode markers for plants.

According to the results of the study seedling growth on hydrological gradients in the field is affected by soil moisture status and by root competition. This means that hydrological niche segregation could indeed potentially originate in the seedling stage. In particular below-ground competition seems to be decisive in determining a species' hydrological niche. Fynbos species, as in other fire-prone plant communities, divide between those that regenerate from seed and those that resprout. The resprouters were probably the chief source of below-ground competition for seedlings in our experiment

Tuesday, April 15, 2014


Regular readers of my blog might have noticed that I have a weak spot for the DIY movement especially in biology and biotechnology. I try to follow the news in this sector and secured them a spot in the DNA Barcoding Bulletin that we produce quarterly. That being said, I was surprised that I didn't find out about the BioCoder newsletter that is published by O’Reilly.

O'Reilly is an american media company that mostly publishes books on computer technology topics. Their distinctive brand features a woodcut of an animal on many of their book covers. If you don't know what I am talking about I suggest you type in "O’Reilly covers" in a Google Image search. The animal illustrations are quite beautiful. I have a few of their books in my shelf and my favorite is a book on BLAST showing a coelacanth on the cover (see image below).

So what is BioCoder? Here is what their website has to say about that:
We’re at the start of a revolution that will transform our lives as radically as the computer revolution of the 70s. The biological revolution will touch every aspect of our lives: food and health, certainly, but also art, recreation, law, business, and much more.

BioCoder is the newsletter of that revolution. It’s about biology as it moves from research labs into startup incubators, hacker spaces, and even homes. It’s about a very old programming language that we’re just beginning to understand, and that’s written in a code made up of organic chemicals. It’s the product of a sharing community of scientists that stretches from grade school to post docs and university faculty.

The new spring issue contains an article about DNA Barcoding of fungi coming from a DIY lab in Victoria here in Canada. The organism choice clearly tells me that DIY people are up for challenges and not necessarily aiming for the low hanging fruits. The article is pretty interesting also from a technical standpoint and there is a part II following in the next issue. 

Great new resource. Well, not that new. This was their third issue. So, it is new to me but I am sure not news to the DIY biohack community.

Monday, April 14, 2014

When pharmaceuticals become too effective

Sepsid fly
The veterinary pharmaceutical ivermectin has been used for more than thirty years all over the world to combat parasites like roundworms, lice and mites in livestock and pets. The active ingredient belongs to the chemical group of avermectins, which generally disrupt cell transport. However, when ivermectin is used in high dosage excess quantities are excreted in the faeces of treated animals which also harms dung-degrading beneficial insects like dung beetles and dung flies. This has a profound impact on the the functioning of surrounding ecosystems. In extreme cases the dung is not decomposed and the pasture is destroyed.

Since 2000 public regulators in many countries therefore mandate standardized safety tests for the use of avermectin derivatives. A research team consisting of scientists from the University of Zurich and an ecotoxicology company in Germany, has now shown that the currently used safety tests are not able to sufficiently prevent environmental damage. Even closely related dung organisms react with varying degrees of sensitivity to the same veterinary pharmaceutical.

The group examined 23 species of sepsid flies that typically live in cow dung. It turns out that individual species vary by a factor of 500 in their sensitivity to ivermectin. Standardized safety tests typically performed in toxicology laboratories today are based on single, arbitrarily selected dung organisms. This poses the considerable risk that more sensitive species will continue to be harmed by ivermectin and that important ecosystem functions will suffer long-term damage as a consequence. In order to prevent this, safety tests should be extended to include a representative selection of all dung-degrading organisms, if not the entire community:

We close by reiterating that sepsid flies are very well suited as test organisms for any toxic residues in the dung of livestock or other large vertebrates, due to their ease and speed of rearing and handling. While the choice of a particular species will be crucial because species vary strongly in sensitivity, use of several local species can offset the arbitrariness of choice to some degree, rendering overall representative results. Sepsids as ecotoxicological test organisms could be particularly useful and economical in the tropics, where high-tech laboratory equipment is often not available.

By including more species in the tests costs for the authorization process would increase especially because all relevant organisms would need to be properly identified. For that reason the authors suggest to include DNA Barcoding in the test protocol as its inclusion would represent a rather modest increase in costs.

Friday, April 11, 2014

A different take on Escargot

Gastropod shells and bodies extracted after microwaving
And today for something completely different. Let's start with a description of the problem:
Extracting DNA from gastropods presents particular difficulties due to the capacity of the living animal to retract into the shell, resulting in poor penetration of the ethanol into the tissues. Because the shell is essential to establish the link between sequences and traditional taxonomic identity, cracking the shell to facilitate fixation is not ideal. 

This sounds very familiar to me. While working on my masters project I had to remove tissue from coiled shells of a number terrestrial gastropods and some of those specimens were quite small and delicate. Most of the time I was working with a dissecting probe which tip I had bent to be able to reach the fully retracted animal. A very tedious and not always successful method to retrieve a tiny tissue sample for DNA analysis. Over the years a variety of methods to retrieve tissue without damaging the shell have been developed but for the most part they are suffering from the same problem. Due to the fact that they all take a fair bit of time they are not useful for large scale surveys or expeditions.

In a new paper a group of French researchers present an alternative method for the easy, efficient and nondestructive tissue removal from shells. It involves the use of a regular microwave oven. The use of microwaves in molecular biology is actually not unknown and has been applied in the extraction of DNA from viruses, bacteria, soil micro-organisms, and animal tissue. The colleagues placed the living gastropods in a microwave oven in which the electromagnetic radiation very quickly heats both the animal and the water trapped inside the shell, which results in the separation of the muscles that anchor the animal to the shell. If done properly, the body can be removed intact from the shell and the shell voucher is undamaged as well. The authors conducted comparative tests to find out if microwaving the snail tissue will have any effect on DNA extraction or subsequent PCRs. They couldn't find any difference in DNA quantity or quality.

The method was then implemented on a large scale during expeditions, resulting in higher percentage of DNA extraction success. The microwaves are also effective for quickly and easily removing other molluscs from their shells, that is, bivalves and scaphopods. Workflows implementing the microwave technique show a three- to fivefold increase in productivity compared with other methods.

That seems to be worth the effort. I wish we had thought of that 12 years ago.

Thursday, April 10, 2014

...and another record

Paul Hebert documenting the exciting find
Today's post shows the way in which sampling programs launched for barcode programs can deliver unexpected surprises.

Back in 2006 our institute decided to engage in the International Polar Year, a large scientific program that focused on the Arctic and the Antarctic and officially covered two full annual cycles from March 2007 to March 2009. Our contribution was to develop a comprehensive biodiversity inventory for a sub-arctic region, in our case Churchill, Manitoba. There is a variety of reasons to chose this spot of all in Canada. Churchill is situated along the Hudson Bay seacoast at the meeting of three major biomes: marine, northern boreal forest, and tundra which makes it biologically very interesting. Furthermore, Churchill is home to an accessible and active research centre/station which provides accommodations, meals, equipment rentals, and logistical support to researchers. A lot of stations in Canada's North have been closed over the past years and only recently it was decided to build a new one in the High Arctic. Another contributing factor was that our department ran arctic ecology courses at this particular station enabling us to engange students in the inventory work. The goal was rather simple - a comprehensive inventory of all live in the Churchill region and explicitly using DNA Barcoding to accomplish this. 

I had the chance to participate in two expeditions and I vividly remember the first one of them in 2006. We had about 20 highly motivated students and at least 10 senior researchers. One day we encountered a moth that was actually a rather rare visitor to the region and a recent paper now proofs that it was actually the most northerly find ever. The Black Witch Moth (Ascalapha odorata) is a seasonal migrant to more northerly regions of North America but it was never found that far north. It is thought to breed in Central America and the southernmost United States. 

So after the German altitude record not long ago we have another record for the books.

Wednesday, April 9, 2014

Barcodes to validate Mitogenomes

mtDNA (image 'stolen' here)
Today I found an article published in Mitchondrial DNA which tackles a problem that I have encountered myself in a couple of situations. Unfortunately, the article is hiding behind a paywall even for me at a university with rather good library access. This is particularly frustrating given the rather important message and recommendations provided in the publication.

The researchers deal with the issue coming from the misidentification of biological samples used for generating entire mitogenomes. As a consequence mitgenomes are attributed to incorrect species. This can have even more profound implications if the misidentified sequence ends up as the reference mitochondrial genome for the species in public curated databases such as the RefSeq section of GenBank. Large genomic databases are often used for annotation of unknown genes. As a result errors propagate quickly and a wrong species ID will spread across the entire database. That is not a fault of the people that designed and operate databases but rather of the data submitters that often fail to do their due diligence. Other problems that could reduce the quality of mitgenomic data are the potential occurrence of NUMTs or contamination which are also known issues in DNA Barcoding research. 

The example the colleagues used are in relation to a recently published sequence of the complete mitochondrial genome of a bat called Leschenault’s rousette (Rousettus leschenaultii), allegedly providing the second mitogenome for this genus of pteropodid bats in addition to the available Egyptian fruit bat (Rousettus aegyptiacus). By re-analyzing the mitogenome in comparison with available mitochondrial sequences, the authors were able to show that this sequence does not belong to Rousettus leschenaultii and that it is most probably a second mitochondrial genome for Rousettus aegyptiacus

I can relate to this as I share this experience. When I started working on an analysis of mitochondrial fish genes the first step was to download all available mitogenomes or better all coding genes of those. As my analysis also included a comparison of divergence values for different lengths of COI I started with identification runs on BOLD for each sequence and sure enough I found two mitogenomes that were not identified correctly. The identifications were actually way off and could not be explained by any variation in the gene region. I knew the sequences on BOLD were properly identified by experts and when I looked at the original publication that used the mitogenomic data I couldn't find any information on the location the fish were collected let alone any voucher information. In the end I had to leave the sequences out of any further analysis.

The problem is not new and supposedly well known in the scientific community although it might have been underestimated. What makes this paper unique though is a set of recommendations provided to help with quality control and the short list that follows should be hanging over each desk belonging to a genomic researcher:

(1) Provide detailed information on the origin of the sample used for mitogenomic sequencing. 
Ideally the sample should be attached to a specimen voucher deposited in a recognized museum and accessible through multi-institution, multi-collection databases.

(2) Conduct a phylogenetic analysis of the new mitogenome in the context of closely related species.
We therefore suggest using the 20 phylogenetically closest taxa that should allow for a clear depiction of both the evolutionary affinities of the new mitgenome and the degree of divergence as compared to its closest relatives.

(3) Provide a barcoding identification assessment of the sample thanks to a ML tree based on the closest available sequences.
..the strength of these databases [BOLD and Genbank] relies on the detection of misidentified sequences provided that sequences are available for the same marker for different individuals and populations of a given taxon.

Tuesday, April 8, 2014

New tools review

Today a post about new developments from the world of DNA barcoding informatics. I selected three publications of the last few months that actually provide some new package worth to be tested by the community. Without further ado my little collection of new bioinformatic releases.

This idea starts with the notion that a modern DNA barcoding approach should incorporate the multispecies coalescent. The multispecies coalescent model was developed as a framework to infer species phylogenies from multilocus genetic data collected from multiple individuals. It assumes that speciation occurs at a specific point in time, after which two new species evolve in total isolation. However, in reality speciation may occur over an extended period of time, during which the two sister lineages likely remain in some sort of contact. Inferring phylogenies with multiple species under those conditions is actually very difficult and requires a fair amount of computation time. Using the approach with DNA Barcode data is a little simpler as one element of complexity has been removed by using only one gene region or as in this publication just two as the authors make the following bold statement:  recent developments make a barcoding approach that utilizes a single locus outdated. I beg to differ as I don't see the point of sequencing a plethora of loci when one does the trick already but that is a topic for another post sometime. The nice thing about this approach is the fact that it utilizes already existing software and algorithms. Everyone familiar with this approach should find it easy to follow their recipe that uses BPP (Yang and Rannala 2010) and *BEAST to produce a guide tree for the subsequent BPP analysis. 

This coalescent-based *BEAST/BPP approach was used to identify species boundaries. The colleagues used a test set of Sarcophaga species to compare a distance based approach with their new method: We found that, of  the 31 species of Sarcophaga examined..., 27 could be reliably distinguished by barcoding when a 4% sequence divergence threshold was applied. The four problematic taxa were S. megafilosia, S. meiofilosia, S. crassipalpis and S ruficornis. S. megafilosia and S. meiofilosia had an interspecific divergence of 2.81%, while S. crassipalpis and S ruficornis had an interspecific divergence of 3.75%. The success rate of barcoding for this set of taxa is thus 87%, while the *BEAST/BPP approach had a success rate of 100%. 

The only question I have is why divergence values of 2.81% and 3.75% where considered problematic in the first place. That can only happen if a fixed value is used to define species boundaries. Who does that?

ExcaliBAR is a small routine to facilitate one important initial step in DNA Barcoding analyses, namely the determination of the barcoding gap between pairwise genetic distances among and within species, based on original distance matrices computed by MEGA software. In addition, the software is able to rename sequences downloaded via the standard user interfaces of public databases such as GenBank, without the need of developing and applying specific scripts for this purpose.
This is an interesting little tool although I have to admit that aside from the very useful renaming of sequence names which make the resulting file compatible with other software I don't see the full advantage. From what I understood reading the paper the routine takes a MEGA output containing a pairwise distance matrix. ExcaliBAR then calculates intra- or interspecific pairwise distances that can be exported e.g. into Excel to determine a threshold above which sequences are likely to represent different species. The authors claim that the program is actually performing better than other software such as ABGD or SpideR. They even have the guts to state that similar to the other program the ‘Barcode Gap Analysis’ option on BOLD was not devised to handle large datasets. I beg to differ as BOLD is probably still the best option available to deal with large datasets and it has been criticized for using distance based methods to accomplish this. ExcaliBAR still needs about 30 min to process a matrix generated from 5000 DNA Barcodes. Not bad but are they really better than others?

One criticism provided in the publication on ExcaliBAR was about the fact that some programs are using R. R is a free software programming language and software environment for statistical computing and graphics, and yes, there is a bit of a learning curve involved to develop the mastery of using it properly.

Adhoc is a new method to deal with incomplete reference libraries of DNA barcodes is based on ad hoc distance thresholds that are calculated for each library considering the estimated probability of relative identification errors. By using each sequence of a reference library as a query against all other reference sequences the program can calculate the relative identification error (RE) of the best close match method. Prior to that Adhoc generates some basic descriptive statistics of the imported dataset providing  two tables containing species names, full sequence identifiers, and numbers of sequences and haplotypes for each species. It also returns the length of each reference sequence, calculates all pairwise distances and separates intra- and interspecific pairwise comparisons. In their publication the authors also provide a very important disclaimer:
This method has been developed for specimen identification. It is intended to optimise the identification success rate by adapting the distance threshold according to a RE estimated from a particular reference library. Hence, using this method for species delimitation requires a careful interpretation of the output.

I think that disclaimer should be found on every bioinformatic tool for DNA Barcoding.

Friday, April 4, 2014

Cover cropping

Cover crops are typically defined as crops used to protect agricultural soils and to improve soil productivity. Historically, farmers have relied on green manure crops to add nutrients and organic matter to their soil. Typically, green manure crops are grown for a specific period of time, and then plowed under and incorporated into the soil while green or shortly after flowering. Cover crops have also been used to protect the soil from wind and water erosion, to interrupt disease cycles and suppress weeds; and sometimes as supplemental feed for livestock or to provide an additional food source for pollinators and other beneficial insects. 

This traditional form of plant diversification may also promote natural regulation of agricultural pests by supporting alternative prey that in turn enable the increase of generalist arthropod predator densities and diversities. The larger the densities of these predators, the higher the consumption of herbivore pests - provided that the pest remains the favorite prey. However, predator diet composition changes induced by cover cropping are poorly understood.

A group of French researchers used a metabarcoding approach to assess the diet of eight ground-dwelling predators commonly found in banana plantations in Martinique. They used a shortened fragment of COI from the gut contents of predators to identify their prey and to identify predators of the major pest of banana, Cosmopolites sordidus.  The researchers were particularly interested in differences in the composition of predator diets between a bare soil plot and a cover cropped plot of the banana plantation as the cover crop Brachiaria decumbens is increasingly used to control weeds and improve physical soil properties. They were able to demonstrate that the use of a cover crop in banana plantations altered the arthropod food web, with significant changes in the frequency of consumption of some of the prey. An increase in alternative prey in the diet of the predators induces a diet shift that seems to dampen the positive effects of cover crops on pest regulation. The predators actually increase consumption of non-pests without increasing consumption of pests. 

The study closes with a general assessment of the use of metabarcoding for research on trophic interactions:
In conclusion, it is essential to disentangle trophic interactions in order to achieve a better understanding of ecosystem resilience and persistence following disturbances, such as plant diversification. DNA metabarcoding allows direct inference of trophic interactions and enables the assessment of arthropod diet. Although the method has limitations, including the inability to discriminate between direct predation, secondary predation, and scavenging, it has the potential to be very useful for describing arthropod food webs. Here, we identified new and unexpected trophic interactions in the predator–prey system in banana plantations. The accurate determination of trophic networks will challenge current models of trophic interactions and will contribute to food web theory and ecosystem management. In addition to its application to individual food webs, DNA metabarcoding could be used to link different food webs, such as those that describe micro-organisms, plants, arthropods, and larger animals.

Thursday, April 3, 2014

Apes’ insect buffet

Fruit and leaves are known to be the main component to the diet of great apes such as gorilla, chimpanzee, bonobo, orangutan and gibbon. However, these non-human primates have also been known to feast on insects, but this behavior has been difficult to understand and to track in the wild. To this date insect consumption by apes has been reported based on direct observations or trail signs in feces. 

Now, a group of researchers of Aix-Marseille Universit√© in Marseilles and the University of Montpellier have gained further insight into apes’ insect-eating habits by using DNA Barcoding.

The researchers analyzed fecal samples from gorillas, bonobos, and chimpanzees and they were able to identify 106 different species from 32 families of insects, including flies, beetles, butterflies, moths, mosquitoes, and termites. Surprisingly, they did not find any ant or bee species within the samples, although they noted that could have been a result of DNA degradation in primate guts.

Compared with behavioral observations and/or analysis of trail signs in ape feces, we found many previously unknown insect families that are consumed by African great apes. Many insects, such as species in the orders Coleoptera and Lepidoptera or caterpillars detected in this study have strong associations with plants. Some of these insect species, such as member of family Chrysomelidae, feed on different plant parts. Consequently, they could be eaten incidentally (secondary predation) when African great apes feed on plants. Thus, one advantage of using a molecular approach to examine insects consumed is the inclusion of those consumed via secondary predation. These species may not have been otherwise detected through classical approaches, but they are still components of the diet that have nutritional value. These indirectly eaten insects may also contribute to an understanding of the feeding ecology and foraging strategy of a species.

Tuesday, April 1, 2014

Taxonomy is slow

A frequent misconception of the discovery process is that new species are recognized as new in the field. This is not the case, most newly collected specimens are archived in museums and herbaria: collections thus act as a reservoir of potential new species. Vast collections of plants, vertebrates and insects have already been accumulated in museum vaults, representing a huge amount of unstudied material — this probably explains why these taxa have longer shelf lives than, for instance, fungi and invertebrates excluding insects, which are comparatively underrepresented in museum collections.

This paragraph is from a recent study published in Current Biology. A reality in taxonomy is that takes time from the first collection of a specimen of a new species to its formal description and naming in the scientific literature. The authors of the study refer to this time span as 'shelf life'. It is common knowledge among researchers in biodiversity science and taxonomy that this shelf life can be quite long and varies between groups but so far nobody actually tried to estimate these time spans more systematically.

Although the numbers don't surprise me very much they are still alarming. According to the study the average 'shelf life' between field discovery of a new species and its formal description is twenty-one years ranging between 206 and zero years. The authors of the study have also looked at variation in different taxonomic groups or demographic factors that might influence the shelf life. The figure on the left shows a couple of those comparisons.

Such high values are clearly a  symptom of the so-called taxonomic impediment, in particular the shortage of taxonomists but there are also technological and methodological restrictions on data analysis and publication norms that are usually highlighted as a handicap to rapid species delineation and description.

However, we might be on a good way to help with the latter. Researchers and the public can now have immediate access to data underlying discovery of new species of life on Earth, under a new streamlined system linking taxonomic research with open data publication. A new partnership paves the way for unlocking and preserving a wealth of 'small data' backing up research conclusions, which often become lost within a few years of an article's publication in an academic journal:

A group of scientists and students discovered a new species of spider during a field course in Borneo. The species was described and submitted online from the field to the Biodiversity Data Journal through a satellite internet connection, along with the underlying data. The manuscript was peer-reviewed and published within two weeks of submission. On the day of publication, the Global Biodiversity Information Facility (GBIF) and the Encyclopedia of Life (EOL) have harvested and included the data in their respective platforms.

This is a radically shortened workflow with data shared and ready for re-use almost immediately. The full description is also published in an open source journal making all information fully accessible to everyone without any fee-barriers. I am very curious how this approach will develop and how it will be accepted by the taxonomic community. It certainly promises to become one factor that could bring down the average shelf life from 21 years to more reasonable figures.