Wednesday, June 28, 2017

Shelf Life Webseries

Today a post on an interesting video series which is produced by the American Museum of Natural History. I have to admit I wasn't aware of it at all until a few days ago when the newest episode on cryptic species was shared with me:

The web series Shelf Life highlights different aspects of the museums work and is certainly not the only one out there which is produced my the museum itself. The Smithsonian e.g. has its own channel with countless short videos on many many topics. What I like about the AMNH series is the right mixture of length (never over 10 min), topic and production quality. it all started about three years ago with this one:

h/t Paul Hebert

Monday, June 26, 2017

Image Data Resource

Much of the published research in the life sciences is based on image data sets that sample 3D space, time and the spectral characteristics of detected signal to provide quantitative measures of cell, tissue and organismal processes and structures. The sheer size of biological image data sets makes data submission, handling and publication challenging. An image-based genome-wide 'high-content' screen (HCS) may contain more than 1 million images, and new 'virtual slide' and 'light sheet' tissue imaging technologies generate individual images that contain gigapixels of data showing tissues or whole organisms at subcellular resolutions. At the same time, published versions of image data are often mere illustrations: they are presented in processed, compressed formats that cannot convey the measurements and multiple dimensions contained in the original image data and cannot easily be reanalyzed. Furthermore, conventional publications do not include the metadata that define imaging protocols, biological systems and perturbations or the processing and analytic outputs that convert the image data into quantitative measurements.

There are many resources worldwide in which people publish imaging data, but none of these repositories is both generic and linked to other relevant bio-molecular data. This means that for all the effort that goes into them, it is difficult to reuse these datasets in new studies. There are many reasons why sharing imaging data has been so difficult until now, most notably the heterogeneity and complexity of the image data, but also the lack of a critical mass of storage, compute and curation expertise.

To address this challenge, scientists at the University of Dundee, the European Bioinformatics Institute (EMBL-EBI), the University of Bristol and the University of Cambridge have launched a prototype repository for imaging data: the Image Data Resource (IDR). The new resource integrates imaging data with molecular and phenotype data. IDR includes information on experimental protocols: parameters, analyses and the effects scientists have observed in cells and features, for example.

To demonstrate the power of the new repository the researchers used data deposited in the IDR to identify genes from different studies that, when mutated or removed, caused cells to elongate and stretch out. Information from several different studies was used to built a gene network, which provides insights into how these genes affect cell shape which is an important property to consider in metastatic cancer. 

The prototype public image repository contains a broad range of data, including:

  • High-content screening
  • Super-resolution microscopy
  • Time-lapse imaging
  • Digital pathology imaging
  • Experimental protocol metadata
  • Observed effects in cells and features
  • Cross references with molecular archives

The next step is to secure the support and investment needed to transform the prototype into a production-ready imaging infrastructure. IDR's software and technology is open source, so it can be accessed and built into other image data publication systems. At this point this new project focuses on microscopic imaging but why not expanding into images of entire organisms or specific traits?

Friday, June 23, 2017

Weekend reads

New reading material for the weekend or for those of you that are blessed with some better weather perhaps for Monday morning back at work. 

Bird remains that are difficult to identify taxonomically using morphological methods, are common in the palaeontological record. Other types of challenging avian material include artefacts and food items from endangered taxa, as well as remains from aircraft strikes. We here present a DNA-based method that enables taxonomic identification of bird remains, even from material where the DNA is heavily degraded. The method is based on the amplification and sequencing of two short variable parts of the 16S region in the mitochondrial genome. To demonstrate the applicability of this approach, we evaluated the method on a set of Holocene and Late Pleistocene postcranial bird bones from several palaeontological and archaeological sites in Europe with good success.

Community-level data, the type generated by an increasing number of metabarcoding studies, is often graphed as stacked bar charts or pie graphs that use color to represent taxa. These graph types do not convey the hierarchical structure of taxonomic classifications and are limited by the use of color for categories. As an alternative, we developed metacoder, an R package for easily parsing, manipulating, and graphing publication-ready plots of hierarchical data. Metacoder includes a dynamic and flexible function that can parse most text-based formats that contain taxonomic classifications, taxon names, taxon identifiers, or sequence identifiers. Metacoder can then subset, sample, and order this parsed data using a set of intuitive functions that take into account the hierarchical nature of the data. Finally, an extremely flexible plotting function enables quantitative representation of up to 4 arbitrary statistics simultaneously in a tree format by mapping statistics to the color and size of tree nodes and edges. Metacoder also allows exploration of barcode primer bias by integrating functions to run digital PCR. Although it has been designed for data from metabarcoding research, metacoder can easily be applied to any data that has a hierarchical component such as gene ontology or geographic location data. Our package complements currently available tools for community analysis and is provided open source with an extensive online user manual.

Studying taxonomic and ecological diversity of phytoplankton assemblages is often difficult because morphological analysis cannot provide a complete description of their composition. Therefore, more robust and feasible approaches have to be chosen to elucidate the interactions between environmental and human pressures and phytoplankton assemblages. The Ocean Sampling Day (OSD) allowed collecting seawater samples from a wide range of oceanic regions including the Mediterranean Sea. In this study, a total of 754,167 V4-18S ribosomal DNA (rDNA) metabarcodes derived from 20 plankton samples collected at 19 sampling sites across the coastal areas of the Mediterranean Sea were analyzed to explore the relationships between phytoplankton assemblages' composition, sub-regional environmental features and human pressures. We reduced the whole set of autotroph plankton (1398 OTUs) to a smaller number of ecologically relevant entities (205 taxa) and used the latter for analysing the structure of phytoplankton assemblages. Chaetoceros was the only genus occurring in all the samples, while the number of taxa was maximum in the W Mediterranean. Based on the assigned OTUs, the structure of E Mediterranean phytoplankton was the most homogeneous. Further, phytoplankton assemblages from the three Mediterranean sub-regions (Western, Adriatic and Eastern) were significantly different (R=0.25, p=0.0136) based on Jaccard similarity. We also observed that phytoplankton diversity and human impact on marine ecosystems were not significantly related to each other based on Mantel's test.

Human impact on marine benthic communities has traditionally been assessed using visible morphological traits and has focused on the macrobenthos, whereas the ecologically important organisms of the meio- and microbenthos have received less attention. DNA metabarcoding offers an alternative to this approach and enables a larger fraction of the biodiversity in marine sediments to be monitored in a cost-efficient manner. Although this methodology remains poorly standardised and challenged by biases inherent to rRNA copy number variation, DNA extraction, PCR, and limitations related to taxonomic identification, it has been shown to be semi-quantitative and useful for comparing taxon abundances between samples. Here, we evaluate the effect of replicating genomic DNA extraction in order to counteract small scale spatial heterogeneity and improve diversity and community structure estimates in metabarcoding-based monitoring. For this purpose, we used ten technical replicates from three different marine sediment samples. The effect of sequence depth was also assessed, and in silico pooling of DNA extraction replicates carried out in order to maintain the number of reads constant. Our analyses demonstrated that both sequencing depth and DNA extraction replicates could improve diversity estimates as well as the ability to separate samples with different characteristics. We could not identify a "sufficient" replicate number or sequence depth, where further improvements had a less significant effect. Based on these results, we consider replication an attractive alternative to directly increasing the amount of sample used for DNA extraction and strongly recommend it for future metabarcoding studies and routine assessments of sediment biodiversity.

Terrestrial animals must have frequent contact with water to survive, implying that environmental DNA (eDNA) originating from those animals should be detectable from places containing water in terrestrial ecosystems. Aiming to detect the presence of terrestrial mammals using forest water samples, we applied a set of universal PCR primers (MiMammal, a modified version of fish universal primers) for metabarcoding mammalian eDNA. The versatility of MiMammal primers was tested in silico and by amplifying DNAs extracted from tissues. The results suggested that MiMammal primers are capable of amplifying and distinguishing a diverse group of mammalian species. In addition, analyses of water samples from zoo cages of mammals with known species composition suggested that MiMammal primers could successfully detect mammalian species from water samples in the field. Then, we performed an experiment to detect mammals from natural ecosystems by collecting five 500-ml water samples from ponds in two cool-temperate forests in Hokkaido, northern Japan. MiMammal amplicon libraries were constructed using eDNA extracted from water samples, and sequences generated by Illumina MiSeq were subjected to data processing and taxonomic assignment. We thereby detected multiple species of mammals common to the sampling areas, including deer (Cervus nippon), mouse (Mus musculus), vole (Myodes rufocanus), raccoon (Procyon lotor), rat (Rattus norvegicus) and shrew (Sorex unguiculatus). Many previous applications of the eDNA metabarcoding approach have been limited to aquatic/semiaquatic systems, but the results presented here show that the approach is also promising even for forest mammal biodiversity surveys.

Benthic communities are key components of aquatic ecosystems' biomonitoring. However, morphology-based species identifications remain a low-throughput, and sometimes ambiguous, approach. Despite metabarcoding methodologies have been applied for above-species taxa inventories in marine meiofaunal communities, a comprehensive approach providing species-level identifications for estuarine macrobenthic communities is still lacking. Here we report a combination of experimental and field studies that demonstrate the aptitude of cytochrome oxidase I (COI) metabarcoding to provide robust species-level identifications within a framework of high-throughput monitoring of estuarine macrobenthic communities. To investigate the ability to recover DNA barcodes from all species present in a bulk community DNA extract, we assembled experimentally 3 phylogenetically diverse communities, and used in each 4 different primer pairs to generate an equal number of different PCR products of the COI barcode region. Between 78 and 83% of the species in the tested communities were recovered through multi-primer high throughput sequencing (HTS). Two primer pairs were sufficient to attain these recovery rates. Subsequently, we compared morphology and metabarcoding-based approaches to determine the species composition of macrobenthos from four distinct sites of the Sado estuary, Portugal. Our results indicate that the species richness would be considerably underestimated if only morphological methods were used. Although further refinement is required for improving the efficiency and output of this approach, here we show the great aptitude of COI-multi-primer metabarcoding to provide high quality and auditable species identifications in macrobenthos monitoring.

Wednesday, June 21, 2017

2017 GBIF Ebbe Nielsen Challenge

For the third year GBIF is running its Ebbe Nielsen Challenge. Developers and data scientists have three months to create and submit tools capable of liberating species records from open data repositories for scientific discovery and reuse. Here some more details:

This year's Challenge will seek to leverage the growth of open data policies among scientific journals and research funders, which require researchers to make the data underlying their findings publicly available. Adoption of these policies represents an important first step toward increasing openness, transparency and reproducibility across all scientific domains, including biodiversity-related research.

To abide by these requirements, researchers often deposit datasets in public open-access repositories. Potential users are then able to find and access the data through repositories as well as data aggregators like OpenAIRE and DataONE. Many of these datasets are already structured in tables that contain the basic elements of biodiversity information needed to build species occurrence records: scientific names, dates, and geographic locations, among others.

However, the practices adopted by most repositories, funders and journals do not yet encourage the use of standardized formats. This approach significantly limits the interoperability and reuse of these datasets. As a result, the wider reuse of data implied if not stated by many open data policies falls short, even in cases where open licensing designations (like those provided through Creative Commons) seem to encourage it.

The challenge
The 2017 GBIF Ebbe Nielsen Challenge seeks submissions that repurpose these datasets and adapting them into the Darwin Core Archive format (DwC-A), the interoperable and reusable standard that powers the publication of almost 800 million species occurrence records from the nearly 1,000 worldwide institutions now active in the GBIF network.

The 2017 Ebbe Nielsen Challenge will task developers and data scientists to create web applications, scripts or other tools that automate the discovery and extraction of relevant biodiversity data from open data repositories. Such tools might generate datasets ready for publication on by:

  • Automating searches of open data available in public repositories
  • Effectively mining the information needed to generate checklists, species occurrence and sampling-event datasets (e.g. scientific names, date and location of occurrence et al.) from datasets in these repositories
  • Mapping datasets’ column headings and/or contents with standardized Darwin Core terms
  • Routinely converting the reformatted data into Darwin Core archive formats ready for publication through

Friday, June 16, 2017

Weekend reads

Hot of the press - more reading material from the DNA barcoding community. Not as many as last week in which I had a lot of catch up to do. Nevertheless, very interesting reads.

Thirty-four species of Culicidae are present in the UK, of which 15 have been implicated as potential vectors of arthropod-borne viruses such as West Nile virus. Identification of mosquito feeding preferences is paramount to the understanding of vector-host-pathogen interactions which, in turn, would assist in the control of disease outbreaks. Results are presented on the application of DNA barcoding for vertebrate species identification in blood-fed female mosquitoes in rural locations. Blood-fed females (n = 134) were collected in southern England from rural sites and identified based on morphological criteria. Blood meals from 59 specimens (44%) were identified as feeding on eight hosts: European rabbit, cow, human, barn swallow, dog, great tit, magpie and blackbird. Analysis of the cytochrome c oxidase subunit I mtDNA barcoding region and the internal transcribed spacer 2 rDNA region of the specimens morphologically identified as Anopheles maculipennis s.l. revealed the presence of An. atroparvus and An. messeae. A similar analysis of specimens morphologically identified as Culex pipiens/Cx. torrentium showed all specimens to be Cx. pipiens (typical form). This study demonstrates the importance of using molecular techniques to support species-level identification in blood-fed mosquitoes to maximize the information obtained in studies investigating host feeding patterns.

We used a 227-bp fragment of the mitochondrial gene cytochrome oxidase I (DNA "barcode") in conjunction with morphological data to study specimens of the Neotropical genus Orthocomotis Dognin, 1906, acquired from natural history collections. We examined over 20 species of Orthocomotis from 17 localities in Colombia, Ecuador, and Peru. The analysis identified 32 haplotypes among the 62 specimens and found no haplotypes shared among species. The molecular study revealed not only the usefulness of short COI sequences in discriminating among Orthocomotis species but also showed distinctness of four clusters which correspond to those based on morphological (genitalia) characters. Moreover, the molecular results suggest the occurrence of rapid speciation in Orthocomotis. We hypothesize that this may be linked to the great biodiversity of potential host plants in Neotropical ecosystems.

Taxonomic identification of pollen has historically been accomplished via light microscopy but requires specialized knowledge and reference collections, particularly when identification to lower taxonomic levels is necessary. Recently, next-generation sequencing technology has been used as a cost-effective alternative for identifying bee-collected pollen; however, this novel approach has not been tested on a spatially or temporally robust number of pollen samples. Here, we compare pollen identification results derived from light microscopy and DNA sequencing techniques with samples collected from honey bee colonies embedded within a gradient of intensive agricultural landscapes in the Northern Great Plains throughout the 2010-2011 growing seasons. We demonstrate that at all taxonomic levels, DNA sequencing was able to discern a greater number of taxa, and was particularly useful for the identification of infrequently detected species. Importantly, substantial phenological overlap did occur for commonly detected taxa using either technique, suggesting that DNA sequencing is an appropriate, and enhancing, substitutive technique for accurately capturing the breadth of bee-collected species of pollen present across agricultural landscapes. We also show that honey bees located in high and low intensity agricultural settings forage on dissimilar plants, though with overlap of the most abundantly collected pollen taxa. We highlight practical applications of utilizing sequencing technology, including addressing ecological issues surrounding land use, climate change, importance of taxa relative to abundance, and evaluating the impact of conservation program habitat enhancement efforts.

Claims abound that the Transvaal red milkwood, Mimusops zeyheri, indigenous to areas with tropical and subtropical commercial fruit trees and fruiting vegetables in South Africa, is relatively pest free owing to its copious concentrations of latex in the above-ground organs. On account of observed fruit fly damage symptoms, a study was conducted to determine whether M. zeyheri was a host to the notorious quarantined Mediterranean fruit fly (Ceratitis capitata).
Fruit samples were kept for 16-21 days in plastic pots containing moist steam-pasteurised growing medium with tops covered with a mesh sheath capable of retaining emerging flies. Microscopic diagnosis of the trapped flies suggested that the morphological characteristics were congruent with those of C. capitata, which was confirmed through cytochrome c oxidase I (COI) gene sequence alignment with a 100% bootstrap value and 99% confidence probability when compared with those from the National Centre for Biotechnology Information database.
This study demonstrated that M. zeyheri is a host of C. capitata. Therefore, C. capitata from infestation reservoirs of M. zeyheri fruit trees could be a major threat to the tropical and subtropical fruit industries in South Africa owing to the fruit-bearing nature of the new host.

International agreements mandate the expansion of Earth's protected-area network as a bulwark against the continued extinction of wild populations, species, and ecosystems. Yet many protected areas are underfunded, poorly managed, and ecologically damaged; the conundrum is how to increase their coverage and effectiveness simultaneously. Innovative restoration and rewilding programmes in Costa Rica's Area de Conservacion Guanacaste and Mozambique's Parque Nacional da Gorongosa highlight how degraded ecosystems can be rehabilitated, expanded, and woven into the cultural fabric of human societies. Worldwide, enormous potential for biodiversity conservation can be realized by upgrading existing nature reserves while harmonizing them with the needs and aspirations of their constituencies.

Seed dispersal constitutes a pivotal process in an increasingly fragmented world, promoting population connectivity, colonization and range shifts in plants. Unveiling how multiple frugivore species disperse seeds through fragmented landscapes, operating as mobile links, has remained elusive owing to methodological constraints for monitoring seed dispersal events. We combine for the first time DNA barcoding and DNA microsatellites to identify, respectively, the frugivore species and the source trees of animal-dispersed seeds in forest and matrix of a fragmented landscape. We found a high functional complementarity among frugivores in terms of seed deposition at different habitats (forest vs. matrix), perches (isolated trees vs. electricity pylons) and matrix sectors (close vs. far from the forest edge), cross-habitat seed fluxes, dispersal distances, and canopy-cover dependency. Seed rain at the landscape-scale, from forest to distant matrix sectors, was characterized by turnovers in the contribution of frugivores and source-tree habitats: open-habitat frugivores replaced forest-dependent frugivores, whereas matrix trees replaced forest trees. As a result of such turnovers, the magnitude of seed rain was evenly distributed between habitats and landscape sectors. We thus uncover key mechanisms behind 'biodiversity-ecosystem function' relationships, in this case, the relationship between frugivore diversity and landscape-scale seed dispersal. Our results reveal the importance of open-habitat frugivores, isolated fruiting trees, and anthropogenic perching sites (infrastructures) in generating seed dispersal events far from the remnant forest, highlighting their potential to drive regeneration dynamics through the matrix. This study helps to broaden the 'mobile link' concept in seed dispersal studies by providing a comprehensive and integrative view of the way in which multiple frugivore species disseminate seeds through real-world landscapes.

Thursday, June 15, 2017

Plants and climate change

Plants provide us with food, pastures for livestock, and places for recreation and wellbeing. They also directly and indirectly provide numerous invaluable ecosystem services such as water regulation, carbon sequestration and flood prevention. As a result, it is imperative that we understand how plant populations are responding to climate constraints now, and use that information to predict how they are likely to respond to climatic changes in the future.

In fact it might be very important to assess the persistence strategies of plants in any given habitat. Noting its mere presence does not paint a very useful picture as a species may be found in a particular area but that doesn't mean it is making much of a living there; it may, just, be making ends meet for the time being. An international group of ecologists tested the links between climate suitability and persistence strategies for nearly 100 populations of over 30 species of trees and herbs growing on 3 continents and 16 countries across the globe. Some of these data were gathered over the duration of a decade, allowing the researchers to identify emergent patterns linked to climate change with greater confidence.

What they found is that while many species are able to persist in less favourable climate conditions, those same species often do so by adopting last-stand strategies such as shrinking in size and temporarily suspending reproductive and vegetative growth. This merely helps them to survive and makes them more vulnerable to further changes and to disturbances such as wildfires or pest outbreaks. Many such disturbances are more likely today due to changing climates.

Not all plants have the life strategies to persist for extended periods of time in less favourable climates but our research is already helping to pinpoint those that do. One of the next steps is to design management strategies to help support these species and to safeguard the ecosystem services that they provide us.

Wednesday, June 14, 2017

Invasive species hotspots

Human-mediated transport beyond biogeographic barriers has led to the introduction and establishment of alien species in new regions worldwide. However, we lack a global picture of established alien species richness for multiple taxonomic groups. 

The number of established alien species varies across the world and it is where the most established alien species can be found and which factors influence their distribution. An international team created a database for eight animal and plant groups (mammals, birds, amphibians, reptiles, fishes, spiders, ants and vascular plants) that were found to occur in regions outside their original habitat. The study of the distribution of these species led the research team to identify 186 islands and 423 mainland regions in total thereby illustrating the global distribution of established alien species. 

The highest number of alien species can be found on islands and in the coastal regions of continents. The island of Hawaii was found to have the most alien species, followed by the north island of New Zealand and the small Sunda Islands of Indonesia. What these places have in common is that they are remote islands that used to be very isolated, lacking some taxa altogether, e.g. mammals. Today, these island regions are economically highly developed and maintain intense trade relationships with the mainlands. 

We found the number of alien species to be particularly high in densely populated areas as well as in economically highly developed ones. These factors increase the likelihood of humans introducing many new species to an area. This almost invariably results in the destruction of natural habitats, which in turn allows non-indigenous species to spread. Islands and coastal regions seem to be particularly vulnerable because they occupy leading roles in global overseas trade. There is yet another considerable risk besides the introduction of new alien species. Many of the alien plants and animals that, until now, have been kept in people's homes and gardens and are not yet to be found in the wild might well spread in the future. Given the word-wide effects of climate change, this is in fact a distinct possibility.

Tuesday, June 13, 2017


Number of samples in the NCBI GEO
Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data.

Researchers routinely deposit data in online repositories. But they are only human and its not rare that they forget to inform a repository to release their data once a paper is published. Open data is a vital pillar of open science, enabling other researchers to reproduce results and use the same datasets to produce novel discoveries. While many scientific journals now require published authors to make the data underlying their findings publicly available, these policies often go unenforced. The challenge is substantial -- the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus repository (GEO) alone contains 80,985 public datasets, spanning hundreds of tissue types in thousands of organisms -- and the rapid growth in data makes it difficult for journals or data repositories to "police" whether datasets that should be made publicly available actually are.

A new tool, developed by University of Washington and Microsoft researchers automatically identifies datasets overdue for public release by applying text mining to dataset references in published articles and parse query results from repositories to determine if the datasets remain private.  The system is called Wide-Open and is available under an open source license on GitHub.

The colleagues tested their tool on two popular data repositories maintained by the NCBI - GEO and the Sequence Read Archive (SRA) . Wide-Open identified a large number of overdue datasets, which spurred repository administrators to respond by releasing 400 datasets within one week.

Monday, June 12, 2017

Defying Muller's Ratchet?

Meloidogyne incognita in action
For most animal species sexual reproduction is favored over asexual reproduction. A proposed mechanism to explain this is Muller's ratchet which assumes that the genomes of an asexual population accumulate deleterious mutations in an irreversible manner. However, this negative effect may not be prevalent in organisms which, while they reproduce asexually, also undergo other forms of recombination. 

Root-knot nematodes (Meloidogyne spp.) exhibit a diversity of reproductive modes ranging from obligatory sexual to fully asexual reproduction. Intriguingly, the most widespread and devastating species to global agriculture are those that reproduce asexually, without meiosis.  Instead of hitting an evolutionary dead-end, these plant pests have a wider geographic range and can infect greater numbers of crops than sexual species. 

To investigate the reasons behind their success, researchers sequenced and assembled the genomes of the three most damaging root-knot nematodes and compared them to a sexual relative. The asexual genomes are large, with numerous duplicated regions resulting from past reproduction events where at least two individual genomes recently hybridized together. They detected signs of positive selection between these gene copies and confirmed functional divergence at the expression pattern level. The colleagues think that it is this peculiar hybrid genome structure that provide these nematodes with a potential for adaptation and plasticity and explains the paradoxical success in the absence of sex:

By analyzing and comparing their genomes, we provide large-scale evidence that these asexual nematodes underwent hybridization and are polyploid. Their duplicated hybrid genome architectures provide these nematodes with multi-copy genes showing diverged sequence and expression patterns where their sexual relatives have very closely related alleles. We suspect these multiple copies provide a reservoir to adapt to different environments and plant hosts, and constitute an evolutionary advantage over their sexual relatives (at least in the short term). Their intriguing parasitic success despite absence of sex could thus be due to their hybrid origin where they combined multiple genomes of adapted parasitic nematodes in one single species.

In addition, Transposable elements (TE) cover a ~1.7 times higher proportion of the genomes of the ameiotic asexual Meloidogyne compared to the sexual relative and might also participate in their plasticity. The intriguing parasitic success of asexually-reproducing Meloidogyne species could be partly explained by their TE-rich composite genomes, resulting from allopolyploidization events, and promoting plasticity and functional divergence between gene copies in the absence of sex and meiosis.

It becomes paramount to understand under what conditions these hybrids came to be. It is a scary thought that similar conditions could favor the rise of even more aggressive and devastating new hybrids.

Friday, June 9, 2017

Weekend reads

After a longer silence due to some changes in the job and travel I am slowly picking up posting duties. I have decided to move the barcoding paper suggestions to Friday and rename these posts. If you happen to have nothing else to do on the weekend or in case you need some good reads for a quite moment, here they are:

Wine is a complex beverage, comprising hundreds of metabolites produced through the action of yeasts and bacteria in fermenting grape must. Commercially, there is now a growing trend away from using wine yeast ( Saccharomyces ) starter cultures, towards the historic practice of uninoculated or "wild" fermentation, where the yeasts and bacteria associated with the grapes and/or winery perform the fermentation. It is the varied metabolic contributions of these numerous non- Saccharomyces species that are thought to impart complexity and desirable taste and aroma attributes to wild ferments in comparison to their inoculated counterparts. To map the microflora of spontaneous fermentation, metagenomic techniques were employed to characterize and monitor the progression of fungal species in five different wild fermentations. Both amplicon-based ribosomal DNA internal transcribed spacer (ITS) phylotyping and shotgun metagenomics were used to assess community structure across different stages of fermentation. While providing a sensitive and highly accurate means of characterizing the wine microbiome, the shotgun metagenomic data also uncovered a significant over-abundance bias in the ITS phylotyping abundance estimations for the common non- Saccharomyces wine yeast genus Metschnikowia . By identifying biases such as that observed for Metschnikowia , abundance mesurements from future ITS-phylotyping datasets can corrected to provide more accurate species representation. Ulitmtaely, as more shotgun metagenomic and single-strain de novo assemblies for key wine species become available, the accuracy of both ITS-amplicon and shotgun studies will greatly increase, providing a powerful methodology for deciphering the influence of the microbial community on the wine flavor and aroma.

Genetic barcodes of arctic medusae and meiobenthic cnidarians have uncovered a fortuitous connection between the medusa Plotocnide borealis Wagner, 1885 and the minute, mud-dwelling polyp Boreohydra simplex Westblad, 1937. Little to no sequence differences exist among independently collected samples identified as Boreohydra simplex and Plotocnide borealis, showing that the two different forms represent a single species that is henceforth known by the older name Plotocnide borealis Wagner, 1885. The polyp form has been observed to produce bulges previously hypothesized to be gonophores, and the results here are consistent with that view. Interestingly, the polyp has also been reported to produce egg cells in the epiderm, a surprising phenomenon that we document here for only the second time. Thus, P. borealis produces eggs in two different life stages, polyp and medusa. This is the first documented case of a metagenetic medusozoan species being able to produce gametes in both the medusa and polyp stage. It remains unclear what environmental/ecological conditions modulate the production of eggs and/or medusa buds in the polyp stage. Similarly, sperm production, fertilization and development are unknown, warranting further studies.

The mosquito family (Diptera: Culicidae) constitutes the most medically important group of arthropods because certain species are vectors of human pathogens. In some parts of the world, the diversity is so high that the accurate delimitation and/or identification of species is challenging. A DNA-based identification system for all animals has been proposed, the so-called DNA barcoding approach. In this study, our objectives were (i) to establish DNA barcode libraries for the mosquitoes of French Guiana based on the COI and the 16S markers, (ii) to compare distance-based and tree-based methods of species delimitation to traditional taxonomy, and (iii) to evaluate the accuracy of each marker in identifying specimens. A total of 266 specimens belonging to 75 morphologically identified species or morphospecies were analyzed allowing us to delimit 86 DNA clusters with only 21 of them already present in the BOLD database. We thus provide a substantial contribution to the global mosquito barcoding initiative. Our results confirm that DNA barcodes can be successfully used to delimit and identify mosquito species with only a few cases where the marker could not distinguish closely related species. Our results also validate the presence of new species identified based on morphology, plus potential cases of cryptic species. We found that both COI and 16S markers performed very well, with successful identifications at the species level of up to 98% for COI and 97% for 16S when compared to traditional taxonomy. This shows great potential for the use of metabarcoding for vector monitoring and eco-epidemiological studies.

Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences ('mislabels') using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at

1. In recent years, large-scale DNA barcoding campaigns have generated an enormous amount of COI barcodes, which are usually stored in NCBI's GenBank and the official Barcode of Life database (BOLD). BOLD data are generally associated with more detailed and better curated meta-data, because a great proportion is based on expert-verified and vouchered material, accessible in public collections. In the course of the initiative German Barcode of Life (GBOL), data were generated for the reference library of 2,846 species of Coleoptera from 13,516 individuals.
2. Confronted with the high effort associated with the identification, verification and data validation, a bioinformatic pipeline, “TaxCI” was developed that i) identifies taxonomic inconsistencies in a given tree topology (optionally including a reference data set), ii) discriminates between different cases of incongruence in order to identify contamination or misidentified specimens, iii) graphically marks those cases in the tree, which finally can be checked again and, if needed, corrected or removed from the dataset. For this, “TaxCI” may use DNA-based species delimitations from other approaches (e.g., mPTP) or may perform implemented threshold-based clustering.
3. The data-processing pipeline was tested on a newly generated set of barcodes, using the available BOLD records as a reference. A data revision based on the first run of the TaxCI tool resulted in the second TaxCI analysis in a taxonomic match ratio very similar to the one recorded from the reference set (92 vs 94%). The revised dataset improved by nearly 20% through this procedure compared to the original, uncorrected one.
4. Overall, the new processing pipeline for DNA barcode data allows for the rapid and easy identification of inconsistencies in large datasets, which can be dealt with before submitting them to public data repositories like BOLD or GenBank. Ultimately, this will increase the quality of submitted data and the speed of data submission, while primarily avoiding the deterioration of the accuracy of the data repositories due to ambiguously identified or contaminated specimens.

Food trade globalization and the growing demand for selected food varieties have led to the intensification of adulteration cases, especially in the form of species substitution/mixing with cheaper taxa. This phenomenon acquired huge economic impact and sometimes even public health implications. DNA barcoding represents a well-proven molecular tool to assess the authenticity of food items, although its diffusion is hampered by analytical constraints and timeframes that are often prohibitive for food market. To address such issues, we have introduced a new technology, named NanoTracer, which allows for rapid and naked-eye molecular traceability of any food, employing limited instrumentation and cost-effective reagents. Moreover, unlike sequencing, this method allows to identify not only the substitution of a fine ingredient, but also its dilution with cheaper ones.

In this study, we used several molecular techniques to develop a fast and reliable protocol (DNA Verity Test, DVT) for the characterization and confirmation of the species or taxa present in herbal infusions. As a model plant for this protocol, Camellia sinensis, a traditional tea plant, was selected due to the following reasons: its historical popularity as a (healthy) beverage, its high selling value, the importation of barely recognizable raw product (i.e., crushed), and the scarcity of studies concerning adulterants or contamination. The DNA Verity Test includes both the sequencing of DNA barcoding markers and genotyping of labeled-PCR DNA barcoding fragments for each sample analyzed. This protocol (DVT) was successively applied to verify the authenticity of 32 commercial teas (simple or admixture), and the main results can be summarized as follows: (1) the DVT protocol is suitable to detect adulteration in tea matrices (contaminations or absence of certified ingredients), and the method can be exported for the study of other similar systems; (2) based on the BLAST analysis of the sequences of rbcL+matK±rps7-trnV(GAC) chloroplast markers, C. sinensis can be taxonomically characterized; (3) rps7-trnV(GAC) can be employed to discriminate C. sinensis from C. pubicosta; (4) ITS2 is not an ideal DNA barcode for tea samples, reflecting potential incomplete lineage sorting and hybridization/introgression phenomena in C. sinensis taxa; (5) the genotyping approach is an easy, inexpensive and rapid pre-screening method to detect anomalies in the tea templates using the trnH(GUG)-psbA barcoding marker; (6) two herbal companies provided no authentic products with a contaminant or without some of the listed ingredients; and (7) the leaf matrices present in some teabags could be constituted using an admixture of different C. sinensis haplotypes and/or allied species (C. pubicosta).

A large-scale comprehensive reference library of DNA barcodes for European marine fishes was assembled, allowing the evaluation of taxonomic uncertainties and species genetic diversity that were otherwise hidden in geographically restricted studies. A total of 4118 DNA barcodes were assigned to 358 species generating 366 Barcode Index Numbers (BIN). Initial examination revealed as much as 141 BIN discordances (more than one species in each BIN). After implementing an auditing and five-grade (A-E) annotation protocol, the number of discordant species BINs was reduced to 44 (13% grade E), while concordant species BINs amounted to 271 (78% grades A and B) and 14 other had insufficient data (grade D). Fifteen species displayed comparatively high intraspecific divergences ranging from 2·6 to 18·5% (grade C), which is biologically paramount information to be considered in fish species monitoring and stock assessment. On balance, this compilation contributed to the detection of 59 European fish species probably in need of taxonomic clarification or re-evaluation. The generalized implementation of an auditing and annotation protocol for reference libraries of DNA barcodes is recommended.

Thursday, June 8, 2017

Kickstarter campaign

Save Coral Reefs: Costa Rica Calling

Have you ever heard of the Area de Conservacion Guanacaste (ACG)? As an ecologist or conservation biologist you might have, as a barcoder it is very likely you have. Home to 2.4% of the world’s terrestrial biodiversity, the ACG in Costa Rica is the only protected area in the Neotropics that sweeps from Pacific Ocean waters up over the volcanic mountain range of the continental divide and down into the lowlands of the Atlantic rain forest. In the mid-1980s, world renowned ecologist Dan Janzen and his wife and research partner, Winnie Hallwachs, had grown so alarmed at the rapid rate at which forests were disappearing in the region that they threw themselves into an immense conservation project, co-founded the ACG, and pretty much dedicated all their life to turn an area of 120,000 terrestrial hectares and 43,000 marine hectares into permanently conserved, government-owned, and managed wildlands. The ACG has also been the stage for a lot of groundbreaking work on DNA barcoding starting with the famous case of Astraptes fulgerator which was followed by numerous studies focusing on different aspects of the region and its biodiversity. Dan and Winnie are likely the strongest supporters of DNA barcoding outside our institute and a lot of the success of the past decade can be attributed to their tireless efforts to spread the word and convince colleagues of the value of DNA-based biodiversity research. When I was asked to spread the word about a Kickstarter campaign to support a new research project on corals in the ACG's marine sector there was no hesitation to provide some modest help through posting my blog: