Friday, June 29, 2018

Weekend reads

Long weekend for all the Canadians which means more time to read and certainly no post on Monday.

Knowledge of community structure within an ecosystem is essential when trying to understand the function and importance of the system and when making related management decisions. Within the larger ecosystem, microhabitats play an important role by providing inhabitants with a subset of available resources. On coral reefs, cryptobenthic fishes encompass many groups and make up an important proportion of the biodiversity. However, these fishes are relatively small, exhibit extreme visual or behavioral camouflage, and, therefore, are often overlooked. We examined the differences in fish community structure between three common reef microhabitats (live hard coral, dead coral rubble, and sand) using ichthyocide stations in the central Red Sea. Using a combination of morphological and genetic (cytochrome oxidase I (COI) barcoding) techniques, we identified 326 individuals representing 73 species spread across 17 families, from fifteen 1 m2 quadrats. Fish assemblages in the three microhabitats were significantly different from each other. Rubble microhabitats yielded the highest levels of fish abundance, richness, and diversity, followed by hard coral, and then sand. The results show that benthic composition, even at a small scale, influences cryptobenthic communities. This study also provides new COI sequence data to public databases, in order to further the research of cryptobenthic fishes in the Red Sea region.

The cytochrome c oxidase subunit I (cox1) gene is the main mitochondrial molecular marker playing a pivotal role in phylogenetic research and is a crucial barcode sequence. Folmer's "universal" primers designed to amplify this gene in metazoan invertebrates allowed quick and easy barcode and phylogenetic analysis. On the other hand, the increase in the number of studies on barcoding leads to more frequent publishing of incorrect sequences, due to amplification of non-target taxa, and insufficient analysis of the obtained sequences. Consequently, some sequences deposited in genetic databases are incorrectly described as obtained from invertebrates, while being in fact bacterial sequences. In our study, in which we used Folmer's primers to amplify COI sequences of the crustacean fairy shrimp Branchipus schaefferi (Fischer 1834), we also obtained COI sequences of microbial contaminants from Aeromonas sp. However, when we searched the GenBank database for sequences closely matching these contaminations we found entries described as representatives of Gastrotricha and Mollusca. When these entries were compared with other sequences bearing the same names in the database, the genetic distance between the incorrect and correct sequences amplified from the same species was c.a. 65%. Although the responsibility for the correct molecular identification of species rests on researchers, the errors found in already published sequences data have not been re-evaluated so far. On the basis of the standard sampling technique we have estimated with 95% probability that the chances of finding incorrectly described metazoan sequences in the GenBank depend on the systematic group, and variety from less than 1% (Mollusca and Arthropoda) up to 6.9% (Gastrotricha). Consequently, the increasing popularity of DNA barcoding and metabarcoding analysis may lead to overestimation of species diversity. Finally, the study also discusses the sources of the problems with amplification of non-target sequences.

DNA metabarcoding is increasingly used in dietary studies to estimate diversity, composition, and frequency of occurrence of prey items. However, few studies have assessed how technical and biological replication affect the accuracy of diet estimates. This study addresses these issues using the European free-tailed bat Tadarida teniotis, involving high-throughput sequencing of a small fragment of the COI gene in 15 separate faecal pellets and a 15-pellet pool per each of 20 bats. We investigated how diet descriptors were affected by variability among (i) individuals, (ii) pellets of each individual, and (iii) PCRs of each pellet. In addition, we investigated the impact of (iv) analysing separate pellets versus pellet pools. We found that diet diversity estimates increased steadily with the number of pellets analysed per individual, with seven pellets required to detect ~80% of prey species. Most variation in diet composition was associated with differences among individual bats, followed by pellets per individual, and PCRs per pellet. The accuracy of frequency of occurrence estimates increased with the number of pellets analysed per bat, with the highest error rates recorded for prey consumed infrequently by many individuals. Pools provided poor estimates of diet diversity and frequency of occurrence, which were comparable to analysing a single pellet per individual, and consistently missed the less common prey items. Overall, our results stress that maximizing biological replication is critical in dietary metabarcoding studies, and emphasize that analysing several samples per individual rather than pooled samples produce more accurate results.

DNA metabarcoding is a technique used to survey biodiversity in many ecological settings, but there are doubts about whether it can provide quantitative results, i.e. the proportions of each species in the mixture as opposed to a species list. While there are several experimental studies that report quantitative metabarcoding results, there are a similar number that fail to do so. Here we provide the rationale to understand under what circumstances the technique can be quantitative. Basically, we simulate a mixture of DNA of S species with a defined initial abundance distribution. In the simulated PCR, each species increases its concentration following a certain amplification efficiency. The final DNA concentration will reflect the initial one when the efficiency is similar for all species; otherwise, the initial and final DNA concentrations would be poorly related. Although there are many known factors that modulate amplification efficiency, we focused on the number of primer-template mismatches, arguably the most important one. We used 15 common primers pairs targeting the mitochondrial COI region and the mitogenomes of ca. 1200 insect species. The results showed that some primers pairs produced quantitative results under most circumstances, whereas some other primers failed to do so. Many species, and a high diversity within the mixture, helped the metabarcoding to be quantitative. In conclusion, depending on the primer pair used in the PCR amplification and on the characteristics of the mixture analysed (i.e., high species richness, low evenness), DNA metabarcoding can provide a quantitative estimate of the relative abundances of different species.

Marine meiofauna comprises up to 22 phyla. Its morphological identification requires time and taxonomists' expertise, and molecular tools can make this task faster. We aim to disentangle meiofaunal diversity patterns at Araçá Bay by applying a model selection approach and estimating the effectiveness of metabarcoding (18S rDNA) and morphological methods for estimating the response of meiofauna diversity in small-scale interactions with environmental variables. A rarefaction curve indicated that ten samples were sufficient for estimating the total number of meiofauna OTUs in a tidal flat. In both approaches, richness was predicted by mean sand percentage, sediment sorting, and bacteria concentration. Nematode genera composition differed significantly between approaches, the result of taxonomic mismatch in the genetic database. The similarity between the model selected for diversity descriptors, the richness of nematode genera and meiofauna composition emphasized the utility of predictive models for metabarcoding estimates to detect small-scale interactions of these organisms.

Background: In light of the current biodiversity crisis, DNA barcoding is developing into an essential tool to quantify state shifts in global ecosystems. Current barcoding protocols often rely on short amplicon sequences, which yield accurate identification of biological entities in a community, but provide limited phylogenetic resolution across broad taxonomic scales. However, the phylogenetic structure of communities is an essential component of biodiversity. Consequently, a barcoding approach is required that unites robust taxonomic assignment power and high phylogenetic utility. A possible solution is offered by sequencing long ribosomal DNA (rDNA) amplicons on the MinION platform (Oxford Nanopore Technologies). Results: Using a dataset of various animal and plant species, with a focus on arthropods, we assemble a pipeline for long rDNA barcode analysis and introduce a new software (MiniBar) to demultiplex dual indexed nanopore reads. We find excellent phylogenetic and taxonomic resolution offered by long rDNA sequences across broad taxonomic scales. We highlight the simplicity of our approach by field barcoding with a miniaturized, mobile laboratory in a remote rainforest. We also test the utility of long rDNA amplicons for analysis of community diversity through metabarcoding and find that they recover highly skewed diversity estimates. Conclusions: Sequencing dual indexed, long rDNA amplicons on the MinION platform is a straightforward, cost effective, portable and universal approach for eukaryote DNA barcoding. Long rDNA amplicons scale up DNA barcoding by enabling the accurate recovery of taxonomic and phylogenetic diversity. However, bulk community analyses using long-read approaches may introduce biases and will require further exploration.

Background. Knowledge on the globally outstanding Amazonian biodiversity and its environmental determinants stems almost exclusively from aboveground organisms, notably plants. In contrast, the environmental factors and habitat preferences that drive diversity patterns for micro-organisms in the ground remain elusive, despite the fact that micro-organisms constitute the overwhelming majority of life forms in any given location, in terms of both diversity and abundance. Here we address how the diversity and community turnover of operational taxonomic units (OTU) of micro-organisms in soil and litter respond to soil physicochemical properties; whether OTU diversities and community composition in soil and litter are correlated with each other; and whether they respond in a similar way to soil properties. Methods. We used recently inferred OTUs from high-throughput metabarcoding of the 16S (prokaryotes) and 18S (eukaryotes) genes to estimate OTU diversity (OTU richness and effective number of OTUs) and community composition for prokaryotes and eukaryotes in soil and litter across four localities in Brazilian Amazonia. All analyses were run separately for prokaryote and eukaryote OTUs, and for each group using both presence-absence and abundance data. Combining these with novel data on soil chemical and physical properties, we identify abiotic correlates of soil and litter micro-organism diversity and community structure using regression, ordination, and variance partitioning analysis. Results. Soil organic carbon content was the strongest factor explaining OTU diversity (negative correlation) and pH was the strongest factor explaining turnover for prokaryotes and eukaryotes in both soil and litter. We found significant effects also for other soil variables, including both chemical and physical properties. The correlation between OTU diversity in litter and in soil was non-significant for eukaryotes and weak for prokaryotes, suggesting that diversity of in one substrate should not be used as a proxy for diversity in the other. The community compositions of both prokaryotes and eukaryotes were more separated for habitat type than for substrate (soil and litter). Discussion. In spite of the limited sampling (four localities, 39 plots), our results provide a broad-scale view of the physical and chemical correlations of soil and litter biodiversity in a longitudinal transect across the world’s largest rainforest. Our methods help to understand links between soil properties, OTU diversity patterns, and community composition and turnover. The lack of strong correlation between OTU diversity in litter and in soil suggests independence of diversity drives of these substrates and highlights the importance of including both measures in biodiversity assessments. Massive sequencing of soil and litter samples holds the potential to complement traditional biological inventories in advancing our understanding of the factors affecting tropical diversity.

Constructing networks has become an indispensable approach in understanding how different taxa interact. However, methodologies vary widely among studies, potentially limiting our ability to meaningfully compare results. In particular, how network architecture is influenced by the extent to which nodes are resolved to either taxa or taxonomic units is poorly understood. To address this, here we collate nine datasets of ecological interactions, from both observations and DNA metabarcoding, and construct networks under a range of commonly-used node resolutions. We demonstrate that small changes in node resolution can cause wide variation in almost all key metric values, including robustness and nestedness. Moreover, relative values of metrics such as robustness were seen to fluctuate continuously with node resolution, thereby potentially confounding comparisons of networks, as well as interpretations concerning their constituent ecological interactions. These findings highlight the need for care when comparing networks, especially where these differ with respect to node resolution.

Tuesday, June 26, 2018

Three opportunities to learn about metabarcoding

In case you are looking for ways to learn about metabarcoding, there are actually three different courses offered this year. All of them differ in approach, content focus, and venue but what they have in common is the fact that they provide participants with a comprehensive package that enables them make informed decisions when it comes to organizing experiments, field work and analytics.

This program will provide an overview of the state of current technology and the various platforms used. The course consists of a series of online lectures and research exercises introducing different aspects of metabarcoding and environmental DNA research. We will also touch on the suite of bioinformatics tools available for sequence analysis and data interpretation.

This course will focus on eDNA metabarcoding, however targeted single species detection and other alternatives will also be explored, as they can sometimes be suitable metabarcoding alternatives.

The lectures will cover different aspects of DNA metabarcoding. The bioinformatics practicals will introduce data analysis from raw sequences to basic ecological conclusions. The molecular ecology practical will present basic techniques for DNA extraction in the field and DNA amplification by PCR.

Monday, June 25, 2018

DNA barcoding for pollen forecasting

PollerGEN is a group of interdisciplinary researchers funded by NERC to understand grass pollen deposition. We aim to revolutionise the way that pollen is measured, model spatial and temporal deposition from different species of grass pollen and identify linkages to human health.

DNA barcoding of pollen is not a new invention. It is not easy either but has been shown to provide extremely valuable information, e.g. for understanding plant-pollinator interactionshoney bee foraging, or the characterization of honeybee pollen pellets. It should come to no surprise that researchers are also working on an application that intends to improve a forecasting system that has become more and more important for a large portion of the human population - pollen forecasting for hay fever and other allergic reactions.

At this point most forecasts are build using data from a network of pollen traps which operate throughout the main pollen seasons. These traps measure how many pollen grains are present on a daily basis and identifications of species are done using morphology-based methods. The latter is extremely challenging when it comes to species with very uniform appearance, e.g. grasses. However, the species identity often makes a big difference. It is fairly rare that somebody is allergic to all grass pollen but we are having difficulties to tell which pollen in the mix is the culprit.

PollerGen, a project run out of Bangor University wants to change this by using a DNA-based approach. 

The colleagues are now working on a way to detect airborne pollen from different species of allergenic grass. We’re also developing new pollen source maps, and modelling how pollen grains likely move across landscapes, as well as identifying which species are linked with the exacerbation of asthma and hay fever.

We’re going to be using a new UK plant DNA barcode library, as well as environmental genomic technologies to identify complex mixtures of tree and grass pollens from a molecular genetic perspective. By combining this information with detailed source maps and aerobiological modelling, we hope to redefine how pollen forecasts are measured and reported in the future.

We have just started the third year of pollen collection and hope to road test the combined forecasting methods over the next year. In the long run, our vision is to be able to provide specific pollen forecasts for grass, and unravel which species of grass pollen are most likely causing allergic responses. More broadly, we also want to provide information to healthcare professionals and charities, who can translate this information to help pollen allergy sufferers live healthier and more productive lives.

Pretty cool.

Friday, June 22, 2018

Weekend reads

Here we go again, another week has passed quickly. Light on posting, mainly because I had some days off and no chance to do digging for blog posts. Nevertheless, here your weekly dose of interesting papers. Really good stuff.

Genetic taxonomic assignment can be more sensitive than morphological taxonomic assignment, particularly for small, cryptic or rare species. Sequence processing is essential to taxonomic assignment, but can also produce errors because optimal parameters are not known a priori. Here, we explored how sequence processing parameters influence taxonomic assignment of 18S sequences from bulk zooplankton samples produced by 454 pyrosequencing. We optimized a sequence processing pipeline for two common research goals, estimation of species richness and early detection of aquatic invasive species (AIS), and then tested most optimal models' performances through simulations. We tested 1,050 parameter sets on 18S sequences from 20 AIS to determine optimal parameters for each research goal. We tested optimized pipelines' performances (detectability and sensitivity) by computationally inoculating sequences of 20 AIS into ten bulk zooplankton samples from ports across Canada. We found that optimal parameter selection generally depends on the research goal. However, regardless of research goal, we found that metazoan 18S sequences produced by 454 pyrosequencing should be trimmed to 375-400 bp and sequence quality filtering should be relaxed (1.5 ≤ maximum expected error ≤ 3.0, Phred score = 10). Clustering and denoising were only viable for estimating species richness, because these processing steps made some species undetectable at low sequence abundances which would not be useful for early detection of AIS. With parameter sets optimized for early detection of AIS, 90% of AIS were detected with fewer than 11 target sequences, regardless of whether clustering or denoising was used. Despite developments in next-generation sequencing, sequence processing remains an important issue owing to difficulties in balancing false-positive and false-negative errors in metabarcoding data.

DNA metabarcoding has been introduced as a revolutionary way to identify organisms and monitor ecosystems. However, the potential of this approach for biomonitoring remains partially unfulfilled because a significant part of the sampled DNA cannot be affiliated to species due to incomplete reference libraries. Thus, biotic indices which are based on the estimated abundances of species in a community and their ecological profiles can be inaccurate. We propose to compute biotic indices using phylogenetic imputation of OTUs' ecological profiles (OTU-PITI approach). Firstly, OTUs sequences are inserted within a reference phylogeny. Secondly, OTUs' ecological profiles are estimated on the basis of their phylogenetic relationships with reference species whose ecology is known. Based on these ecological profiles, biotic indices can be computed using all available OTUs. Using freshwater diatoms as a case study, we show that short DNA barcodes can be placed accurately within a phylogeny and their ecological preferences estimated with a satisfactory level of precision. In light of these results, we tested the approach with a dataset of 139 environmental samples of benthic river diatoms for which the same biotic index (IPS) was calculated using (i) traditional microscopy, (ii) OTUs with taxonomic assignment approach, (iii) OTUs with phylogenetic estimation of ecological profiles (OTU-PITI), and (iv) OTU with taxonomic assignment completed by the phylogenetic approach (OTU-PITI) for unclassified OTUs. Using traditional microscopy as a reference, we found that the combination of the OTUs' taxonomic assignment completed by the phylogenetic method performed satisfactorily and substantially better than the other methods tested.

BACKGROUND: High throughput DNA sequencing of bulk invertebrate samples or metabarcoding is becoming increasingly used to provide profiles of biological communities for environmental monitoring. As metabarcoding becomes more widely applied, new reference DNA barcodes linked to individual specimens identified by taxonomists are needed. This can be achieved through using DNA extraction methods that are not only suitable for metabarcoding but also for building reference DNA barcode libraries.
METHODS: In this study, we test the suitability of a rapid non-destructive DNA extraction method for metabarcoding of freshwater invertebrate samples.
RESULTS: This method resulted in detection of taxa from many taxonomic groups, comparable to results obtained with two other tissue-based extraction methods. Most taxa could also be successfully used for subsequent individual-based DNA barcoding and taxonomic identification. The method was successfully applied to field-collected invertebrate samples stored for taxonomic studies in 70% ethanol at room temperature, a commonly used storage method for freshwater samples.
DISCUSSION: With further refinement and testing, non-destructive extraction has the potential to rapidly characterise species biodiversity in invertebrate samples, while preserving specimens for taxonomic investigation.

Marine plankton populate 70% of Earth's surface, providing the energy that fuels ocean food webs and contributing to global biogeochemical cycles. Plankton communities are extremely diverse and geographically variable, and are overwhelmingly composed of low-abundance species. The role of this rare biosphere and its ecological underpinnings are however still unclear. Here, we analyse the extensive dataset generated by the Tara Oceans expedition for marine microbial eukaryotes (protists) and use an adaptive algorithm to explore how metabarcoding-based abundance distributions vary across plankton communities in the global ocean. We show that the decay in abundance of non-dominant operational taxonomic units, which comprise over 99% of local richness, is commonly governed by a power-law. Despite the high spatial turnover in species composition, the power-law exponent varies by less than 10% across locations and shows no biogeographical signature, but is weakly modulated by cell size. Such striking regularity suggests that the assembly of plankton communities in the dynamic and highly variable ocean environment is governed by large-scale ubiquitous processes. Understanding their origin and impact on plankton ecology will be important for evaluating the resilience of marine biodiversity in a changing ocean.

MOTIVATION: Correct taxonomic identification of DNA sequences is central to studies of biodiversity using both shotgun metagenomic and metabarcoding approaches. However, no genetic marker gives sufficient performance across all the biological kingdoms, hampering studies of taxonomic diversity in many groups of organisms. This has led to the adoption of a range of genetic markers for DNA metabarcoding. While many taxonomic classification software tools can be re-trained on these genetic markers, they are often designed with assumptions that impair their utility on genes other than the SSU and LSU rRNA. Here, we present an update to Metaxa2 that enables the use of any genetic marker for taxonomic classification of metagenome and amplicon sequence data.
RESULTS: We evaluated the Metaxa2 Database Builder on eleven commonly used barcoding regions and found that while there are wide differences in performance between different genetic markers, our software performs satisfactorily provided that the input taxonomy and sequence data are of high quality.
AVAILABILITY: Freely available on the web as part of the Metaxa2 package at

BACKGROUND: The world's herbaria contain millions of specimens, collected and named by thousands of researchers, over hundreds of years. However, this treasure has remained largely inaccessible to genetic studies, because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates.
RESULTS: As a practical test of routine recovery of rDNA and plastid genome sequences from herbarium specimens, we sequenced 25 herbarium specimens up to 80 years old from 16 different Angiosperm families. Paired-end reads were generated, yielding successful plastid genome assemblies for 23 species and nuclear rDNAs for 24 species, respectively. These data showed that genome skimming can be used to generate genomic information from herbarium specimens as old as 80 years and using as little as 500 pg of degraded starting DNA.
CONCLUSIONS: The routine plastome sequencing from herbarium specimens is feasible and cost-effective (compare with Sanger sequencing or plastome-enrichment approaches), and can be performed with limited sample destruction.

The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. Over 2.5 million COI sequences were found in GenBank, half of which were fully identified to the species rank. From 2003 to 2017, the number of COI Eukaryote records deposited has grown by two orders of magnitude representing a nearly 42-fold increase in unique species. For fully identified records, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing sequences. COI metabarcoders are normally considered consumers of taxonomic data. Here we discuss the potential for taxonomists to reverse this pattern and instead mine metabarcode data to guide species discovery. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.

Thursday, June 14, 2018


Ever seen anything in relation to the hashtag #BadStockPhotosOfMyJob? If not you should check out Twitter or search for it on Google because it really shows some ridiculously funny photos that exhibit some of the worst stereotypes people have when thinking about other's jobs. Especially the perception of what scientists do is almost tragic. I thought its a good idea to show a few examples including ironic comments by the real scientists. It's funny indeed but sometimes also just sad to see what others think we scientists do for a living.


  I have no words for those four.

Wednesday, June 13, 2018

Interview with a vampire

In this study, we show for the first time that it is possible to use DNA meta-barcoding to generate data on both diet and the predator's population structure. And we more or less get this additional information for free because the vampire bat's DNA is found in the DNA that we extract from blood meal and faecal samples

When the sun sets in South and Central America, the vampire bats wake up and fly out in search of prey. The vampire bat's diet consists of blood. It prefers to feed on domestic animals such as cows and pigs, but when it does so, there is a risk of transmitting pathogens such as rabies. In order to control rabies transmitted by vampire bats, it is crucial to have a method that allows large-scale assessment of vampire bat prey. A study published back in April led by researchers from Denmark and the UK, shows that metabarcoding can do just that.

The colleagues analysed vampire bat blood meal and faecal samples collected in Peru, along the coast, in the Andes and in the Amazon. In diet studies, the metabarcoding is normally only used to assess diet, but in this study, the researchers went one step further and gathered information on the vampire bat's population structure. The latter is an approach very similar to work my group has been doing in collaboration with researchers in Germany. This 'free of charge' data can help researchers understand how the landscape influences the connectivity of vampire bat populations, which could influence the spread of pathogens. 

We are slowly beginning to understand that all the metabarcoding data we generate to better understand community composition of a given environment contains several layers of information. It is perhaps much richer than an OTU table. That being said it is an entire different story on how to release let alone disentangle all that information.

It is great to gain insight into both predator and prey from DNA in droppings and blood meals. Apart from feeding on domestic animals, vampire bats occasionally took blood from wild tapirs, so the method may be useful for determining the distribution of elusive mammal prey. It is also of note that we found no evidence of vampire bats feeding on humans from the DNA left over from their dinners.

Tuesday, June 12, 2018

Citizen science vs giant slugs

Citizen science is a powerful tool to combat the challenges created by invasive species. Our study emphasizes the importance of collaborations between researchers, government administration, and citizen volunteers. 

The giant slug Limax maximus is an invasive species which made its way from northern Europe all the way to Japan and other regions of the world. It is a notorious pest of horticultural and agricultural crops. 

Recently a Japanese research team found that a certain set of weather conditions could be a reliable short-term indicator of how often giant slugs would appear on a set mountain path. The findings showed that the slugs were more likely to appear on days with higher humidity, lower windspeed and lower precipitation than the 20-year average. These observations can be used to predict future  outbreaks of the pest. 

This study was actually made possible by citizen science. In order to survey the number of slugs present on the mountain path chosen for the study (Mt. Maruyama route, in Sapporo, Japan), a volunteer naturalist hiked the path at 5:00 AM nearly every day for two years. The colleagues collected weather data obtained from a nearby meteorological station and combined them with observational data to calculate correlations between slug appearances and complex weather conditions.

Friday, June 8, 2018

Weekend readings

Need some readings for a sunny weekend? Not enough papers on the pile on your desk? Here is a solution for you. A couple of interesting journal articles I came across this week. Enjoy.

The genus Amara Bonelli, 1810 is a very speciose and taxonomically difficult genus of the Carabidae. The identification of many of the species is accomplished with considerable difficulty, in particular for females and immature stages. In this study the effectiveness of DNA barcoding, the most popular method for molecular species identification, was examined to discriminate various species of this genus from Central Europe. DNA barcodes from 690 individuals and 47 species were analysed, including sequences from previous studies and more than 350 newly generated DNA barcodes. Our analysis revealed unique BINs for 38 species (81%). Interspecific K2P distances below 2.2% were found for three species pairs and one species trio, including haplotype sharing between Amara alpina/Amara torrida and Amara communis/Amara convexior/Amara makolskii. This study represents another step in generating an extensive reference library of DNA barcodes for carabids, highly valuable bioindicators for characterizing disturbances in various habitats.

The correct identification of species in the highly divergent group of plants is crucial for several forensic investigations. Previous works had difficulties in the establishment of a rapid and robust method for the identification of plants. For instance, DNA barcoding requires the analysis of two or three different genomic regions to attain reasonable levels of discrimination. Therefore, new methods for the molecular identification of plants are clearly needed. Here we tested the utility of variable-length sequences in the chloroplast DNA (cpDNA) as a way to identify plant species. The SPInDel (Species Identification by Insertions/Deletions) approach targets hypervariable genomic regions that contain multiple insertions/deletions (indels) and length variability, which are found interspersed with highly conserved regions. The combination of fragment lengths defines a unique numeric profile for each species, allowing its identification. We analysed more than 44,000 sequences retrieved from public databases belonging to 206 different plant families. Four target regions were identified as suitable for the SPInDel concept: atpF-atpH, psbA-trnH, trnL CD and trnL GH. When considered alone, the discrimination power of each region was low, varying from 5.18% (trnL GH) to 42.54% (trnL CD). However, the discrimination power reached more than 90% when the length of some of these regions is combined. We also observed low diversity in intraspecific data sets for all target regions, suggesting they can be used for identification purposes. Our results demonstrate the utility of the SPInDel concept for the identification of plants.

Environmental DNA (eDNA) metabarcoding has been increasingly applied to biodiversity surveys in stream ecosystems. In stream networks, the accuracy of eDNA-based biodiversity assessment depends on whether the upstream eDNA influx affects downstream detection. Biodiversity assessment in low-discharge streams should be less influenced by eDNA transport than in high-discharge streams. We estimated α- and β-diversity of the fish community from eDNA samples collected in a small Michigan (USA) stream from its headwaters to its confluence with a larger river. We found that α-diversity increased from upstream to downstream and, as predicted, we found a significant positive correlation between β-diversity and physical distance (stream length) between locations indicating species turnover along the longitudinal stream gradient. Sample replicates and different genetic markers showed similar species composition, supporting the consistency of the eDNA metabarcoding approach to estimate α- and β-diversity of fishes in low-discharge streams.

The use of environmental DNA (eDNA) has become an applicable non-invasive tool with which to obtain information about biodiversity. A sub-discipline of eDNA is iDNA (invertebrate-derived DNA), where genetic material ingested by invertebrates is used to characterise the biodiversity of the species that served as hosts. While promising, these techniques are still in their infancy, as they have only been explored on limited numbers of samples from only a single or a few different locations. In this study, we investigate the suitability of iDNA extracted from more than 3,000 haematophagous terrestrial leeches as a tool for detecting a wide range of terrestrial vertebrates across five different geographical regions on three different continents. These regions cover almost the full geographical range of haematophagous terrestrial leeches, thus representing all parts of the world where this method might apply. We identify host taxa through metabarcoding coupled with high-throughput sequencing on Illumina and IonTorrent sequencing platforms to decrease economic costs and workload and thereby make the approach attractive for practitioners in conservation management. We identified hosts in four different taxonomic vertebrate classes: mammals, birds, reptiles, and amphibians, belonging to at least 42 different taxonomic families. We find that vertebrate blood ingested by haematophagous terrestrial leeches throughout their distribution is a viable source of DNA with which to examine a wide range of vertebrates. Thus, this study provides encouraging support for the potential of haematophagous terrestrial leeches as a tool for detecting and monitoring terrestrial vertebrate biodiversity.

Advances in DNA sequencing technology have revolutionised the field of molecular analysis of trophic interactions and it is now possible to recover counts of food DNA sequences from a wide range of dietary samples. But what do these counts mean? To obtain an accurate estimate of a consumer's diet should we work strictly with datasets summarising frequency of occurrence of different food taxa, or is it possible to use relative number of sequences? Both approaches are applied to obtain semi-quantitative diet summaries, but occurrence data is often promoted as a more conservative and reliable option due to taxa-specific biases in recovery of sequences. We explore representative dietary metabarcoding datasets and point out that diet summaries based on occurrence data often overestimate the importance of food consumed in small quantities (potentially including low-level contaminants) and are sensitive to the count threshold used to define an occurrence. Our simulations indicate that using relative read abundance (RRA) information often provide a more accurate view of population-level diet even with moderate recovery biases incorporated; however, RRA summaries are sensitive to recovery biases impacting common diet taxa. Both approaches are more accurate when the mean number of food taxa in samples is small. The ideas presented here highlight the need to consider all sources of bias and to justify the methods used to interpret count data in dietary metabarcoding studies. We encourage researchers to continue addressing methodological challenges, and acknowledge unanswered questions to help spur future investigations in this rapidly developing area of research.

DNA metabarcoding is a rapidly growing technique for obtaining detailed dietary information. Current metabarcoding methods for herbivory, using a single locus, can lack taxonomic resolution for some applications. We present novel primers for the second internal transcribed spacer of nuclear ribosomal DNA (ITS2) designed for dietary studies in Mauritius and the UK, which have the potential to give unrivalled taxonomic coverage and resolution from a short-amplicon barcode. In silico testing used three databases of plant ITS2 sequences from UK and Mauritian floras (native and introduced) totalling 6561 sequences from 1790 species across 174 families. Our primers were well-matched in silico to 88% of species, providing taxonomic resolution of 86.1%, 99.4% and 99.9% at the species, genus and family levels, respectively. In vitro, the primers amplified 99% of Mauritian (n = 169) and 100% of UK (n = 33) species, and co-amplified multiple plant species from degraded faecal DNA from reptiles and birds in two case studies. For the ITS2 region, we advocate taxonomic assignment based on best sequence match instead of a clustering approach. With short amplicons of 187-387 bp, these primers are suitable for metabarcoding plant DNA from faecal samples, across a broad geographic range, whilst delivering unparalleled taxonomic resolution.

The implementation of HTS (high-throughput sequencing) approaches is rapidly changing our understanding of the lichen symbiosis, by uncovering high bacterial and fungal diversity, which is often host-specific. Recently, HTS methods revealed the presence of multiple photobionts inside a single thallus in several lichen species. This differs from Sanger technology, which typically yields a single, unambiguous algal sequence per individual. Here we compared HTS and Sanger methods for estimating the diversity of green algal symbionts within lichen thalli using 240 lichen individuals belonging to two species of lichen-forming fungi. According to HTS data, Sanger technology consistently yielded the most abundant photobiont sequence in the sample. However, if the second most abundant photobiont exceeded 30% of the total HTS reads in a sample, Sanger sequencing generally failed. Our results suggest that most lichen individuals in the two analyzed species, Lasallia hispanica and L. pustulata, indeed contain a single, predominant green algal photobiont. We conclude that Sanger sequencing is a valid approach to detect the dominant photobionts in lichen individuals and populations. We discuss which research areas in lichen ecology and evolution will continue to benefit from Sanger sequencing, and which areas will profit from HTS approaches to assessing symbiont diversity.

Thursday, June 7, 2018

Who owns ocean biodiversity?

Within national jurisdiction, the Nagoya Protocol protects countries from exploitative bioprospecting, and is meant to foster greater equity. But there's a huge missing piece, because two-thirds of the ocean exists beyond national jurisdiction. That's roughly half the Earth's surface with no regulations on accessing or using genetic resources.

Marine organisms have evolved to thrive in various ocean environments, resulting in unique adaptations that make them the object of commercial interest, particularly for biomedical and industrial applications. Researchers from the Stockholm Resilience Centre and University of British Columbia have now identified 862 marine species, with a total of 12,998 genetic sequences that associated with a patent. They found that a single transnational corporation (BASF, the world's largest chemical manufacturer) has registered 47% of these sequences. Public and private universities accounted for another 12%, while entities such as governmental bodies, individuals, hospitals, and nonprofit research institutes registered the remaining 4%. Overall, entities located in only 10 countries accounted for 98% of the patents. 

A considerable portion of all patent sequences (11%) are derived from species associated with deep sea and hydrothermal vent ecosystems (91 species, 1650 sequences), many of which are found in unregulated areas beyond national jurisdiction.

Establishing a legal framework for marine genetic resources will be a core issue when international negotiations on a new UN treaty on the conservation and sustainable use of biodiversity in areas beyond national jurisdiction (BBNJ) begin in earnest in September 2018. By 2025, the global market for marine biotechnology is expected to reach $6.4 billion and span a broad range of commercial purposes for pharmaceutical, biofuel, and chemical industries. It is clear that these industry leaders must be involved in the upcoming BBNJ treaty negotiations, if only by virtue of their ownership of such a large share of the marine genetic sequence patents.

Wednesday, June 6, 2018

Deep learning to identify and count wild animals

This technology lets us accurately, unobtrusively and inexpensively collect wildlife data, which could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology and animal behavior into 'big data' sciences. This will dramatically improve our ability to both study and conserve wildlife and precious ecosystems.

Motion sensor camera trap' unobtrusively take pictures of animals in their natural environment, oftentimes yielding images not otherwise observable. The information in these photographs is only useful once it has been converted into numerical data. For years, the best method for extracting such information was to involve crowdsourced teams of human volunteers to label each image manually.

A team of researchers form the US and the UK has developed a system to automatically extract such information from images by using deep neural networks. The result is a system that can automate animal identification for up to 99.3 percent of images while still performing at the same 96.6 percent accuracy rate of crowdsourced teams of human volunteers. Deep neural networks are artificial neural networks with multiple hidden layers between the input and output layers. They require vast amounts of training data to work well, and the data must be accurately labeled (e.g., each image being correctly tagged with which species of animal is present, how many there are, etc.). For this study such data was available through Snapshot Serengeti, a citizen science project. Snapshot Serengeti has deployed a large number of camera traps in Tanzania that collect millions of images of animals in their natural habitat, such as lions, leopards, cheetahs and elephants. For this study 3.2 million labeled images tagged by more than 50,000 human volunteers over several years were used as training set.

Not only does the artificial intelligence system tell you which of 48 different species of animal is present, but it also tells you how many there are and what they are doing. It will tell you if they are eating, sleeping, if babies are present, etc. We estimate that the deep learning technology pipeline we describe would save more than eight years of human labeling effort for each additional 3 million images. That is a lot of valuable volunteer time that can be redeployed to help other projects.

Tuesday, June 5, 2018

What a few rabbits can do

Azorella selago
Understanding the full impact of an invasive species on an environment is very difficult as it involves many factors, one of which is generally a long timescale. A team of researchers from France, Italy and Norway has found a natural historical record of the impact of an invasive species of rabbit on a remote Indian Ocean island. They used an environment with few interacting variables and a natural historical record - DNA found in a lake bottom.

A type of rabbit was introduced to the Kerguelen Islands, situated in a remote southern part of the Indian Ocean. In 1874 a group of scientists that were studying the transit of Venus brought the animals with them as a food source and when they disembarked they left behind several rabbits that quickly multiplied because there were no natural predators. Since then, the rabbits spread across much of the main island of Grande Terre, wreaking havoc on a delicate ecosystem.

To learn more about the impact the rabbits had on the island, the colleagues collected samples from the bottom of a lake which contained samples of plant DNA. They found samples dating back several hundred years, and were able to reconstruct the events after the scientists left the island. The region had been relatively stable for hundreds of years prior to the arrival of the rabbits. Then, in the early 1940s, when the rabbits made their way to the part of the island were the lake is located, things changed. Prior to their arrival, the dominant plant was Azorella selago; after their arrival, plant diversity plummeted and Azorella selago disappeared quickly. They also noted that erosion dramatically increased, although it did eventually level off, but the ecosystem was left unstable, and remains until today in spite of efforts to eradicate the rabbits. Instead, as the result of increased human presence in the area, other invasive species have made their way to the islands. 

Monday, June 4, 2018

1000 posts

Wow - who would have thought back in 2012 that I will ever reach such a high number of posts. 

The prouder I am to have reached this milestone. The blog is still alive and kicking and I have all intentions to keep it that way. 

A big shoutout to all my readers. Without them there would be no blog. Thank you!

Off to the next thousand.

Measuring plant diversity using spectral imaging

We have known for decades that the chemical composition of plants can be estimated from reflectance spectra. What we found is that the spectral dissimilarity, or the overall differences in spectral reflectance, among plant species increases with their functional dissimilarity and evolutionary divergence time.

The value of ecological biodiversity for maintaining ecosystem stability and function is well established, but how do we measure it at larger scales. We need novel approaches that are rapid, repeatable and scalable in particular in ecosystems for which information about species identity and the number of species is difficult to acquire.

A group of US researchers is proposing to measure plant diversity using spectral data in an attempt to  improve efforts to predict how well ecosystems function. The colleagues measured the light reflectance of plants in 35 plots at a field station north of Minneapolis famous for long-term ecological experiments by using a field spectrometer. The spectrometer allows the researchers to evaluate how much light plants reflect at the leaf level across a range of wavelengths. By taking the leaf-level data the team found that the spectral diversity of a plant community predicted aboveground productivity, a critical ecosystem function, to a similar or higher degree than measures of species functional differences, their phylogenetic distances or species richness in a plant community.

Seeing that the ecosystem effect of plant diversity can be effectively evaluated using spectrometry, the team also wanted to know if their method could scale. They used an imaging spectrometer mounted three meters above ground at the same 35 plots at Cedar Creek. Their scans showed that the spectral diversity metric performed similarly when calculated from such spectral images.

The findings indicate that spectral diversity provides a powerful, integrative method of assessing several dimensions of biodiversity relevant to ecosystem function. The rapid changes in the Earth's biodiversity that are underway require novel means of continuous and global detection. This study demonstrates that we can detect plant biodiversity using spectral measurements from plant leaves or from the sky, which opens a whole new range of possibilities.

I guess all that needs to be shown is how well it really scales when it comes to remote sensing technology but this is really promising especially when taking into account the breadth of additional information the colleagues were able to obtain.

Friday, June 1, 2018

Weekend reads

This week a hopefully eclectic collection of reads. I  also hope I posted something for everyone.

Microeukaryotic plankton (0.2-200 μm) are critical components of aquatic ecosystems and key players in global ecological processes. High-throughput sequencing is currently revolutionizing their study on an unprecedented scale. However, it is currently unclear whether we can accurately, effectively and quantitatively depict the microeukaryotic plankton communities using traditional size-fractionated filtering combined with molecular methods. To address this, we analysed the eukaryotic plankton communities both with, and without, prefiltering with a 200 μm pore-size sieve -by using SSU rDNA-based high-throughput sequencing on 16 samples with three replicates in each sample from two subtropical reservoirs sampled from January to October in 2013. We found that ~25% reads were classified as metazoan in both size groups. The species richness, alpha and beta diversity of plankton community and relative abundance of reads in 99.2% eukaryotic OTUs showed no significant changes after prefiltering with a 200 μm pore-size sieve. We further found that both >0.2 μm and 0.2-200 μm eukaryotic plankton communities, especially the abundant plankton subcommunities, exhibited very similar, and synchronous, spatiotemporal patterns and processes associated with almost identical environmental drivers. The lack of an effect on community structure from prefiltering suggests that environmental DNA from larger metazoa is introduced into the smaller size class. Therefore, size-fractionated filtering with 200 μm is insufficient to discriminate between the eukaryotic plankton size groups in metabarcoding approaches. Our results also highlight the importance of sequencing depth, and strict quality filtering of reads, when designing studies to characterize microeukaryotic plankton communities.

Understanding the geographical distribution and community composition of species is crucial to monitor species persistence and define effective conservation strategies. Environmental DNA (eDNA) has emerged as a powerful noninvasive tool for species detection. However, most eDNA survey methods have been developed and applied in temperate zones. We tested the feasibility of using eDNA to survey anurans in tropical streams in the Brazilian Atlantic forest and compared the results with short-term visual and audio surveys. We detected all nine species known to inhabit our focal streams with one single visit for eDNA sampling. We found a higher proportion of sequence reads and larger number of positive PCR replicates for more common species and for those with life cycles closely associated with the streams, factors that may contribute to increased release of DNA in the water. However, less common species were also detected in eDNA samples, demonstrating the detection power of this method. Filtering larger volumes of water resulted in a higher probability of detection. Our data also show it is important to sample multiple sites along streams, particularly for detection of target species with lower population densities. For the three focal species in our study, the eDNA metabarcoding method had a greater capacity of detection per sampling event than our rapid field surveys, and thus, has the potential to circumvent some of the challenges associated with traditional approaches. Our results underscore the utility of eDNA metabarcoding as an efficient method to survey anuran species in tropical streams of the highly biodiverse Brazilian Atlantic forest.

Next-generation deep amplicon sequencing, or metabarcoding, has revolutionized the study of microbial communities in humans, animals and the environment. However, such approaches have yet to be applied to parasitic helminth communities. We recently described the first example of such a method - nemabiome sequencing - based on deep-amplicon sequencing of internal transcribed spacer 2 (ITS-2) rDNA, and validated its ability to quantitatively assess the species composition of cattle gastro-intestinal nematode (GIN) communities. Here, we present the first application of this approach to explore GIN species diversity and the impact of anthelmintic drug treatments. First, we investigated GIN species diversity in cow-calf beef cattle herds in several different regions, using coproculture derived L3s. A screen of 50 Canadian beef herds revealed parasite species diversity to be low overall. The majority of parasite communities were comprised of just two species; Ostertagia ostertagi and Cooperia oncophora. Cooperia punctata was present at much lower levels overall, but nevertheless comprised a substantive part of the parasite community of several herds in eastern Canada. In contrast, nemabiome sequencing revealed higher GIN species diversity in beef calves sampled from central/south-eastern USA and Sao Paulo State, Brazil. In these regions C. punctata predominated in most herds with Haemonchus placei predominating in a few cases. Ostertagia ostertagi and C. oncophora were relatively minor species in these regions in contrast to the Canadian herds. We also examined the impact of routine macrocyclic lactone pour-on treatments on GIN communities in the Canadian beef herds. Low treatment effectiveness was observed in many cases, and nemabiome sequencing revealed an overall increase in the proportion of Cooperia spp. relative to O. ostertagi post-treatment. This work demonstrates the power of nemabiome metabarcoding to provide a detailed picture of GIN parasite community structure in large sample sets and illustrates its potential use in research, diagnostics and surveillance.

DNA metabarcoding is an increasingly popular method to characterize and quantify biodiversity in environmental samples. Metabarcoding approaches simultaneously amplify a short, variable genomic region, or "barcode," from a broad taxonomic group via the polymerase chain reaction (PCR), using universal primers that anneal to flanking conserved regions. Results of these experiments are reported as occurrence data, which provide a list of taxa amplified from the sample, or relative abundance data, which measure the relative contribution of each taxon to the overall composition of amplified product. The accuracy of both occurrence and relative abundance estimates can be affected by a variety of biological and technical biases. For example, taxa with larger biomass may be better represented in environmental samples than those with smaller biomass. Here, we explore how polymerase choice, a potential source of technical bias, might influence results in metabarcoding experiments. We compared potential biases of six commercially available polymerases using a combination of mixtures of amplifiable synthetic sequences and real sedimentary DNA extracts. We find that polymerase choice can affect both occurrence and relative abundance estimates and that the main source of this bias appears to be polymerase preference for sequences with specific GC contents. We further recommend an experimental approach for metabarcoding based on results of our synthetic experiments.

Molecular gut-content analysis has revolutionized the study of food webs and feeding interactions, allowing the detection of prey DNA within the gut of many organisms. However, successful prey detection is a challenging procedure in which many factors affect every step, starting from the DNA extraction process. Spiders are liquid feeders with branched gut diverticula extending into their legs and throughout the prosoma, thus digestion takes places in different parts of the body and simple gut dissection is not possible. In this study, we investigated differences in prey detectability in DNA extracts from different parts of the spider´s body: legs, prosoma and opisthosoma, using prey-specific PCR and metabarcoding approaches. We performed feeding trials with the woodlouse hunter spider Dysdera verneaui Simon, 1883 (Dysderidae) to estimate the time at which prey DNA is detectable within the predator after feeding. Although we found that all parts of the spider body are suitable for gut-content analysis when using prey-specific PCR approach, results based on metabarcoding suggested the opisthosoma is optimal for detection of predation in spiders because it contained the highest concentration of prey DNA for longer post feeding periods. Other spiders may show different results compared to D. verneaui, but given similarities in the physiology and digestion in different families, it is reasonable to assume this to be common across species and this approach having broad utility across spiders.

Tropical animals and plants are known to have high alpha diversity within forests, but low beta diversity between forests. By contrast, it is unknown if microbes inhabiting the same ecosystems exhibit similar biogeographic patterns. To evaluate the biogeographies of tropical protists, we used metabarcoding data of species sampled in the soils of three lowland Neotropical rainforests. Taxa-area and distance-decay relationships for three of the dominant protist taxa and their subtaxa were estimated at both the OTU- and phylogenetic-levels, with presence-absence and abundance based measures. These estimates were compared to null models. High local alpha and low regional beta diversity patterns were consistently found for both the parasitic Apicomplexa and the largely free-living Cercozoa and Ciliophora. Similar to animals and plants, the protists showed spatial structures between forests at the OTU- and phylogenetic-levels, and only at the phylogenetic level within forests. These results suggest that the biogeographies of macro- and micro-organismal eukaryotes in lowland Neotropical rainforests are partially structured by the same general processes. However, and unlike the animals and plants, the protist OTUs did not exhibit spatial structures within forests, which hinders our ability to estimate local and regional diversity of protists in tropical forests.

Maximizing the delivery of key ecosystem services such as biological control through the management of natural enemy communities is one of the major challenges for modern agriculture. The main obstacle lies in our yet limited capacity of identifying the factors that drive the dynamics of trophic interactions within multi-species assemblages. Invertebrate generalist predators like carabid beetles are known for their dynamic feeding behaviour. Yet, at what extent different carabid species contribute to the regulation of animal and plant pests within agroecosystems is currently unknown. Here, we developed a DNA metabarcoding approach for characterizing the full diet spectrum of a community of fourteen very common carabid species inhabiting an intensively managed Western-European agroecosystem. We then investigated how diet and biological control potential within the carabid community varies with the sampling field location and the crop type (wheat vs oilseed rape). DNA metabarcoding diet analysis allowed to detect a wide variety of animal and plant taxa from carabid gut contents thus confirming their generalist feeding behaviour. The most common prey categories detected were arachnids, insects, earthworms and several plant families potentially including many weed species. Our results also show that the field location and the crop type are much stronger determinants then the species regarding carabid dietary choice: significantly more trophic links involving dipteran prey were observed in wheat, whereas more collembolan and plant prey was consumed in oilseed rape by the same carabid community. We speculate that structural differences in the habitats provided by these two crop types drive differences in resource availability cascading up the trophic chain, and we assume that specific carabid taxa could hardly be used to infer levels of ecosystem services (biological control) or disservices (e.g. intraguild predation). However, as this is the first study to report the use of DNA metabarcoding diet analysis in predatory carabid beetles we urge caution over the interpretation of our results. For instance, overall detection rates were rather low (31% of the individuals analysed tested positive for at least one prey category) most likely due to the overwhelming amplification of the carabid host DNA. Therefore, we acknowledge that more studies are required in order to confirm our observations and conclude with few recommendations for further improvements of the community-level DNA metabarcoding analysis of carabid diet.