Friday, May 15, 2020

Weekend reads - Week 20/2020

Here in Canada we are having a long weekend which means for some there is even more time to read. No worries, I won't add more than usual to this blog post although there have been quite a few new papers that were published over the past two weeks. Here we go: 

A clear insight into the large-scale community structure of planktonic copepods is critical to understanding the mechanisms controlling diversity and biogeography of marine taxa in terms of their high abundance, ubiquity, and sensitivity to environmental changes. Here, we applied a 28S metabarcoding approach to large-scale communities of epipelagic and mesopelagic copepods at 70 stations across the Pacific Ocean and three stations in the Arctic Ocean. Major patterns of community structure and diversity, influenced by water mass structures, agreed with results from previous morphology-based studies. However, a large-scale metabarcoding approach could detect community changes even under stable environmental conditions, including changes in the north/south subtropical gyres and east/west areas within each subtropical gyre. There were strong effects of the epipelagic environment on mesopelagic communities, and community subdivisions were observed in the environmentally stable mesopelagic layer. In each sampling station, higher operational taxonomic unit (OTU) numbers and lower phylogenetic diversity were observed in the mesopelagic layer than in the epipelagic layer, indicating a recent rapid increase in species numbers in the mesopelagic layer. The phylogenetic analysis utilizing representative sequences of OTUs revealed trends of recent emergence of cold-water OTUs, which are mainly distributed at high latitudes with low water temperatures. Conversely, the high diversity of copepods at low latitudes was suggested to have been formed through long evolution under high water temperature conditions. The metabarcoding results suggest that evolutionary processes have strong impacts on current patterns of copepod diversity, and support the "out of the tropics" theory explaining latitudinal diversity gradients of copepods. Diversity patterns in both epipelagic and mesopelagic copepods was highly correlated to sea surface temperature; thus, predicted global warming may have a significant impact on copepod diversity in both layers.

Biological conclusions based on DNA barcoding and metabarcoding analyses can be strongly influenced by the methods utilized for data generation and curation, leading to varying levels of success in the separation of biological variation from experimental error. The 5' region of cytochrome c oxidase subunit I (COI-5P) is the most common barcode gene for animals, with conserved structure and function that allows for biologically informed error identification. Here, we present coil ( ), an R package for the pre-processing and frameshift error assessment of COI-5P animal barcode and metabarcode sequence data. The package contains functions for placement of barcodes into a common reading frame, accurate translation of sequences to amino acids, and highlighting insertion and deletion errors. The analysis of 10 000 barcode sequences of varying quality demonstrated how coil can place barcode sequences in reading frame and distinguish sequences containing indel errors from error-free sequences with greater than 97.5% accuracy. Package limitations were tested through the analysis of COI-5P sequences from the plant and fungal kingdoms as well as the analysis of potential contaminants: nuclear mitochondrial pseudogenes and Wolbachia COI-5P sequences. Results demonstrated that coil is a strong technical error identification method but is not reliable for detecting all biological contaminants.

The meiofauna is an important part of the marine ecosystem, but its composition and distribution patterns are relatively unexplored. Here we assessed the biodiversity and community structure of meiofauna from five locations on the Swedish western and southern coasts using a high-throughput DNA sequencing (metabarcoding) approach. The mitochondrial cytochrome oxidase 1 (COI) mini-barcode and nuclear 18S small ribosomal subunit (18S) V1-V2 region were amplified and sequenced using Illumina MiSeq technology. Our analyses revealed a higher number of species than previously found in other areas: thirteen samples comprising 6.5 dm3 sediment revealed 708 COI and 1,639 18S metazoan OTUs. Across all sites, the majority of the metazoan biodiversity was assigned to Arthropoda, Nematoda and Platyhelminthes. Alpha and beta diversity measurements showed that community composition differed significantly amongst sites. OTUs initially assigned to Acoela, Gastrotricha and the two Platyhelminthes sub-groups Macrostomorpha and Rhabdocoela were further investigated and assigned to species using a phylogeny-based taxonomy approach. Our results demonstrate that there is great potential for discovery of new meiofauna species even in some of the most extensively studied locations.

The complexity and natural variability of ecosystems present a challenge for reliable detection of change due to anthropogenic influences. This issue is exacerbated by necessary trade-offs that reduce the quality and resolution of survey data for assessments at large scales. The Peace–Athabasca Delta (PAD) is a large inland wetland complex in northern Alberta, Canada. Despite its geographic isolation, the PAD is threatened by encroachment of oil sands mining in the Athabasca watershed and hydroelectric dams in the Peace watershed. Methods capable of reliably detecting changes in ecosystem health are needed to evaluate and manage risks. Between 2011 and 2016, aquatic macroinvertebrates were sampled across a gradient of wetland flood frequency, applying both microscope-based morphological identification and DNA metabarcoding. By using multispecies occupancy models, we demonstrate that DNA metabarcoding detected a much broader range of taxa and more taxa per sample compared to traditional morphological identification and was essential to identifying significant responses to flood and thermal regimes. We show that family-level occupancy masks high variation among genera and quantify the bias of barcoding primers on the probability of detection in a natural community. Interestingly, patterns of community assembly were nearly random, suggesting a strong role of stochasticity in the dynamics of the metacommunity. This variability seriously compromises effective monitoring at local scales but also reflects resilience to hydrological and thermal variability. Nevertheless, simulations showed the greater efficiency of metabarcoding, particularly at a finer taxonomic resolution, provided the statistical power needed to detect change at the landscape scale.

Better knowledge of food webs and related ecological processes is fundamental to understanding the functional role of biodiversity in ecosystems. This is particularly true for pest regulation by natural enemies in agroecosystems. However, it is generally difficult to decipher the impact of predators, as they often leave no direct evidence of their activity. Metabarcoding via high-throughput sequencing (HTS) offers new opportunities for unraveling trophic linkages between generalist predators and their prey, and ultimately identifying key ecological drivers of natural pest regulation. Here, this approach proved effective in deciphering the diet composition of key predatory arthropods (nine species.; 27 prey taxa), insectivorous birds (one species, 13 prey taxa) and bats (one species; 103 prey taxa) sampled in a millet-based agroecosystem in Senegal. Such information makes it possible to identify the diet breadth and preferences of predators (e.g., mainly moths for bats), to design a qualitative trophic network, and to identify patterns of intraguild predation across arthropod predators, insectivorous vertebrates and parasitoids. Appropriateness and limitations of the proposed molecular-based approach for assessing the diet of crop pest predators and trophic linkages are discussed.


Increasing evidence for global insect declines is prompting a renewed interest in the survey of whole insect communities. DNA metabarcoding can contribute to assessing diverse insect communities over a range of spatial and temporal scales, but efforts are still needed to optimise and standardise procedures, from field sampling, through laboratory analysis, to bioinformatic processing.
Here we describe and test a methodological pipeline for surveying nocturnal flying insects, combining a customised automatic light trap and DNA metabarcoding. We optimised laboratory procedures and then tested the methodological pipeline using 12 field samples collected in northern Portugal in 2017. We focused on Lepidoptera to compare metabarcoding results with those from morphological identification, using three types of bulks produced from each sample (individuals, legs and the unsorted mixture).
The customised trap was highly efficient at collecting nocturnal flying insects, allowing a small team to operate several traps per night, and a fast field processing of samples for subsequent metabarcoding with low contamination risks. Morphological processing yielded 871 identifiable individuals of 102 Lepidoptera species. Metabarcoding detected a total of 528 taxa, most of which were Lepidoptera (31.1%), Diptera (26.1%) and Coleoptera (14.7%). There was a reasonably high matching in community composition between morphology and metabarcoding when considering the ‘individuals’ and ‘legs’ bulk samples, with few errors mostly associated with morphological misidentification of small microlepidoptera. Regarding the ‘mixture’ bulk sample, metabarcoding identified nearly four times more Lepidoptera species than morphological examination.
Our study provides a methodological metabarcoding pipeline that can be used in standardised surveys of nocturnal flying insects, showing that it can overcome limitations and potential shortcomings of traditional methods based on morphological identification. Our approach efficiently collects highly diverse taxonomic groups such as nocturnal Lepidoptera that are poorly represented when using Malaise traps and other widely used field methods. To enhance the potential of this pipeline in ecological studies, efforts are needed to test its effectiveness and potential biases across habitat types and to extend the DNA barcode databases for important groups such as Diptera.

Modern ecosystem models have the potential to greatly enhance our capacity to predict community responses to change, but they demand comprehensive spatial distribution information, creating the need for new approaches to gather and synthesize biodiversity data. Metabarcoding or metagenomics can generate comprehensive biodiversity data sets at species-level resolution but they are limited to point samples. CommDivMap contains a number of functions that can be used to turn OTU tables resulting from metabarcoding runs of bulk samples into species richness maps. We tested the method on a series of arthropod bulk samples obtained from various experimental agricultural plots. The script runs smoothly and is reasonably fast. We hope that our assemble first, predict later approach to statistical modelling of species richness will set the stage for the transition from data-rich but finite sets of point samples to spatially continuous biodiversity maps.

The task of recognizing species names in scientific articles is a quintessential step for a large number of applications in high-throughput text mining and data analytics, such as species-specific information collection, construction of species food networks and trophic relationship extraction. These tasks become even more important in fast-paced species-discovery areas such as entomology, where an impressive number of new arthropod species are discovered each year. This article explores the use of twocharacter n-grams (bigrams) in machine learning models for arthropod species name recognition. This particular method has been previously applied successfully to the task of language identification but the application to species name identification had yet to be explored.
Arthropod species names, regular English words used in scientific publications and person names were collected from the public domain and bigrams were extracted and used as classifier features. A number of learning classifiers spanning 7 algorithmic categories (tree-based, rule-based, artificial neural network, Bayesian, boosting, lazy and kernel-based) were tested and the highest accuracies were consistently obtained with LIBLINEAR, Bayesian Logistic Regression, the Multilayer Perceptron, Random Forest, and the LIBSVM classifiers. When compared with dictionary-based external software tools such as GNRD and TaxonFinder, our top-3 classifiers were insensitive to words capitalization and were able to correctly classify novel species names that are absent in dictionary-based approaches with accuracies between 88.6% and 91.6%.
Our results suggest that character bigram-based classification is a suitable method for distinguishing arthropod species names from regular English words and person names commonly found in scientific literature. Moreover, our method can also be used to reduce the number of false positives produced by dictionary-based methods. 

Friday, May 1, 2020

Weekend reads -- Week 18/2020

Add caption
I am revitalizing an older tradition of this blog. A weekly (very subjective) collection of papers relating to DNA barcoding, metabarcoding and everything related:

Insects form an established part of the diet in many parts of the world and insect food products are emerging into the European and North American marketplaces. Consumer confidence in product is key in developing this market, and accurate labelling of content identity is an important component of this. We used DNA barcoding to assess the accuracy of insect food products sold in the UK. We purchased insects sold for human consumption from online retailers in the UK and compared the identity of the material ascertained from DNA barcoding to that stated on the product packaging. To this end, the COI sequence of mitochondrial DNA was amplified and sequenced, and compared the sequences produced to reference sequences in NCBI and the Barcode of Life Data System (BOLD). The barcode identity of all insects that were farmed was consistent with the packaging label. In contrast, disparity between barcode identity and package contents was revealed in two cases of foraged material (mopane worm and winged termites). One case of very broad family-level description was also highlighted, where material described as grasshopper was identified as Locusta migratoria from DNA barcode. Overall these data indicate the need to establish tight protocols to validate product identity in this developing market. Maintaining biosafety and consumer confidence rely on accurate and consistent product labelling that provides a clear chain of information from producer to consumer.

Walnut (Juglans regia L.) is one of the most widely cultivated nuts. Walnut milk beverage is very popular in China due to its nutritional value. However, adulterated walnut milk ingredients have been detected in the Chinese market. Peanut and soybean are sold at much lower prices than walnut and are reported to be commonly used for adulteration in the industrial chain of walnut milk production. The purpose of this study is therefore to develop an accurate and efficient method for detecting the authenticity of the raw materials used in walnut milk beverage. DNA barcoding and high‐resolution melting (HRM) analyses were used to identify common adulterated raw ingredients such as peanut and soybean in commercial walnut milk beverage samples. The chloroplast psbA‐trnH gene was used for sequencing, and HRM analysis was performed. We also prepared experimental mixtures, in the laboratory, with different quantities of walnut, peanut, and soybean. High‐resolution melting analysis of the experimental mixtures clearly distinguished all of them. The results revealed that most of the walnut milk beverage samples fell in the same cluster of walnut species. Several samples fell in the peanut cluster, confirming that they were adulterated products. The results revealed that HRM analysis based on the psbA‐trnH barcode sequence can be used to identify raw ingredients in walnut milk beverages. 

Accurate and cost-effective methods for tracking changes in arthropod communities are needed to develop integrative environmental monitoring programs in the Arctic. To date, even baseline data on their species composition at established ecological monitoring sites are severely lacking. We present the results of a pilot assessment of non-marine arthropod diversity in a middle arctic tundra area near Ikaluktutiak (Cambridge Bay), Victoria Island, Nunavut, undertaken in 2018 using DNA barcodes. A total of 1264 Barcode Index Number (BIN) clusters, used as a proxy for species, were recorded. The efficacy of widely used sampling methods was assessed. Yellow pan traps captured 62% of the entire BIN diversity at the study sites. When complemented with soil and leaf litter sifting, the coverage rose up to 74.6%. Combining community-based data collection with high-throughput DNA barcoding has the potential to overcome many of the logistic, financial, and taxonomic obstacles for large-scale monitoring of the Arctic arthropod fauna.

Improved taxonomic methods are needed to quantify declining populations of insect pollinators. This study devises a high‐throughput DNA barcoding protocol for a regional fauna (United Kingdom) of bees (Apiformes), consisting of reference library construction, a proof‐of‐concept monitoring scheme, and the deep barcoding of individuals to assess potential artefacts and organismal associations. A reference database of cytochrome oxidase c subunit 1 (cox1) sequences including 92.4% of 278 bee species known from the UK showed high congruence with morphological taxon concepts, but molecular species delimitations resulted in numerous split and (fewer) lumped entities within the Linnaean species. Double tagging permitted deep Illumina sequencing of 762 separate individuals of bees from a UK‐wide survey. Extracting the target barcode from the amplicon mix required a new protocol employing read abundance and phylogenetic position, which revealed 180 molecular entities of Apiformes identifiable to species. An additional 72 entities were ascribed to nuclear pseudogenes based on patterns of read abundance and phylogenetic relatedness to the reference set. Clustering of reads revealed a range of secondary operational taxonomic units (OTUs) in almost all samples, resulting from traces of insect species caught in the same traps, organisms associated with the insects including a known mite parasite of bees, and the common detection of human DNA, besides evidence for low‐level cross‐contamination in pan traps and laboratory procedures. Custom scripts were generated to conduct critical steps of the bioinformatics protocol. The resources built here will greatly aid DNA‐based monitoring to inform management and conservation policies for the protection of pollinators.

Freshwaters face some of the highest rates of species loss, caused by strong human impact. To decrease or even revert this strong impact, ecological restorations are increasingly applied to restore and maintain the natural ecological status of freshwaters. Their ecological status can be determined by assessing the presence of indicator species (e.g., certain fish species), which is called biomonitoring. However, traditional biomonitoring of fish, such as electrofishing, is often challenging and invasive. To augment traditional biomonitoring of fish, the analysis of environmental DNA (eDNA) has recently been proposed as an alternative, sensitive approach. The present study employed this modern approach to monitor the Rhine sculpin (Cottus rhenanus), a fish species that has been reintroduced into a recently restored stream within the Emscher catchment in Germany, in order to validate the success of the applied restorations and to monitor the species’ dispersal. We monitored the dispersal of the Rhine sculpin using replicated 12S end-point nested PCR eDNA surveillance at a fine spatial and temporal scale. In that way, we investigated if eDNA analysis can be applied for freshwater assessments. We also performed traditional electrofishing in one instance to validate our eDNA-based approach. We could track the dispersal of the Rhine sculpin and showed a higher dispersal potential of the species than we assumed. eDNA detection indicated the species’ dispersal across a potential dispersal barrier and showed a steep increase of positive detections once the reintroduced population had established. In contrast to that, false negative eDNA results occurred at early reintroduction stages. Our results show that eDNA detection can be used to confirm and monitor reintroductions and to contribute to the assessment and modeling of the ecological status of streams.

Environmental DNA (eDNA) is usually defined as genetic material obtained directly from environmental samples, such as soil, water, or ice. Coupled to DNA metabarcoding, eDNA is a powerful tool in biodiversity assessments. Results from eDNA approach provided valuable insights to the studies of past and contemporary biodiversity in terrestrial and aquatic environments. However, the state and fate of eDNA are still investigated and the knowledge about the form of eDNA (i.e., extracellular vs. intracellular) or the DNA degradation under different environmental conditions is limited. Here, we tackle this issue by analyzing foraminiferal sedimentary DNA (sedDNA) from different size fractions of marine sediments: >500 µm, 500-100 µm, 100-63 µm, and < 63 µm. Surface sediment samples were collected at 15 sampling stations located in the Svalbard archipelago. Sequences of the foraminifera-specific 37f region were generated using Illumina technology. The presented data may be used as a reference for a wide range of eDNA-based studies, including biomonitoring and biodiversity assessments across time and space.


Environmental DNA (eDNA) analysis utilises trace DNA released by organisms into their environment for species detection and is revolutionising non-invasive species monitoring. The use of this technology requires rigorous validation - from field sampling to interpretation of PCR-based results - for meaningful application and interpretation. Assays targeting eDNA released by individual species are typically validated with no predefined criteria to answer specific research questions in one ecosystem. Their general applicability, uncertainties and limitations often remain undetermined. The absence of clear guidelines prevents targeted eDNA assays from being incorporated into species monitoring and policy, thus their establishment will be key for the future implementation of eDNA-based surveys. We describe the measures and tests necessary for successful validation of targeted eDNA assays and the associated pitfalls to form the basis of guidelines. A list of 122 variables was compiled and consolidated into a scale to assess the validation status of individual assays. These variables were evaluated for 546 published single-species assays. The resulting dataset was used to provide an overview of current validation practices and test the applicability of the validation scale for future assay rating. The 122 variables representing assay validation status were classified into 14 thematic blocks, such as "in silico analysis", and arranged on a 5-level validation scale from "incomplete" to "operational". Additionally, minimum validation criteria were defined for each level. The majority (30%) of investigated assays were classified as Level 1 (incomplete), and 15% did not achieve this first level. These assays were characterised by minimal in silico and in vitro testing, but their share in annually published eDNA assays has declined since 2014. The total number of reported variables ranged from 20% to 76% and deviated both between and within levels. The meta-analysis demonstrates the suitability of the 5-level validation scale for assessing targeted eDNA assays. It is a user-friendly tool to evaluate previously published assays for future research and routine monitoring, while also enabling appropriate interpretation of results. Finally, it provides guidance on validation and reporting standards for newly developed assays.

We used two large-scale metabarcoding datasets to evaluate phylogenetic signals at global marine and regional terrestrial scales using co-occurrence and co-exclusion networks. Phylogenetic relatedness was estimated using either global pairwise sequence distance or phylogenetic distance and the significance of observed patterns relating networks and phylogenies were evaluated against two null models. In all datasets, we found that phylogenetically close OTUs significantly co-occurred more often, and OTUs with intermediate phylogenetic relatedness co-occurred less often, than expected by chance. Phylogenetically close OTUs co-excluded less often than expected by chance in the marine datasets only. Simultaneous excess of co-occurrences and co-exclusions were observed in the inversion zone between close and intermediate phylogenetic distance classes in marine surface. Similar patterns were observed by using either pairwise sequence or phylogenetic distances, and by using both null models. These results suggest that environmental filtering and dispersal limitation are the preponderant forces driving co-occurrence of protists in both environments, while signal of competitive exclusion was only detected in the marine surface environment. The discrepancy in the co-exclusion pattern is potentially linked to the individual environments: water bodies are more homogeneous while tropical forest soils contain a myriad of nutrient rich micro-environment reducing the strength of mutual exclusion.

The Bees@Schools Program

Some years after the last run of our successful School Malaise Trap Program we started thinking about new ways to involve citizen scientists at schools in our research. We pitched an idea to the Natural Sciences and Engineering Research Council of Canada (NSERC) and were granted some funds to set it up and start with a few runs.

The Bees@School project initially involved 100 school classrooms in discerning critical information on the changing geographic distributions of plant-pollinator interactions across Canadat. By combining state-of-the-art DNA barcoding of bees, and the pollen they carry, with distribution and climate change data, we are collecting data to show how distributions of Canada’s bee species are changing along with climate. The project will also help to determine how pollination services shift across Canada, with impacts on food production. The ultimate hope is to provide landscape management advice to improve vital species' chances of persisting in agricultural landscapes and alleviating pollination deficits. 

Each participating school receives a wild bee nest box (perhaps better known as bee hotel) in the spring that is installed throughout the summer. In the fall, nest boxes are sent back to our institute. Here, the contents of the nest boxes will be analyzed using DNA barcoding of the larvae we found and metabarcoding of the pollen that was provisioned for them. 

The project is run by one of my grad students, Sage Handler. She is doing pretty much everything from communication with schools and the public to the laboratory work and data analysis. As you can imagine, her planning for this year's run was thrown into chaos once the COVID-19 pandemic caused major lockdowns including the closure of schools here in Canada. However, we were able to shift gears and run the the program regardless. Thanks to the support of so many teachers 200 traps are currently deployed across Canada. You'll find them in teacher's backyards, school yards or public spaces and now we are developing material (videos, activities for kids at home etc.) to keep this as educational as possible amidst the school closures. This video is just an example on how kids can learn and interact with the program.