Biodiversity Science: Weekend reads

Here we go again, another week has passed quickly. Light on posting, mainly because I had some days off and no chance to do digging for blog posts. Nevertheless, here your weekly dose of interesting papers. Really good stuff.

Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species.

Genetic taxonomic assignment can be more sensitive than morphological taxonomic assignment, particularly for small, cryptic or rare species. Sequence processing is essential to taxonomic assignment, but can also produce errors because optimal parameters are not known a priori. Here, we explored how sequence processing parameters influence taxonomic assignment of 18S sequences from bulk zooplankton samples produced by 454 pyrosequencing. We optimized a sequence processing pipeline for two common research goals, estimation of species richness and early detection of aquatic invasive species (AIS), and then tested most optimal models' performances through simulations. We tested 1,050 parameter sets on 18S sequences from 20 AIS to determine optimal parameters for each research goal. We tested optimized pipelines' performances (detectability and sensitivity) by computationally inoculating sequences of 20 AIS into ten bulk zooplankton samples from ports across Canada. We found that optimal parameter selection generally depends on the research goal. However, regardless of research goal, we found that metazoan 18S sequences produced by 454 pyrosequencing should be trimmed to 375-400 bp and sequence quality filtering should be relaxed (1.5 ≤ maximum expected error ≤ 3.0, Phred score = 10). Clustering and denoising were only viable for estimating species richness, because these processing steps made some species undetectable at low sequence abundances which would not be useful for early detection of AIS. With parameter sets optimized for early detection of AIS, 90% of AIS were detected with fewer than 11 target sequences, regardless of whether clustering or denoising was used. Despite developments in next-generation sequencing, sequence processing remains an important issue owing to difficulties in balancing false-positive and false-negative errors in metabarcoding data.

Boosting DNA metabarcoding for biomonitoring with phylogenetic estimation of OTUs' ecological profiles.

DNA metabarcoding has been introduced as a revolutionary way to identify organisms and monitor ecosystems. However, the potential of this approach for biomonitoring remains partially unfulfilled because a significant part of the sampled DNA cannot be affiliated to species due to incomplete reference libraries. Thus, biotic indices which are based on the estimated abundances of species in a community and their ecological profiles can be inaccurate. We propose to compute biotic indices using phylogenetic imputation of OTUs' ecological profiles (OTU-PITI approach). Firstly, OTUs sequences are inserted within a reference phylogeny. Secondly, OTUs' ecological profiles are estimated on the basis of their phylogenetic relationships with reference species whose ecology is known. Based on these ecological profiles, biotic indices can be computed using all available OTUs. Using freshwater diatoms as a case study, we show that short DNA barcodes can be placed accurately within a phylogeny and their ecological preferences estimated with a satisfactory level of precision. In light of these results, we tested the approach with a dataset of 139 environmental samples of benthic river diatoms for which the same biotic index (IPS) was calculated using (i) traditional microscopy, (ii) OTUs with taxonomic assignment approach, (iii) OTUs with phylogenetic estimation of ecological profiles (OTU-PITI), and (iv) OTU with taxonomic assignment completed by the phylogenetic approach (OTU-PITI) for unclassified OTUs. Using traditional microscopy as a reference, we found that the combination of the OTUs' taxonomic assignment completed by the phylogenetic method performed satisfactorily and substantially better than the other methods tested.

Can non-destructive DNA extraction of bulk invertebrate samples be used for metabarcoding?

BACKGROUND: High throughput DNA sequencing of bulk invertebrate samples or metabarcoding is becoming increasingly used to provide profiles of biological communities for environmental monitoring. As metabarcoding becomes more widely applied, new reference DNA barcodes linked to individual specimens identified by taxonomists are needed. This can be achieved through using DNA extraction methods that are not only suitable for metabarcoding but also for building reference DNA barcode libraries.

METHODS: In this study, we test the suitability of a rapid non-destructive DNA extraction method for metabarcoding of freshwater invertebrate samples.

RESULTS: This method resulted in detection of taxa from many taxonomic groups, comparable to results obtained with two other tissue-based extraction methods. Most taxa could also be successfully used for subsequent individual-based DNA barcoding and taxonomic identification. The method was successfully applied to field-collected invertebrate samples stored for taxonomic studies in 70% ethanol at room temperature, a commonly used storage method for freshwater samples.

DISCUSSION: With further refinement and testing, non-destructive extraction has the potential to rapidly characterise species biodiversity in invertebrate samples, while preserving specimens for taxonomic investigation.

Ubiquitous abundance distribution of non-dominant plankton across the global ocean.

Marine plankton populate 70% of Earth's surface, providing the energy that fuels ocean food webs and contributing to global biogeochemical cycles. Plankton communities are extremely diverse and geographically variable, and are overwhelmingly composed of low-abundance species. The role of this rare biosphere and its ecological underpinnings are however still unclear. Here, we analyse the extensive dataset generated by the Tara Oceans expedition for marine microbial eukaryotes (protists) and use an adaptive algorithm to explore how metabarcoding-based abundance distributions vary across plankton communities in the global ocean. We show that the decay in abundance of non-dominant operational taxonomic units, which comprise over 99% of local richness, is commonly governed by a power-law. Despite the high spatial turnover in species composition, the power-law exponent varies by less than 10% across locations and shows no biogeographical signature, but is weakly modulated by cell size. Such striking regularity suggests that the assembly of plankton communities in the dynamic and highly variable ocean environment is governed by large-scale ubiquitous processes. Understanding their origin and impact on plankton ecology will be important for evaluating the resilience of marine biodiversity in a changing ocean.

Metaxa2 Database Builder: Enabling taxonomic identification from metagenomic or metabarcoding data using any genetic marker.

MOTIVATION: Correct taxonomic identification of DNA sequences is central to studies of biodiversity using both shotgun metagenomic and metabarcoding approaches. However, no genetic marker gives sufficient performance across all the biological kingdoms, hampering studies of taxonomic diversity in many groups of organisms. This has led to the adoption of a range of genetic markers for DNA metabarcoding. While many taxonomic classification software tools can be re-trained on these genetic markers, they are often designed with assumptions that impair their utility on genes other than the SSU and LSU rRNA. Here, we present an update to Metaxa2 that enables the use of any genetic marker for taxonomic classification of metagenome and amplicon sequence data.

RESULTS: We evaluated the Metaxa2 Database Builder on eleven commonly used barcoding regions and found that while there are wide differences in performance between different genetic markers, our software performs satisfactorily provided that the input taxonomy and sequence data are of high quality.

AVAILABILITY: Freely available on the web as part of the Metaxa2 package at http://microbiology.se/software/metaxa2/.

Genome skimming herbarium specimens for DNA barcoding and phylogenomics.

BACKGROUND: The world's herbaria contain millions of specimens, collected and named by thousands of researchers, over hundreds of years. However, this treasure has remained largely inaccessible to genetic studies, because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates.

RESULTS: As a practical test of routine recovery of rDNA and plastid genome sequences from herbarium specimens, we sequenced 25 herbarium specimens up to 80 years old from 16 different Angiosperm families. Paired-end reads were generated, yielding successful plastid genome assemblies for 23 species and nuclear rDNAs for 24 species, respectively. These data showed that genome skimming can be used to generate genomic information from herbarium specimens as old as 80 years and using as little as 500 pg of degraded starting DNA.

CONCLUSIONS: The routine plastome sequencing from herbarium specimens is feasible and cost-effective (compare with Sanger sequencing or plastome-enrichment approaches), and can be performed with limited sample destruction.

Over 2.5 million COI sequences in GenBank and growing (Preprint)

The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. Over 2.5 million COI sequences were found in GenBank, half of which were fully identified to the species rank. From 2003 to 2017, the number of COI Eukaryote records deposited has grown by two orders of magnitude representing a nearly 42-fold increase in unique species. For fully identified records, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing sequences. COI metabarcoders are normally considered consumers of taxonomic data. Here we discuss the potential for taxonomists to reverse this pattern and instead mine metabarcode data to guide species discovery. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.

Biodiversity Science

Friday, June 22, 2018

Weekend reads

No comments:

Post a Comment