Biodiversity Science: April 2018

Lots of work and distractions keep me from blogging these days. Hope to get back to old routine in the coming weeks. Meanwhile, some more papers to read:

Performance of amplicon and shotgun sequencing for accurate biomass estimation in invertebrate community samples

New applications of DNA and RNA sequencing are expanding the field of biodiversity discovery and ecological monitoring, yet questions remain regarding precision and efficiency. Due to primer bias, the ability of metabarcoding to accurately depict biomass of different taxa from bulk communities remains unclear, while PCR-free whole mitochondrial genome (mitogenome) sequencing may provide a more reliable alternative. Here we used a set of documented mock communities comprising 13 species of freshwater macroinvertebrates of estimated individual biomass, to compare the detection efficiency of COI metabarcoding (3 different amplicons) and shotgun mitogenome sequencing. Additionally, we used individual COI barcoding and de novo mitochondrial genome sequencing, to provide reference sequences for OTU assignment and metagenome mapping (mitogenome-skimming) respectively. We found that even though both methods occasionally failed to recover very low abundance species, metabarcoding was less consistent, by failing to recover some species with higher abundances, probably due to primer bias. Shotgun sequencing results provided highly significant correlations between read number and biomass in all but one species. Conversely, the read-biomass relationships obtained from metabarcoding varied across amplicons. Specifically, we found significant relationships for 8 out of 13 (amplicons B1FR-450bp, FF130R-130bp) or 4 out of 13 (amplicon FFFR, 658bp) species. Combining the results of all three COI amplicons (multi-amplicon approach) improved the read-biomass correlations for some of the species. Overall, mitogenomic sequencing yielded more informative predictions of biomass content from bulk macroinvertebrate communities than metabarcoding. However, for large scale ecological studies, metabarcoding currently remains the most commonly used approach for diversity assessment.

Estimating intraspecific genetic diversity from community DNA metabarcoding data

BACKGROUND: DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high-throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTUs), losing information on intraspecific diversity in the process. While Cytochrome c oxidase subunit I (COI) haplotype information is limited in resolving intraspecific diversity it is nevertheless often useful e.g. in a phylogeographic context, helping to formulate hypotheses on taxon distribution and dispersal.

METHODS: This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotype information from freshwater macroinvertebrate metabarcoding datasets. This novel approach was added to the R package "JAMP" and can be applied to COI amplicon datasets. We tested our haplotyping method by sequencing (i) a single-species mock community composed of 31 individuals with 15 different haplotypes spanning three orders of magnitude in biomass and (ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates.

RESULTS: We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177-200 OTUs, each containing an average of 2.40-3.30 haplotypes per OTU. The derived intraspecific diversity data showed population structures that were consistent between replicates and similar between primer pairs but resolution depended on the primer length. A closer look at abundant taxa in the dataset revealed various population genetic patterns, e.g. the stonefly Taeniopteryx nebulosa and the caddisfly Hydropsyche pellucidula showed a distinct north-south cline with respect to haplotype distribution, while the beetle Oulimnius tuberculatus and the isopod Asellus aquaticus displayed no clear population pattern but differed in genetic diversity.

DISCUSSION: We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate metabarcoding data. It needs to be stressed that at this point this metabarcoding-informed haplotyping is not capable of capturing the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding datasets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about species diversity but also underlying genetic diversity.

Phylogeographic structure in three North American tent caterpillar species (Lepidoptera: Lasiocampidae): Malacosoma americana, M. californica, and M. disstria.

While phylogeographic structure has been examined in many North American vertebrate species, insects have received much less attention despite their central ecological roles. The moth genus Malacosoma (Hübner, 1820), is an important group of forestry pests responsible for large-scale defoliation across much of the Nearctic and Palearctic. The present study uses sequence variation in the mitochondrial cytochrome c oxidase 1 (COI) gene to examine the population genetic structure of the three widespread Malacosoma species (M. americana, M. californica, and M. disstria). Populations of all three species showed highest diversity in the south, suggesting that modern populations derived from southern refugia with loss of variation as these lineages dispersed northwards. However, despite similar life histories and dispersal abilities, the extent of regional variation varied among the taxa. M. americana, a species restricted to eastern North America, showed much less genetic structure than the western M. californica or the widespread M. disstria. The regional differentiation in the latter reflects the likely derivation of modern lineages from several refugia, as well as taxonomic uncertainty in M. californica. In these respects, the three species of Malacosoma share phylogeographic patterns similar to those detected in vertebrates which are characterised by greater phylogeographic breaks in the western half of the continent and limited structure in the east.

DNA metabarcoding and microscopic analyses of sea turtles biofilms: Complementary to understand turtle behavior

Sea turtles are distributed in tropical and subtropical seas worldwide. They play several ecological roles and are considered important indicators of the health of marine ecosystems. Studying epibiotic diatoms living on turtle shells suggestively has great potential in the study of turtle behavior because diatoms are always there. However, diatom identification at the species level is time consuming, requires well-trained specialists, and there is a high probability of finding new taxa growing on turtle shells, which makes identification tricky. An alternative approach based on DNA barcoding and high throughput sequencing (HTS), metabarcoding, has been developed in recent years to identify species at the community level by using a DNA reference library. The suitabilities of morphological and molecular approaches were compared. Diatom assemblages were sampled from seven juvenile green turtles (Chelonia mydas) from Mayotte Island, France. The structures of the epibiotic diatom assemblages differed between both approaches. This resulted in different clustering of the turtles based on their diatom communities. Metabarcoding allowed better discrimination between turtles based on their epibiotic diatom assemblages and put into evidence the presence of a cryptic diatom diversity. Microscopy, for its part, provided more ecological information of sea turtles based on historical bibliographical data and the abundances of ecological guilds of the diatom species present in the samples. This study shows the complementary nature of these two methods for studying turtle behavior.

Real-time DNA barcoding in a rainforest using nanopore sequencing: opportunities for rapid biodiversity assessments and local capacity building

BACKGROUND: Advancements in portable scientific instruments provide promising avenues to expedite field work in order to understand the diverse array of organisms that inhabit our planet. Here, we tested the feasibility for in situ molecular analyses of endemic fauna using a portable laboratory fitting within a single backpack in one of the world's most imperiled biodiversity hotspots, the Ecuadorian Chocó rainforest. We used portable equipment, including the MinION nanopore sequencer (Oxford Nanopore Technologies) and the miniPCR (miniPCR), to perform DNA extraction, polymerase chain reaction amplification, and real-time DNA barcoding of reptile specimens in the field.

FINDINGS: We demonstrate that nanopore sequencing can be implemented in a remote tropical forest to quickly and accurately identify species using DNA barcoding, as we generated consensus sequences for species resolution with an accuracy of >99% in less than 24 hours after collecting specimens. The flexibility of our mobile laboratory further allowed us to generate sequence information at the Universidad Tecnológica Indoamérica in Quito for rare, endangered, and undescribed species. This includes the recently rediscovered Jambato toad, which was thought to be extinct for 28 years. Sequences generated on the MinION required as few as 30 reads to achieve high accuracy relative to Sanger sequencing, and with further multiplexing of samples, nanopore sequencing can become a cost-effective approach for rapid and portable DNA barcoding.

CONCLUSIONS: Overall, we establish how mobile laboratories and nanopore sequencing can help to accelerate species identification in remote areas to aid in conservation efforts and be applied to research facilities in developing countries. This opens up possibilities for biodiversity studies by promoting local research capacity building, teaching nonspecialists and students about the environment, tackling wildlife crime, and promoting conservation via research-focused ecotourism.

Implementation options for DNA-based identification into ecological status assessment under the European Water Framework Directive

Assessment of ecological status for the European Water Framework Directive (WFD) is based on "Biological Quality Elements" (BQEs), namely phytoplankton, benthic flora, benthic invertebrates and fish. Morphological identification of these organisms is a time-consuming and expensive procedure. Here, we assess the options for complementing and, perhaps, replacing morphological identification with procedures using eDNA, metabarcoding or similar approaches. We rate the applicability of DNA-based identification for the individual BQEs and water categories (rivers, lakes, transitional and coastal waters) against eleven criteria, summarised under the headlines representativeness (for example suitability of current sampling methods for DNA-based identification, errors from DNA-based species detection), sensitivity (for example capability to detect sensitive taxa, unassigned reads), precision of DNA-based identification (knowledge about uncertainty), comparability with conventional approaches (for example sensitivity of metrics to differences in DNA-based identification), cost effectiveness and environmental impact. Overall, suitability of DNA-based identification is particularly high for fish, as eDNA is a well-suited sampling approach which can replace expensive and potentially harmful methods such as gill-netting, trawling or electrofishing. Furthermore, there are attempts to replace absolute by relative abundance in metric calculations. For invertebrates and phytobenthos, the main challenges include the modification of indices and completing barcode libraries. For phytoplankton, the barcode libraries are even more problematic, due to the high taxonomic diversity in plankton samples. If current assessment concepts are kept, DNA-based identification is least appropriate for macrophytes (rivers, lakes) and angiosperms/macroalgae (transitional and coastal waters), which are surveyed rather than sampled. We discuss general implications of implementing DNA-based identification into standard ecological assessment, in particular considering any adaptations to the WFD that may be required to facilitate the transition to molecular data.

High-throughput terrestrial biodiversity assessments: mitochondrial metabarcoding, metagenomics or metatranscriptomics?

Consensus on the optimal high-throughput sequencing (HTS) approach to examine biodiversity in mixed terrestrial arthropod samples has not been reached. Metatranscriptomics could increase the proportion of taxonomically informative mitochondrial reads in HTS outputs but has not been investigated for terrestrial arthropod samples. We compared the efficiency of 16S rRNA metabarcoding, metagenomics and metatranscriptomics for detecting species in a mixed terrestrial arthropod sample (pooled DNA/RNA from 38 taxa). 16S rRNA metabarcoding and nuclear rRNA-depleted metatranscriptomics had the highest detection rate with 97% of input species detected. Based on cytochrome c oxidase I, metagenomics had the highest detection rate with 82% of input species detected, but metatranscriptomics produced a larger proportion of reads matching (Sanger) reference sequences. Metatranscriptomics with nuclear rRNA depletion may offer advantages over metabarcoding through reducing the number of spurious operational taxonomic units while retaining high detection rates, and offers natural enrichment of mitochondrial sequences which may enable increased species detection rates compared with metagenomics.

Environmental DNA filtration techniques affect recovered biodiversity

Freshwater metazoan biodiversity assessment using environmental DNA (eDNA) captured on filters offers new opportunities for water quality management. Filtering of water in the field is a logistical advantage compared to transport of water to the nearest lab, and thus, appropriate filter preservation becomes crucial for maximum DNA recovery. Here, the effect of four different filter preservation strategies, two filter types, and pre-filtration were evaluated by measuring metazoan diversity and community composition, using eDNA collected from a river and a lake ecosystem. The filters were preserved cold on ice, in ethanol, in lysis buffer and dry in silica gel. Our results show that filters preserved either dry or in lysis buffer give the most consistent community composition. In addition, mixed cellulose ester filters yield more consistent community composition than polyethersulfone filters, while the effect of pre-filtration remained ambiguous. Our study facilitates development of guidelines for aquatic community-level eDNA biomonitoring, and we advocate filtering in the field, using mixed cellulose ester filters and preserving the filters either dry or in lysis buffer.

Counting with DNA in metabarcoding studies: how should we convert sequence reads to dietary data? (Preprint)

Advances in DNA sequencing technology have revolutionised the field of molecular analysis of trophic interactions and it is now possible to recover counts of food DNA barcode sequences from a wide range of dietary samples. But what do these counts mean? To obtain an accurate estimate of the overall diet of a consumer should we work strictly with datasets summarising the frequency of occurrence of different food taxa, or is it possible to use the relative number of sequences? Both approaches are applied in the dietary metabarcoding literature, but occurrence data is often promoted as a more conservative and reliable option due to taxa-specific biases in recovery of sequences. Here, we point out that diet summaries based on occurrence data overestimate the importance of food consumed in small quantities (potentially including low-level contaminants) and are sensitive to the count threshold used to define an occurrence. Our simulations indicate that even with recovery biases incorporated, using relative read abundance (RRA) information can provide a more accurate view of population-level diet in many scenarios. The ideas presented here highlight the need to consider all sources of bias and to justify the methods used to interpret count data in dietary metabarcoding studies. We encourage researchers to continue to addressing methodological challenges, and acknowledge unanswered questions to help spur future investigations in this rapidly developing area of research.

Plant DNA metabarcoding of lake sediments: How does it represent the contemporary vegetation

Metabarcoding of lake sediments have been shown to reveal current and past biodiversity, but little is known about the degree to which taxa growing in the vegetation are represented in environmental DNA (eDNA) records. We analysed composition of lake and catchment vegetation and vascular plant eDNA at 11 lakes in northern Norway. Out of 489 records of taxa growing within 2 m from the lake shore, 17-49% (mean 31%) of the identifiable taxa recorded were detected with eDNA. Of the 217 eDNA records of 47 plant taxa in the 11 lakes, 73% and 12% matched taxa recorded in vegetation surveys within 2 m and up to about 50 m away from the lakeshore, respectively, whereas 16% were not recorded in the vegetation surveys of the same lake. The latter include taxa likely overlooked in the vegetation surveys or growing outside the survey area. The percentages detected were 61, 47, 25, and 15 for dominant, common, scattered, and rare taxa, respectively. Similar numbers for aquatic plants were 88, 88, 33 and 62%, respectively. Detection rate and taxonomic resolution varied among plant families and functional groups with good detection of e.g. Ericaceae, Roseaceae, deciduous trees, ferns, club mosses and aquatics. The representation of terrestrial taxa in eDNA depends on both their distance from the sampling site and their abundance and is sufficient for recording vegetation types. For aquatic vegetation, eDNA may be comparable with, or even superior to, in-lake vegetation surveys and may therefore be used as an tool for biomonitoring. For reconstruction of terrestrial vegetation, technical improvements and more intensive sampling is needed to detect a higher proportion of rare taxa although DNA of some taxa may never reach the lake sediments due to taphonomical constrains. Nevertheless, eDNA performs similar to conventional methods of pollen and macrofossil analyses and may therefore be an important tool for reconstruction of past vegetation.

Biodiversity Science

Tuesday, April 24, 2018

Heineken Prize for Paul Hebert

Friday, April 20, 2018

Weekend reads