Friday, August 24, 2018

Weekend reads

A lot of papers have piled up in the past few weeks. Lots to recommend for thorough reading. Lets start with some that were written by colleagues at our institute:

PREMISE OF THE STUDY:
The detection of environmental DNA (eDNA) using high-throughput sequencing has rapidly emerged as a method to detect organisms from environmental samples. However, eDNA studies of aquatic biomes have focused on surveillance of animal species with less emphasis on plants. Pondweeds are important bioindicators of freshwater ecosystems, although their diversity is underestimated due to difficulties in morphological identification and monitoring.
METHODS:
A protocol was developed to detect pondweeds in water samples using atpB-rbcL and ITS2 markers. The water samples were collected from the Grand River within the rare Charitable Research Reserve, Ontario (RARE). Short fragments were amplified using primers targeting pondweeds, sequenced on an Ion Torrent Personal Genome Machine, and assigned to the taxonomy using a local DNA reference library and GenBank.
RESULTS:
We detected two species earlier documented at the experimental site during ecological surveys (Potamogeton crispus and Stuckenia pectinata) and three species new to the RARE checklist (P. foliosus, S. filiformis, and Zannichellia palustris).
DISCUSSION:
Our targeted approach to track the species composition of pondweeds in freshwater ecosystems revealed underestimation of their diversity. This result suggests that eDNA is an effective tool for monitoring plant diversity in aquatic habitats.

Characterisation of freshwater benthic biodiversity using DNA metabarcoding may allow more cost-effective environmental assessments than the current morphological-based assessment methods. DNA metabarcoding methods where sorting or pre-sorting of samples are avoided altogether are especially interesting, since the time between sampling and taxonomic identification is reduced. Due to the presence of non-target material like plants and sediments in crude samples, DNA extraction protocols become important for maximising DNA recovery and sample replicability. We sampled freshwater invertebrates from six river and lake sites and extracted DNA from homogenised bulk samples in quadruplicate subsamples, using a published method and two commercially available kits: HotSHOT approach, Qiagen DNeasy Blood & Tissue Kit and Qiagen DNeasy PowerPlant Pro Kit. The performance of the selected extraction methods was evaluated by measuring DNA yield and applying DNA metabarcoding to see if the choice of DNA extraction method affects DNA yield and metazoan diversity results. The PowerPlant Kit extractions resulted in the highest DNA yield and a strong significant correlation between sample weight and DNA yield, while the DNA yields of the Blood & Tissue Kit and HotSHOT method did not correlate with the sample weights. Metazoan diversity measures were more repeatable in samples extracted with the PowerPlant Kit compared to those extracted with the HotSHOT method or the Blood & Tissue Kit. Subsampling using Blood & Tissue Kit and HotSHOT extraction failed to describe the same community in the lake samples. Our study exemplifies that the choice of DNA extraction protocol influences the DNA yield as well as the subsequent community analysis. Based on our results, low specimen abundance samples will likely provide more stable results if specimens are sorted prior to DNA extraction and DNA metabarcoding, but the repeatability of the DNA extraction and DNA metabarcoding results was close to ideal in high specimen abundance samples.

Background
Macroinvertebrates such as non-biting midges (Chironomidae: Diptera) are important components of freshwater ecosystems. However, they are often neglected in biodiversity and conservation research because invertebrate species richness is difficult and expensive to quantify with traditional methods. We here demonstrate that Next Generation Sequencing barcodes (“NGS barcodes”) can provide relief because they allow for fast and large-scale species-level sorting of large samples at low cost.
Results
We used NGS barcoding to investigate the midge fauna of Singapore’s swamp forest remnant (Nee Soon Swamp Forest). Based on > 14.000 barcoded specimens, we find that the swamp forest maintains an exceptionally rich fauna composed of an observed number of 289 species (estimated 336 species) in a very small area (90 ha). We furthermore barcoded the chironomids from three surrounding reservoirs that are located in close proximity. Although the swamp forest remnant is much smaller than the combined size of the freshwater reservoirs in the study (90 ha vs. > 450 ha), the latter only contains 33 (estimated 61) species. We show that the resistance of the swamp forest species assemblage is high because only 8 of the 314 species are shared despite the close proximity. Moreover, shared species are not very abundant (3% of all specimens). A redundancy analysis revealed that ~ 21% of the compositional variance of midge communities within the swamp forest was explained by a range of variables with conductivity, stream order, stream width, temperature, latitude (flow direction), and year being significant factors influencing community structure. An LME analysis demonstrates that the total species richness decreased with increasing conductivity.
Conclusion
Our study demonstrates that midge diversity of a swamp forest can be so high that it questions global species diversity estimates for Chironomidae, which are an important component of many freshwater ecosystems. We furthermore demonstrate that small and natural habitat remnants can have high species turnover and can be very resistant to the invasion of species from neighboring reservoirs. Lastly, the study shows how NGS barcodes can be used to integrate specimen- and species-rich invertebrate taxa in biodiversity and conservation research.

The hyporheic zone, i.e. the ecotone between surface water and the groundwater, is a rarely studied freshwater ecosystem. Hyporheic taxa are often meiofaunal (<1 mm) in size and difficult to identify based on morphology. Metabarcoding approaches are promising for the study of these environments and taxa, but it is yet unclear if commonly applied metabarcoding primers and replication strategies can be used. In this study, we took sediment cores from two near natural upstream (NNU) and two ecologically improved downstream (EID) sites in the Boye catchment (Emscher River, Germany), metabarcoding their meiofaunal communities. We evaluated the usability of a commonly used, highly degenerate COI primer pair (BF2/BR2) and tested how sequencing three PCR replicates per sample and removing MOTUs present in only one out of three replicates impacts the inferred community composition. A total of 22,514 MOTUs were detected, of which only 263 were identified as Metazoa. Our results highlight the gaps in reference databases for meiofaunal taxa and the potential problems of using highly degenerate primers for studying samples containing a high number of non-metazoan taxa. Alpha diversity was higher in EID sites and showed higher community similarity when compared to NNU sites. Beta diversity analyses showed that removing MOTUs detected in only one out of three replicates per site greatly increased community similarity in samples. Sequencing three sample replicates and removing rare MOTUs is seen as a good compromise between retaining too many false-positives and introducing too many false-negatives. We conclude that metabarcoding hyporheic communities using highly degenerate COI primers can provide valuable first insights into the diversity of these ecosystems and highlight some potential application scenarios.

Metabarcoding of complex metazoan communities is increasingly being used to measure biodiversity in terrestrial, freshwater, and marine ecosystems, revolutionizing our ability to observe patterns and infer processes regarding the origin and conservation of biodiversity. A fundamentally important question is which genetic marker to amplify, and although the mitochondrial cytochrome oxidase subunit I (COI) gene is one of the more widely used markers in metabarcoding for the Metazoa, doubts have recently been raised about its suitability. We argue that (i) the extensive coverage of reference-sequence databases for COI, (ii) the variation it presents, (iii) the comparative advantages for denoising protein coding genes, and (iv) recent advances in DNA sequencing protocols argue in favour of standardising for the use of COI for metazoan community samples. We also highlight where research efforts should focus to maximise the utility of metabarcoding.

Resource variation along abiotic gradients influences subsequent trophic interactions and these effects can be transmitted through entire food webs. Interactions along abiotic gradients can provide clues as to how organisms will face changing environmental conditions, such as future range shifts. However, it is challenging to find replicated systems to study these effects. Phytotelmata, such as those found in carnivorous plants, are isolated aquatic communities and thus form a good model for the study of replicated food webs. Due to the degraded nature of the prey, molecular techniques provide a useful tool to study these communities. We studied the pitcher plant Sarracenia purpurea L. in allochthonous populations along an elevational gradient in the Alps and Jura. We predicted that invertebrate richness in the contents of the pitcher plants would decrease with increasing elevation, reflecting harsher environmental conditions. Using metabarcoding of the COI gene, we sequenced the invertebrate contents of these pitcher plants. We assigned Molecular Operational Taxonomic Units at ordinal level as well as recovering species-level data. We found small but significant changes in community composition with elevation. These recovered sequences could belong to invertebrate prey, rotifer inquilines, pollinators and other animals possibly living inside the pitchers. However, we found no directional trend or site-based differences in MOTU richness with elevational gradient. Use of molecular techniques for dietary or contents analysis is a powerful way to examine numerous degraded samples, although factors such as DNA persistence and the relationship to species presence still have to be completely determined. 

DNA metabarcoding is widely used to study prokaryotic and eukaryotic microbial diversity. Technological constraints limit most studies to marker lengths below 600 base pairs (bp). Longer sequencing reads of several thousand bp are now possible with third-generation sequencing. Increased marker lengths provide greater taxonomic resolution and allow for phylogenetic methods of classification, but longer reads may be subject to higher rates of sequencing error and chimera formation. In addition, most bioinformatics tools for DNA metabarcoding were designed for short reads and are therefore unsuitable. Here we used Pacific Biosciences circular consensus sequencing (CCS) to DNA-metabarcode environmental samples using a ca. 4,500 bp marker that included most of the eukaryote SSU and LSU rRNA genes and the complete ITS region. We developed an analysis pipeline that reduced error rates to levels comparable to short-read platforms. Validation using a mock community indicated that our pipeline detected 98% of chimeras de novo. We recovered 947 OTUs from water and sediment samples from a natural lake, 848 of which could be classified to phylum, 397 to genus, and 330 to species. By allowing for the simultaneous use of three databases (Unite, SILVA, RDP LSU), long-read DNA metabarcoding provided better taxonomic resolution than any single marker. We foresee the use of long reads enabling the cross-validation of reference sequences and the synthesis of ribosomal rRNA gene databases. The universal nature of the rRNA operon and our recovery of >100 non-fungal OTUs indicate that long-read DNA metabarcoding holds promise for studies of eukaryotic diversity more broadly.

BACKGROUND:
Taxonomic identification of plants and insects is a hard process that demands expert taxonomists and time, and it's often difficult to distinguish on morphology only. DNA barcodes allow a rapid species discovery and identification and have been widely used for taxonomic identification by targeting known gene regions that permit to discriminate these species. DNA barcode sequence analysis is usually carried out with processes and tools that still demand a high interaction with the user or researcher. To reduce at most such interaction, we proposed PIPEBAR, a pipeline for DNA chromatograms analysis of Sanger platform sequencing, ensuring high quality consensus sequences along with efficient running time. We also proposed a paired-end reads assembly tool, OverlapPER, which is used in sequence or independently of PIPEBAR.
RESULTS:
PIPEBAR is a command line tool to automatize the processing of large number of trace files. It is accurate as the proprietary Geneious tool and faster than most popular software for barcoding analysis. It is 7 times faster than Geneious and 14 times faster than SeqTrace for processing hundreds of barcoding sequences. OverlapPER is a novel tool for overlapping paired-end reads accurately that accepts both substitution and indel errors and returns both overlapped and non-overlapped regions between a pair of reads. OverlapPER obtained the best results compared to currently used tools when merging 1,000,000 simulated paired-end reads.
CONCLUSIONS:
PIPEBAR and OverlapPER run on most operating systems and are freely available, along with supporting code and documentation

Tuesday, August 21, 2018

From the inbox: Young researcher position

Here a call for a 30-month young researcher position in the InterReg AlpineSpace project “Eco-AlpsWater”, in the field of DNA biomonitoring of freshwater ecosystems, which will be hosted by Nico Salmaso, member of DNAqua-Net.