Monday, January 23, 2017

Monday reads

Welcome to another week. This weekend saw the birth of 'alternative facts', an expression most people better know as 'falsehood', but never mind. We also witnessed a very powerful expression of democracy in the Women's march on Washington and all the Sister marches around the world.

Its Monday, time for science again, so here are some good reads for the week.

Alert! Shameless self-promoting to follow. The first two reads include me as co-author. Number one is the result of collaborative work of colleagues around the Mediterranean Sea.
Cartilaginous fish are particularly vulnerable to anthropogenic stressors and environmental change because of their K-selected reproductive strategy. Accurate data from scientific surveys and landings are essential to assess conservation status and to develop robust protection and management plans. Currently available data are often incomplete or incorrect as a result of inaccurate species identifications, due to a high level of morphological stasis, especially among closely related taxa. Moreover, several diagnostic characters clearly visible in adult specimens are less evident in juveniles. Here we present results generated by the ELASMOMED Consortium, a regional network aiming to sample and DNA-barcode the Mediterranean Chondrichthyans with the ultimate goal to provide a comprehensive DNA barcode reference library. This library will support and improve the molecular taxonomy of this group and the effectiveness of management and conservation measures. We successfully barcoded 882 individuals belonging to 42 species (17 sharks, 24 batoids and one chimaera), including four endemic and several threatened ones. Morphological misidentifications were found across most orders, further confirming the need for a comprehensive DNA barcoding library as a valuable tool for the reliable identification of specimens in support of taxonomist who are reviewing current identification keys. Despite low intraspecific variation among their barcode sequences and reduced samples size, five species showed preliminary evidence of phylogeographic structure. Overall, the ELASMOMED initiative further emphasizes the key role accurate DNA barcoding libraries play in establishing reliable diagnostic species specific features in otherwise taxonomically problematic groups for biodiversity management and conservation actions.

Number two is the result of some follow-up emails to a blog post I wrote a while ago.
no abstract as it is a commentary paper

And now to all the good ones.
Climate change may result in ecological futures with novel species assemblages, trophic mismatch, and mass extinction. Alaska has a limited taxonomic workforce to address these changes. We are building a DNA barcode library to facilitate a metabarcoding approach to monitoring non-marine arthropods. Working with the Canadian Centre for DNA Barcoding, we obtained DNA barcodes from recently collected and authoritatively identified specimens in the University of Alaska Museum (UAM) Insect Collection and the Kenai National Wildlife Refuge collection. We submitted tissues from 4776 specimens, of which 81% yielded DNA barcodes representing 1662 species and 1788 Barcode Index Numbers (BINs), of primarily terrestrial, large-bodied arthropods. This represents 84% of the species available for DNA barcoding in the UAM Insect Collection. There are now 4020 Alaskan arthropod species represented by DNA barcodes, after including all records in Barcode of Life Data Systems (BOLD) of species that occur in Alaska - i.e., 48.5% of the 8277 Alaskan, non-marine-arthropod, named species have associated DNA barcodes. An assessment of the identification power of the library in its current state yielded fewer species-level identifications than expected, but the results were not discouraging. We believe we are the first to deliberately begin development of a DNA barcode library of the entire arthropod fauna for a North American state or province. Although far from complete, this library will become increasingly valuable as more species are added and costs to obtain DNA sequences fall.

DNA metabarcoding is a promising approach for rapidly surveying biodiversity and is likely to become an important tool for measuring ecosystem responses to environmental change. Metabarcoding markers need sufficient taxonomic coverage to detect groups of interest, sufficient sequence divergence to resolve species, and will ideally indicate relative abundance of taxa present. We characterized zooplankton assemblages with three different metabarcoding markers (nuclear 18S rDNA, mitochondrial COI, and mitochondrial 16S rDNA) to compare their performance in terms of taxonomic coverage, taxonomic resolution, and correspondence between morphology- and DNA-based identification. COI amplicons sequenced on separate runs showed that operational taxonomic units representing >0.1% of reads per sample were highly reproducible, although slightly more taxa were detected using a lower annealing temperature. Mitochondrial COI and nuclear 18S showed similar taxonomic coverage across zooplankton phyla. However, mitochondrial COI resolved up to threefold more taxa to species compared to 18S. All markers revealed similar patterns of beta-diversity, although different taxa were identified as the greatest contributors to these patterns for 18S. For calanoid copepod families, all markers displayed a positive relationship between biomass and sequence reads, although the relationship was typically strongest for 18S. The use of COI for metabarcoding has been questioned due to lack of conserved primer-binding sites. However, our results show the taxonomic coverage and resolution provided by degenerate COI primers, combined with a comparatively well-developed reference sequence database, make them valuable metabarcoding markers for biodiversity assessment.

Understanding the diversity and composition of species assemblages and identifying underlying biotic and abiotic determinants represent great ecological challenges. Addressing some of these issues, we investigated the α-diversity and phylogenetic composition of species-rich geometrid moth (Lepidoptera: Geometridae) assemblages in the mature temperate forest on Changbai Mountain. A total of 9285 geometrid moths representing 131 species were collected, with many species displaying wide elevational distribution ranges. Moth α-diversity decreased monotonously, while the standardized effect size of mean pairwise phylogenetic distances (MPD) and phylogenetic diversity (PD) increased significantly with increasing elevation. At high elevations, the insect assemblages consisted largely of habitat generalists that were individually more phylogenetically distinct from co-occurring species than species in assemblages at lower altitudes. This could hint at higher speciation rates in more favourable low-elevation environments generating a species-rich geometrid assemblage, while exclusion of phylogenetically closely related species becomes increasingly important in shaping moth assemblages at higher elevations. Overall, it appears likely that high-elevation temperate moth assemblages are strongly resilient to environmental change, and that they contain a much larger proportion of the genetic diversity encountered at low-elevation assemblages in comparison to tropical geometrid communities.

...and some bioinformatics
DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way.
We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe.
DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.

This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of > 60 in the additional read. Corrections are only available during system training. Developing the system, nearly 850 000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of < 1%. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79% of N-labels from COI (animal barcode); 80% from matK and rbcL (plant barcodes); and 58% from non-protein-coding sequences (across eukaryotes).

No comments:

Post a Comment