Friday, June 9, 2017

Weekend reads

After a longer silence due to some changes in the job and travel I am slowly picking up posting duties. I have decided to move the barcoding paper suggestions to Friday and rename these posts. If you happen to have nothing else to do on the weekend or in case you need some good reads for a quite moment, here they are:

Wine is a complex beverage, comprising hundreds of metabolites produced through the action of yeasts and bacteria in fermenting grape must. Commercially, there is now a growing trend away from using wine yeast ( Saccharomyces ) starter cultures, towards the historic practice of uninoculated or "wild" fermentation, where the yeasts and bacteria associated with the grapes and/or winery perform the fermentation. It is the varied metabolic contributions of these numerous non- Saccharomyces species that are thought to impart complexity and desirable taste and aroma attributes to wild ferments in comparison to their inoculated counterparts. To map the microflora of spontaneous fermentation, metagenomic techniques were employed to characterize and monitor the progression of fungal species in five different wild fermentations. Both amplicon-based ribosomal DNA internal transcribed spacer (ITS) phylotyping and shotgun metagenomics were used to assess community structure across different stages of fermentation. While providing a sensitive and highly accurate means of characterizing the wine microbiome, the shotgun metagenomic data also uncovered a significant over-abundance bias in the ITS phylotyping abundance estimations for the common non- Saccharomyces wine yeast genus Metschnikowia . By identifying biases such as that observed for Metschnikowia , abundance mesurements from future ITS-phylotyping datasets can corrected to provide more accurate species representation. Ulitmtaely, as more shotgun metagenomic and single-strain de novo assemblies for key wine species become available, the accuracy of both ITS-amplicon and shotgun studies will greatly increase, providing a powerful methodology for deciphering the influence of the microbial community on the wine flavor and aroma.

Genetic barcodes of arctic medusae and meiobenthic cnidarians have uncovered a fortuitous connection between the medusa Plotocnide borealis Wagner, 1885 and the minute, mud-dwelling polyp Boreohydra simplex Westblad, 1937. Little to no sequence differences exist among independently collected samples identified as Boreohydra simplex and Plotocnide borealis, showing that the two different forms represent a single species that is henceforth known by the older name Plotocnide borealis Wagner, 1885. The polyp form has been observed to produce bulges previously hypothesized to be gonophores, and the results here are consistent with that view. Interestingly, the polyp has also been reported to produce egg cells in the epiderm, a surprising phenomenon that we document here for only the second time. Thus, P. borealis produces eggs in two different life stages, polyp and medusa. This is the first documented case of a metagenetic medusozoan species being able to produce gametes in both the medusa and polyp stage. It remains unclear what environmental/ecological conditions modulate the production of eggs and/or medusa buds in the polyp stage. Similarly, sperm production, fertilization and development are unknown, warranting further studies.

The mosquito family (Diptera: Culicidae) constitutes the most medically important group of arthropods because certain species are vectors of human pathogens. In some parts of the world, the diversity is so high that the accurate delimitation and/or identification of species is challenging. A DNA-based identification system for all animals has been proposed, the so-called DNA barcoding approach. In this study, our objectives were (i) to establish DNA barcode libraries for the mosquitoes of French Guiana based on the COI and the 16S markers, (ii) to compare distance-based and tree-based methods of species delimitation to traditional taxonomy, and (iii) to evaluate the accuracy of each marker in identifying specimens. A total of 266 specimens belonging to 75 morphologically identified species or morphospecies were analyzed allowing us to delimit 86 DNA clusters with only 21 of them already present in the BOLD database. We thus provide a substantial contribution to the global mosquito barcoding initiative. Our results confirm that DNA barcodes can be successfully used to delimit and identify mosquito species with only a few cases where the marker could not distinguish closely related species. Our results also validate the presence of new species identified based on morphology, plus potential cases of cryptic species. We found that both COI and 16S markers performed very well, with successful identifications at the species level of up to 98% for COI and 97% for 16S when compared to traditional taxonomy. This shows great potential for the use of metabarcoding for vector monitoring and eco-epidemiological studies.

Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences ('mislabels') using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa.

1. In recent years, large-scale DNA barcoding campaigns have generated an enormous amount of COI barcodes, which are usually stored in NCBI's GenBank and the official Barcode of Life database (BOLD). BOLD data are generally associated with more detailed and better curated meta-data, because a great proportion is based on expert-verified and vouchered material, accessible in public collections. In the course of the initiative German Barcode of Life (GBOL), data were generated for the reference library of 2,846 species of Coleoptera from 13,516 individuals.
2. Confronted with the high effort associated with the identification, verification and data validation, a bioinformatic pipeline, “TaxCI” was developed that i) identifies taxonomic inconsistencies in a given tree topology (optionally including a reference data set), ii) discriminates between different cases of incongruence in order to identify contamination or misidentified specimens, iii) graphically marks those cases in the tree, which finally can be checked again and, if needed, corrected or removed from the dataset. For this, “TaxCI” may use DNA-based species delimitations from other approaches (e.g., mPTP) or may perform implemented threshold-based clustering.
3. The data-processing pipeline was tested on a newly generated set of barcodes, using the available BOLD records as a reference. A data revision based on the first run of the TaxCI tool resulted in the second TaxCI analysis in a taxonomic match ratio very similar to the one recorded from the reference set (92 vs 94%). The revised dataset improved by nearly 20% through this procedure compared to the original, uncorrected one.
4. Overall, the new processing pipeline for DNA barcode data allows for the rapid and easy identification of inconsistencies in large datasets, which can be dealt with before submitting them to public data repositories like BOLD or GenBank. Ultimately, this will increase the quality of submitted data and the speed of data submission, while primarily avoiding the deterioration of the accuracy of the data repositories due to ambiguously identified or contaminated specimens.

Food trade globalization and the growing demand for selected food varieties have led to the intensification of adulteration cases, especially in the form of species substitution/mixing with cheaper taxa. This phenomenon acquired huge economic impact and sometimes even public health implications. DNA barcoding represents a well-proven molecular tool to assess the authenticity of food items, although its diffusion is hampered by analytical constraints and timeframes that are often prohibitive for food market. To address such issues, we have introduced a new technology, named NanoTracer, which allows for rapid and naked-eye molecular traceability of any food, employing limited instrumentation and cost-effective reagents. Moreover, unlike sequencing, this method allows to identify not only the substitution of a fine ingredient, but also its dilution with cheaper ones.

In this study, we used several molecular techniques to develop a fast and reliable protocol (DNA Verity Test, DVT) for the characterization and confirmation of the species or taxa present in herbal infusions. As a model plant for this protocol, Camellia sinensis, a traditional tea plant, was selected due to the following reasons: its historical popularity as a (healthy) beverage, its high selling value, the importation of barely recognizable raw product (i.e., crushed), and the scarcity of studies concerning adulterants or contamination. The DNA Verity Test includes both the sequencing of DNA barcoding markers and genotyping of labeled-PCR DNA barcoding fragments for each sample analyzed. This protocol (DVT) was successively applied to verify the authenticity of 32 commercial teas (simple or admixture), and the main results can be summarized as follows: (1) the DVT protocol is suitable to detect adulteration in tea matrices (contaminations or absence of certified ingredients), and the method can be exported for the study of other similar systems; (2) based on the BLAST analysis of the sequences of rbcL+matK±rps7-trnV(GAC) chloroplast markers, C. sinensis can be taxonomically characterized; (3) rps7-trnV(GAC) can be employed to discriminate C. sinensis from C. pubicosta; (4) ITS2 is not an ideal DNA barcode for tea samples, reflecting potential incomplete lineage sorting and hybridization/introgression phenomena in C. sinensis taxa; (5) the genotyping approach is an easy, inexpensive and rapid pre-screening method to detect anomalies in the tea templates using the trnH(GUG)-psbA barcoding marker; (6) two herbal companies provided no authentic products with a contaminant or without some of the listed ingredients; and (7) the leaf matrices present in some teabags could be constituted using an admixture of different C. sinensis haplotypes and/or allied species (C. pubicosta).

A large-scale comprehensive reference library of DNA barcodes for European marine fishes was assembled, allowing the evaluation of taxonomic uncertainties and species genetic diversity that were otherwise hidden in geographically restricted studies. A total of 4118 DNA barcodes were assigned to 358 species generating 366 Barcode Index Numbers (BIN). Initial examination revealed as much as 141 BIN discordances (more than one species in each BIN). After implementing an auditing and five-grade (A-E) annotation protocol, the number of discordant species BINs was reduced to 44 (13% grade E), while concordant species BINs amounted to 271 (78% grades A and B) and 14 other had insufficient data (grade D). Fifteen species displayed comparatively high intraspecific divergences ranging from 2·6 to 18·5% (grade C), which is biologically paramount information to be considered in fish species monitoring and stock assessment. On balance, this compilation contributed to the detection of 59 European fish species probably in need of taxonomic clarification or re-evaluation. The generalized implementation of an auditing and annotation protocol for reference libraries of DNA barcodes is recommended.

No comments:

Post a Comment