Thursday, March 8, 2018

Weekend reads

Another week, another pile of reading material. This time with some bioinformatics. Enjoy!

In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced "Poisson Tree Processes" (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences.
We introduce the multi-rate PTP (mPTP), an improved method that alleviates the theoretical and technical shortcomings of PTP. It incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species. Results on empirical data suggest that mPTP is superior to PTP and popular distance-based methods as it, consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to the taxonomy). Moreover, mPTP does not require any similarity threshold as input. The novel dynamic programming algorithm attains a speedup of at least five orders of magnitude compared to PTP, allowing it to delimit species in large (meta-) barcoding data. In addition, Markov Chain Monte Carlo sampling provides a comprehensive evaluation of the inferred delimitation in just a few seconds for millions of steps, independently of tree size.
mPTP is implemented in C and is available for download at under the GNU Affero 3 license. A web-service is available at

The body of DNA sequence data lacking taxonomically informative sequence headers is rapidly growing in user and public databases (e.g. sequences lacking identification and contaminants). In the context of systematics studies, sorting such sequence data for taxonomic curation and/or molecular diversity characterization (e.g. crypticism) often requires the building of exploratory phylogenetic trees with reference taxa. The subsequent step of segregating DNA sequences of interest based on observed topological relationships can represent a challenging task, especially for large datasets.
We have written TREE2FASTA, a Perl script that enables and expedites the sorting of FASTA-formatted sequence data from exploratory phylogenetic trees. TREE2FASTA takes advantage of the interactive, rapid point-and-click color selection and/or annotations of tree leaves in the popular Java tree-viewer FigTree to segregate groups of FASTA sequences of interest to separate files. TREE2FASTA allows for both simple and nested segregation designs to facilitate the simultaneous preparation of multiple data sets that may overlap in sequence content.

Biological soil crusts (BSCs) are amalgamations of autotrophic, heterotrophic and saprotrophic organisms. In the Polar Regions, these unique communities occupy essential ecological functions such as primary production, nitrogen fixation and ecosystem engineering. Here we present the first molecular survey of BSCs from the Arctic and Antarctica focused on both eukaryotes and prokaryotes as well as passive and active biodiversity. Considering sequence abundance, Bryophyta is among the most abundant taxa in all analyzed BSCs suggesting that they were in a late successional stage. In terms of algal and cyanobacterial biodiversity, the genera Chloromonas, Coccomyxa, Elliptochloris and Nostoc were identified in all samples regardless of origin confirming their ubiquitous distribution. For the first time, we found the chrysophyte Spumella to be common in polar BSCs as it was present in all analyzed samples. Co-occurrence analysis revealed the presence of sulfur metabolizing microbes indicating that BSCs also play an important role for the sulfur cycle. In general, phototrophs were most abundant within the BSCs but there was also a diverse community of heterotrophs and saprotrophs. Our results show that BSCs are unique microecosystems in polar environments with an unexpectedly high biodiversity.

eDNA metabarcoding represents a new tool for community biodiversity assessment in a broad range of aquatic and terrestrial habitats. However, much of the existing literature focuses on methodological development rather than testing of ecological hypotheses. Here, we use presence-absence data generated by eDNA metabarcoding of over 500 UK ponds to examine: 1) species associations between the great crested newt (Triturus cristatus) and other vertebrates, 2) determinants of great crested newt occurrence at the pondscape, and 3) determinants of vertebrate species richness at the pondscape. The great crested newt was significantly associated with nine vertebrate species. Occurrence in ponds was broadly reduced by more fish species, but enhanced by more waterfowl and other amphibian species. Abiotic determinants (including pond area, depth, and terrestrial habitat) were identified, which both corroborate and contradict existing literature on great crested newt ecology. Some of these abiotic factors (pond outflow) also determined species richness at the pondscape, but other factors were unique to great crested newt (pond area, depth, and ruderal habitat) or the wider biological community (pond density, macrophyte cover, terrestrial overhang, rough grass habitat, and overall terrestrial habitat quality) respectively. The great crested newt Habitat Suitability Index positively correlated with both eDNA-based great crested newt occupancy and vertebrate species richness. Our study is one of the first to use eDNA metabarcoding to test abiotic and biotic determinants of pond biodiversity. eDNA metabarcoding provided new insights at scales that were previously unattainable using established methods. This tool holds enormous potential for testing ecological hypotheses alongside biodiversity monitoring and pondscape management.

A method for the extraction of nucleic acids from a wide range of environmental samples was developed. This method consists of several modules, which can be individually modified to maximize yields in extractions of DNA and RNA or separations of DNA pools. Modules were designed based on elaborate tests, in which permutations of all nucleic acid extraction steps were compared. The final modular protocol is suitable for extractions from igneous rock, air, water, and sediments. Sediments range from high-biomass, organic rich coastal samples to samples from the most oligotrophic region of the world's oceans and the deepest borehole ever studied by scientific ocean drilling. Extraction yields of DNA and RNA are higher than with widely used commercial kits, indicating an advantage to optimizing extraction procedures to match specific sample characteristics. The ability to separate soluble extracellular DNA pools without cell lysis from intracellular and particle-complexed DNA pools may enable new insights into the cycling and preservation of DNA in environmental samples in the future. A general protocol is outlined, along with recommendations for optimizing this general protocol for specific sample types and research goals.

