Tuesday, July 17, 2018

Temperature and mtDNA selection

Mitochondrial DNA (mtDNA) has traditionally been used in population genetic and biogeographic studies as a maternally-inherited and evolutionary-neutral genetic marker. However, it is now clear that polymorphisms within the mtDNA sequence are routinely non-neutral, and furthermore several studies have suggested that such mtDNA polymorphisms are also sensitive to thermal selection. 

A team of researchers from Japan, Australia, and the UK studied two naturally occurring mtDNA variants that are carried by fruit flies inhabiting the east coast of Australia. One of these variants is more common in the sub-tropical, northern part of the country, where temperatures are higher. The other fly variant is more common in the temperate, southern part, which tends to be colder. The colleagues collected flies from both sites and interbred them to get a series of populations with equally mixed genes. Each mixed population was subdivided into four subpopulations. Each of those wer maintained at different conditions, some at constant temperature(e.g. 19ºC and 25ºC respectively), some at fluctuating temperatures to simulate the thermal conditions at the two sites where the flies were collected. After three months, the researchers sequenced the mtDNA of flies from all these subpopulations.

In addition they also examined how the presence of bacteria such as Wolbachia, which commonly infects fruit flies, affects mtDNA selection. Some of the fly populations were treated with antibiotics to kill off any Wolbachia infections that they might be harboring.

The researchers found that in flies reared under warm laboratory conditions, one of the two mtDNA variants became more common than the other. The same mtDNA variant was also found to be widely prevalent in flies from the warmer northern parts of Australia. A similar pattern was observed with the other mtDNA variant in flies reared under cold laboratory conditions. However, researchers only observed this effect in populations where Wolbachia infections had been wiped out. Moreover, the variation patterns they observed in males didn't always match up with the ones in females.

The results show that temperature shapes how mtDNA varies in nature. They also suggest that additional factors, such as sex and infection with microbes, might also influence how the mitochondrial genome evolves in the wild. 

Our results strengthen the emerging view that intra-specific mtDNA variants are sensitive to selection, and suggest spatial distributions of mtDNA variants in natural populations of metazoans might reflect adaptation to climatic environments rather than within-population coalescence and diffusion of selectively-neutral haplotypes across populations.

Friday, July 13, 2018

Weekend reads

Back on track with the weekend read posts.  Maybe not as many as usual but still quite interesting to read.

Ecological and taxonomic knowledge is important for conservation and utilization of biodiversity. Biodiversity and ecology of fungi in Mediterranean ecosystems is poorly understood. Here, we examined the diversity and spatial distribution of fungi along an elevational gradient in a Mediterranean ecosystem, using DNA metabarcoding. This study provides novel information about diversity of all ecological and taxonomic groups of fungi along an elevational gradient in a Mediterranean ecosystem. Our analyses revealed that among all biotic and abiotic variables tested, host species identity is the main driver of the fungal richness and fungal community composition. Fungal richness was strongly associated with tree richness and peaked in Quercus-dominated habitats and Cistus-dominated habitats. The highest taxonomic richness of ectomycorrhizal fungi was observed under Quercus ilex, whereas the highest taxonomic richness of saprotrophs was found under Pinus. Our results suggest that the effect of plant diversity on fungal richness and community composition may override that of abiotic variables across environmental gradients.

Although insects dominate the terrestrial fauna, sampling constraints and the poor taxonomic knowledge of many groups have limited assessments of their diversity. Passive sampling techniques and DNA-based species assignments now make it possible to overcome these barriers. For example, Malaise traps collect specimens with minimal intervention while the Barcode Index Number (BIN) system automates taxonomic assignments. The present study employs Malaise traps and DNA barcoding to extend understanding of insect diversity in one of the least known zoogeographic regions, the Saharo-Arabian. Insects were collected at four sites in three countries (Egypt, Pakistan, Saudi Arabia) by deploying Malaise traps. The collected specimens were analyzed by sequencing 658 bp of cytochrome oxidase I (DNA barcode) and assigning BINs on the Barcode of Life Data Systems. The year-long deployment of a Malaise trap in Pakistan and briefer placements at two Egyptian sites and at one in Saudi Arabia collected 53,092 specimens. They belonged to 17 insect orders with Diptera and Hymenoptera dominating the catch. Barcode sequences were recovered from 44,432 (84%) of the specimens, revealing the occurrence of 3,682 BINs belonging to 254 families. Many of these taxa were uncommon as 25% of the families and 50% of the BINs from Pakistan were only present in one sample. Family and BIN counts varied significantly through the year, but diversity indices did not. Although more than 10,000 specimens were analyzed from each nation, just 2% of BINs were shared by Pakistan and Saudi Arabia, 4% by Egypt and Pakistan, and 7% by Egypt and Saudi Arabia. The present study demonstrates how the BIN system can circumvent the barriers imposed by limited access to taxonomic specialists and by the fact that many insect species in the Saharo-Arabian region are undescribed.

Fungal spores and mycelium fragments are particles which become and remain airborne and have been subjects of aerobiological studies. The presence and the abundance of taxa in aerobiological samples can be very variable and impaired by changeable climatic conditions. Because many fungi produce mycotoxins and both their mycelium fragments and spores are potential allergens, monitoring the presence of these taxa is of key importance. So far data on exposure and sensitization to fungal allergens are mainly based on the assessment of few, easily identifiable taxa and focused only on certain environments. The microscopic method used to analyze aerobiological samples and the inconspicuous fungal characters do not allow a in depth taxonomical identification. Here, we present a first assessment of fungal diversity from airborne samples using a DNA metabarcoding analysis. The nuclear ITS2 region was selected as barcode to catch fungal diversity in mixed airborne samples gathered during two weeks in four sites of North-Eastern and Central Italy. We assessed the taxonomic composition and diversity within and among the sampled sites and compared the molecular data with those obtained by traditional microscopy. The molecular analyses provide a tenfold more comprehensive determination of the taxa than the traditional morphological inspections. Our results prove that the metabarcoding analysis is a promising approach to increases quality and sensitivity of the aerobiological monitoring. The laboratory and bioinformatic workflow implemented here is now suitable for routine, high-throughput, regional analyses of airborne fungi.

Throughout the years, DNA barcoding has gained in importance in forensic entomology as it leads to fast and reliable species determination. High-quality results, however, can only be achieved with a comprehensive DNA barcode reference database at hand. In collaboration with the Bavarian State Criminal Police Office, we have initiated at the Bavarian State Collection of Zoology the establishment of a reference library containing arthropods of potential forensic relevance to be used for DNA barcoding applications. CO1-5P' DNA barcode sequences of hundreds of arthropods were obtained via DNA extraction, PCR and Sanger Sequencing, leading to the establishment of a database containing 502 high-quality sequences which provide coverage for 88 arthropod species. Furthermore, we demonstrate an application example of this library using it as a backbone to a high throughput sequencing analysis of arthropod bulk samples collected from human corpses, which enabled the identification of 31 different arthropod Barcode Index Numbers.

Environmental DNA (eDNA) analysis is a rapid, cost-effective, non-invasive biodiversity monitoring tool which utilises DNA left behind in the environment by organisms for species detection. The method is used as a species-specific survey tool for rare or invasive species across a broad range of ecosystems. Recently, eDNA and "metabarcoding" have been combined to describe whole communities rather than focusing on single target species. However, whether metabarcoding is as sensitive as targeted approaches for rare species detection remains to be evaluated. The great crested newt Triturus cristatus is a flagship pond species of international conservation concern and the first UK species to be routinely monitored using eDNA. We evaluate whether eDNA metabarcoding has comparable sensitivity to targeted real-time quantitative PCR (qPCR) for T. cristatus detection. Extracted eDNA samples (N = 532) were screened for T. cristatus by qPCR and analysed for all vertebrate species using high-throughput sequencing technology. With qPCR and a detection threshold of 1 of 12 positive qPCR replicates, newts were detected in 50% of ponds. Detection decreased to 32% when the threshold was increased to 4 of 12 positive qPCR replicates. With metabarcoding, newts were detected in 34% of ponds without a detection threshold, and in 28% of ponds when a threshold (0.028%) was applied. Therefore, qPCR provided greater detection than metabarcoding but metabarcoding detection with no threshold was equivalent to qPCR with a stringent detection threshold. The proportion of T. cristatus sequences in each sample was positively associated with the number of positive qPCR replicates (qPCR score) suggesting eDNA metabarcoding may be indicative of eDNA concentration. eDNA metabarcoding holds enormous potential for holistic biodiversity assessment and routine freshwater monitoring. We advocate this community approach to freshwater monitoring to guide management and conservation, whereby entire communities can be initially surveyed to best inform use of funding and time for species-specific surveys.

Thursday, July 12, 2018

How predictable is evolution?

Imagine 500 to 1,000 species of cichlids living in one of the African Great Lakes, one of the largest freshwater habitats in the world. The degree of complexity is unimaginable. Even the genealogical relationships between the cichlid species living in these lakes have only partially been resolved

For every two species of mammal there is one species of cichlid fish, which shows that biodiversity is distributed rather unevenly among animals. The question is why and to what extent can this be predicted? We know that a number of factors play a role in the evolution of this. One of them are ecological conditions, i.e. the number of different habitats and the similarity of ecological niches available. The demographic history of a population can also influence biodiversity. A higher level of genetic variation in a population is beneficial in a sense that it allows - given sufficient time - adaptation to more ecological niches. Quantifying all potential factors that contribute to biological diversity, even for only one group of animals, is not easy, not to mention that comparing mammals with a group of fish would be like comparing apples and oranges. 

A new study coming from the lab of my PhD supervisor, Axel Meyer shows some of the factors that contribute to recurrent patterns of diversity and similarity in cichlids. The colleagues aimed to determine factors that led to similar outcomes and thereby help to predict evolution. As any African Great Lake harbours an incredibly species diversity, the team studied a more simple system involving parallel species of Midas cichlids, which occur in two great lakes as well as in a chain of crater lakes in Nicaragua. 

The more similar the habitat of the crater lake is to that of the large source lake, the more similar the fish are to each other. Habitat seems to be more important than demographic criteria when it comes to predictability of diversity.  The data collected by the colleagues also shows that, compared to the source population, the morphology of all crater lake populations has diversified mostly in the same direction. Crater lake fish all very quickly evolved body shapes that are longer and more slender than those of their cousins from the bigger lakes. In addition body shapes of the crater lake populations correlate with the average depth of the lakes. It makes sense. The deeper a lake is, the more likely it is to provide various ecological niches, including in the deep open water. 

In summary parallel morphological divergence in allopatry and the propensity for diversification in sympatry across the entire Midas cichlid fish radiation is partly predictable and mostly driven by ecology.

Monday, July 9, 2018

(Post-) Weekend reads

Last Friday I was so invested in some data analysis that I forgot everything around me and that included my Friday blog post with weekend reading material. My apologies for that. Nevertheless, here are my weekly favourites for some throughout the week reading.

Efficient DNA extraction is fundamental to molecular studies. However, commercial kits are expensive when a large number of samples need to be processed. Here we present a simple, modular and adaptable DNA extraction ‘toolkit’ for the isolation of high purity DNA from multiple sample types (modular universal DNA extraction method or Mu-DNA). We compare the performance of our method to that of widely used commercial kits across a range of soil, stool, tissue and water samples. Mu-DNA produced DNA extractions of similar or higher yield and purity to that of the commercial kits. As a proof of principle, we carried out replicate fish metabarcoding of aquatic eDNA extractions, which confirmed that the species detection efficiency of our method is similar to that of the most frequently used commercial kit. Our results demonstrate the reliability of Mu-DNA along with its modular adaptability to challenging sample types and sample collection methods. Mu-DNA can substantially reduce the costs and increase the scope of experiments in molecular studies.

The monitoring of impacts of anthropic activities in marine environments, such as aquaculture, oil-drilling platforms or deep-sea mining, relies on Benthic Biotic Indices (BBI). Several indices have been formalised to reduce the multivariate composition data into a single continuous value that is ascribed to a discrete ecological quality status. Such composition data is traditionally obtained from macrofaunal inventories, which is time-consuming and expertise-demanding. Important efforts are ongoing towards using High-Throughput Sequencing of environmental DNA (eDNA metabarcoding) to replace or complement morpho-taxonomic surveys for routine biomonitoring. The computation of BBI from such composition data is usually being undertaken by practitioners with excel spreadsheets or through custom script. Hence, the updating of reference morpho-taxonomic tables and cross studies comparison could be hampered. Here we introduce the R package BBI for the computation of BBI from composition data, either obtained from traditional morpho-taxonomic inventories or from metabarcoding data. Its aim is to provide an open-source, transparent and centralised method to compute BBI for routine biomonitoring.

The degradation of freshwater ecosystems has become a common ecological and environmental problem globally. Owing to the complexity of biological communities, there remain tremendous technical challenges for investigating influence of environmental stressors (e.g. chemical pollution) on biological communities. High-throughput sequencing-based metabarcoding provides a powerful tool to reveal complex interactions between environments and biological communities. Among many technical issues, the clustering strategies for Operational Taxonomic Units (OTUs) which are crucial for assessing biodiversity of communities, may affect final conclusions. Here, we used zooplankton communities along an environmental pollution gradient in the Chaobai River in Northern China to test different clustering strategies, including non-clustering and clustering with varied thresholds. Our results showed that though the number of OTUs estimated by non-clustering strategies and clustering strategies with divergence thresholds of 99-97% largely varied, they were able to identify the same set of significant environmental and spatial variables responsible for geographical distributions of zooplankton communities. In addition, the ecological conclusions obtained by clustering thresholds of 99-97% were consistent with non-clustering strategies, where for all eight clustering scenarios we detected that species sorting predicted by environmental variables overrode dispersal as the dominant factor in structuring zooplankton communities. However, clustering with the divergence thresholds of <95% affected the environmental and spatial variables identified. We conclude that both newly developed non-clustering methods and traditional clustering methods with divergence thresholds ≥97% were reliable to reveal mechanisms of complex community-environment interactions, although different clustering strategies could lead to largely varied biodiversity estimates such as those for α-diversity.

Sediment bypass tunnels (SBTs) are guiding structures used to reduce sediment accumulation in reservoirs during high flows by transporting sediments to downstream reaches during operation. Previous studies monitoring the ecological effects of SBT operations on downstream reaches suggest a positive influence of SBTs on riverbed sediment conditions and macroinvertebrate communities based on traditional morphology-based surveys. Morphology-based macroinvertebrate assessments are costly and time-consuming, and the large number of morphologically cryptic, small-sized and undescribed species usually results in coarse taxonomic identification. Here, we used DNA metabarcoding analysis to assess the influence of SBT operations on macroinvertebrates downstream of SBT outlets by estimating species diversity and pairwise community dissimilarity between upstream and downstream locations in dam-fragmented rivers with operational SBTs in comparison to dam-fragmented (i.e., no SBTs) and free-flowing rivers (i.e., no dam). We found that macroinvertebrate community dissimilarity decreases with increasing operation time and frequency of SBTs. These factors of SBT operation influence changes in riverbed features, e.g. sediment relations, that subsequently effect the recovery of downstream macroinvertebrate communities to their respective upstream communities. Macroinvertebrate abundance using morphologically-identified specimens was positively correlated to read abundance using metabarcoding. This supports and reinforces the use of quantitative estimates for diversity analysis with metabarcoding data.

Metabarcoding is a popular application which warrants continued methods optimization. To maximize barcoding inferences, hierarchy-based sequence classification methods are increasingly common. We present methods for the construction and curation of a database designed for hierarchical classification of a 157 bp barcoding region of the arthropod cytochrome c oxidase subunit I (COI) locus. We produced a comprehensive arthropod COI amplicon dataset including annotated arthropod COI sequences and COI sequences extracted from arthropod whole mitochondrion genomes, the latter of which provided the only source of representation for Zoraptera, Callipodida and Holothyrida. The database contains extracted sequences of the target amplicon from all major arthropod clades, including all insect orders, all arthropod classes and Onychophora, Tardigrada and Mollusca outgroups. During curation, we extracted the COI region of interest from approximately 81 percent of the input sequences, corresponding to 73 percent of the genus-level diversity found in the input data. Further, our analysis revealed a high degree of sequence redundancy within the NCBI nucleotide database, with a mean of approximately 11 sequence entries per species in the input data. The curated, low-redundancy database is included in the Metaxa2 sequence classification software. Using this database with the Metaxa2 classifier, we performed a cross-validation analysis to characterize the relationship between the Metaxa2 reliability score, an estimate of classification confidence, and classification error probability. We used this analysis to select a reliability score threshold which minimized error. We then estimated classification sensitivity, false discovery rate and overclassification, the propensity to classify sequences from taxa not represented in the reference database. Our work will help researchers design and evaluate classification databases and conduct metabarcoding on arthropods and alternate taxa.

Honeydew produced from the excretion of plant-sucking insects (order Hemiptera) is a carbohydrate-rich material that is foraged by honey bees to integrate their diets. In this study, we used DNA extracted from honey as a source of environmental DNA to disclose its entomological signature determined by honeydew producing Hemiptera that was recovered not only from honeydew honey but also from blossom honey. We designed PCR primers that amplified a fragment of mitochondrial cytochrome c oxidase subunit 1 (COI) gene of Hemiptera species using DNA isolated from unifloral, polyfloral and honeydew honeys. Ion Torrent next generation sequencing metabarcoding data analysis assigned Hemiptera species using a customized bioinformatic pipeline. The forest honeydew honeys reported the presence of high abundance of Cinara pectinatae DNA, confirming their silver fir forest origin. In all other honeys, most of the sequenced reads were from the planthopper Metcalfa pruinosa for which it was possible to evaluate the frequency of different mitotypes. Aphids of other species were identified from honeys of different geographical and botanical origins. This unique entomological signature derived by environmental DNA contained in honey opens new applications for honey authentication and to disclose and monitor the ecology of plant-sucking insects in agricultural and forest landscapes.

Introduced species of mammals in New Zealand have had catastrophic effects on populations of diverse native species. Quantifying the diets of these omnivorous and predatory species is critical for understanding which native species are most impacted, and to prioritize which mammal species and locations should be targeted with control programmes. A variety of methods have been applied to quantify diet components in animals, including visual inspection of gut contents (Daniel 1973; Pierce and Boyle 1991), stable isotope analysis (Major et al. 2007; Carreon-Martinez and Heath 2010), and time-lapse video (Brown and Brown 1997; Dunlap and Pawlik 1996). Increasingly, DNA-based metabarcoding methods are being used (King et al. 2008; Soininen et al. 2009). These metabarcoding methods require a PCR step using primers that bind to highly conserved genomic regions (e.g. mitochondrial COI) to amplify specific regions for sequencing. This step introduces significant bias, primarily due to the lack of a universal primer set (King et al. 2008). Here we show that direct metagenomic sequencing using the Oxford Nanopore Minion allows rapid quantification of rat diets. Using a sample of rats collected from within 100km of Auckland, NZ, we show that these rats consume a wide variety of plant, invertebrate, vertebrate, and fungal taxa, with substantial differences in diet content between locales. We then show that, based on diet content alone, it is possible to pinpoint the sampling location of an individual rat within tens of kilometres. We expect that the rapidly increasing accuracy and throughput of nanopore-based sequencing, as well as increases in the species diversity of genomic databases, will soon allow rapid and unbiased assessments of animal diets in field settings.

Thursday, July 5, 2018

One decade of ZooKeys

One decade of ZooKeys - not bad at all. That means one decade of Open Access Taxonomy. Descriptions that are not hiding behind a paywall, a no-brainer if you ask me. How can we talk about democratization and equal access to information if a large part of the primary literature is still hidden to a substantial group of researchers simply because they or their institution can't afford a subscription. Especially for taxonomy that is pretty aggravating. So, my best wishes and congratulations to Pensoft (in particular Lubo) for this success story. 

The Pensoft blog has some more details. Here the first part:

So here we are, 10 years from that very first issue of ours published on a very special date – the 4th of July – and the result of a seemingly ordinary breakfast conversation between two respected entomologists, Prof Lyubomir Penev and Dr Terry Erwin, during the Entomological Society of America meeting in San Diego, USA, seven months earlier.

Then and there, under the California sun, an idea about a brand new taxonomic journal meant to revolutionise the scholarly publishing in zoology – in terms of both openness and technological innovation, was born. The rest, like they say, is history.

Ten years in, we stand as the most prolific open-access journal in zoology with a total of 4,103 published articles, 45 newly described animal families, 650 genera and 8977 species, authored by a total of 5,720 researchers coming from 131 different countries. We also take pride in having set an excellent example for the rest of the academic titles in Pensoft’s already extensive portfolio of open access journals.

Wednesday, July 4, 2018

Tour of Flanders video footage shows climate change impact on trees

Predicting how the timing of cyclic life history events, such as leafing and flowering, respond to climate change is of paramount importance due to the cascading impacts of vegetation phenology on species and ecosystem fitness. However, progress of this field is hampered by the relative scarcity, and geographic and phylogenetic bias, of longterm phenology datasets.

By analyzing nearly four decades of archive footage from the cycling Tour of Flanders, researchers from Ghent University have been able to detect climate change impacts on trees. Focusing on trees and shrubs growing around recognisable climbs and other landmarks along the route of this major annual road cycling race in Belgium, the colleagues looked at video footage from 1981 to 2016 obtained by a Flemish broadcaster. They visually estimated how many leaves and flowers were present on the day of the course (usually in early April) and linked their scores to climate data. 

They found that the trees had advanced the timing of leafing and flowering in response to recent temperature changes. Before 1990, almost no trees had grown leaves at the time of the spring race. After that year, more and more trees visible in the television footage, in particular magnolia, hawthorn, hornbeam and birch trees, were already in full leaf. These shifts were most strongly related to warmer average temperatures in the area, which have increased by 1.5°C since 1980.

Early-leafing trees can be good news for some species as they grow faster and produce more wood. However, their leaves also cast shadows. When trees flush earlier in the year, they shadow for a longer period of time, affecting other animals and plants, and even whole ecosystems. Some of the flowers growing under these trees may not be able to receive enough sunlight to bloom. As a result, insects can go without nectar and may struggle to find enough spots to sunbathe.

Phenology (the study of natural phenomena that recur periodically such as leafing and flowering) is mostly based on long-term observations and repeat photography, with data often being biased towards common species or geographical regions. In this study, archive footage allowed the researchers to use previously unexploited records of twelve tree species in the Flanders region in order to build long-term datasets of phenological responses.

Our method could also be used to collect data on other aspects important for ecological or evolutionary research, such as tree health, water levels in rivers and lakes, and the spread of invasive species. Only by compiling data from the past will we be able to predict the future effects of climate change on species and ecosystems.

Tuesday, July 3, 2018

Scale matters

Setophaga discolor - Credit: Julie Hart
Biodiversity is changing all around us and worldwide. Local species disappear and sometimes other species invade. Studying birds in the U.S. and worldwide, we show that patterns and implications of this ongoing change vary strongly with the scale.

A minor loss or gain of species richness or functional diversity at the local or county level can look like a major gain at the state or national level, and yet be a net loss when viewed at a global scale. Researchers at Yale University studied 50 years of data about nesting birds in North America and tracked biodiversity changes at different scales. They found significant differences in how much change had occurred, based upon how wide a geographic net they cast. In addition taxonomic diversity and functional diversity increased over all but the global scale. The larger the scale change in taxonomic diversity was higher than in functional diversity which suggests strong trait redundancy at those scales. Also, insectivorous birds (like the prairie warbler in the photo) showed the most drastic declines across all geographic scales, from local to continental.

Any reporting and interpretation of biodiversity change thus needs scale as a key qualifier. Better yet, researchers and practitioners of biodiversity science should adopt a multi-scale framework and consider all geographic scales simultaneously.

Friday, June 29, 2018

Weekend reads

Long weekend for all the Canadians which means more time to read and certainly no post on Monday.

Knowledge of community structure within an ecosystem is essential when trying to understand the function and importance of the system and when making related management decisions. Within the larger ecosystem, microhabitats play an important role by providing inhabitants with a subset of available resources. On coral reefs, cryptobenthic fishes encompass many groups and make up an important proportion of the biodiversity. However, these fishes are relatively small, exhibit extreme visual or behavioral camouflage, and, therefore, are often overlooked. We examined the differences in fish community structure between three common reef microhabitats (live hard coral, dead coral rubble, and sand) using ichthyocide stations in the central Red Sea. Using a combination of morphological and genetic (cytochrome oxidase I (COI) barcoding) techniques, we identified 326 individuals representing 73 species spread across 17 families, from fifteen 1 m2 quadrats. Fish assemblages in the three microhabitats were significantly different from each other. Rubble microhabitats yielded the highest levels of fish abundance, richness, and diversity, followed by hard coral, and then sand. The results show that benthic composition, even at a small scale, influences cryptobenthic communities. This study also provides new COI sequence data to public databases, in order to further the research of cryptobenthic fishes in the Red Sea region.

The cytochrome c oxidase subunit I (cox1) gene is the main mitochondrial molecular marker playing a pivotal role in phylogenetic research and is a crucial barcode sequence. Folmer's "universal" primers designed to amplify this gene in metazoan invertebrates allowed quick and easy barcode and phylogenetic analysis. On the other hand, the increase in the number of studies on barcoding leads to more frequent publishing of incorrect sequences, due to amplification of non-target taxa, and insufficient analysis of the obtained sequences. Consequently, some sequences deposited in genetic databases are incorrectly described as obtained from invertebrates, while being in fact bacterial sequences. In our study, in which we used Folmer's primers to amplify COI sequences of the crustacean fairy shrimp Branchipus schaefferi (Fischer 1834), we also obtained COI sequences of microbial contaminants from Aeromonas sp. However, when we searched the GenBank database for sequences closely matching these contaminations we found entries described as representatives of Gastrotricha and Mollusca. When these entries were compared with other sequences bearing the same names in the database, the genetic distance between the incorrect and correct sequences amplified from the same species was c.a. 65%. Although the responsibility for the correct molecular identification of species rests on researchers, the errors found in already published sequences data have not been re-evaluated so far. On the basis of the standard sampling technique we have estimated with 95% probability that the chances of finding incorrectly described metazoan sequences in the GenBank depend on the systematic group, and variety from less than 1% (Mollusca and Arthropoda) up to 6.9% (Gastrotricha). Consequently, the increasing popularity of DNA barcoding and metabarcoding analysis may lead to overestimation of species diversity. Finally, the study also discusses the sources of the problems with amplification of non-target sequences.

DNA metabarcoding is increasingly used in dietary studies to estimate diversity, composition, and frequency of occurrence of prey items. However, few studies have assessed how technical and biological replication affect the accuracy of diet estimates. This study addresses these issues using the European free-tailed bat Tadarida teniotis, involving high-throughput sequencing of a small fragment of the COI gene in 15 separate faecal pellets and a 15-pellet pool per each of 20 bats. We investigated how diet descriptors were affected by variability among (i) individuals, (ii) pellets of each individual, and (iii) PCRs of each pellet. In addition, we investigated the impact of (iv) analysing separate pellets versus pellet pools. We found that diet diversity estimates increased steadily with the number of pellets analysed per individual, with seven pellets required to detect ~80% of prey species. Most variation in diet composition was associated with differences among individual bats, followed by pellets per individual, and PCRs per pellet. The accuracy of frequency of occurrence estimates increased with the number of pellets analysed per bat, with the highest error rates recorded for prey consumed infrequently by many individuals. Pools provided poor estimates of diet diversity and frequency of occurrence, which were comparable to analysing a single pellet per individual, and consistently missed the less common prey items. Overall, our results stress that maximizing biological replication is critical in dietary metabarcoding studies, and emphasize that analysing several samples per individual rather than pooled samples produce more accurate results.

DNA metabarcoding is a technique used to survey biodiversity in many ecological settings, but there are doubts about whether it can provide quantitative results, i.e. the proportions of each species in the mixture as opposed to a species list. While there are several experimental studies that report quantitative metabarcoding results, there are a similar number that fail to do so. Here we provide the rationale to understand under what circumstances the technique can be quantitative. Basically, we simulate a mixture of DNA of S species with a defined initial abundance distribution. In the simulated PCR, each species increases its concentration following a certain amplification efficiency. The final DNA concentration will reflect the initial one when the efficiency is similar for all species; otherwise, the initial and final DNA concentrations would be poorly related. Although there are many known factors that modulate amplification efficiency, we focused on the number of primer-template mismatches, arguably the most important one. We used 15 common primers pairs targeting the mitochondrial COI region and the mitogenomes of ca. 1200 insect species. The results showed that some primers pairs produced quantitative results under most circumstances, whereas some other primers failed to do so. Many species, and a high diversity within the mixture, helped the metabarcoding to be quantitative. In conclusion, depending on the primer pair used in the PCR amplification and on the characteristics of the mixture analysed (i.e., high species richness, low evenness), DNA metabarcoding can provide a quantitative estimate of the relative abundances of different species.

Marine meiofauna comprises up to 22 phyla. Its morphological identification requires time and taxonomists' expertise, and molecular tools can make this task faster. We aim to disentangle meiofaunal diversity patterns at Araçá Bay by applying a model selection approach and estimating the effectiveness of metabarcoding (18S rDNA) and morphological methods for estimating the response of meiofauna diversity in small-scale interactions with environmental variables. A rarefaction curve indicated that ten samples were sufficient for estimating the total number of meiofauna OTUs in a tidal flat. In both approaches, richness was predicted by mean sand percentage, sediment sorting, and bacteria concentration. Nematode genera composition differed significantly between approaches, the result of taxonomic mismatch in the genetic database. The similarity between the model selected for diversity descriptors, the richness of nematode genera and meiofauna composition emphasized the utility of predictive models for metabarcoding estimates to detect small-scale interactions of these organisms.

Background: In light of the current biodiversity crisis, DNA barcoding is developing into an essential tool to quantify state shifts in global ecosystems. Current barcoding protocols often rely on short amplicon sequences, which yield accurate identification of biological entities in a community, but provide limited phylogenetic resolution across broad taxonomic scales. However, the phylogenetic structure of communities is an essential component of biodiversity. Consequently, a barcoding approach is required that unites robust taxonomic assignment power and high phylogenetic utility. A possible solution is offered by sequencing long ribosomal DNA (rDNA) amplicons on the MinION platform (Oxford Nanopore Technologies). Results: Using a dataset of various animal and plant species, with a focus on arthropods, we assemble a pipeline for long rDNA barcode analysis and introduce a new software (MiniBar) to demultiplex dual indexed nanopore reads. We find excellent phylogenetic and taxonomic resolution offered by long rDNA sequences across broad taxonomic scales. We highlight the simplicity of our approach by field barcoding with a miniaturized, mobile laboratory in a remote rainforest. We also test the utility of long rDNA amplicons for analysis of community diversity through metabarcoding and find that they recover highly skewed diversity estimates. Conclusions: Sequencing dual indexed, long rDNA amplicons on the MinION platform is a straightforward, cost effective, portable and universal approach for eukaryote DNA barcoding. Long rDNA amplicons scale up DNA barcoding by enabling the accurate recovery of taxonomic and phylogenetic diversity. However, bulk community analyses using long-read approaches may introduce biases and will require further exploration.

Background. Knowledge on the globally outstanding Amazonian biodiversity and its environmental determinants stems almost exclusively from aboveground organisms, notably plants. In contrast, the environmental factors and habitat preferences that drive diversity patterns for micro-organisms in the ground remain elusive, despite the fact that micro-organisms constitute the overwhelming majority of life forms in any given location, in terms of both diversity and abundance. Here we address how the diversity and community turnover of operational taxonomic units (OTU) of micro-organisms in soil and litter respond to soil physicochemical properties; whether OTU diversities and community composition in soil and litter are correlated with each other; and whether they respond in a similar way to soil properties. Methods. We used recently inferred OTUs from high-throughput metabarcoding of the 16S (prokaryotes) and 18S (eukaryotes) genes to estimate OTU diversity (OTU richness and effective number of OTUs) and community composition for prokaryotes and eukaryotes in soil and litter across four localities in Brazilian Amazonia. All analyses were run separately for prokaryote and eukaryote OTUs, and for each group using both presence-absence and abundance data. Combining these with novel data on soil chemical and physical properties, we identify abiotic correlates of soil and litter micro-organism diversity and community structure using regression, ordination, and variance partitioning analysis. Results. Soil organic carbon content was the strongest factor explaining OTU diversity (negative correlation) and pH was the strongest factor explaining turnover for prokaryotes and eukaryotes in both soil and litter. We found significant effects also for other soil variables, including both chemical and physical properties. The correlation between OTU diversity in litter and in soil was non-significant for eukaryotes and weak for prokaryotes, suggesting that diversity of in one substrate should not be used as a proxy for diversity in the other. The community compositions of both prokaryotes and eukaryotes were more separated for habitat type than for substrate (soil and litter). Discussion. In spite of the limited sampling (four localities, 39 plots), our results provide a broad-scale view of the physical and chemical correlations of soil and litter biodiversity in a longitudinal transect across the world’s largest rainforest. Our methods help to understand links between soil properties, OTU diversity patterns, and community composition and turnover. The lack of strong correlation between OTU diversity in litter and in soil suggests independence of diversity drives of these substrates and highlights the importance of including both measures in biodiversity assessments. Massive sequencing of soil and litter samples holds the potential to complement traditional biological inventories in advancing our understanding of the factors affecting tropical diversity.

Constructing networks has become an indispensable approach in understanding how different taxa interact. However, methodologies vary widely among studies, potentially limiting our ability to meaningfully compare results. In particular, how network architecture is influenced by the extent to which nodes are resolved to either taxa or taxonomic units is poorly understood. To address this, here we collate nine datasets of ecological interactions, from both observations and DNA metabarcoding, and construct networks under a range of commonly-used node resolutions. We demonstrate that small changes in node resolution can cause wide variation in almost all key metric values, including robustness and nestedness. Moreover, relative values of metrics such as robustness were seen to fluctuate continuously with node resolution, thereby potentially confounding comparisons of networks, as well as interpretations concerning their constituent ecological interactions. These findings highlight the need for care when comparing networks, especially where these differ with respect to node resolution.

Tuesday, June 26, 2018

Three opportunities to learn about metabarcoding

In case you are looking for ways to learn about metabarcoding, there are actually three different courses offered this year. All of them differ in approach, content focus, and venue but what they have in common is the fact that they provide participants with a comprehensive package that enables them make informed decisions when it comes to organizing experiments, field work and analytics.

This program will provide an overview of the state of current technology and the various platforms used. The course consists of a series of online lectures and research exercises introducing different aspects of metabarcoding and environmental DNA research. We will also touch on the suite of bioinformatics tools available for sequence analysis and data interpretation.

This course will focus on eDNA metabarcoding, however targeted single species detection and other alternatives will also be explored, as they can sometimes be suitable metabarcoding alternatives.

The lectures will cover different aspects of DNA metabarcoding. The bioinformatics practicals will introduce data analysis from raw sequences to basic ecological conclusions. The molecular ecology practical will present basic techniques for DNA extraction in the field and DNA amplification by PCR.

Monday, June 25, 2018

DNA barcoding for pollen forecasting

PollerGEN is a group of interdisciplinary researchers funded by NERC to understand grass pollen deposition. We aim to revolutionise the way that pollen is measured, model spatial and temporal deposition from different species of grass pollen and identify linkages to human health.

DNA barcoding of pollen is not a new invention. It is not easy either but has been shown to provide extremely valuable information, e.g. for understanding plant-pollinator interactionshoney bee foraging, or the characterization of honeybee pollen pellets. It should come to no surprise that researchers are also working on an application that intends to improve a forecasting system that has become more and more important for a large portion of the human population - pollen forecasting for hay fever and other allergic reactions.

At this point most forecasts are build using data from a network of pollen traps which operate throughout the main pollen seasons. These traps measure how many pollen grains are present on a daily basis and identifications of species are done using morphology-based methods. The latter is extremely challenging when it comes to species with very uniform appearance, e.g. grasses. However, the species identity often makes a big difference. It is fairly rare that somebody is allergic to all grass pollen but we are having difficulties to tell which pollen in the mix is the culprit.

PollerGen, a project run out of Bangor University wants to change this by using a DNA-based approach. 

The colleagues are now working on a way to detect airborne pollen from different species of allergenic grass. We’re also developing new pollen source maps, and modelling how pollen grains likely move across landscapes, as well as identifying which species are linked with the exacerbation of asthma and hay fever.

We’re going to be using a new UK plant DNA barcode library, as well as environmental genomic technologies to identify complex mixtures of tree and grass pollens from a molecular genetic perspective. By combining this information with detailed source maps and aerobiological modelling, we hope to redefine how pollen forecasts are measured and reported in the future.

We have just started the third year of pollen collection and hope to road test the combined forecasting methods over the next year. In the long run, our vision is to be able to provide specific pollen forecasts for grass, and unravel which species of grass pollen are most likely causing allergic responses. More broadly, we also want to provide information to healthcare professionals and charities, who can translate this information to help pollen allergy sufferers live healthier and more productive lives.

Pretty cool.

Friday, June 22, 2018

Weekend reads

Here we go again, another week has passed quickly. Light on posting, mainly because I had some days off and no chance to do digging for blog posts. Nevertheless, here your weekly dose of interesting papers. Really good stuff.

Genetic taxonomic assignment can be more sensitive than morphological taxonomic assignment, particularly for small, cryptic or rare species. Sequence processing is essential to taxonomic assignment, but can also produce errors because optimal parameters are not known a priori. Here, we explored how sequence processing parameters influence taxonomic assignment of 18S sequences from bulk zooplankton samples produced by 454 pyrosequencing. We optimized a sequence processing pipeline for two common research goals, estimation of species richness and early detection of aquatic invasive species (AIS), and then tested most optimal models' performances through simulations. We tested 1,050 parameter sets on 18S sequences from 20 AIS to determine optimal parameters for each research goal. We tested optimized pipelines' performances (detectability and sensitivity) by computationally inoculating sequences of 20 AIS into ten bulk zooplankton samples from ports across Canada. We found that optimal parameter selection generally depends on the research goal. However, regardless of research goal, we found that metazoan 18S sequences produced by 454 pyrosequencing should be trimmed to 375-400 bp and sequence quality filtering should be relaxed (1.5 ≤ maximum expected error ≤ 3.0, Phred score = 10). Clustering and denoising were only viable for estimating species richness, because these processing steps made some species undetectable at low sequence abundances which would not be useful for early detection of AIS. With parameter sets optimized for early detection of AIS, 90% of AIS were detected with fewer than 11 target sequences, regardless of whether clustering or denoising was used. Despite developments in next-generation sequencing, sequence processing remains an important issue owing to difficulties in balancing false-positive and false-negative errors in metabarcoding data.

DNA metabarcoding has been introduced as a revolutionary way to identify organisms and monitor ecosystems. However, the potential of this approach for biomonitoring remains partially unfulfilled because a significant part of the sampled DNA cannot be affiliated to species due to incomplete reference libraries. Thus, biotic indices which are based on the estimated abundances of species in a community and their ecological profiles can be inaccurate. We propose to compute biotic indices using phylogenetic imputation of OTUs' ecological profiles (OTU-PITI approach). Firstly, OTUs sequences are inserted within a reference phylogeny. Secondly, OTUs' ecological profiles are estimated on the basis of their phylogenetic relationships with reference species whose ecology is known. Based on these ecological profiles, biotic indices can be computed using all available OTUs. Using freshwater diatoms as a case study, we show that short DNA barcodes can be placed accurately within a phylogeny and their ecological preferences estimated with a satisfactory level of precision. In light of these results, we tested the approach with a dataset of 139 environmental samples of benthic river diatoms for which the same biotic index (IPS) was calculated using (i) traditional microscopy, (ii) OTUs with taxonomic assignment approach, (iii) OTUs with phylogenetic estimation of ecological profiles (OTU-PITI), and (iv) OTU with taxonomic assignment completed by the phylogenetic approach (OTU-PITI) for unclassified OTUs. Using traditional microscopy as a reference, we found that the combination of the OTUs' taxonomic assignment completed by the phylogenetic method performed satisfactorily and substantially better than the other methods tested.

BACKGROUND: High throughput DNA sequencing of bulk invertebrate samples or metabarcoding is becoming increasingly used to provide profiles of biological communities for environmental monitoring. As metabarcoding becomes more widely applied, new reference DNA barcodes linked to individual specimens identified by taxonomists are needed. This can be achieved through using DNA extraction methods that are not only suitable for metabarcoding but also for building reference DNA barcode libraries.
METHODS: In this study, we test the suitability of a rapid non-destructive DNA extraction method for metabarcoding of freshwater invertebrate samples.
RESULTS: This method resulted in detection of taxa from many taxonomic groups, comparable to results obtained with two other tissue-based extraction methods. Most taxa could also be successfully used for subsequent individual-based DNA barcoding and taxonomic identification. The method was successfully applied to field-collected invertebrate samples stored for taxonomic studies in 70% ethanol at room temperature, a commonly used storage method for freshwater samples.
DISCUSSION: With further refinement and testing, non-destructive extraction has the potential to rapidly characterise species biodiversity in invertebrate samples, while preserving specimens for taxonomic investigation.

Marine plankton populate 70% of Earth's surface, providing the energy that fuels ocean food webs and contributing to global biogeochemical cycles. Plankton communities are extremely diverse and geographically variable, and are overwhelmingly composed of low-abundance species. The role of this rare biosphere and its ecological underpinnings are however still unclear. Here, we analyse the extensive dataset generated by the Tara Oceans expedition for marine microbial eukaryotes (protists) and use an adaptive algorithm to explore how metabarcoding-based abundance distributions vary across plankton communities in the global ocean. We show that the decay in abundance of non-dominant operational taxonomic units, which comprise over 99% of local richness, is commonly governed by a power-law. Despite the high spatial turnover in species composition, the power-law exponent varies by less than 10% across locations and shows no biogeographical signature, but is weakly modulated by cell size. Such striking regularity suggests that the assembly of plankton communities in the dynamic and highly variable ocean environment is governed by large-scale ubiquitous processes. Understanding their origin and impact on plankton ecology will be important for evaluating the resilience of marine biodiversity in a changing ocean.

MOTIVATION: Correct taxonomic identification of DNA sequences is central to studies of biodiversity using both shotgun metagenomic and metabarcoding approaches. However, no genetic marker gives sufficient performance across all the biological kingdoms, hampering studies of taxonomic diversity in many groups of organisms. This has led to the adoption of a range of genetic markers for DNA metabarcoding. While many taxonomic classification software tools can be re-trained on these genetic markers, they are often designed with assumptions that impair their utility on genes other than the SSU and LSU rRNA. Here, we present an update to Metaxa2 that enables the use of any genetic marker for taxonomic classification of metagenome and amplicon sequence data.
RESULTS: We evaluated the Metaxa2 Database Builder on eleven commonly used barcoding regions and found that while there are wide differences in performance between different genetic markers, our software performs satisfactorily provided that the input taxonomy and sequence data are of high quality.
AVAILABILITY: Freely available on the web as part of the Metaxa2 package at http://microbiology.se/software/metaxa2/.

BACKGROUND: The world's herbaria contain millions of specimens, collected and named by thousands of researchers, over hundreds of years. However, this treasure has remained largely inaccessible to genetic studies, because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates.
RESULTS: As a practical test of routine recovery of rDNA and plastid genome sequences from herbarium specimens, we sequenced 25 herbarium specimens up to 80 years old from 16 different Angiosperm families. Paired-end reads were generated, yielding successful plastid genome assemblies for 23 species and nuclear rDNAs for 24 species, respectively. These data showed that genome skimming can be used to generate genomic information from herbarium specimens as old as 80 years and using as little as 500 pg of degraded starting DNA.
CONCLUSIONS: The routine plastome sequencing from herbarium specimens is feasible and cost-effective (compare with Sanger sequencing or plastome-enrichment approaches), and can be performed with limited sample destruction.

The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. Over 2.5 million COI sequences were found in GenBank, half of which were fully identified to the species rank. From 2003 to 2017, the number of COI Eukaryote records deposited has grown by two orders of magnitude representing a nearly 42-fold increase in unique species. For fully identified records, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing sequences. COI metabarcoders are normally considered consumers of taxonomic data. Here we discuss the potential for taxonomists to reverse this pattern and instead mine metabarcode data to guide species discovery. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.

Thursday, June 14, 2018


Ever seen anything in relation to the hashtag #BadStockPhotosOfMyJob? If not you should check out Twitter or search for it on Google because it really shows some ridiculously funny photos that exhibit some of the worst stereotypes people have when thinking about other's jobs. Especially the perception of what scientists do is almost tragic. I thought its a good idea to show a few examples including ironic comments by the real scientists. It's funny indeed but sometimes also just sad to see what others think we scientists do for a living.


  I have no words for those four.

Wednesday, June 13, 2018

Interview with a vampire

In this study, we show for the first time that it is possible to use DNA meta-barcoding to generate data on both diet and the predator's population structure. And we more or less get this additional information for free because the vampire bat's DNA is found in the DNA that we extract from blood meal and faecal samples

When the sun sets in South and Central America, the vampire bats wake up and fly out in search of prey. The vampire bat's diet consists of blood. It prefers to feed on domestic animals such as cows and pigs, but when it does so, there is a risk of transmitting pathogens such as rabies. In order to control rabies transmitted by vampire bats, it is crucial to have a method that allows large-scale assessment of vampire bat prey. A study published back in April led by researchers from Denmark and the UK, shows that metabarcoding can do just that.

The colleagues analysed vampire bat blood meal and faecal samples collected in Peru, along the coast, in the Andes and in the Amazon. In diet studies, the metabarcoding is normally only used to assess diet, but in this study, the researchers went one step further and gathered information on the vampire bat's population structure. The latter is an approach very similar to work my group has been doing in collaboration with researchers in Germany. This 'free of charge' data can help researchers understand how the landscape influences the connectivity of vampire bat populations, which could influence the spread of pathogens. 

We are slowly beginning to understand that all the metabarcoding data we generate to better understand community composition of a given environment contains several layers of information. It is perhaps much richer than an OTU table. That being said it is an entire different story on how to release let alone disentangle all that information.

It is great to gain insight into both predator and prey from DNA in droppings and blood meals. Apart from feeding on domestic animals, vampire bats occasionally took blood from wild tapirs, so the method may be useful for determining the distribution of elusive mammal prey. It is also of note that we found no evidence of vampire bats feeding on humans from the DNA left over from their dinners.

Tuesday, June 12, 2018

Citizen science vs giant slugs

Citizen science is a powerful tool to combat the challenges created by invasive species. Our study emphasizes the importance of collaborations between researchers, government administration, and citizen volunteers. 

The giant slug Limax maximus is an invasive species which made its way from northern Europe all the way to Japan and other regions of the world. It is a notorious pest of horticultural and agricultural crops. 

Recently a Japanese research team found that a certain set of weather conditions could be a reliable short-term indicator of how often giant slugs would appear on a set mountain path. The findings showed that the slugs were more likely to appear on days with higher humidity, lower windspeed and lower precipitation than the 20-year average. These observations can be used to predict future  outbreaks of the pest. 

This study was actually made possible by citizen science. In order to survey the number of slugs present on the mountain path chosen for the study (Mt. Maruyama route, in Sapporo, Japan), a volunteer naturalist hiked the path at 5:00 AM nearly every day for two years. The colleagues collected weather data obtained from a nearby meteorological station and combined them with observational data to calculate correlations between slug appearances and complex weather conditions.

Friday, June 8, 2018

Weekend readings

Need some readings for a sunny weekend? Not enough papers on the pile on your desk? Here is a solution for you. A couple of interesting journal articles I came across this week. Enjoy.

The genus Amara Bonelli, 1810 is a very speciose and taxonomically difficult genus of the Carabidae. The identification of many of the species is accomplished with considerable difficulty, in particular for females and immature stages. In this study the effectiveness of DNA barcoding, the most popular method for molecular species identification, was examined to discriminate various species of this genus from Central Europe. DNA barcodes from 690 individuals and 47 species were analysed, including sequences from previous studies and more than 350 newly generated DNA barcodes. Our analysis revealed unique BINs for 38 species (81%). Interspecific K2P distances below 2.2% were found for three species pairs and one species trio, including haplotype sharing between Amara alpina/Amara torrida and Amara communis/Amara convexior/Amara makolskii. This study represents another step in generating an extensive reference library of DNA barcodes for carabids, highly valuable bioindicators for characterizing disturbances in various habitats.

The correct identification of species in the highly divergent group of plants is crucial for several forensic investigations. Previous works had difficulties in the establishment of a rapid and robust method for the identification of plants. For instance, DNA barcoding requires the analysis of two or three different genomic regions to attain reasonable levels of discrimination. Therefore, new methods for the molecular identification of plants are clearly needed. Here we tested the utility of variable-length sequences in the chloroplast DNA (cpDNA) as a way to identify plant species. The SPInDel (Species Identification by Insertions/Deletions) approach targets hypervariable genomic regions that contain multiple insertions/deletions (indels) and length variability, which are found interspersed with highly conserved regions. The combination of fragment lengths defines a unique numeric profile for each species, allowing its identification. We analysed more than 44,000 sequences retrieved from public databases belonging to 206 different plant families. Four target regions were identified as suitable for the SPInDel concept: atpF-atpH, psbA-trnH, trnL CD and trnL GH. When considered alone, the discrimination power of each region was low, varying from 5.18% (trnL GH) to 42.54% (trnL CD). However, the discrimination power reached more than 90% when the length of some of these regions is combined. We also observed low diversity in intraspecific data sets for all target regions, suggesting they can be used for identification purposes. Our results demonstrate the utility of the SPInDel concept for the identification of plants.

Environmental DNA (eDNA) metabarcoding has been increasingly applied to biodiversity surveys in stream ecosystems. In stream networks, the accuracy of eDNA-based biodiversity assessment depends on whether the upstream eDNA influx affects downstream detection. Biodiversity assessment in low-discharge streams should be less influenced by eDNA transport than in high-discharge streams. We estimated α- and β-diversity of the fish community from eDNA samples collected in a small Michigan (USA) stream from its headwaters to its confluence with a larger river. We found that α-diversity increased from upstream to downstream and, as predicted, we found a significant positive correlation between β-diversity and physical distance (stream length) between locations indicating species turnover along the longitudinal stream gradient. Sample replicates and different genetic markers showed similar species composition, supporting the consistency of the eDNA metabarcoding approach to estimate α- and β-diversity of fishes in low-discharge streams.

The use of environmental DNA (eDNA) has become an applicable non-invasive tool with which to obtain information about biodiversity. A sub-discipline of eDNA is iDNA (invertebrate-derived DNA), where genetic material ingested by invertebrates is used to characterise the biodiversity of the species that served as hosts. While promising, these techniques are still in their infancy, as they have only been explored on limited numbers of samples from only a single or a few different locations. In this study, we investigate the suitability of iDNA extracted from more than 3,000 haematophagous terrestrial leeches as a tool for detecting a wide range of terrestrial vertebrates across five different geographical regions on three different continents. These regions cover almost the full geographical range of haematophagous terrestrial leeches, thus representing all parts of the world where this method might apply. We identify host taxa through metabarcoding coupled with high-throughput sequencing on Illumina and IonTorrent sequencing platforms to decrease economic costs and workload and thereby make the approach attractive for practitioners in conservation management. We identified hosts in four different taxonomic vertebrate classes: mammals, birds, reptiles, and amphibians, belonging to at least 42 different taxonomic families. We find that vertebrate blood ingested by haematophagous terrestrial leeches throughout their distribution is a viable source of DNA with which to examine a wide range of vertebrates. Thus, this study provides encouraging support for the potential of haematophagous terrestrial leeches as a tool for detecting and monitoring terrestrial vertebrate biodiversity.

Advances in DNA sequencing technology have revolutionised the field of molecular analysis of trophic interactions and it is now possible to recover counts of food DNA sequences from a wide range of dietary samples. But what do these counts mean? To obtain an accurate estimate of a consumer's diet should we work strictly with datasets summarising frequency of occurrence of different food taxa, or is it possible to use relative number of sequences? Both approaches are applied to obtain semi-quantitative diet summaries, but occurrence data is often promoted as a more conservative and reliable option due to taxa-specific biases in recovery of sequences. We explore representative dietary metabarcoding datasets and point out that diet summaries based on occurrence data often overestimate the importance of food consumed in small quantities (potentially including low-level contaminants) and are sensitive to the count threshold used to define an occurrence. Our simulations indicate that using relative read abundance (RRA) information often provide a more accurate view of population-level diet even with moderate recovery biases incorporated; however, RRA summaries are sensitive to recovery biases impacting common diet taxa. Both approaches are more accurate when the mean number of food taxa in samples is small. The ideas presented here highlight the need to consider all sources of bias and to justify the methods used to interpret count data in dietary metabarcoding studies. We encourage researchers to continue addressing methodological challenges, and acknowledge unanswered questions to help spur future investigations in this rapidly developing area of research.

DNA metabarcoding is a rapidly growing technique for obtaining detailed dietary information. Current metabarcoding methods for herbivory, using a single locus, can lack taxonomic resolution for some applications. We present novel primers for the second internal transcribed spacer of nuclear ribosomal DNA (ITS2) designed for dietary studies in Mauritius and the UK, which have the potential to give unrivalled taxonomic coverage and resolution from a short-amplicon barcode. In silico testing used three databases of plant ITS2 sequences from UK and Mauritian floras (native and introduced) totalling 6561 sequences from 1790 species across 174 families. Our primers were well-matched in silico to 88% of species, providing taxonomic resolution of 86.1%, 99.4% and 99.9% at the species, genus and family levels, respectively. In vitro, the primers amplified 99% of Mauritian (n = 169) and 100% of UK (n = 33) species, and co-amplified multiple plant species from degraded faecal DNA from reptiles and birds in two case studies. For the ITS2 region, we advocate taxonomic assignment based on best sequence match instead of a clustering approach. With short amplicons of 187-387 bp, these primers are suitable for metabarcoding plant DNA from faecal samples, across a broad geographic range, whilst delivering unparalleled taxonomic resolution.

The implementation of HTS (high-throughput sequencing) approaches is rapidly changing our understanding of the lichen symbiosis, by uncovering high bacterial and fungal diversity, which is often host-specific. Recently, HTS methods revealed the presence of multiple photobionts inside a single thallus in several lichen species. This differs from Sanger technology, which typically yields a single, unambiguous algal sequence per individual. Here we compared HTS and Sanger methods for estimating the diversity of green algal symbionts within lichen thalli using 240 lichen individuals belonging to two species of lichen-forming fungi. According to HTS data, Sanger technology consistently yielded the most abundant photobiont sequence in the sample. However, if the second most abundant photobiont exceeded 30% of the total HTS reads in a sample, Sanger sequencing generally failed. Our results suggest that most lichen individuals in the two analyzed species, Lasallia hispanica and L. pustulata, indeed contain a single, predominant green algal photobiont. We conclude that Sanger sequencing is a valid approach to detect the dominant photobionts in lichen individuals and populations. We discuss which research areas in lichen ecology and evolution will continue to benefit from Sanger sequencing, and which areas will profit from HTS approaches to assessing symbiont diversity.