Tuesday, January 24, 2017

Tree identification by laser scanning

LIDAR, which stands for Light Detection and Ranging, is a remote sensing method that uses light in the form of a pulsed laser to measure variable distances. 

LiDAR provides measurements of the horizontal and vertical vegetation structure of ecosystems. The light pulses generate precise, three-dimensional information that, alone, or in combination with satellite multispectral images, allows to automatically and accurately predict forest characteristics, such as tree height, single tree detection, stem diameter, basal area, stem volume, biomass etc.

A joint research project by the Tampere University of Technology's mathematics laboratory and the Natural Resources Institute Finland has developed a new method of identifying tree species based on laser scanning measurements. 

With their method, individual trees can be extracted from forest plot level point cloud data, and the structure of their crowns can be reconstructed as comprehensive 3D models. The created tree models consist of consecutive cylinders, which determine the structure of the tree stem and branches as well as the branching structure.

Previously, it was possible to make a rough distinction between the stem and the crown, based on the point cloud. Now, we are able to make out individual branches and analyse the characteristics of their diameters, volumes and branch angles.

For species identification, the researchers defined 15 classification features, the values of which were then calculated for each tree. Some of these features are completely new and some have been used in previous studies. The new aspect is that now their value can be calculated more accurately, as the colleagues were able to utilize information about the tree's entire crown. They tested three of the most common tree species in Finland, birch, pine and spruce, but they already plan to extend their test set to more species.

According to our results, automatic species recognition is possible with more than 95% accuracy. The purpose was not to find the best possible combination of features, but only to prove that classification based on detailed tree models is possible. However, several combinations produced good results and all the classification methods had a maximum accuracy over 95%. The results also showed that just 30 trees per species is enough learning material for the classification.

Future tests will also include measurements taken from more diverse forests. The tree models calculated based on the laser scanning data will be stored in a database, which can be accessed for even more accurate species recognition when the number of included samples grows.

Monday, January 23, 2017

Monday reads

Welcome to another week. This weekend saw the birth of 'alternative facts', an expression most people better know as 'falsehood', but never mind. We also witnessed a very powerful expression of democracy in the Women's march on Washington and all the Sister marches around the world.

Its Monday, time for science again, so here are some good reads for the week.

Alert! Shameless self-promoting to follow. The first two reads include me as co-author. Number one is the result of collaborative work of colleagues around the Mediterranean Sea.
Cartilaginous fish are particularly vulnerable to anthropogenic stressors and environmental change because of their K-selected reproductive strategy. Accurate data from scientific surveys and landings are essential to assess conservation status and to develop robust protection and management plans. Currently available data are often incomplete or incorrect as a result of inaccurate species identifications, due to a high level of morphological stasis, especially among closely related taxa. Moreover, several diagnostic characters clearly visible in adult specimens are less evident in juveniles. Here we present results generated by the ELASMOMED Consortium, a regional network aiming to sample and DNA-barcode the Mediterranean Chondrichthyans with the ultimate goal to provide a comprehensive DNA barcode reference library. This library will support and improve the molecular taxonomy of this group and the effectiveness of management and conservation measures. We successfully barcoded 882 individuals belonging to 42 species (17 sharks, 24 batoids and one chimaera), including four endemic and several threatened ones. Morphological misidentifications were found across most orders, further confirming the need for a comprehensive DNA barcoding library as a valuable tool for the reliable identification of specimens in support of taxonomist who are reviewing current identification keys. Despite low intraspecific variation among their barcode sequences and reduced samples size, five species showed preliminary evidence of phylogeographic structure. Overall, the ELASMOMED initiative further emphasizes the key role accurate DNA barcoding libraries play in establishing reliable diagnostic species specific features in otherwise taxonomically problematic groups for biodiversity management and conservation actions.

Number two is the result of some follow-up emails to a blog post I wrote a while ago.
no abstract as it is a commentary paper

And now to all the good ones.
Climate change may result in ecological futures with novel species assemblages, trophic mismatch, and mass extinction. Alaska has a limited taxonomic workforce to address these changes. We are building a DNA barcode library to facilitate a metabarcoding approach to monitoring non-marine arthropods. Working with the Canadian Centre for DNA Barcoding, we obtained DNA barcodes from recently collected and authoritatively identified specimens in the University of Alaska Museum (UAM) Insect Collection and the Kenai National Wildlife Refuge collection. We submitted tissues from 4776 specimens, of which 81% yielded DNA barcodes representing 1662 species and 1788 Barcode Index Numbers (BINs), of primarily terrestrial, large-bodied arthropods. This represents 84% of the species available for DNA barcoding in the UAM Insect Collection. There are now 4020 Alaskan arthropod species represented by DNA barcodes, after including all records in Barcode of Life Data Systems (BOLD) of species that occur in Alaska - i.e., 48.5% of the 8277 Alaskan, non-marine-arthropod, named species have associated DNA barcodes. An assessment of the identification power of the library in its current state yielded fewer species-level identifications than expected, but the results were not discouraging. We believe we are the first to deliberately begin development of a DNA barcode library of the entire arthropod fauna for a North American state or province. Although far from complete, this library will become increasingly valuable as more species are added and costs to obtain DNA sequences fall.

DNA metabarcoding is a promising approach for rapidly surveying biodiversity and is likely to become an important tool for measuring ecosystem responses to environmental change. Metabarcoding markers need sufficient taxonomic coverage to detect groups of interest, sufficient sequence divergence to resolve species, and will ideally indicate relative abundance of taxa present. We characterized zooplankton assemblages with three different metabarcoding markers (nuclear 18S rDNA, mitochondrial COI, and mitochondrial 16S rDNA) to compare their performance in terms of taxonomic coverage, taxonomic resolution, and correspondence between morphology- and DNA-based identification. COI amplicons sequenced on separate runs showed that operational taxonomic units representing >0.1% of reads per sample were highly reproducible, although slightly more taxa were detected using a lower annealing temperature. Mitochondrial COI and nuclear 18S showed similar taxonomic coverage across zooplankton phyla. However, mitochondrial COI resolved up to threefold more taxa to species compared to 18S. All markers revealed similar patterns of beta-diversity, although different taxa were identified as the greatest contributors to these patterns for 18S. For calanoid copepod families, all markers displayed a positive relationship between biomass and sequence reads, although the relationship was typically strongest for 18S. The use of COI for metabarcoding has been questioned due to lack of conserved primer-binding sites. However, our results show the taxonomic coverage and resolution provided by degenerate COI primers, combined with a comparatively well-developed reference sequence database, make them valuable metabarcoding markers for biodiversity assessment.

Understanding the diversity and composition of species assemblages and identifying underlying biotic and abiotic determinants represent great ecological challenges. Addressing some of these issues, we investigated the α-diversity and phylogenetic composition of species-rich geometrid moth (Lepidoptera: Geometridae) assemblages in the mature temperate forest on Changbai Mountain. A total of 9285 geometrid moths representing 131 species were collected, with many species displaying wide elevational distribution ranges. Moth α-diversity decreased monotonously, while the standardized effect size of mean pairwise phylogenetic distances (MPD) and phylogenetic diversity (PD) increased significantly with increasing elevation. At high elevations, the insect assemblages consisted largely of habitat generalists that were individually more phylogenetically distinct from co-occurring species than species in assemblages at lower altitudes. This could hint at higher speciation rates in more favourable low-elevation environments generating a species-rich geometrid assemblage, while exclusion of phylogenetically closely related species becomes increasingly important in shaping moth assemblages at higher elevations. Overall, it appears likely that high-elevation temperate moth assemblages are strongly resilient to environmental change, and that they contain a much larger proportion of the genetic diversity encountered at low-elevation assemblages in comparison to tropical geometrid communities.

...and some bioinformatics
DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way.
We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe.
DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.

This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of > 60 in the additional read. Corrections are only available during system training. Developing the system, nearly 850 000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of < 1%. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79% of N-labels from COI (animal barcode); 80% from matK and rbcL (plant barcodes); and 58% from non-protein-coding sequences (across eukaryotes).

Friday, January 20, 2017

Metabarcoding and Metagenomics - a new journal, have your say

Some of you might have heard about the EU-COST Action DNAqua-Net. It is based on a fundamental problem in morpho-taxonomic approaches of aquatic biodiversity assessment and monitoring. These have many drawbacks such as being time consuming, limited in temporal and spatial resolution, and error-prone due to variation of individual taxonomic expertise of analysts. Novel genomic tools could circumvent many of the aforementioned problems and thereby complement traditional strategies. Yet, a plethora of protocols are independently developed in different institutions, thereby hampering any concerted routine application.

DNAqua-Net brings together researchers across disciplines in order to identify gold-standard genomic tools (aka barcodes) and novel eco-genomic indices and metrics for routine application for biodiversity assessments and biomonitoring of European water bodies. Furthermore, DNAqua-Net will provide a platform for training of the coming generation of researchers preparing them for the new technologies. Jointly with water managers, politicians and other stakeholders, the group will develop a conceptual framework for the standard application of eco-genomic tools as part of legally binding assessments, and here we are talking about things like the EU-Water Framework Directive and the Marine Strategy Framework Directive.

I am a co-proposer of the project and representative for Canada. The project's core focus is of course the EU but the project leads wanted to ensure that their efforts are embedded in the international community. After all we are talking about a shift in the way we do biomonitoring in general. 

The project is only a few months old, the kick-off conference will happen in March but the group is already very active. Together with Pensoft Publishers the team is planning to launch a new journal, named Metabarcoding and Metagenomics. The journal would feature a set of tailored publication types to address the needs of the community. The hope is that it will serve as a community-organizing focal point far beyond the scope and duration of the DNAqua-Net project.

In order to turn this idea into a perfect fit for the needs of the metabarcoding and metagenomic community, the team is currently soliciting opinions on the features that they envision. 

They would greatly appreciate if you take a few minutes for a short Questionnaire to provide them with your feedback and ideas.

Thursday, January 19, 2017

More sticklebacks through climate change?

It has been speculated that conditions brought on by a warming climate may allow animals to breed more often in a single year. However, this has only been empirically shown in insects. The problem is that such predictions and vulnerability assessments require comprehensive and high-quality long-term datasets which are not widely available. But if such long term studies have been done they can deliver some exciting and perhaps concerning results.

A good example is a new study that for the first time documented multiple breeding cycles for fish in a single season due to climate change. Research conducted by the University of Washington showed that one of Alaska's most abundant freshwater fish species, the three-spined stickleback (Gasterosteus aculeatus) is altering its breeding patterns in response to earlier spring ice breakup and longer ice-free summers.

The data were collected from 1963 to 2015 in Alaska's Lake Aleknagik, home to one of the University of Washington's Alaska Salmon Program research stations. For 52 years, the abundance of juvenile sockeye salmon and other fish that live in the region's freshwater lakes was recorded by capturing fish along the lakeshore at 10 different sites every seven days between June and September. All fish were identified and measured.

While the program's monitoring was designed to track the commercially important sockeye salmon population, scientists also meticulously recorded every other fish present, including three-spine stickleback. Stickleback represent almost half of the fish found in Lake Aleknagik, with juvenile sockeye salmon nearly matching that percentage. Three-spine stickleback make up a large percentage of the fish communities in many northern lakes, so these findings could be relevant throughout the region. 

Stickleback are born near the shore, then move to the middle of the lake to feed on zooplankton. Adults return to the shore in the summer to spawn; males will build the nest and attract a female, who then lays the eggs. Males guard the nest until the fish hatch, usually after about two weeks. This behaviour made them great study objects for fish nerds like me. I remember keeping some of them in a tank at home fascinated by textbook knowledge becoming reality in front of my eyes.

By analyzing decades of data showing fish sizes throughout each summer, the colleagues could determine roughly when certain fish were born -  a larger fish captured in August was indicative of an early season brood, while a smaller fish captured on the same day likely came from a brood that hatched later in the summer. Using these data and additional environmental data, they found that the fish spawned earlier in years when ice breakup occurred earlier, and in some years, the fish produced more than one brood. Given the short summers in Alaska, most stickleback have time and stamina for only one brood, but increasingly they are rearing two broods a summer as climate change ushers in earlier springs.

If stickleback are increasing in abundance because of their modified reproduction strategy, this can have ecosystem implications for the productivity of species we commercially care about, like sockeye salmon. We don't know exactly what this means for demographics of this species. It could also mean that fish are living shorter lives because there's a higher physiological cost to breeding more than once. In the lower-latitude extent of their range, fish mature earlier and die earlier.

Tuesday, January 17, 2017

From the Inbox: BioSyst.EU conference Barcoding Symposium

Dear colleagues, I would like to draw your attention to the upcoming BioSyst.EU conference in August 2017 in Sweden, Gothenburg, where we will organize a half-day symposium on “DNA-barcoding and the future of biodiversity monitoring”, with a special focus on metabarcoding approaches for biodiversity monitoring. We are aiming at bringing together the experts on this topic and would like to cover as many aspects as possible in a balanced fashion - given the time limitation for approx. 1 plenary talk and 8 regular talks. There is of course room for poster contributions as well. Registration for presentations closes at February 1st. It would be great if you let us know if you are interested in bringing the audience up to date with the latest findings from your field of research. 

Application for presenters is done by sending an e-mail . The e-mail should contain the following information: -Which symposium you would prefer to be assigned to -The title of your presentation -If it is an Oral presentation or Poster -An abstract in either doc/pdf/txt -format. Please find further details on the conference here.

Monday, January 16, 2017

Monday reads

Another week, another suite of interesting papers to read.

I wrote about this one in a separate blog post:

Their relatively slow rates of molecular evolution, as well as frequent exposure to hybridization and introgression, often make it difficult to discriminate species of vascular plants with the standard barcode markers (rbcL, matK, ITS2). Previous studies have examined these constraints in narrow geographic or taxonomic contexts, but the present investigation expands analysis to consider the performance of these gene regions in discriminating the species in local floras at sites across Canada. To test identification success, we employed a DNA barcode reference library with sequence records for 96% of the 5108 vascular plant species known from Canada, but coverage varied from 94% for rbcL to 60% for ITS2 and 39% for matK. Using plant lists from 27 national parks and one scientific reserve, we tested the efficacy of DNA barcodes in identifying the plants in simulated species assemblages from six biogeographic regions of Canada using BLAST and mothur. Mean pairwise distance (MPD) and mean nearest taxon distance (MNTD) were strong predictors of barcode performance for different plant families and genera, and both metrics supported ITS2 as possessing the highest genetic diversity. All three genes performed strongly in assigning the taxa present in local floras to the correct genus with values ranging from 91% for rbcL to 97% for ITS2 and 98% for matK. However, matK delivered the highest species discrimination (~81%) followed by ITS2 (~72%) and rbcL (~44%). Despite the low number of plant taxa in the Canadian Arctic, DNA barcodes had the least success in discriminating species from this biogeographic region with resolution ranging from 36% with rbcL to 69% with matK. Species resolution was higher in the other settings, peaking in the Woodland region at 52% for rbcL and 87% for matK. Our results indicate that DNA barcoding is very effective in identifying Canadian plants to a genus, and that it performs well in discriminating species in regions where floristic diversity is highest.

Ornamental horticulture has been identified as an important threat to plant biodiversity and the major pathway for plant invasions worldwide. In this context, the family Cactaceae is particularly interesting and challenging for three main reasons-it is considered the fifth most threatened major taxonomic group in the world; several cactus species are amongst the most widespread and damaging invasive species; and Cactaceae is one of the most popular horticultural plant groups. Based on CITES trade data and the eleven main auction sites selling cacti on the internet we document a substantial global trade from and to almost all continents. While less than 20 % of this trade involves threatened species, and less than 3% involves known invasive species, many species are sold without a valid scientific name. Importantly, however, hardly any of the globally traded cacti are collected from wild populations. In order to provide an in-depth look at the dynamics of the industry, we surveyed the businesses involved in the cactus trade in South Africa (one of the main hotspots of cactus trade and invasions). Despite a large commercial network, all South African imports (of which only 15 % and 1.5 % were of species listed as threatened and invasive, respectively) came from the same source. We purchased seeds of every available species and, based on DNA-barcoding techniques, could only identify 24 % of the species to genus level. If trade restrictions are placed on the small proportion of cacti that are invasive and there is no major increase in harvesting of native populations, the commercial cactus horticultural trade will pose a negligible environmental threat. However, there are currently no effective methods for easily identifying which cacti are traded, and both the illicit harvesting of cacti from the wild and the informal trade in invasive taxa pose on-going conservation challenges.

This one made some news headlines:
Seafood mislabeling is common in both domestic and international markets. Previous studies on seafood fraud often report high rates of mislabeling (e.g. >70%), but these studies have been limited to a single sampling year, making it difficult to assess the impact of stricter governmental truth-in-labeling regulations. This study uses DNA barcoding to assess seafood mislabeling in Los Angeles over a four-year period. Sushi restaurants had a consistently high percentage of mislabeling (47%) from 2012 to 2015, yet mislabeling was not homogenous across species. Menu-listed halibut, red snapper, yellowfin tuna, and yellowtail had consistently high occurrences of mislabeling, whereas mislabeling of salmon and mackerel were typically low. All sampled sushi restaurants had at least one case of mislabeling. Mislabeling of sushi-grade fish from high-end grocers was also identified in red snapper, yellowfin tuna, and yellowtail, but at a slightly lower frequency (42%) than sushi restaurants. Results show that despite increased regulatory measures and media attention, seafood mislabeling continues to be prevalent. 

Fungal pathogens severely impact global food and fibre crop security. Fungal species that cause plant diseases have mostly been recognized based on their morphology. In general, morphological descriptions remain disconnected from crucially important knowledge such as mating types, host specificity, life cycle stages and population structures. The majority of current fungal species descriptions lack even the most basic genetic data that could address at least some of these issues. Such information is essential for accurate fungal identifications, to link critical metadata and to understand the real and potential impact of fungal pathogens on production and natural ecosystems. Because international trade in plant products and introduction of pathogens to new areas is likely to continue, the manner in which fungal pathogens are identified should urgently be reconsidered. The technologies that would provide appropriate information for biosecurity and quarantine already exist, yet the scientific community and the regulatory authorities are slow to embrace them. International agreements are urgently needed to enforce new guidelines for describing plant pathogenic fungi (including key DNA information), to ensure availability of relevant data and to modernize the phytosanitary systems that must deal with the risks relating to trade-associated plant pathogens.This article is part of the themed issue 'Tackling emerging fungal threats to animal health, food security and ecosystem resilience'.

Wednesday, January 11, 2017

Reference Library for Canadian vascular plants

Just finished reading a paper by some of my colleagues here at the institute. They summarize the results of a large study which employed a DNA barcode library for the vascular plants of Canada to determine the method with the best species resolution and the barcode marker (rbcL, matK, ITS2) with the highest performance. 

The colleagues build a barcode reference library for 4923 of the 5108 species of non-hybrid origin (~96%) with coverage for all 1153 genera and 171 families in the Database of Vascular Plants of Canada. Of course coverage for the three markers differs. The rbcL dataset is most complete with almost 94% coverage. The ITS2 library includes almost 60% of the species and the matK dataset 39%. Overall, 78% of the species possess records for some combination of two markers, but only 1074 species (22%) have data for all three. Despite such gaps the results are more than promising and certainly very impressive. For almost all vascular plants in Canada the library contains barcode sequences for at least one marker and given their individual effectiveness it is possible to make species and genus assignments at a considerable level:

Analyses based on this library indicate that any one of the three barcode regions is very effective (>90%) in delivering a generic assignment while species resolution is often possible with ITS2 (72%) and matK (80%). BLAST demonstrated higher performance than mothur in assigning specimens to a species in all datasets, including those at a community level and for 1074 species with data for all three barcode regions. The higher performance of BLAST reflects its consideration of indel variation and absolute length of the marker, leading matK to deliver the highest resolution. Although ITS2 showed slightly lower performance, it has two important advantages; its short length makes it suitable for HTS-based applications, and it is readily recovered from diverse taxa, including vascular plants and fungi.