Tuesday, July 16, 2013

Universal primers for marine invertebrates

Over the years I have been part of the marine barcoding movement I always thought: " Folmer et al. must have been damn lucky!" That does not at all devalue the work of these researchers back in 1994. Until today their primers have been used with many organisms in order to retrieve a COI fragment that 9 years after the primer was developed was declared the DNA Barcode region for animals. However, these primers are not really universal in applicability. The so called Folmer primers often fail or perform poorly. A large range of especially marine invertebrates can't be sequenced with them. Unfortunately, the general reaction to the problem was to develop more taxon specific primers and adapt laboratory protocols accordingly. Nevertheless real universal primers were always deemed desirable.

Now it seems we do have a new set that would be ready to do the job. Maybe not across all animal life but at least for the large fraction that lives in the ocean and has no backbone. US researchers have now published a set of new primers, called jgHCO2198 and jgLCO1490, which are well suited for routine DNA barcoding, all-taxon surveys and metazoan metagenomics.

Although I am truly happy about this advance to which I can congratulate the authors I have two issues with the paper and both relate to data sharing and open access.

Number one is a more general one and not necessarily criticism of this particular paper alone. Why does the scientific community always have to wait years before such breakthroughs are shared and primer sequences published? I don't see the point why primers are not shared right after they have been developed. Their success is perhaps not guaranteed but sharing them in some sort of a general database would largely increase the number of experiments and provide researchers with results that in turn show them how universal or specific their newly designed oligo is.

It is a shame that for most primers we still have to wait for the associated publication to come out before we 'discover' them. I wonder how many primer combinations have been repeatedly developed independent from each other. What a waste of precious time and only because we are still largely paranoid and self-serving in our community. I am not excluding myself from this as I am also guilty of holding back such information for years waiting for all results to come in to write the big paper that gives me the impact factor I thought I deserve. Well, that nonsense is over. A few years back I decided to apply the open access concept to the few primers I developed for DNA Barcoding. Those are at least available through the BOLD primer database. They might never end up in a publication. I don't see a problem in releasing them before a paper is submitted.

So, why don't we just put primers up as soon as we designed them? There are a couple of databases out there although none of them has a community interface that would allow feedback on success. At least for primers relevant to the DNA Barcoding community BOLD could extend their interface. Just a thought.

My second major criticism is summarized in one question: Where are the sequences relating to this paper? Some 7600 sequences of marine invertebrates have been recovered using this new primer set but why are they not shared via GenBank or BOLD? This is an old problem in the entire DNA Barcoding movement. There is a barcoding twilight zone and its proportions are largely unknown. It is home to sequences that reside on individual researchers hard disks or institutional databases. Both are locked up and the community can only hope that some day the data will be released. This seriously hampers any effort to effectively build DNA Barcode libraries as nobody really knows what has been sequenced and what not. The best solution to this problem is the paradigm shift many people silently hope for - full immediate data release that also allows for community tagging and commenting as realized in BOLD. Unfortunately, there are also many people that rather like to lock up their data indefinitely to make sure that everything is correct before releasing. They are concerned to spread partially erroneous data. Understandable point but it is the typical Wikipedia counterargument. I think we researchers should start to embrace community approaches out there. Say what you will - Wikipedia got better with the number of contributors especially the ones that do a proper job. Time to do the same and stop sitting on data.

