Tuesday, October 1, 2013

Sharing, sharing, sharing

As I've mentioned it in a couple of blog posts before I am a big proponent of open access. Not only for any sort of publication but also for data generated through research. Data and information sharing makes all our lives as researchers easier.

The early release of DNA sequence data is common practice in some communities, e.g. genome projects do that on a regular basis and it becomes the rule rather than the exception that the community works on the annotation of the data. Unfortunately, in the barcoding community this is not common practice and many colleagues would like to have rather more time to work with their data before releasing them. There are a variety of reasons for that reaching from being afraid to be scooped by overly competitive colleagues all the way to the fear to be embarrassed for providing wrong annotations (identifications in our business) in the first place. In the latter case a more reasonable concern seems to be the fact that wrong ID's would pollute any database such as BOLD and in some instances provide wrong identifications. Indeed this could lead to some serious error propagation which is a phenomenon long known from GenBank. However, diligent work and data maintenance of barcode data owners can certainly overcome this and BOLD offers a variety of tools to help with data curation and cleansing.

Maybe the results of a new study published in PeerJ help to overcome some of those concerns as it points out the advantage of open data. Two researchers from Duke University looked at gene expression publications between 2001 and 2009 and calculated that on average studies with publicly available data are cited about 9% more often than others. It seems to pay off to have data out there in the public. In addition the study shows that researchers tend to use their own data on average for to two years after initially publishing them. Their colleagues often refer to them up to six years after data went public.Overall the number of datasets used for a single publication went up from 1-2 in the years 2002-2004 to 3 and more in the year 2010.

