Much of the published research in the life sciences is based on image data sets that sample 3D space, time and the spectral characteristics of detected signal to provide quantitative measures of cell, tissue and organismal processes and structures. The sheer size of biological image data sets makes data submission, handling and publication challenging. An image-based genome-wide 'high-content' screen (HCS) may contain more than 1 million images, and new 'virtual slide' and 'light sheet' tissue imaging technologies generate individual images that contain gigapixels of data showing tissues or whole organisms at subcellular resolutions. At the same time, published versions of image data are often mere illustrations: they are presented in processed, compressed formats that cannot convey the measurements and multiple dimensions contained in the original image data and cannot easily be reanalyzed. Furthermore, conventional publications do not include the metadata that define imaging protocols, biological systems and perturbations or the processing and analytic outputs that convert the image data into quantitative measurements.
There are many resources worldwide in which people publish imaging data, but none of these repositories is both generic and linked to other relevant bio-molecular data. This means that for all the effort that goes into them, it is difficult to reuse these datasets in new studies. There are many reasons why sharing imaging data has been so difficult until now, most notably the heterogeneity and complexity of the image data, but also the lack of a critical mass of storage, compute and curation expertise.
To address this challenge, scientists at the University of Dundee, the European Bioinformatics Institute (EMBL-EBI), the University of Bristol and the University of Cambridge have launched a prototype repository for imaging data: the Image Data Resource (IDR). The new resource integrates imaging data with molecular and phenotype data. IDR includes information on experimental protocols: parameters, analyses and the effects scientists have observed in cells and features, for example.
To demonstrate the power of the new repository the researchers used data deposited in the IDR to identify genes from different studies that, when mutated or removed, caused cells to elongate and stretch out. Information from several different studies was used to built a gene network, which provides insights into how these genes affect cell shape which is an important property to consider in metastatic cancer.
The prototype public image repository contains a broad range of data, including:
- High-content screening
- Super-resolution microscopy
- Time-lapse imaging
- Digital pathology imaging
- Experimental protocol metadata
- Observed effects in cells and features
- Cross references with molecular archives
The next step is to secure the support and investment needed to transform the prototype into a production-ready imaging infrastructure. IDR's software and technology is open source, so it can be accessed and built into other image data publication systems. At this point this new project focuses on microscopic imaging but why not expanding into images of entire organisms or specific traits?
No comments:
Post a Comment