PLoS Comput Biol
January 1, 2018;
New methods for computational decomposition of whole-mount in situ images enable effective curation of a large, highly redundant collection of Xenopus images.
The precise anatomical location of gene expression is an essential component of the study of gene function. For most model organisms this task is usually undertaken via visual inspection of gene expression images by interested researchers. Computational analysis of gene expression has been developed in several model organisms, notably in Drosophila which exhibits a uniform shape and outline in the early stages of development. Here we address the challenge of computational analysis of gene expression in Xenopus, where the range of developmental stages of interest encompasses a wide range of embryo
size and shape. Embryos may have different orientation across images, and, in addition, embryos have a pigmented epidermis
that can mask or confuse underlying gene expression. Here we report the development of a set of computational tools capable of processing large image sets with variable characteristics. These tools efficiently separate the Xenopus embryo
from the background, separately identify both histochemically stained and naturally pigmented regions within the embryo
, and can sort images from the same gene and developmental stage according to similarity of gene expression patterns without information about relative orientation. We tested these methods on a large, but highly redundant, collection of 33,289 in situ hybridization images, allowing us to select representative images of expression patterns at different embryo
orientations. This has allowed us to put a much smaller subset of these images into the public domain in an effective manner. The ''isimage'' module and the scripts developed are implemented in Python and freely available on https://pypi.python.org/pypi/isimage/.
PLoS Comput Biol
[+] show captions
Fig 1. Overview of image analysis pipeline.
(Upper panel) schematic representations of the stages of image analysis. Text boxes contain brief descriptions, see text for more detail, roman numerals correspond to steps in the workflow. Arrows show where data is extracted from the image for analysis. (A) Orthogonal projection of whitened 18 dimensional data extracted from the image. Colouring is made on result of clustering, with crosses and ellipses represent centres and covariances of the identified clusters. (B) Example representation of pixel colour density in the 3D colour space, showing identification of vectors corresponding to in situ stain, pigmented and un-pigmented embryo, used to identify regions of the embryo expressing the gene in question. (C) Example histogram of stain distribution. Data modelled as mixture of two Gaussians. The threshold is the smallest of mu + 2*sigma of the two components; it is represented as a solid green line. Dashed red lines represent range of values [.25, .67] the threshold is allowed to take.
Fig 2. Graphical depiction of image analysis workflow for selected images with different shape embryos and a range of different background colours and textures.
Images show that embryo detection and stain colour analysis is effective independently of a wide range of variation in image background and embryo characteristics. All images were analysed without changing initial parameters. Note image (d) where even the human eye struggles to distinguish the upper border of the embryo from the background.
Fig 3. Rational selection of representative images to reduce redundancy.
Unsorted images from a large collection for a given gene/development stage are first classified into cleared (grey background) and un-cleared (orange/red background) images. Embryo boundaries were detected within the image and embryo pixel colours analysed to yield predicted in situ stain, pigmentation or unmarked embryo. Embryos with predicted outline touching the image border were excluded (unless the outline in all images in the group touched the border). Images were sorted within groups by stain content for selection, and cropped for display where needed. (A) Lateral view images, NF stage 20+. (B) Quasi-spherical development stages (up to NF stage 20): images are clustered according to expression pattern similarity under rotational and other transformations (see also Fig 4), and the most stained image is selected from each well-separated group.
Fig 4. Simplified example of spherical stage image similarity clustering.
Early development stages are routinely photographed from different directions to maximise information about the expression pattern. Reference and comparison images for the same gene and stage are compared under multiple transformations (scale, rotation, shear) to identify the set of transformations that minimises their dissimilarity. Here we see that a 91.5o rotation and a 1.01 scaling suggest the most likely transformation between these two images. All images from the same gene and stage are compared with each other to identify images from (probably) the same view-point. See also Fig 3B.
Fig 5. Developmental progression of expression for selected genes prdm1, ank1, and hoxb3 during Xenopus embryo development.
This is the result of applying our suite of image analysis tools to (in this case) the 334 original images for these three genes, and reducing them to a representative set of 65 images, including multiple views of expression patterns from the early spherical stages (pre-Stage 22).