January 1, 2015;
Finding Our Way through Phenotypes.
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today''s data barriers and facilitate analytical reproducibility.
[+] show captions
Figure 2. Phenotypes shared across biology.Phenotype data are relevant to many different domains, but they are currently isolated in data “silos.” Research from a broad array of seemingly disconnected domains, as outlined here, can be dramatically accelerated with a computable data store. (A) Domains: Diverse fields such as evolutionary biology, human disease and medicine, and climate change relate to phenotypes. (B) Phenotypes: insects, vertebrates, plants, and even forests all have features that are branched in some way, but they are described using different terms. For a computer to discover this, the phenotypes must be annotated with unique identifiers from ontologies that are logically linked. Under “shape” in the PATO quality ontology , “branchiness” is an encompassing parent term with subtypes “branched” and “increased branchiness.” From left to right, top layer, insects, vertebrates and plants have species that demonstrate phenotypes for which the genetic basis is not known. Often their companion model species, however, have experimental genetic work that is relevant to proposing candidate genes and gene networks. Insects (1): An evolutionary novelty in bees (top layer) is the presence of branched setae used for pollen collection. Nothing is known about the genetic basis of this feature. One clue to the origin of this evolutionary feature comes from studies of Drosophila (bottom layer), where Mical overexpression in unbranched wild-type bristles generates a branched morphology . Mical directly links semaphorins and their plexin receptors to the precise control of actin filament dynamics . Vertebrates (2): In humans, aberrant angiogenesis, including excessive blood vessel branching (top layer), is one of the six central hallmarks of cancer . Candidate genes have been identified using data from model organisms. In zebrafish (middle layer), studies of the control of sprouting in blood vessel development show that signaling via semaphorins  and their plexin receptors is required for proper abundance and distribution ; disruption of plxnd1 results in increased branching ,,. In mouse (bottom layer), branching of salivary glands is dependent on semaphorin signaling , as is the branching of various other epithelial organs . Plants (3): The uppermost canopy of trees of the rainforest (top layer) undergo a marked increase in branching associated with climate change . Nothing is known about the genetic basis of this feature. The branching of plant trichomes (bottom layer), tiny outgrowths with a variety of functions including seed dispersal, has been studied in the model Arabidopsis thaliana. Branching occurs in association with many MYB-domain genes , transcription factors that are found in both plants and animals . (C) Environment: Diverse input from the environment influences organismal phenotype. (D) Genes: At the genetic level, previously unknown associations with various types of “branchiness” between insects and vertebrates are here made to possibly a common core or network of genes (the semaphorin-plexin signaling network). No association between genes associated with plant branching (Myb transcription factors) and animal branching is obvious from the literature. Image credit: Anya Broverman-Wray.
Figure 1. How to discover branching phenotypes?(Bottom panel) Phenotype data exhibiting various forms of branchiness are not easily discerned from diverse natural language descriptions. (A) Bee hairs are different from most other insect hairs in that they are plumose, which facilitates pollen collection. (B) A mutant of Drosophila melanogaster exhibits forked bristles, due to a variation in mical. (C) In zebrafish larvae (Danio rerio), angiogenesis begins with vessels branching. (D) Plant trichomes take on many forms, including trifurcation. (Top) Phenotypes involving some type of “branched” are easily recovered when they are represented with ontologies. In a semantic graph, free text descriptions are converted into phenotype statements involving an anatomy term from animal or plant ontologies , and a quality term from a quality ontology , connected by a logical expression (“inheres_in some”). Anatomy (purple) and quality (green) terms (ontology IDs beneath) relate phenotype statements from different species by virtue of the logic inherent in the ontologies, e.g., plumose, bifurcated, branched, and tripartite are all subtypes of “branched.” Image credits: bumble bee with pollen by Thomas Bresson, seta with pollen by István Mikó, Arabidopsis plants with hair-like structures (trichomes) by Annkatrin Rose, Drosophila photo by John Tann, Drosophila bristles redrawn from , scanning electron micrograph of Arabidopsis trichome by István Mikó, zebrafish embryos by MichianaSTEM, zebrafish blood vessels from . Figure assembled by Anya Broverman-Wray.