XB-ART-56471Nucleic Acids Res January 1, 2020; 48 (D1): D776-D782.
Xenbase: deep integration of GEO & SRA RNA-seq and ChIP-seq data in a model organism database.
Xenbase (www.xenbase.org) is a knowledge base for researchers and biomedical scientists that employ the amphibian Xenopus as a model organism in biomedical research to gain a deeper understanding of developmental and disease processes. Through expert curation and automated data provisioning from various sources Xenbase strives to integrate the body of knowledge on Xenopus genomics and biology together with the visualization of biologically significant interactions. Most current studies utilize next generation sequencing (NGS) but until now the results of different experiments were difficult to compare and not integrated with other Xenbase content. Xenbase has developed a suite of tools, interfaces and data processing pipelines that transforms NCBI Gene Expression Omnibus (GEO) NGS content into deeply integrated gene expression and chromatin data, mapping all aligned reads to the most recent genome builds. This content can be queried and visualized via multiple tools and also provides the basis for future automated ''gene expression as a phenotype'' and gene regulatory network analyses.
PubMed ID: 31733057
PMC ID: PMC7145613
Article link: Nucleic Acids Res
Genes referenced: foxh1.2
Article Images: [+] show captions
|Figure 1. GEO and SRA data collection and processing. (A) Automated systems detect new data and load metadata for curators to use in manual annotation processes. The results are then used to generate a run file that feeds the raw sequencing data from the SRA in the appropriate manner into the GEO bioinformatics pipeline. The pipeline output then supplies files to various Xenbase resources, such as the FTP repository, the JBrowse genome browser and the Xenbase database. (B) Details of the RNA-seq and ChIP-seq data processing pipelines utilizing CSBB (see https://github.com/csbbcompbio/CSBB-v3.0).|
|Figure 2. The GEO and SRA simple search interface and search results. (A) The Simple Search Interface. Any common search term, such as a GEO ID, gene symbol, tissue, author or reagent can be queried. In this example the query was ‘wnt8a’, returned two GSE results. (B) Within the GSE page opened by selecting the search result (A) of interest, in addition to the metadata and other information on the GSE a table is displayed providing access to the processed data. Results of the DEG analysis between samples can be viewed in Table format by selecting the DEG link (red arrow) or by selecting multiple check boxes under ‘Compare’ (blue arrow) then selecting this button. Alternatively, bigWig tracks can be viewed by choosing those of interest and selecting the ‘Load in JBrowse’ button.|
|Figure 3. Visualization of GEO data. (A) An example of a heatmap displaying TPM values for a set of six experiments from within a single GSE. This view is generated using the check boxes described in Figure 2B. Users can change the color map, switch to sorting results by Log fold change, FDR and more using various tools in the interface. Cutoffs/thresholds can also be changed, as can the number of DEGs display. Mouse over will display the values for any individual tile. (B) The table view of a set of DEGs from a GSM, compared to its controls. Results can be sorted by any column (e.g. LogFC, FDR etc.) using the up or down arrowheads. (C) Results loaded into JBrowse displaying ChIP results as a bigWig, the corresponding peak call or RNA-Seq data illustrating the impact on a target gene of injecting a morpholino against foxH1. These data can be loaded using the ‘Load in JBrowse’ tool illustrated in Figure 2, or via the hierarchical menu on the left side of the JBrowse window. In this example results from multiple different studies are compared. (D) Alternatively tracks of interest can be loaded using the faceted track viewer available for both Xenopus laevis and Xenopus tropicalis genomes. In this example the ‘GEO tracks’ button in the top left of panel C was selected, then data filtered by entering MO into the ‘Contains text’ box to restricts tracks displayed to morpholino data. This method also allows users to combine results loaded via the GEO results interface to additional results from other studies (GSEs), which can also be loaded via the faceted track viewer. The selected tracks are viewed by clicking on the ‘Back to browser’ button. While different studies cannot be stringently compared, the same pipeline and thresholds were used across all dataset, so are useful for hypothesis generation and further experimentation.|