May 28, 2013;
A genome-wide survey of maternal and embryonic transcripts during Xenopus tropicalis development.
Dynamics of polyadenylation vs. deadenylation determine the fate of several developmentally regulated genes. Decay of a subset of maternal mRNAs and new transcription define the maternal-to-zygotic transition, but the full complement of polyadenylated and deadenylated coding and non-coding transcripts has not yet been assessed in Xenopus embryos. To analyze the dynamics and diversity of coding and non-coding transcripts during development, both polyadenylated mRNA and ribosomal RNA-depleted total RNA were harvested across six developmental stages and subjected to high throughput sequencing. The maternally loaded transcriptome is highly diverse and consists of both polyadenylated and deadenylated transcripts. Many maternal genes show peak expression in the oocyte
and include genes which are known to be the key regulators of events like oocyte
maturation and fertilization. Of all the transcripts that increase in abundance between early blastula
and larval stages, about 30% of the embryonic genes are induced by fourfold or more by the late blastula stage
and another 35% by late gastrulation. Using a gene model validation and discovery pipeline, we identified novel transcripts and putative long non-coding RNAs (lncRNA). These lncRNA transcripts were stringently selected as spliced transcripts generated from independent promoters, with limited coding potential and a codon bias characteristic of noncoding sequences. Many lncRNAs are conserved and expressed in a developmental stage-specific fashion. These data reveal dynamics of transcriptome polyadenylation and abundance and provides a high-confidence catalogue of novel and long non-coding RNAs.
[+] show captions
Figure 1. Generation of RNA-sequencing libraries.(a) Developmental stages of Xenopus tropicalis. (b) RPKM distribution across six developmental time-points. Numbers on the x-axis are Xenopus tropicalis Nieuwkoop and Faber developmental stages, Oocyte (Oo), stage 6, stage 9, stage 12, stage 16 and stage 30. (c) Heat map to show Pearson correlation of expression (RPKM) between all 9 RNA-seq libraries. (d) Scatter plots to show stage specific Pearson correlation between RNA-seq data generated using two different methods. Log2 RPKM values are plotted on x and y axis respectively. PolyA+ (RNA harvested with double PolyA+ selection), RZ (ribosomal rRNA depleted-total RNA).
Figure 2. Total and Polyadenylated RNA profiles of the Maternal Transcriptome.(a) Barplots to show gene specific distribution of log2 RPKM ratios during early development. (b) Heatmap to show stage specific comparison between PolyA+ and RZ data. The barplots to the right of the figure represent average PolyA+ and RZ ratios per stage for the same cluster numbered to the left of the heatmap. Gene names are representative examples from the corresponding cluster. (c) Heatmap to show abundance of polyadenylated maternal genes from six developmental time points. Gene names are representative examples from the corresponding cluster. The heatmaps (bandc) show scaled expression values (the sum of expression per gene across all stages is set to one). PolyA+(RNA harvested with double PolyA+ selection), RZ (ribosomal rRNA depleted-total RNA).
Figure 3. Overview of the Embryonic Transcriptome.(a) Density plot to show distribution of Maternal-Embryonic (grey) and Embryonic (red) ratios of polyA+ vs. RZ expression (RPKM) at Stage 9. (b) Heatmap to show dynamic expression of 2,481 polyadenylated embryonic genes. Scale represents the log2 transformed RPKM values. Gene names are representative examples from the corresponding cluster. (c) A pie-chart to show percentage of genes whose expression is increased four folds or more relative to Oocyte. (d) A heatmap to show scaled expression (the sum of expression per gene across all stages is set to one) of 2,481 polyadenylated embryonic genes. Gene names are representative examples from the corresponding cluster. (e) A pie-chart to show percentage of embryonic genes peaking in expression per stage.
Figure 4. Gene Ontology Analysis of the Embryonic Transcriptome.(a) GO term enrichment analysis from DAVID. Barplots (i) Stage 12, (ii) Stage 16, (iii) Stage 30 show stage-specific significant Biological Processes and their -log P-values plotted on x-axis. (b) A plot to cluster and visualize DAVID-derived GO terms from developmental stages 9, 12, 16 and 30 using R package clusterProfiler with a p-value cut off < 0.01 . The DAVID GO terms have been derived from biological process annotation of Xenopus tropicalis genes.
Figure 5. Analysis of Novel transcripts.(a) Subsets of gene models from the updated Xenopus tropicalis gene annotation pipeline. (bandc) Cumulative frequency chart to show distribution of codon bias (LLR score) and ORF length between new gene models (NGM), all gene models (GM), new gene models with validation support (NGM-vv), random genomic sequences (Genomic seq.) and Xenbase extracted X.tropicalis mRNAs (X.trop mRNA).
Figure 6. Analysis of NGM-vvo transcripts.(a) Cumulative frequency chart to show distribution of codon bias (LLR score) for NGM-vvo, random genomic sequences (Genomic seq.) and Xenbase extracted X.tropicalis mRNAs (X.trop mRNA). (b) An example to illustrate NGM-vvo gene model. H3K4me3 peak demonstrates the gene being transcribed from its own promoter . (candd) Frequency distribution to compare number of exons and transcript length (nt, nucleotides) between all gene models (GM) and new gene models (NGM-vvo).
Figure 7. Expression analysis of putative long non-coding RNAs (NGM-vvo).(a) Boxplot to show log transformed expression (RPKM, PolyA+) across six developmental stages in the NGM-vvo subset. (b) Density graph to compare stage-9 expression (RPKM, PolyA+) between all gene models (GM) and NGM-vvo subset. (c) Heat map to show unsupervised hierarchical clustering of expression (RPKM) of polyA+ and RZ data across embryogenesis. Colorscale represents deviation from mean expression calculated row-wise. (d) Density plot to compare distribution of log10 transformed conservation score (phastCons analysis, see Materials and methods) between random genomic sequences (Genomic Seq), Xenbase extracted X.tropicalis mRNAs (X.trop mRNA) and NGM-vvo subset. PolyA+(RNA harvested with double polyA+ selection), RZ (ribosomal rRNA depleted-total RNA).