XB-ART-52286Dev Biol January 1, 2017; 426 (2): 401-408.
Developmentally regulated long non-coding RNAs in Xenopus tropicalis.
Advances in RNA sequencing technologies have led to the surprising discovery that a vast number of transcripts emanate from regions of the genome that are not part of coding genes. Although some of the smaller ncRNAs such as microRNAs have well-characterized functions, the majority of long ncRNA (lncRNA) functions remain poorly understood. Understanding the significance of lncRNAs is an important challenge facing biology today. A powerful approach to uncovering the function of lncRNAs is to explore temporal and spatial expression profiling. This may be particularly useful for classes of lncRNAs that have developmentally important roles as the expression of such lncRNAs will be expected to be both spatially and temporally regulated during development. Here, we take advantage of our ultra-high frequency (temporal) sampling of Xenopus embryos to analyze gene expression trajectories of lncRNA transcripts over the first 3 days of development. We computationally identify 5689 potential single- and multi-exon lncRNAs. These lncRNAs demonstrate clear dynamic expression patterns. A subset of them displays highly correlative temporal expression profiles with respect to those of the neighboring genes. We also identified spatially localized lncRNAs in the gastrula stage embryo. These results suggest that lncRNAs have regulatory roles during early embryonic development.
PubMed ID: 27418388
PMC ID: PMC5233649
Article link: Dev Biol
Genes referenced: clic3 cops5 foxa2 foxi2 frmd8 gpr155 igf2bp3 loc101731854 pou3f4 rps7 sox2 st6galnac3 tmc5 txnl4a vegt
Article Images: [+] show captions
|Fig. 1. LncRNA discovery pipeline. The output of Cuffmerge (step 1) goes through multiple filtering steps to remove unqualified lncRNA genes and any transcripts with coding potential (step 2), short transcripts (step 3), miRNAs (step 4). These processes are performed in parallel rounds for single time points and also using pooled reads over a sliding window of 5 time points). After these commonly used filtering steps, the remaining transcripts are combined as one set and one representative transcript model is kept among the overlapping transcripts (step 5). Then, multi-exon and single-exon lncRNA candidates are separated (step 6). After removing the lncRNA candidates with less than 5 consecutive time points of non-zero expression, the SNR threshold is applied (step 7). We remove any potential lncRNA candidates that have the possibility of being part of exons of a neighboring gene (step 8 and 9). Our final lists of lncRNAs are 1336 multi-exon lncRNAs and 4353 single-exon lncRNAs.|
|Fig. 2. Temporal expression dynamics of lncRNAs. The expression values of individual candidate lncRNAs are normalized by their maxima. These expression profiles are assigned (k-means clustering) to 8 different expression clusters. A) The heatmaps show individual normalized expression patterns for all 5689 lncRNAs. B) The plots demonstrate the average expression of all genes within individual clusters. Each blue bar in panel B corresponds to egg (E), late blastula (B), gastrula (G), neurula (N), tailbud (T).|
|Fig. 3. Expression profiles of lncRNAs and the neighboring genes. A) Gene expression values in RPKM are shown for a lncRNA and a neighboring gene during the developmental time course. The blue and red solid lines represent Gaussian processes medians and the shaded areas are the 95% confidence intervals of the data. C denotes the Pearson correlation between the lncRNA and neighboring gene expression dynamics. Gene models of lncRNAs are shown in Supplementary Fig. 5. B) Left panel shows distribution of correlations of pairs of lncRNA – neighboring gene (in blue) and pairs of lncRNA – random gene (green). Right panel shows the distribution of correlations of pairs of lncRNA – neighboring gene (in blue) and pairs of antisense strand lncRNA –neighboring gene (light blue). Pearson coefficient of 1 is highly correlated, and −1 is highly anti-correlated.|
|Fig. 4. LncRNA distribution in gastrula stage embryos. A) Spatial expression of lncRNAs in gastrula stage embryos. The scatter plot in left panel depicts the comparison between vegetal and animal RPKM values of lncRNAs. The scatter plot in the right panel depicts the comparison between ventral and dorsal expressions. Individual points represent 5689 lncRNAs expressed in gastrula embryos, and the red boxes mark differentially expressed lncRNAs. The black line denotes equal expression between vegetal and animal, or dorsal and ventral tissue fragements. B) RT-qPCR analysis of lncRNAs using RNA isolated from designated tissue fragements.|
|Supplementary Figure 1 Scatter plot of 4640 lncRNA candidates after step 6. X-axis is the log (signal variance/noise variance) calculated using signal variance and noise variance hyperparameters of Gaussian processes. Y-axis is the log of maximum expression value observed for each transcript. The light purple points (2795) have at least 5 consecutive non-zero expression time points and those in blue (1845) failed to satisfy this condition. The vertical line demonstrates log10(SNR)=0.6, which is the threshold we used to eliminate less-qualified transcripts|
|Supplementary Figure 2 Examples of lncRNA gene expression patterns at increasing signals-to-noise ratios (SNRs). We set the threshold SNR to be 0.6 for our pipeline analysis (Figure 1).|
|Supplementary Figure 3 Candidate lncRNAs located near foxa2 and sox2 loci. Foxa2 lncRNA did not survive our analysis as one paired end read bridged between this lncRNA and Foxa2 exon. However, we propose that this lncRNA adjacent to FOXA2 is an authentic lncRNA because syntenic lncRNA is found in both human and mouse. We also found lncRNAs adjacent to Sox2 in both human and mouse.|
|Supplementary Figure 4MALAT/NEAT2 and Xlsirts-related lncRNAs in Xenopus tropicalis. A) Besides the sequence similarity between lncrna_single_sw_00112474 and MALAT1 in mouse and human, the relative position of frog malat1 with respect to frmd8 also agrees with that found in mouse and human. In all three species, MALAT1 is a single exon lncRNA gene located downstream of FRMD8. In addition, NEAT1 lncRNA gene is located between MALAT1 and frmd8 in human and mouse, and we find a lncRNA in the similar position in Xenopus tropicalis. B) For Xlsirts, multiple alignments to our set of lncRNAs were reported by Blast, and two examples are shown. Xlsirt-related genes are found between Xenopus tropicalis and laevis, but not in human and mouse.|
|Supplementary Figure 5 Genome browser view of lncRNAs and neighboring genes displayed in Figure 3. Red lines represent coding genes. Blue lines denote lncRNAs. Grey lines feature other nearby coding genes around the lncRNA loci.|