|
Figure 1- sisRNAs are identified from GV RNA-seq data according to Ensemble and RefSeq gene sets in X. tropicalis. a Venn diagram of peaks called by MACS with FDR=0.01 and determined to be sisRNAs according to refSeq and Ensembl gene sets. b UCSC screenshot of identified sisRNAs in the gene E2F3. Red and blue blocks indicate exons, red peaks indicate mRNA detected in the cytoplasm, blue peaks indicate RNA detected in the GV, grey blocks indicate sisRNAs identified, orange lines indicate the summits of sisRNA peaks. c-d Histograms of length distributions for (c) refSeq and (d) Ensembl sisRNAs
|
|
Figure 2-sisRNAs are highly expressed from genes with multiple introns. a-b Histograms show (a) the percentage of genes with sisRNAs, and (b) the average number of sisRNAs per gene in refSeq (red) and Ensembl (blue) genes. Genes are grouped by the intron number. The dashed line indicates the average value for all genes.
|
|
Figure 3-The sequences of sisRNAs have higher GC and TG content, while CpG poor compared to the genome. a-b The ratio of observed/expected (O/E) for the occurrence of dinucleotide (a) and trinucleotide (b) combinations in sisRNAs identified by refSeq genes (red) and by Ensembl genes (blue). c-e Boxplots show the GC% (c), CpG density (d) and CA/TG density (e) of the genome (green) compared to sisRNAs identified by refSeq (red) and Ensembl (blue) genes
|
|
Figure 4- sisRNAs are specific regions of the introns. Boxplots of the comparisons of GC% (a), CpG density (b) and CA|TG density (c) of the introns from genes without any sisRNA (grey), introns without sisRNA from host genes with sisRNAs (orange), and introns with sisRNAs (purple) to sisRNAs identified by refSeq (red) and Ensembl (blue) genes
|
|
Figure 5-sisRNAs are enriched at both the start and the end of transcript, preferentially the 3â² end of transcript. a The density of sisRNAs in introns. For each transcript, all the introns are concatenated from 5â² to 3â² end, and then divided into 100 bins. b UCSC genome browser screenshot of sisRNAs distribution along the introns of the gene nasp. A higher number of sisRNAs are located at the 3â² end. c Scatter plot of sisRNA peak signals versus host gene expression level (FPKM)
|
|
Figure 6-Specific TFBSs are enriched in sisRNAs. The enrichment of TFBS motifs was plotted for (a) the introns without any sisRNAs versus the sisRNAs identified by RefSeq genes, and (b) the RefSeq sisRNAs versus the Ensembl sisRNAs. c UCSC genome browser screenshot of Stat3 shows it is highly expressed in the cytoplasm
|
|
Figure 7- sisRNAs are as evolutionary conserved as introns, but much less than exons. Average PhastCons conservation scores of sisRNAs and introns for upstream and downstream (±150-bps) relative to (a) 5â² end, (b) midpoint, and (c) 3â² end. Boxplots of the comparisons of GC% (d), CpG density (e) and CA|TG density (f) of the human red blood cells (grey), human Hela cells (orange), mouse red blood cells (brown), mouse 3âT3 cells (yellow), chicken DF1 cells (purple), and Xenopus laevis XTC cells (light blue) cytoplasmic sisRNAs, to sisRNAs identified by refSeq (red) and Ensembl (blue) genes. g Venn diagram shows the overlap of host genes with sisRNAs identified by RefSeq genes in Xenopus tropicalis GV and host genes with cytoplasmic sisRNAs in Xenopus laevis XTC
|
|
Supplementary Figure 1. Longer introns have more sisRNAs. (A) Boxplot of lengths of introns grouped by the number of bearing sisRNAs: 0 (green), 1 (orange), 2 (purple), 3 (red) and >3 (blue). (B) Scatter plot of sisRNA numbers versus length of introns. (C) Scatter plot of sisRNA numbers versus total length of introns per gene. (D) Scatter plot of sisRNA numbers versus normalized intron number per Mbps per gene
|
|
Supplementary Figure 2. sisRNAs are specific regions of the introns.
Boxplots of the comparisons of GC% (A), CpG density (B), CA|TG density (C), and length (D) of the 1st introns (green), middle introns (orange), and last introns (purple) of genes with more than 3 introns, to sisRNAs identified by refSeq (red) and Ensembl (blue) genes.
|
|
Fig. 1. sisRNAs are identified from GV RNA-seq data according to Ensemble and RefSeq gene sets in X. tropicalis.
a Venn diagram of peaks called by MACS with FDR=0.01 and determined to be sisRNAs according to refSeq and Ensembl gene sets. b UCSC screenshot of identified sisRNAs in the gene E2F3. Red and blue blocks indicate exons, red peaks indicate mRNA detected in the cytoplasm, blue peaks indicate RNA detected in the GV, grey blocks indicate sisRNAs identified, orange lines indicate the summits of sisRNA peaks. c-d Histograms of length distributions for (c) refSeq and (d) Ensembl sisRNAs
|
|
Fig. 2. sisRNAs are highly expressed from genes with multiple introns. a-b Histograms show (a) the percentage of genes with sisRNAs, and (b) the average number of sisRNAs per gene in refSeq (red) and Ensembl (blue) genes. Genes are grouped by the intron number. The dashed line indicates the average value for all genes
|
|
Fig. 3. The sequences of sisRNAs have higher GC and TG content, while CpG poor compared to the genome. a-b The ratio of observed/expected (O/E) for the occurrence of dinucleotide (a) and trinucleotide (b) combinations in sisRNAs identified by refSeq genes (red) and by Ensembl genes (blue). c-e Boxplots show the GC% (c), CpG density (d) and CA/TG density (e) of the genome (green) compared to sisRNAs identified by refSeq (red) and Ensembl (blue) genes
|
|
Fig. 4. sisRNAs are specific regions of the introns. Boxplots of the comparisons of GC% (a), CpG density (b) and CA|TG density (c) of the introns from genes without any sisRNA (grey), introns without sisRNA from host genes with sisRNAs (orange), and introns with sisRNAs (purple) to sisRNAs identified by refSeq (red) and Ensembl (blue) genes
|
|
Fig. 5. sisRNAs are enriched at both the start and the end of transcript, preferentially the 3â² end of transcript. a The density of sisRNAs in introns. For each transcript, all the introns are concatenated from 5â² to 3â² end, and then divided into 100 bins. b UCSC genome browser screenshot of sisRNAs distribution along the introns of the gene nasp. A higher number of sisRNAs are located at the 3â² end. c Scatter plot of sisRNA peak signals versus host gene expression level (FPKM)
|
|
Fig. 6. Specific TFBSs are enriched in sisRNAs. The enrichment of TFBS motifs was plotted for (a) the introns without any sisRNAs versus the sisRNAs identified by RefSeq genes, and (b) the RefSeq sisRNAs versus the Ensembl sisRNAs. c UCSC genome browser screenshot of Stat3 shows it is highly expressed in the cytoplasm
|
|
Fig. 7. sisRNAs are as evolutionary conserved as introns, but much less than exons. Average PhastCons conservation scores of sisRNAs and introns for upstream and downstream (±150-bps) relative to (a) 5â² end, (b) midpoint, and (c) 3â² end. Boxplots of the comparisons of GC% (d), CpG density (e) and CA|TG density (f) of the human red blood cells (grey), human Hela cells (orange), mouse red blood cells (brown), mouse 3âT3 cells (yellow), chicken DF1 cells (purple), and Xenopus laevis XTC cells (light blue) cytoplasmic sisRNAs, to sisRNAs identified by refSeq (red) and Ensembl (blue) genes. g Venn diagram shows the overlap of host genes with sisRNAs identified by RefSeq genes in Xenopus tropicalis GV and host genes with cytoplasmic sisRNAs in Xenopus laevis XTC
|