January 1, 2019;
Modeling of Genome-Wide Polyadenylation Signals in Xenopus tropicalis.
Alternative polyadenylation (APA
) is an important post-transcriptional modification event to process messenger RNA (mRNA) for transcriptional termination, transport, and translation. In the present study, we characterized poly(A) signals in Xenopus tropicalis using 70,918 highly confident poly(A) sites derived from 16,511 protein-coding genes to understand their roles in the regulation of embryo
development and gender difference. We examined potential factors, including the gene length, the number of introns in a gene, and the intron length, that may affect the prevalence of APA
. We observed 12 prominent poly(A) signal patterns, which accounted for approximately 92% of total APA
sites in Xenopus tropicalis. Among them, three patterns are specific to X. tropicalis, so they are absent in other animals such as humans or mice. We catalogued APA
sites based on their genomic regions and developed a bioinformatics pipeline to identify over-represented signal patterns for each class. Then the schema of cis elements for APA
sites in each genomic region was proposed. More importantly, APA
usage is dramatically dynamic in embryos along five developmental stages and well-coordinated with the maternal-to-zygotic transition event. We used an entropy-based method to identify developmental stage-specific APA
sites and identified significant signal patterns around specific sites and constitutive sites. We found that the APA
frequency in different genomic regions varies with developmental stages and that those sites located in intron or coding sequence regions contribute most to the dynamics of gene expression during developmental stages. This study deciphers the characteristics and poly(A) signal patterns for both canonical APA
sites and non-canonical APA
sites across different developmental stages and gender dimorphisms in X. tropicalis, providing new insights into the dynamic regulation of distal
and proximal APA
[+] show captions
References [+] :
Figure 1. Genomic distribution of poly(A) sites in X. tropicalis. (A) Distribution of different types of genes. Protein-coding gene accounts for more than 97% of annotated genes. (B) Distribution of APA sites in different locations. “3′ end ss” refers to the 3′ UTR and the extended region of 3′ UTR. “5′ end ss” refers to the 5′ UTR and the extended region of 5′ UTR. “Ex_3′ UTR” refers to the extended region of 3′ UTR. “Ex_5′ UTR” refers to the extended region of 5′ UTR. (C) Distribution of APA frequencies in different genomic regions. The left y-axis denotes the percentage of respective genes with the poly(A) site(s) in the specific region. The right y-axis denotes the APA ratio, which is the ratio between the number of APA sites in the specific region and the number of genes these APA sites are located in. Proximal sites are defined as poly(A) sites located in non-3′ UTR regions, while distal sites are those in 3′ UTR or extended 3′ UTR regions. “non-APA gene” refers to no APA event in the gene. “rare APA gene” refers to one APA site per gene. “moderate APA gene” refers to two to four APA sites per gene. “abundant APA gene” refers to more than four APA sites per gene. “APA ratio” refers to the average number of APA per gene (APA site number/APA gene number). (D) The relationship among the APA frequency, the gene length, the number of introns, etc. “Freq” represents the frequency of APA sites in the gene; “Length” represents the length of a gene; “Inum” represents the number of introns; “5′ UTR,” “CDS,” and “Intron” represent the frequency of APA sites in the respective genomic regions, respectively. “proximal” represents the frequency of APA sites in CDS, intron, or 5′ UTR regions. “distal” represents the frequency of APA sites in 3′ UTR.
Figure 2. Characteristics of poly(A) signals. (A) Nucleotide profiles surrounding poly(A) sites. (B) The difference of frequency between the highest signal and the second highest signal (displayed in the legend) with different pattern sizes. (C) Top 50 hexamers visual alignment as in the sequence graphics view. Each sequence is present as a single pixel on a horizontal line, and the bright spot represents each occurrence of the signal patterns with respect to their locations on each sequence. The pattern is ranked according to the total frequency that appears in the dataset. The higher the ranking is, the brighter the point is represented. The continuous vertical band of lines from top to bottom indicates the common locations of the signal element. AAUAAA (brightest point) mainly appears around −30 nt to −10 nt (the red dashed box), and the signal aggregation was also observed around +20 nt (the blue dashed box). (D) Schematic of cis elements for poly(A) sites in X. tropicalis. Five regions were determined based on the nucleotide composition profile and the signal analysis. The GU-rich element is overlapped with the downstream U-rich element.
Figure 3. Top-ranked patterns in different poly(A) signal elements. (A) Top 20 4-nt patterns in USE according to the occurrence number in the upstream region of poly(A) site (−34 to −100 bp). (B) Top 20 4-nt patterns in CEL according to the occurrence number in the upstream region of poly(A) site (−3 to −12 bp). (C) Top 20 4-nt patterns in CER according to the occurrence number in the downstream region of poly(A) site (2 to 34 bp). (D) Top 20 4-nt patterns in PE region according to the occurrence number in the downstream region of poly(A) site (−13 to −34 bp).
Figure 4. Signal distribution of non-3′ UTR APA sites in X. tropicalis. (A) Top 20 hexamers according to the occurrence number from −100 to 100 bp around 5′ UTR poly(A) sites. The signal has changed dramatically in the −40 nt to −1 nt. The most significant signal in PE region is AAUAAA, similar to the PE signal on the 3′ UTR. (B) Top 20 hexamers according to the occurrence number from −100 to 100 bp around CDS poly(A) sites. The most significant hexamer is ACUUAC, and the highly overlapping hexamer is AAGAAA in CE. (C) Top 20 hexamers sorted according to the occurrence number from −100 to 100 bp around intronic poly(A) sites. The most significant signal is AAUAAA in PE, and there is a GA-rich element around CS. (D) Polyadenylation signal models on 5′ UTR, CDS, and intron. The red triangle denotes the poly(A) site. USE, upstream sequence element; PE, positioning element; CE, cleavage element; CEL, left cleavage element; CS, cleavage site; CER, right cleavage element; DSE, downstream sequence element.
Figure 5. Characteristics of poly(A) signals during the development of X. tropicalis. (A) Information entropy distribution of poly(A) sites. The x-axis is the conventional information entropy (H), and the dashed lines represent thresholds of 0.82 and 2.66. The y-axis represents the adjusted information entropy (modeH), and the dashed lines represent thresholds of 2.65 and 2.70. Each point denotes one poly(A) site. Specific site is colored in blue, and constitutive site is colored in red. Red points in the top-right and bottom-left corners are sites that were selected based on the entropy value but were not defined as specific or constitutive sites because of their low expression levels (supported by less than five reads). AL: all poly(A) sites; SP: specific sites; CP: constitutive sites. (B) Venn diagram showing the overlap of specifically expressed genes at different developmental stages. “F_adult” includes young females and growing females. “M_adult” includes young males and growing males. (C) Percentages of specific poly(A) sites located in different locations across different periods. We randomly selected the same number of specific sites and constitutive sites and calculated the percentage of the genomic regions where these sites are located. In this figure, we only selected six periods (embryo stage 6, embryo stage 28, young female, growing female, young male, and growing male) with sufficient quantity for analysis.
Frog genetics: Xenopus tropicalis jumps into the future.