XB-ART-52200Genetics 2016 Jun 01;2032:683-97. doi: 10.1534/genetics.116.188508.
Show Gene links Show Anatomy links
Accurate Profiling of Gene Expression and Alternative Polyadenylation with Whole Transcriptome Termini Site Sequencing (WTTS-Seq).
Construction of next-generation sequencing (NGS) libraries involves RNA manipulation, which often creates noisy, biased, and artifactual data that contribute to errors in transcriptome analysis. In this study, a total of 19 whole transcriptome termini site sequencing (WTTS-seq) and seven RNA sequencing (RNA-seq) libraries were prepared from Xenopus tropicalis adult and embryo samples to determine the most effective library preparation method to maximize transcriptomics investigation. We strongly suggest that appropriate primers/adaptors are designed to inhibit amplification detours and that PCR overamplification is minimized to maximize transcriptome coverage. Furthermore, genome annotation must be improved so that missing data can be recovered. In addition, a complete understanding of sequencing platforms is critical to limit the formation of false-positive results. Technically, the WTTS-seq method enriches both poly(A)+ RNA and complementary DNA, adds 5'- and 3'-adaptors in one step, pursues strand sequencing and mapping, and profiles both gene expression and alternative polyadenylation (APA). Although RNA-seq is cost prohibitive, tends to produce false-positive results, and fails to detect APA diversity and dynamics, its combination with WTTS-seq is necessary to validate transcriptome-wide APA.
PubMed ID: 27098915
PMC ID: PMC4896187
Article link: Genetics
Species referenced: Xenopus tropicalis
Genes referenced: c4h1orf52 crtc2 ctsd dld lamb1 rpl34 tbxt.2 tsg101 u2af2 ubtd1
GEO Series: GSE74919: Xenbase, NCBI
Article Images: [+] show captions
|Figure 1. Illustration of our finalized WTTS-seq library preparation procedures. Total RNA serves as the starting material, followed by fragmentation and poly(A)+ RNA enrichment. Reverse transcription synthesizes the first-strand cDNA and adds both 5′- and 3′-adaptors into the library. Treatment with RNases I and H removes all RNA molecules and leaves the first-strand cDNA alone for second-strand synthesis by PCR. The library is then size selected and ready for NGS.|
|Figure 2. Effect of adaptor design used for synthesis of second-strand cDNA in seven trials (T) on library quality and number of T nucleotides at the beginning of raw reads. (A) Adaptor design included OP (outer primer), IP (ion primer), BC (barcode), and PAAP [poly(A)-anchored primer] regions. T1 used OPs, T2 used IPs, and T3–T7 tested PAAPs in PCR reactions. Gel images are shown for library outputs (from concentrated bands to smooth distributions). Ladder was the ACTGene DNA marker 100 bp, including 100, 200, 300, 400, 500, 600, 700, 800, 900, and 1000 bp, respectively. (B) Poly(T) length distributions at the beginning of raw reads are plotted for T1–T7. Only T1 used an adaptor containing oligo(dT20) rather than oligo(dT10) for synthesis of the first-strand cDNA by reverse transcription. The percentage on the right is the proportion of reads with zero to three T’s in each trial.|
|Figure 3. Examples of genes that produced overwhelming numbers of noisy (A) and biased (B) reads in trial 2. (A) X. tropicalis c1orf52 gene had the highest number of noisy reads (81,938,432) produced because 11 internal nucleotides (red color) upstream of the amplified products (underlined; see NM_001015959.2) were identical to the 3′ end of the sequencing primer (Ion A Adaptor primer). (B) X. tropicalis ctsd gene had the highest number of biased reads (16,493,789) because it had 15 nucleotides highly similar to the 3′ end of the Ion P1 Adaptor with only one nucleotide mismatch (red color). The amplified product is underlined (see NM_203633.1). Reads from trial 2 (T2) and trial 7 (T7) are not proportionally visualized by the Integrative Genome Viewer (IGV) program.|
|Figure 4. Incomplete genome assembly (A), incomplete gene annotation (B), and missing data for WTTS-seq analysis. (A) Because of incomplete exon sequencing of the X. tropicalis tsg101 gene, the last exon region was not marked in the current genome assembly or in our merged data sets. A search of the NCBI database for the X. tropicalis tsg101 gene revealed a 1637-bp full-length mRNA sequence [NM_203935.1, including 60 bp of poly(A) tail] but only 1041 bp or 66% (94–706 and 1150–1577 bp) of this sequence aligned with the current genome assembly. Because the alignment cutoff criterion (80%) was not met for this gene, the Cuffmerge program did not replace XetroK02827 (681 bp in length) with the longer NCBI sequence. Therefore, the tsg101 gene was detected only by RNA-seq, even though WTTS reads were mapped to that region of the X. tropicalis genome. (B) The X. tropicalis crtc2 gene was not completely annotated and was missing the 3′-UTR sequence. Both RNA-seq and WTTS-seq reads provided clear evidence that this gene sequence can be extended another 920 to 6907 bp in length (File S7). In fact, an expressed sequence tag (EST) entry (CX401749.1) in the NCBI database with a poly(A) signal site (ATTAAA) and a poly(A) tail supports this unannotated 3′-UTR (File S7).|
|Figure 5. An example of artifactual reads produced for XetroG01729 because of poly(T) stretches in the u2af2 gene. (A) XetroG01729 and u2af2 overlaps visualized by IGV. The WTTS-seq library produced two clusters of reads (read 1 and read 2) with opposite directions. The RNA-seq library had reads that covered the entire exon. (B) Based on u2af2 mRNA sequences (NM_001016998.2), we postulated that these two clusters of WTTS-seq reads were potentially derived from one gene (u2af2) rather than from each of these overlapped genes (XetroG01729 and u2af2). That is, the read 1 cluster originated from poly(T) stretches, while the read 2 cluster was derived from poly(A) junction sites. However, strand mapping assigned the read 1 cluster (artifacts) to XetroG01729 without evidence from the RNA-seq library. (C) Potential mechanism involved in production of artifactual reads with poly(T) stretches.|
|Figure 6. Biological replicate test. Spearman’s rank correlations of WTTS-seq between estimated log2 counts in embryos collected from family A and family B at five developmental stages (6, 8, 11, 15, and 28) and RNA-seq between estimated log2 counts in embryos collected from family A and family B at three developmental stages (6, 8, and 11).|
|Figure 7. Comparisons of embryo transcriptome distributions at stages 6, 8, and 11 between WTTS-seq and RNA-seq data sets in two families (A and B). The solid black curves represent gene expression detected by WTTS-seq, and the blue dotted curves represent gene expression detected by RNA-seq.|
|Figure 8. APA patterns during embryo development are revealed by WTTS-seq but not by RNA-seq. Partial genomic region of X. tropicalis ubtd1 gene including the last two exons is shown. WTTS-seq revealed that the distal APA site was dominant at stage 6, but usage switched to the proximal site at stage 11. At stage 28, however, both sites were used equally. Unfortunately, RNA-seq failed to reveal any differences in usage of proximal or distal APA sites among these five stages. The poly(A) site signals were presented proportionally for each family at each stage but disproportionally among different stages.|
|Figure 9. Expression of short transcript is well detected by WTTS-seq but biased by RNA-seq. The X. tropicalis rpl34 gene has an mRNA sequence of 449 bp in the genome. WTTS-seq revealed that expression of rpl34 increased from stage 6 to stage 28 based on RPM values. However, rpl34 was not fully covered due to biases in RNA-seq libraries (see RPM values in the figure).|
|Figure 10. Expression patterns of overlapping genes are well detected by WTTS-seq but not by RNA-seq. Partial genome regions of X. tropicalis lamb1 and dld genes overlap in opposite directions. The WTTS-seq libraries produced at least two major clusters of reads also with opposite directions in the overlapping region. The blue reads were derived from lamb1 and the red reads from dld. Reads in the RNA-seq libraries covered the overlapping region, but there was no way to allocate them to each gene. Furthermore, RNA-seq mapping quality in the overlapping region was quite low (see reads pointed out with arrows).|
References [+] :
Bhargava, Quantitative transcriptomics using designed primer-based amplification. 2013, Pubmed