XB-ART-51804Cell Rep. January 26, 2016; 14 (3): 632-47.
Measuring Absolute RNA Copy Numbers at High Temporal Resolution Reveals Transcriptome Kinetics in Development.
Transcript regulation is essential for cell function, and misregulation can lead to disease. Despite technologies to survey the transcriptome, we lack a comprehensive understanding of transcript kinetics, which limits quantitative biology. This is an acute challenge in embryonic development, where rapid changes in gene expression dictate cell fate decisions. By ultra-high-frequency sampling of Xenopus embryos and absolute normalization of sequence reads, we present smooth gene expression trajectories in absolute transcript numbers. During a developmental period approximating the first 8 weeks of human gestation, transcript kinetics vary by eight orders of magnitude. Ordering genes by expression dynamics, we find that "temporal synexpression" predicts common gene function. Remarkably, a single parameter, the characteristic timescale, can classify transcript kinetics globally and distinguish genes regulating development from those involved in cellular metabolism. Overall, our analysis provides unprecedented insight into the reorganization of maternal and embryonic transcripts and redefines our ability to perform quantitative biology.
PubMed ID: 26774488
PMC ID: PMC4731879
Article link: Cell Rep.
Grant support: R01 GM099149 NIGMS NIH HHS , P40 OD010997 NIH HHS , R01 HD073179 NICHD NIH HHS , 5R01GM099149 NIGMS NIH HHS , Medical Research Council , Cancer Research UK, R01 GM099149 NIGMS NIH HHS , P40 OD010997 NIH HHS , R01 HD073179 NICHD NIH HHS , 5R01GM099149 NIGMS NIH HHS , Medical Research Council , Cancer Research UK, R01 GM099149 NIGMS NIH HHS , P40 OD010997 NIH HHS , R01 HD073179 NICHD NIH HHS , 5R01GM099149 NIGMS NIH HHS , Medical Research Council , Cancer Research UK, R01 GM099149 NIGMS NIH HHS , P40 OD010997 NIH HHS
Genes referenced: admp bicd2 cirbp ckb dhx32 eef1a1 eef1a1o foxh1.2 gapdh gnao1 klf17 klf17.2 krt5.5 krt5.7 lima1 mixer nek2 nodal3.1 nodal3.2 nodal3.3 nodal3.4 nodal5 nodal5.3 nodal5.4 nodal5.5 nodal6 odc1 sia1 sia2 ventx3.1 ventx3.2
Article Images: [+] show captions
|Isoform Differential Expression in Clutch A, see also Fig S5 A – Isoform switching event examples: maternal to maternal; maternal to zygotic; and zygotic to zygotic switching events (left). Histogram of isoform switching events by category and time (right). B – bicd2 isoforms switch in PolyA+ data (circles), and do not switch in rdRNA data (squares) indicating differential polyadenylation. PolyA+ and rdRNA abundances agree within uncertainty of the absolute normalization (Supplemental Experimental Procedures). C – Normalized read depth on bicd2 locus in Clutch A polyA+ between 0–4.5 hpf. Note temporal dynamics shared by the final three bicd2(1) exons. Heatmap and depth of pile ups corrected for changing total mRNA. D – Spatial expression of three genes with isoforms with differential dynamics. All isoforms show different temporal and spatial expression domains. Grey lines and points mark total expression for each gene (sum of the red and blue isoforms).|
|Transcript copy numbers and kinetics per embryo in Clutch A A – Transcript copy number histogram in polyA+ sequencing at 0.0 (egg), 4.5 hpf (early stage 9), 10 hpf (stage 12.5, end of gastrulation), 66 hpf (tadpole stage 42). Detection limits give number of transcripts required to generate a single read averaged over all transcript lengths (Fig S2E). Transcript copy number distributions transition smoothly between these timepoints (data not shown). B – Histogram of maximum rates of increase in transcripts/hour between any consecutive measurement points calculated from Gaussian process medians for each gene. Blue – PolyA+ 0–23.5 hpf; Red – rdRNA 0–23.5 hpf; Green – PolyA+ 24–66 hpf. Vertical lines mark the gene with max. rate of increase for each category. C – As B but kb/hour. D – Maximum rate of increase in transcripts/hour (left) and kb/hour (right) for 0–3.5 hpf and 4–23.5 hpf time intervals. PolyA+ and rdRNA distributions are discrepant for 0–3.5 hpf reflecting polyadenylation of maternal RNAs. Distributions are identical over 4–23.5 hpf.|
|Accumulation kinetics of pri-mir427 per allele in Clutch A rdRNA, see also Fig S7 A – pri-mir427 expression over first 12 hpf in rdRNA data and in situ hybridizations. pri-mir427 show ubiquitous expression matching RNA-seq temporal data. Examples of most abundant transcripts shown for comparison. Inset: first detection of pri-mir427 above detection limit at 2.0 hpf (8–16 cell transition). B – pri-mir427 abundance in kb per cell. Line marks Gaussian process median, error bar is 95% CI (Supplemental Experimental Procedures). Peak per cell abundance occurs after 11th division at 4.4 hpf. C – pri-mir427 accumulation rate in kb/min/allele with time. Line is median of differential of Gaussian Process, error bar gives 95% CI (Supplemental Experimental Procedures). Peak accumulation rate is achieved at 4.2 hpf just after 10th cell division. D – Organization of pri-mir427 locus on X. tropicalis Chromosome 3 (X. tropicalis v8 genome, Supplemental Experimental Procedures). Clusters of mir427 hairpins appear in tandem with a repeated sequence arranged symmetrically on opposite strands around a gap in the genome assembly.|
|Fig S1 – RNA-seq data quality, related to Fig 1, Supplemental Experimental Procedures (A) Spearman correlation between all pairs of all samples for relative normalized TPM abundances. Neighboring samples are as correlated as replicates. (B) Scatter plots of all TPM abundances for all samples. Color indicates density of points. Horizontal and vertical strips indicate measurements only present in one sample. (C) RNA standard performance. Scatter plots compare RNA standard TPM abundances between all pairs of datasets. Color indicates spike-in species. Red boxes mark ArrayControl spikes present only in Clutch B PolyA+. Although, we find small numbers of reads that align uniquely to ArrayControl sequences when ArrayControl spikes are not present in the sample. We note that this is not true for ERCC spikes, we never find ERCC reads when ERCC spikes are not present (data not shown). ERCC PolyA+ and rdRNA comparisons lie off the diagonal indicating poor PolyA+ performance. ERCC-00116 has the worst PolyA+ performance and is not used in absolute normalization. (D) Identification of poorly performing libraries. Normalized leave-one-out score (Supplemental Experimental Procedures), higher scores indicate a poorly performing sample. Samples adjacent to a bad sample can also receive high scores. Red and blue samples in Clutch A PolyA+ are selected for repeat library construction and sequencing (E). Red samples excluded from all analysis. (E) Technical replicates in Clutch A PolyA+ of poorly performing samples reveal that the poor performance is highly reproducible indicating that performance issues are not due to library construction and sequencing. Red boxes marked those discarded, black boxes marked those retained. (F) Projections onto the first two principle components for Clutch A PolyA+ (left), Clutch B PolyA+ (center), and Clutch A rdRNA (right). Note strong temporal correlation between samples, major changes in the transcriptome are visible.|
|Fig S2 – Statistics of Absolute Normalization, related to Fig 1, Supplemental Experimental Procedures (A) Absolute normalization of RNA standards. Left: relative normalization in TPM on linear and log scale for the three datasets. The trend shared by all spikes is clearly visible and differs between PolyA+ and rdRNA samples. Right: absolute normalization in transcripts per embryo on log and linear scale. Trend in spikes is removed, the sample noise of the spikes is retained and does not contaminate the absolute normalization of native transcripts. (B) Absolute normalization factors. Circles mark per sample factors, lines mark Gaussian Process smoothed factors. Left: Estimates of 𝛽𝑗 (Supplemental Experimental Procedures), for three datasets. Center-left: per dataset correction factors e−𝛽𝑗, note discrepancy in magnitude but not in trend between datasets. Center-right: correction for PolyA+ bias, and averaging Clutch A and Clutch B spike levels results in excellent agreement between the correction factors. Right: Smoothed correction factors calculated with non-stationary Gaussian Process (Supplemental Experimental Procedures). (C) Absolute normalization consistency and accuracy. Scatter plots of ERCC standards spiked vs calculated for Clutch A PolyA+, Clutch A rdRNA, Clutch B PolyA+ and Clutch A/Clutch B PolyA+ combined. Points are averaged over all samples in time to remove spikein sample noise. ArrayControl spikes are given with fold change errors for Clutch B PolyA+, fold changes range from 1.11-1.25. ArrayControl spikes EC02 and EC12 have are spiked in with the same copy number and are marked with the same vertical line.Combined Clutch A/Clutch B gives 95% confidence interval for a linear model with Gaussan noise, with slope unity and with standard deviation 𝜎𝑠 = 0.25. Note that residuals do not depend on expression level. (D) Absolute normalization uncertainty model. Left: model as described in C (far-right), with 25%, 50%, 75%, 95% confidence intervals for 106 transcripts. Right: Propagation of uncertainty model to absolute normalization of eef1a1o Clutch A polyA+, again 25%, 50% 75% and 95% of confidence intervals for the true number of eef1a1o transcripts/embryo marked. (E) RNA-seq detection limits – Transcript abundance required to produce a single read with time, detection limits increase as RNA content in the embryo increases. Top row: Detection limits in kb, points give the per sample detection limits, lines and shaded area give Gaussian Process median and 95% confidence interval. Bottom: Detection limit in transcripts/embryo, calculated by averaging kb Gaussian Process distribution over all transcript lengths. (F) The depletion of non-ribosomal RNAs in rdRNA sequencing. Comparisons between Clutch A PolyA+ and rdRNA sequencing (top). Examples of extreme depletion: mixer and foxh1.2.|
|Fig S3 – Spatial expression of selected genes by in situ, related to Fig 1 All panels give Clutch A PolyA+ (circles), Clutch B PolyA+ (triangles) and Gaussian Process median line and 95% confidence intervals. Numbers on in situ images indicate the Gaussian process median number of transcripts for the given stage. (A) Genes involved in muscle induction. (B) Genes with onset midway through the timecourse. (C) Genes with onset late in the timecourse.|
|Fig S5 – Differential Isoform Dynamics in Clutch A polyA+, related to Fig 3. (A) Explanation and detection of differential isoform dynamics. Left: isoform expression in transcripts/embryo. Center: Joint model of all isoforms assumed to have the same expression on z-normalized scale (see Supplemental Experimental Methods). Right: Models of all isoforms considered separately. If Bayesian Information Criterion (BIC) favors the joint model, the gene has differential abundance, whereas if the BIC favors the individual models a gene has differential dynamics. eef1a1 has differential abundance, nek2 and lima1 have differential dynamics. (B) Differential dynamics of groups of isoforms, BIC graph (left) between all lima1 isoforms. lima2(1) and lima2(2) have differential abundance, and they have differential dynamics with lima2(3) defining two groups of isoforms (right).|
|Fig S6 – Temporal synexpression and gene ontology in Clutch A PolyA+, related to Fig 4. (A) Sliding window gene ontology enrichment. Bottom: Heatmap of all genes (Fig 4) arranged by similarity. Middle: The location of all transcripts factors, blue line marks the number of transcription factors per 500 gene window. Red line marks the expected number of transcripts for any 500 gene window. Top: -log(p) chi-squared test for the association between the window position and the transcription factor density. A region that exceeds –log(p) > 5 is considered for annotation (Supplemental Experimental Procedures). (B) Somite Synexpression. Examples from the somite temporal synexpresssion group. Ranks give similarity to ckb between 34-66 hpf (Supplemental Experimental Procedures). Image source given at bottom.|
|Fig S7 – Early transcription events, related to Fig 6. (A) Stage 9 RNA pol II ChIP - Pile ups on mir427, eef1a1o, eef1a1, krt5.7 and cirbp loci of RNA pol II ChIP-seq reads (van Heeringen et al., 2014). See Supplemental Experimental Procedures for calculation of mir427 pile up. (B) Accumulation rates in kb/min/allele averaged over all cells in the embryo – for nodal3, nodal5, nodal6, sia1, sia2, klf17, admp and ventx3.1. Left column: kb/embryo. Center column: Detection of activation of each gene during the cleavage stages. Note square root vertical scale. Vertical gray dashed lines indicate cell divisions. Horizontal dashed line gives average detection limit of 970kb for the cleavage stages. Right column: Accumulation rate in kb/min/allele, rates account for exponential increase in cell numbers and are correct if each transcript were transcribed in all cells (Supplemental Experimental Procedures).|