Duplicate gene evolution and expression in the wake of vertebrate allopolyploidization.
The mechanism by which duplicate genes originate - whether by duplication of a whole genome or of a genomic segment - influences their genetic fates. To study events that trigger duplicate gene persistence after whole genome duplication in vertebrates, we have analyzed molecular evolution and expression of hundreds of persistent duplicate gene pairs in allopolyploid clawed frogs (Xenopus and Silurana). We collected comparative data that allowed us to tease apart the molecular events that occurred soon after duplication from those that occurred later on. We also quantified expression profile divergence of hundreds of paralogs during development and in different tissues. Our analyses indicate that persistent duplicates generated by allopolyploidization are subjected to strong purifying selection soon after duplication. The level of purifying selection is relaxed compared to a singleton ortholog, but not significantly variable over a period spanning about 40 million years. Despite persistent functional constraints, however, analysis of paralogous expression profiles indicates that quantitative aspects of their expression diverged substantially during this period. These results offer clues into how vertebrate transcriptomes are sculpted in the wake of whole genome duplication (WGD), such as those that occurred in our early ancestors. That functional constraints were relaxed relative to a singleton ortholog but not significantly different in the early compared to the later stage of duplicate gene evolution suggests that the timescale for a return to pre-duplication levels is drawn out over tens of millions of years - beyond the age of these tetraploid species. Quantitative expression divergence can occur soon after WGD and with a magnitude that is not correlated with the rate of protein sequence divergence. On a coarse scale, quantitative expression divergence appears to be more prevalent than spatial and temporal expression divergence, and also faster or more frequent than other processes that operate at the protein level, such as some types of neofunctionalization.
PubMed ID: 18261230
PMC ID: PMC2275784
Article link: BMC Evol Biol.
Article Images: [+] show captions
|Figure 1. Phylogenetic and genealogical relationships of species and paralogs in this study. Phylogenetic relationships are depicted among species, orthologs, and paralogs of a diploid with 20 chromosomes, S. tropicalis (ST), two tetraploids with 40 chromosomes, S. epitropicalis (EP) and S. new tetraploid (NT), and four tetraploids with 36 chromosomes, Xenopus laevis (XL), X. borealis (XB), X. gilli (XG), and X. muelleri (XM). (A) Clawed frogs speciate by allopolyploidization and by regular speciation without a change in genome size. Allotetraploidization occurred independently in Xenopus and in Silurana and produced two paralogs in the resulting tetraploid ancestor – α and β – that are indicated as brown and green lineages respectively. After allopolyploidization, some of the diploid lineages probably went extinct, and this is indicated by a dagger. As a result of these extinctions, the portion of some paralogous lineages that evolved in a diploid, indicated as dashed lines, cannot be dissected apart from the portion that evolved in an allopolyploid. Numbered nodes indicate (0) divergence of the genera Xenopus and Silurana, (1) divergence of the diploid (2n = 18) ancestors of Xenopus, (2) allotetraploidization in Xenopus, (3) the first speciation event of the tetraploid ancestor of extant Xenopus, (4 and 5) more recent speciation events of Xenopus tetraploids, (6) divergence of the diploid (2n = 20) ancestors of Silurana, (7) allotetraploidization in Silurana, (8) speciation of a tetraploid Silurana without change in genome size. Sequences from individual paralogs were used to construct genealogies in order to compare (B) an early to a later stage of evolution after WGD in XLα, (C) an early to a later stage of evolution after WGD of EPα and (D) an intermediate to a later stage after WGD in XLα. Depending on the paralog for which data were obtained, sometimes NTα was considered in (C) or XBα was considered in (D).|
|Figure 2. Functional constraints are similar in early and later stages of duplicate gene evolution in X. laevis paralogs. (A) Binned Ka/Ks of early (blue) and later (red) stages of duplicate gene evolution. (B) Regression of Ka/Ks versus Ks in the early and later stages indicates that selection (relaxed purifying + positive) is not more common in the early stage of duplicate gene evolution (blue dots) than the later stage (red dots). The Y-intercept of these regression lines was set to zero and Ka/Ks ratios greater 2 (including undefined ratios) were given a value of 2. In (A) and (B), a dashed line indicates the neutral expectation. Fragments with Ka/Ks > 2 are, on average, half of the size of those with Ka/Ks < 2. Ka/Ks ratios above 2 may therefore be attributable in part to stochastic variance in Ks .|
|Figure 3. Expression of both paralogs is generally detected in the same treatments, irrespective of the probe specificity (the degree to which each probe matches one but not the other paralog) or the detection threshold (the minimum raw intensity scored as expressed). These data are based on (A) "Standard" and (B) "Conservative" threshold levels for detection of expression and three probe specificities were compared that are labeled low, medium, and high (see Methods). We report paralogous profiles whose presence/absence scores in all five treatments were identical in the medium and high specificity analysis (shaded in gray on the left of each chart). 1789 and 1462 genes had consistent present/absent expression profiles in the medium and high specificity analyses using the standard and conservative thresholds. These sets of genes included 841 and 632 paralogous pairs, respectively. The tables on the right compare paralogous profiles by tabulating whether they are both present and absent in the same treatments (identical), the expression profile of one overlaps entirely with the other (overlap), or paralogs in which each duplicate has a unique component (distinct).|
|Figure 4. Binned expression profile correlations between 841 pairs of paralogs over five developmental stages or adult tissue types in the medium specificity analysis. The proportion of Pearson correlation coefficients between non-paralogous expression profiles (white bars) and between paralogous expression profiles (black bars). Ninety percent of the non-paralogous expression profiles have a Pearson correlation coefficient that is greater than -0.861 but less than 0.865. The Pearson correlation coefficients of 62% of the paralogous expression profiles are less than 0.865, and 0.3% of them are less than -0.861.|