January 1, 2017;
Regulatory remodeling in the allo-tetraploid frog Xenopus laevis.
BACKGROUND: Genome duplication has played a pivotal role in the evolution of many eukaryotic lineages, including the vertebrates. A relatively recent vertebrate genome duplication is that in Xenopus laevis, which resulted from the hybridization of two closely related species about 17 million years ago. However, little is known about the consequences of this duplication at the level of the genome, the epigenome, and gene expression.
RESULTS: The X. laevis genome consists of two subgenomes, referred to as L (long chromosomes) and S (short chromosomes), that originated from distinct diploid progenitors. Of the parental subgenomes, S chromosomes have degraded faster than L chromosomes from the point of genome duplication until the present day. Deletions appear to have the largest effect on pseudogene formation and loss of regulatory regions. Deleted regions are enriched for long DNA repeats and the flanking regions have high alignment scores, suggesting that non-allelic homologous recombination has played a significant role in the loss of DNA. To assess innovations in the X. laevis subgenomes we examined p300
-bound enhancer peaks that are unique to one subgenome and absent from X. tropicalis. A large majority of new enhancers comprise transposable elements. Finally, to dissect early and late events following interspecific hybridization, we examined the epigenome and the enhancer landscape in X. tropicalis × X. laevis hybrid embryos. Strikingly, young X. tropicalis DNA transposons are derepressed and recruit p300
in hybrid embryos.
CONCLUSIONS: The results show that erosion of X. laevis genes and functional regulatory elements is associated with repeats and non-allelic homologous recombination and furthermore that young repeats have also contributed to the p300
-bound regulatory landscape following hybridization and whole-genome duplication.
[+] show captions
References [+] :
Fig. 1. Alignment of a region on chromosome 8 in X. tropicalis and the X. laevis L and S subgenomes annotated with experimental ChIP-seq data (gastrula-stage embryos; NF stage 10.5). Shown are the gene annotation (black), repeats (gray), ChIP-seq profiles for H3K4me3 (green), p300 (yellow), RNA Polymerase II (RNAPII; brown), and H3K36me3 (dark green). The sequence conservation is indicated by gray lines. Conserved H3K4me3 and p300 peaks are denoted by green and yellow lines, respectively. The anp32e gene is expressed in X. tropicalis and both the L and S subgenome of X. laevis. The plekho1 gene, on the other hand, has lost promoter and enhancer activity on the X. laevis S locus and shows no experimental evidence of being expressed
Scatterplot of the expression level (log2 TPM) of L and S homeologs that are both expressed. The expression level of homeolog genes is generally similar (Pearson R = 0.60, p < 1e-300). b Fraction of epigenetic signals (“peaks”) conserved in X. laevis compared to X. tropicalis. Promoters appear more conserved than enhancers; S has lost more epigenetic elements than L. c Active functional elements are equally conserved between L and S compared to X. tropicalis. The background level of sequence conservation in fourfold degenerate sites from coding sequences with respect to X. tropicalis is 78.4% in L and 77.7% in S
Fig. 3. The S subgenome has more and larger deletions than L. a Size frequency distribution of deletions (top) and size ratio of LΔS deletions relative to SΔL deletions as a function of deletion size (bottom). b An example of a gene (grlx2) that has lost the promoter on the S genome due to a deletion. Shown are the gene annotation (black), ChIP-seq profiles for H3K4me3 (green), RNAPII (brown), and H3K36me3 (dark green). The sequence conservation is indicated by gray lines. c The log2 fold difference between the observed number of deleted basepairs and the expected number (mean of 1000 randomizations). The fold difference is calculated per chromosome and summarized in a boxplot. Intergenic 1 kb distance from a gene, Intronic introns, Exonic UTRs + CDS, IntronicTx introns from genes actively transcribed, ExonicTx exons from genes actively transcribed, p300 genomic fragments having a p300 peak, H3K4me3 genomic fragments having a H3K4me3 peak. The asterisks mark significant differences between the L and S chromosomes (p < 0.001, Mann–Whitney U test). d Retained regions associated with deletions are enriched for relatively long repeats (p < 1e-52 for both LΔS and SΔL; Mann–Whitney U test). e 1 kb flanks of the retained regions are more similar to each other than random genomic regions of the same size (p < 1e-114 and 1e-83 for LΔS and SΔL, respectively; Mann–Whitney U test)
Fig. 4. The S subgenome has a higher mutation rate than L. Only genes which none of the L or S copies fall into the pseudogene category are considered. a Ks distribution per subgenome in X. laevis. b Ka/Ks in X. laevis and X. tropicalis
Fig. 5. Pseudogenization rate has increased after hybridization. a Number of likely pseudogenes (i.e. genes having one or more pseudogene feature and no expression while their homeolog is expressed) binned by predicted date of pseudogenization event. b Pseudogenes with different (non-exclusive) pseudogene features and their sum over the years. c
Left: fraction of genes that have a nonsense variant in the population. Right: fraction of mutations in coding regions that introduce a premature stop codon. d Expression of genes with and without a nonsense variant present in the population. e Distribution of predicted pseudogenization time (including one-to-one orthologs of human, mouse, and chicken) for genes with a single pseudogene feature and a tenfold lower expression than the homeolog (top), for genes with a nonsense variant present in the population of X. laevis (middle) and for genes that do not present any feature for pseudogenization and whose expression is less than twofold different between homeologs (bottom)
Fig. 6. Subgenome-specific recruitment of p300 is associated with TEs. Subgenome-specific p300 peaks are enriched for TEs carrying transcription factor (TF) motifs active in early development. a Differential regulation of the slc2a2 homeologs at stage 10.5. Shown are the genomic profiles of H3K4me3 (green), RNA Polymerase II (RNAPII; purple), H3K36me3 (blue), and p300 (yellow) ChIP-seq tracks, as well as DNA methylation levels determined by WGBS (gray). The top panel shows slc2a2.L, which is highly expressed, as evidenced by RNAPII and H3K36me3, and has a number of active enhancers (a–g), while slc2a2.S, shown in the bottom panel, is expressed at a lower rate. The conservation between the L and S genomic sequence is shown in gray between the panels. Differential enhancers between L and S are highlighted in yellow, which illustrates lost enhancer function (a, b), conserved enhancer function (c–e), and deleted enhancers (f, g). b Subgenome-specific p300 peaks are associated with DNA transposon repeats (threshold p ≤ 10e-4, twofold enrichment compared to all X. laevis peaks and present at least in 15% of the peaks). The barplots show the frequency of occurrence of each of the three repeat types per megabase in the three (sub)genomes. Over the bars is represented the percentage of subgenome-specific peaks overlapping with the corresponding repeat. c TF found to be enriched in the subgenome-specific p300 peaks (threshold p ≤ 10e-4, threefold enrichment compared to all X. laevis peaks and present at least in 20% of the peaks)
a Changes in p300 recruitment in LETS hybrids. In the X. tropicalis genome there are new hybridization-induced peaks as well as peaks that disappeared after hybridization. In the X. laevis genome there are no changes. b Newly introduced peaks appear to be repressed by H3K9me3 in X. tropicalis embryos. c
Bottom: a significant number of hybrid-specific peaks are associated with DNA transposon repeats (threshold p ≤ 10e-6, > 20 times fold enrichment compared to all X. tropicalis peaks and present at least in 10% of the peaks). Top: the bar plots show the frequency of occurrence of Motif:lcl|rnd-1_family-451_DNA, Motif:rnd-1_family-203 and Motif:lcl|rnd-1_family-189_DNA_PiggyBac repeats per megabase in the three (sub)genomes. Those repeats are X. tropicalis-specific, as they occur more often compared to X. laevis genomes. d Profiles of X. tropicalis embryos p300 and LETS hybrid p300 in X. tropicalis hybridization-induced peaks loci. New peaks overlap with DNA transposon repeats. e Newly introduced peaks found to be enriched in TF DNA binding sites (threshold p ≤ 10e-6, fivefold enrichment compared to all X. tropicalis peaks and present at least in 10% of the peaks). The TFs that can bind these motifs include Homeobox factors, C2H2 Zinc finger proteins (CTCF, ZNF232), PAX4, TERF, and T-box factors. The AATC motif, marked by an asterisk, is annotated in TRANSFAC as a GATA1 motif, but closely resembles a Paired Homeobox consensus motif. f DMRs in hybrid embryos. g DNA methylation profiles showing the DNA methylation instability in LETS hybrids
FastML: a web server for probabilistic reconstruction of ancestral sequences.