Larger Image

EDF 8. GENE EXPRESSION ANALYSIS Pairwise Pearson correlation distributions between homeologous genes (red) and all genes (blue). The left histogram is for stage data; right is for adult data. The x-axis is the correlation; the y-axis is the percent of data. The homeologous genes have a correlation distribution closer to one due to their being the same locus recently. X. laevis TPM values 0.5 were lowered to 0. Any gene with no TPM > 0 was removed from analysis. We then added 0.1 to all TPM values and log transformed (log10).Scatterplot comparing binned genes by their median X. tropicalis expression64 to the retention rate of their X. laevis (co)-orthologs. Error bars are the standard deviation for the whole data set divided by the square root of the number of genes analyzed in a bin. We assessed significance by a Wilcoxon test of the homeologous and singleton distributions, p-value = 6.31E-113.Complete boxplot shown in Fig. 4c. The difference between subgenomes is difficult to see at this magnification, illustrating that many loci deviate from the whole genome median of preferring the L homeolog. There are some L outliers expressed 104 as much as their S homeologs, whereas no S genes shows such a strong trend. These differences are discussed in more detail in Supplemental Note 12.Boxplot of 4DTv (four-fold degenerate transversions) by homeolog class defined in Supplemental Note 12.4. Significant differences are marked by a red asterisk (Wilcoxon p<1E-5). HCSE group shows lower sequence change than others (p=3.7E-12) and the NCDE group shows high rates of sequence change (p=5.6E-14).Boxplot of CDS length difference between X. laevis homeologs by homeolog class defined in Supplemental Note 12.4. Significant differences are marked by a red asterisk (Wilcoxon p<1E-5). HCSE group shows smaller CDS length differences than others (p=2.4E-13) and the NCDE group shows large differences in homeolog CDS length (p=2.1E-32).Boxplot of Ka/Ks between X. laevis homeologs by homeolog class defined in Supplemental Note 12.4. Significant differences are marked by a red asterisk (t-test p<1E-5). HCSE group shows lower non-synonymous sequence change than others (p=8.2E-19) and the NCDE and NCSE groups shows higher rates of non-synonymous sequence change (p=2.0E-12 and p=7.0E-9 respectively).RNA-seq analysis of six6.L (red) and six6.S (blue) during X. laevis development (left panel) and in the adult tissues (right panel). Expression levels of six6.S were lower than those of six6.L at most developmental stages and in adult tissues.Diagram of Homo sapiens, X. tropicalis and X. laevis six6 loci (upper panel). Magenta and black boxes indicate CNEs and exons, respectively. The phylogenetic tree analyses of H. sapiens, X. tropicalis and X. laevis six6 CNEs (lower left panel), and Six6 proteins (lower right panel). Notably, six6.S is more diverged from X. tropicalis six6 than six6.L, both in the encoded protein sequences and in conserved non-coding elements (CNEs) within 3 kb from the transcription start sites. Materials, methods and the CNE locations on genome assemblies are described in Supplementary Materials (Supplementary Note 13.1).On the basis of chromatin state properties, a Random Forest machine-learning algorithm can accurately predict L versus S expression bias. The classification is based on all genes with greater than 3-fold expression difference at NF stage 10.5 (a set of 1,129 genes). The mean (dotted black line) of the ROC area under the curve is 0.778 (10-fold cross-validation). Features were selected using Linear Support Vector Classification and are shown in Extended Data Fig. 8j.Relative importance (based on Gini impurity) of selected features used in the Random Forest classification. All features used in the classification are shown. Among various variables, the ratios of H3K4me3 and DNA methylation at the promoter contributed most to the decision tree model. A difference in p300 binding in the genomic region surrounding the gene also contributed to the Random Forest classification, as did the presence or absence of a number of specific transcription factor motifs in the promoter.

Image published in: Session AM et al. (2016)

Image downloaded from an Open Access article in PubMed Central. Image reproduced on Xenbase with permission of the publisher and the copyright holder.

Permanent Image Page
Printer Friendly View

XB-IMG-153338