|
Fig 1. Only 52 of the 256 possible genotypes between PfCRT3D7 and the CQ-transporting PfCRTDd2 isoform have measured phenotypes.A) The table shows the amino acid residue differences between the wild-type (PfCRT3D7, 00000000) and mutant (PfCRTDd2, 11111111) isoforms of PfCRT. The network shows the complete set of genotypes between PfCRT3D7 and PfCRTDd2. Each node represents a different genotype with unique number and combination of the 8 amino acid residue differences between the two isoforms. Each edge connects genotypes that differ by a single mutation. Genotypes are in sorted combinatorial order, left to right, such that the second row contains 1000000, 0100000,â¦, 00000010, 00000001, the third row contains 11000000, 10100000, â¦, 00000101, 00000011, the fourth row contains 11100000, 11010000, â¦, 00001011, 00000111, etc. The gray nodes indicate PfCRT genotypes that have not had their CQ transport activities measured. The colors of the remaining nodes indicate experimentally determined CQ transport activities relative to the activity of PfCRTDd2 [24]. Values range from <5% (blue) to 130% (red). In addition to PfCRT3D7 and PfCRTDd2, the names and binary codes of five other field isoforms of PfCRTââ106/1â, âGB4â, âK1â, â783â, and âChina eââare indicated. B) One possible evolutionary trajectory from PfCRT3D7 to PfCRTDd2 that passes through only measured phenotypes. The mutations at each step are indicated next to the relevant edge, along with the effect on CQ transport activity. This trajectory passes through the PfCRTChina e and PfCRTK1 isoforms. Five of the eight steps increase CQ transport activity, two have no effect, and the final step causes a decrease.
|
|
Fig 2. Models can be trained on measured phenotypes to predict uncharacterized phenotypes.The panels show the observed phenotype (Pobs) versus the predicted phenotype (Pmodel) for different input data and models. The vertical error bars represent experimental uncertainty (95% confidence interval on the mean) in the observed phenotypes. The resulting R2 value is shown within each plot and the dashed line depicts a 1:1 relationship (i.e. perfect agreement). The top row (panels A-E) shows the quality of fit for the training dataset (the 52 published phenotypes [24]) and the bottom row (panels F-J) shows the quality of fit for the test dataset (24 newly-measured phenotypes). The models are arranged from left to right in order of increasing sophistication: additive (A,F), additive+classifier (B,G), additive+classifier+nonlinear (C,H), additive+classifier+nonlinear+pairwise epistasis (D,I), and additive+classifier+nonlinear+all epistatic orders (E,J). Symbol colors denote the probability a genotype belongs to either the CQ-transporter (red) or the non-CQ-transporter (blue) class. Gray symbols (panels A, F) show the additive model, prior to the application of a classifier. The blue arrows in panel A indicate data points for which the observed phenotype is zero and the predicted phenotype is nonzero. The red arrow indicates data points for which the observed phenotype is nonzero. The yellow line in panel B shows the spline fit to the data.
|
|
Fig 3. Experimental characterization of the 24 combinatorial variants.(A) The level of PfCRT protein in the oocyte membrane was semi-quantified using an established western blot approach [24]. The analysis included PfCRTDd2 as a positive control, to which the other band intensity values were normalized. Sample loading and transfer of the proteins were evaluated by total protein staining of the nitrocellulose membrane. The data are the mean + SEM of at least three independent experiments (performed using oocytes from different frogs), within which measurements were averaged from two independent replicates. There were no significant differences in expression levels between the 24 variants of PfCRT (P > 0.05; one-way ANOVA). That is, all of the PfCRT variants were present at similar levels in the oocyte membrane and any differences in CQ transport activity between these proteins are thus attributable to the mutations they carry, rather than differences in expression levels. (B) [3H]CQ transport was measured at pH 5.5 and in the presence of 15 μM unlabeled CQ. The direction of CQ transport in oocytes is equivalent to the direction of CQ transport in the malaria parasite (S1 Fig). The PfCRT-mediated component of CQ transport was calculated by subtracting the level of CQ accumulation detected in non-expressing oocytes (the negative control) from that measured in oocytes expressing a PfCRT variant. The rate of PfCRT-mediated transport was then expressed relative to that measured in oocytes expressing PfCRTDd2 (the positive control). The data are the mean + SEM of at least three independent experiments (performed using oocytes from different frogs), within which measurements were made from 10 oocytes per treatment. The asterisks denote a significant difference in CQ transport between oocytes expressing PfCRT3D7 (the wild-type isoform) and oocytes expressing another variant of PfCRT: *, P < 0.05; ***, P < 0.001 (one-way ANOVA or Studentâs t-test). In both panels, the data for IEKSESII (i.e. the â106/1â isoform of PfCRT) was taken from Richards et al., 2016 [75]. The presence at the oocyte plasma membrane of the variants of PfCRT lacking significant CQ transport activity was confirmed via immunofluorescence assay (S2 Fig, S3 Fig).
|
|
Fig 4. The additive+classifier+nonlinear model captures the most variation without over-fitting.In all panels, the lines denote R2train (black) and R2test (red). In panels A and B, the complexity of the model increases from left to right: additive model, plus classifier, plus a nonlinear spline, plus pairwise epistasis, and plus high-order epistasis. A) R2train was calculated from the 52 phenotypes used to train the model and R2test was calculated from the 24 newly-measured phenotypes. B) R2train was calculated for pseudoreplicate training sets of 52 genotypes sampled from the 76 characterized genotypes, with R2test calculated from the matched 24-genotype test datasets. C) The means of R2train and R2test converge as the number of observations in the pseudoreplicate training sets increase, reaching a plateau at ~60 genotypes (calculated with the additive+classifier+nonlinear model).
|
|
Fig 5. Epistasis as uncertainty.A) Schematic of a generic, partially characterized map. The nodes represent genotypes, some of which have been measured (blue) whereas others have not (white). Lines represent single point mutations. Given these observations, we measure the effect of âmutation 1â in five different backgrounds (red arrows) and can thus calculate the mean and variance in its effect across the map (â©Î²1⪠and Ï1). B) R2test versus the average number of times each mutation is seen in sampled genotype-phenotype maps with epistasis responsible for 10% (blue) to 60% (brown) of the variation in the maps. Points indicate where the R2test is within 5% of the maximum predictive power of the additive model. C) A calibration curve indicating how many times, on average, one must observe each mutation in a map to resolve the additive coefficients with different fractions of epistasis.
|
|
Fig 6. CQ transport activities increases across the genotype-phenotype map.A) The complete genotype-phenotype map with 76 measured phenotypes and 180 predicted phenotypes. As in Fig 1A, each node represents a genotype and each edge a mutation. The nodes outlined in black are the 52 genotypes that were measured by Summers et al., 2014 [24]; the genotypes measured in this study are outlined in yellow. B) Levels of CQ transport activity moving from PfCRT3D7 (no mutations) to PfCRTDd2 (eight mutations). Each data point represents a genotype; the line is the mean across genotypes that have the indicated number of mutations. C) The contribution of each mutation to the classifier, where the x-axis is the probability that the mutation is found in genotypes that transport CQ. D) The complete genotype-phenotype map estimated using the simple K76T classifier. The nodes outlined in yellow indicate phenotypes that are predicted to have <5% CQ transport activity with this classifier, but not in the initial classifier from panel A. E) Predicted CQ transport activity as a function of the number of mutations calculated using the simple K76T classifier.
|
|
Fig 7. Selection for increased CQ transport activity strongly constrains trajectories through the map.A) Calculated trajectories through the complete genotype-phenotype map. The edges indicate the probability of a mutation over all possible trajectories from low (thin lines) to high (thick lines). The nodes indicate CQ transport activity as defined in Fig 1. B) Six example trajectories through the map between N75E,K76T-PfCRT3D7 and PfCRTDd2. C) The rank-order and mutational routes of the six trajectories shown in panel B. The Ptraj value indicates the relative probability of the trajectory. Panels D,E show CQ transport activity as a function of mutation step for the trajectories presented in panels B,C.
|
|
Fig 8. A predictive, additive model can be trained on a large genotype-phenotype map.A) Summary of the genotype-phenotype map [55].The map consists of 23 sites, each with four bases with the frequency at each site shown in the sequence logo. The total map has 7 Ã 1013 genotypes; the phenotypes of 59,394 genotypes have been measured. B) Raw Pobs vs. Padd plot for the map. Each point is a genotype. The fit residuals are shown below the main plot. We fit an 5th-order spline to linearize the map (red curve). C) The linearized form of the map, with epistasis removed using the spline shown in panel B. D) A predictive model can be trained using approximately 4,000 genotypes. The bottom x-axis shows the number of unique genotypes used to train the model (sampled randomly); the top x-axis shows the fewest number of times any mutation was seen in that sample. R2test was measured against the remaining 50,000+ genotypes not used to train the model.
|