
Figure 1. Theoretical models of how gRNAspecific efficiencies and frameshift gene editing outcome probabilities influence the cellular composition and percentage of protein knockout cells in a mosaic F0 animal model. (A) There is a nonlinear relationship between gRNAspecific probability of obtaining a frameshift gene editing outcome (xaxis) and the probability of obtaining a biallelic frameshift gene editing outcome in a single cell (yaxis). E.g. upon a gRNAspecific frameshift frequency of 80%, the probability of a single biallelic edited cells to be biallelic frameshift mutant is 64% (0.80*0.80). (Grey demarcation). (B) Examples of theoretical outcomes of gene editing (presuming 100% ontarget efficiency) in an F0 mosaic varying one parameter: gRNAspecific probability of frameshift editing. (C) Examples of theoretical outcomes of gene editing in an F0 mosaic varying two parameters: gRNAspecific probability of frameshift editing and gRNAspecific ontarget efficiency. E.g. for a 100% efficient gRNA with an 80% gRNAspecific probability of frameshift editing, we expect 64% of the cells to be biallelic frameshift mutant (see grey demarcation in A). Please note, blue circles represent cells that are biallelic gene edited, but retain at least one inframe mutation and cannot be considered complete protein knockout. (D) Flowchart representing the pipeline for investigating the correlations between experimentally observed in vivo gene editing outcomes and gene editing outcomes projected by computational prediction models.


Figure 2. The InDelphi prediction model, trained in mESC cells, accurately predicts CRISPR/Cas9 gene editing outcomes and outperforms several other prediction models in X. tropicalis embryos. (A) Scatter plot with modelpredicted cumulative frameshift gene editing frequencies correlated to experimentally observed cumulative frameshift gene editing frequencies, for each sgRNA (n = 28) separately, in X. tropicalis embryos. Black demarcated lines show the perfect correlation r = 1. Lightgrey shows the standard error of the bestfit linear regression line. (B) Scatter plot with modelpredicted INDEL patterns correlated to experimentally observed INDEL patterns, for all gRNAs simultaneously. Black lines show linear regression models of all correlations. Black demarcated lines show the perfect correlation r = 1. (C) Correlations between modelpredicted and experimentally observed INDEL patterns, for each gRNA separately. Error bars represent mean ± SD. (***p < 0.001; **p < 0.01; *p < 0.05; ns = not significant; Shapiro–Wilk (p > 0.05); Levene (p < 0.05); Oneway Welsh ANOVA to adjust for unequal variances (p < 0.001), with GamesHowell multiple comparisons) (Table S2). (D) Violin plots of the residuals (predicted frequency—observed frequency) between modelpredicted and experimentally observed frequency of + 1 insertion gene editing outcome. (E) The SEM of the mean residual difference (predicted frequency—observed frequency) between modelpredicted and experimentally observed frequency of all deletion variants modeled.


Figure 3. The InDelphimESC model accurately predicts CRISPR/Cas9 gene editing outcomes in X. tropicalis, X. laevis and zebrafish embryos which can be exploited to identify highframeshift frequency gRNAs. (A–F) Scatter plot with InDelphimESCpredicted cumulative frameshift gene editing frequencies correlated to experimentally observed cumulative frameshift gene editing frequencies, for each sgRNA separately, in X. tropicalis (n = 14) (Panel A), in X. laevis (n = 6) (Panel B) and in zebrafish (n = 15) embryos (Panel C). Scatter plot with InDelphimESCpredicted INDEL patterns correlated to experimentally observed INDEL patterns, for all gRNAs simultaneous, in X. tropicalis (n = 14) (Panel D), in X. laevis (n = 6) (Panel E) and zebrafish (n = 15) (Panel F) embryos. Black demarcated lines show the perfect correlation r = 1. Lightgrey areas show the standard error on the bestfit linear regression line. Black lines show linear regression model. (G) Correlations between modelpredicted INDEL patterns to experimentally observed INDEL patterns, for each gRNA separately. Correlations for X. tropicalis embryos (n = 14) (dark blue) and X. laevis embryos (n = 6) (middle blue) analyzed by Sanger sequencing and sequence trace decomposition. Correlations for zebrafish embryos analyzed by targeted amplicon sequencing (TAS) (n = 15) (light blue). (H) Using the distribution of the expected probability of frameshift frequency for a large dataset of SpCas9 human target sites in mESC cells from Shen et al. 2018 (black line—monoallelic)27, we draw the derivative distribution of the probability of a randomly designed gRNA to generate biallelic frameshift editing. This distribution is shown for different editing efficiencies within the F0 mosaic animal: 100%, 50% and 25% (in reducing intensities of blue—100 circles, each circle represents a cell within a total mosaic of a 100 cells). E.g. The probability of a randomly designed gRNA to yield more than 80% biallelic frameshift mutant cells in a developing mosaic, assuming 100% efficiency, is the area under curve highlighted in pink and represents only a 3.24% probability.


Figure 4. Integrating CRISPRscan and the InDelphimESC model allows identification of efficient high frameshift frequency gRNAs in X. tropicalis. (A) Scatterplot with marginal histograms demonstrating for 339,693 gRNAs across the coding sequence for 4,860 X. tropicalis genes the relationships between calculated CRISPRscan score, InDelphimESC predicted frequency of MMEJ repair and InDelphimESC predicted knockoutscore (KOscore). KOscore is defined as the predicted percentage of cells with biallelic outofframe mutations within the pool of all mutant cells (i.e. inframe and outofframe; mono and biallelic) in the mosaic mutant embryo and is calculated as the square of the frameshift frequency predicted by InDelphimESC. For each gene (n = 4,860), the gRNA with the highest predicted KOscore (Highestinclass) is highlighted in blue, while the gRNA with the lowest predicted KOscore (Lowestinclass) is highlighted in orange. Demarcations illustrate those quadrants where gRNAs suffice to certain cutoff thresholds. Ideally, designed gRNAs fall within the aquamarine demarcation (high predicted KOscore, high CRISPRscan score), but not the orange (low predicted KOscore, high CRISPRscan score) or purple demarcation (high predicted KOscore, low predicted CRISPRscan score). (B) Violin plot illustrating that highestinclass gRNAs and lowestinclass gRNAs have a higher predicted percentage of repair by microhomologymediated end joining than a random selection of guides. (****p < 0.001—Table S2). (C) No distinct difference in calculated CRISPRscan scores between highestinclass gRNAs, lowestinclass gRNAs and a random selection of gRNAs. (D) Comparison of three pairs of gRNAs targeting the second exon of the tyrosinase gene responsible for pigmentation in X. tropicalis. As these three pairs of guides have very similar genome editing efficiencies, as determined by targeted amplicon sequencing, the impact of differential predicted KOscores on phenotypic penetrance is revealed. (D, E) Phenotypic scoring is based on retinal pigmentation at NieuwkoopFaber stage 38 and a trend is observed where guides with higher predicted KOscores yield a higher phenotypic score under very similar genome editing efficiencies.


Fig. S1: ezh2 CRISPR/Cas9 gene editing outcome can be accurately predicted via the online prediction algorithm InDelphi. (A) Column graphs showing overlay of variant calls (%) between in vivo observations and in silico predictions (B) Pearson correlation with significance interval between in vivo observations and in silico predictions for the ezh2 gRNA.


Fig. S2: Pearson correlations between in vivo observed (obtained by targeted amplicon sequencing) and respective in silico predicted variant frequencies for 28 gRNAs injected in X. tropicalis embryos. gRNAs are injected as Cas9/gRNAribonucleoprotein complexes at early developmental stages (2 to 8 cell stage). Target regions are PCR amplified and sequenced using MiSeq sequencing (Illumina) and raw data is processed using the BATCHGE analysis software. In silico predictions are generated by the InDelphi software algorithm. Plots show correlations between in vivo observed and in silico predicted variant frequencies. x_g1, x_g2, x_g3 refers to different guide RNAs against the same gene. (****p <0.0001; ***p <0.001; **p <0.01).


Fig. S3: Pearson correlations between in vivo observations (generated by Sanger sequencing and sequence trace deconvolution) and respective in silico predictions of 14 gRNAs injected in X. tropicalis embryos. gRNAs are injected as Cas9/gRNAribonucleoprotein complexes at early developmental stages (1cell stage). Target regions are PCR amplified and sequenced using Sanger sequencing and deconvoluted using the Inference of CRISPR Edits (ICE) algorithm. In silico predictions are generated by the InDelphi software algorithm. Plots show correlations between in vivo observed and in silico predicted variant frequencies. x_g1, x_g2 refers to different guide RNAs against the same gene. (****p <0.0001; ***p <0.001; **p <0.01; *p <0.05; ns = not significant).


Fig. S4: Pearson correlations between in vivo observations (generated by Sanger sequencing and sequence trace deconvolution) and respective in silico predictions of 10 gRNAs injected in X. laevis embryos. gRNAs are injected as Cas9/gRNAribonucleoprotein complexes at early developmental stages (1cell stage). Target regions are PCR amplified and sequenced using Sanger sequencing and deconvoluted using the Inference of CRISPR Edits (ICE) algorithm. In silico predictions are generated by the InDelphi software algorithm. Plots show correlations between in vivo observed and in silico predicted variant frequencies. Gene name_S and gene name_L refers to the two homeologues of a particular gene present on the small and large chromosome, respectively. (****p <0.0001; **p <0.01; *p <0.05; ns = not significant).


Fig. S5: Pearson correlations between in vivo observations (generated by targeted amplicon sequencing) and respective in silico predictions of 15 gRNAs injected in zebrafish embryos. gRNAs are injected as Cas9/gRNA ribonucleoprotein complexes at early developmental stages (1 cell stage). Target regions are PCR amplified and sequenced using MiSeq sequencing (Illumina) and raw data is processed using the BATCHGE analysis software. In silico predictions are generated by the InDelphi software algorithm. Plots show correlations between in vivo observed and in silico predicted variant frequencies. x_g1, x_g2, x_g3 refers to different guide RNAs against the same gene. (****p <0.0001; ***p <0.001; **p <0.01).


Fig. S6: Pictures from eyes of tyrosinase mutant embryos with their associated threshold mask used for quantification.
