Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
DNA Res
2019 Apr 01;262:157-170. doi: 10.1093/dnares/dsy046.
Show Gene links
Show Anatomy links
Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.
Suvorova YM
,
Korotkova MA
,
Skryabin KG
,
Korotkov EV
.
???displayArticle.abstract???
A new mathematical method for potential reading frameshift detection in protein-coding sequences (cds) was developed. The algorithm is adjusted to the triplet periodicity of each analysed sequence using dynamic programming and a genetic algorithm. This does not require any preliminary training. Using the developed method, cds from the Arabidopsis thaliana genome were analysed. In total, the algorithm found 9,930 sequences containing one or more potential reading frameshift(s). This is ∼21% of all analysed sequences of the genome. The Type I and Type II error rates were estimated as 11% and 30%, respectively. Similar results were obtained for the genomes of Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Rattus norvegicus and Xenopus tropicalis. Also, the developed algorithm was tested on 17 bacterial genomes. We compared our results with the previously obtained data on the search for potential reading frameshifts in these genomes. This study discussed the possibility that the reading frameshift seems like a relatively frequently encountered mutation; and this mutation could participate in the creation of new genes and proteins.
???displayArticle.pubmedLink???
30726896
???displayArticle.pmcLink???PMC6476729 ???displayArticle.link???DNA Res
Figure 1. (A) The TP matrix is shown for the sequence S={atc}50. (B) The phase TP in a fragment of the sequence S. A base g is inserted in the middle of the sequence. The phase Fa = 0 was before the insertion of g, because there is an agreement between the columns of the matrix and the positions of the codons as 1 => 1, 2 => 2, 3 => 3, respectively. Fa = 1 after the insertion of g, because the following agreement is observed: 1 => 2, 2 => 3, 3 => 1. (C) The TP matrix for the sequence S={atcgga}25.
Figure 2. A block diagram of the algorithm for the optimization of random matrices M(3,16) from the MR set, used to search a matrix with the largest mFmax.
Figure 3. The idea of the algorithm is to create a random matrix Wr, (or a set of random matrices), then optimize it with a genetic algorithm and get the W1 matrix as a result of optimization.
Figure 4. The dependence of the d value (see Equation 2) on the TP, of the analysed sequence. The TP was calculated using the TP matrix33 and is expressed in the arguments of the normal distribution and is shown along the x-axis.
Figure 5. Scheme of division of mFmax on V1, V2 and V3 (see Section 2.4).
Figure 6. Distribution of shifts position in the sequence of a gene. The x-axis shows the distance as a percentage of the beginning of the gene (with step equals to 5%), the y-axis shows the percentage of shifts per interval of 5%. The black bars—the data from the work,21 the white—the data of our work. The leftmost and rightmost bars show the number of frameshifts found outside the cds from the work.21
Figure 7. Distribution of the distance between paired compensating shifts of the TP phase in the Arabidopsis thaliana genome.
Antonov,
GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.
2013, Pubmed
Antonov,
GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.
2013,
Pubmed
Antonov,
Identification of the nature of reading frame transitions observed in prokaryotic genomes.
2013,
Pubmed
Antonov,
Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm.
2010,
Pubmed
Azad,
Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory.
2004,
Pubmed
Berget,
Exon recognition in vertebrate splicing.
1995,
Pubmed
Chechetkin,
Search of hidden periodicities in DNA sequences.
1995,
Pubmed
Chung,
Novel frameshift mutation in Troponin C ( TNNC1) associated with hypertrophic cardiomyopathy and sudden death.
2011,
Pubmed
Cunningham,
Ensembl 2015.
2015,
Pubmed
Du,
Improve homology search sensitivity of PacBio data by correcting frameshifts.
2016,
Pubmed
Frenkel,
Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes.
2009,
Pubmed
Frenkel,
Classification analysis of triplet periodicity in protein-coding regions of genes.
2008,
Pubmed
Frenkel',
[Classification of triplet periodicity in DNA sequences of genes taken from KEGG databank].
2008,
Pubmed
Gao,
Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences.
2005,
Pubmed
Gilbert,
Why genes in pieces?
1978,
Pubmed
Girdea,
Back-translation for discovering distant protein homologies in the presence of frameshift mutations.
2010,
Pubmed
Gouzy,
FrameDP: sensitive peptide detection on noisy matured sequences.
2009,
Pubmed
Gutiérrez,
On the origin of the periodicity of three in protein coding DNA sequences.
1994,
Pubmed
Hiller,
Creation and disruption of protein features by alternative splicing -- a novel mechanism to modulate function.
2005,
Pubmed
Iannuzzi,
Two frameshift mutations in the cystic fibrosis gene.
1991,
Pubmed
Ketteler,
On programmed ribosomal frameshifting: the alternative proteomes.
2012,
Pubmed
Koonin,
Orthologs, paralogs, and evolutionary genomics.
2005,
Pubmed
Korotkov,
Study of the triplet periodicity phase shifts in genes.
2010,
Pubmed
Korotkova,
An approach for searching insertions in bacterial genes leading to the phase shift of triplet periodicity.
2011,
Pubmed
Laskin,
[The locally optimal method of cyclic alignment to reveal latent periodicities in genetic texts. The NAD-binding protein sites].
2003,
Pubmed
Mironov,
Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors.
2001,
Pubmed
Needleman,
A general method applicable to the search for similarities in the amino acid sequence of two proteins.
1970,
Pubmed
Ochman,
Lateral and oblique gene transfer.
2001,
Pubmed
Ogura,
A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease.
2001,
Pubmed
Okamura,
Frequent appearance of novel protein-coding sequences by frameshift translation.
2006,
Pubmed
Pugacheva,
Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming.
2016,
Pubmed
Raes,
Functional divergence of proteins through frameshift mutations.
2005,
Pubmed
Rho,
FragGeneScan: predicting genes in short and error-prone reads.
2010,
Pubmed
Schiex,
FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.
2003,
Pubmed
Sheetlin,
Frameshift alignment: statistics and post-genomic applications.
2014,
Pubmed
Thomson,
Fusion of the human gene for the polyubiquitination coeffector UEV1 with Kua, a newly identified gene.
2000,
Pubmed
Wang,
Localizing triplet periodicity in DNA and cDNA sequences.
2010,
Pubmed
Xu,
Identification of somatic mutations in human prostate cancer by RNA-Seq.
2013,
Pubmed
Yin,
Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence.
2007,
Pubmed
Zhang,
HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors.
2011,
Pubmed