Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
BMC Genomics
2006 May 10;7:174. doi: 10.1186/1471-2164-7-174.
Show Gene links
Show Anatomy links
Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs.
Pavy N
,
Parsons LS
,
Paule C
,
MacKay J
,
Bousquet J
.
???displayArticle.abstract???
High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance. A white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (PSNP), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either PSNP > or = 0.95 or > or = 0.99. A total of 9,310 SNPs were detected by using PSNP > or = 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17. We integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies.
Figure 1. Number of in silico detected SNPs and of snp'ed contigs as a function of the prior probability . P_prior stands for the a priori expected polymorphism rate used by PolyBayes to compute the SNP score PSNP. A value of p_prior of 0.02 means one SNP expected each 50 nt.
Figure 2. In silico detected SNPs and experimentally verified SNPs according to PSNP . A subset of the predicted SNPs was verified by the independant resequencing of fragments amplified from the genomic DNA extracted from the PG653 genotype. The sequence traces were manually inspected to verify the sites where SNPs were predicted by PolyBayes. Predicted SNPs that were indeed found in the genomic DNA sequence were called "true positives" (in blue on the figure), whereas the ones that were not verified were called "false positives" (in yellow on the figure).
Figure 3. Number of contigs including in silico SNPs detected with PSNP ≥ 0.95 . Mean size of the contigs according to the length of the consensus sequence or mean size of the alignment per contig according to the number of clones.
Figure 4. ForestTreeDB screenshot showing the result from a query based on the Contig4486 (ID: 10387). This page displays the Gene Ontology terms associated to the contig and SNP data and the similarity data obtained by Hidden Markov Model searches against the domains and families available in the PFAM and SMART database. A SNP table displays four SNPs predicted by PolyBayes in Contig4486, with PSNP scores ranging from 0.89 to 0.98. Links also allow retrieval of the members (clones and ESTs) of the studied contig, their sequences, as well as the read alignment in a MSF format.
Altschul,
Basic local alignment search tool.
1990, Pubmed
Altschul,
Basic local alignment search tool.
1990,
Pubmed
Batley,
Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data.
2003,
Pubmed
Beaumont,
The Bayesian revolution in genetics.
2004,
Pubmed
Bouillé,
Trans-species shared polymorphisms at orthologous nuclear gene loci among distant species in the conifer Picea (Pinaceae): implications for the long-term maintenance of genetic diversity in trees.
2005,
Pubmed
Brown,
Nucleotide diversity and linkage disequilibrium in loblolly pine.
2004,
Pubmed
Buetow,
Reliable identification of large numbers of candidate SNPs from public EST data.
1999,
Pubmed
Clifford,
Bioinformatics tools for single nucleotide polymorphism discovery and analysis.
2004,
Pubmed
Dantec,
Automated SNP detection in expressed sequence tags: statistical considerations and application to maritime pine sequences.
2004,
Pubmed
Ewing,
Base-calling of automated sequencer traces using phred. I. Accuracy assessment.
1998,
Pubmed
Feltus,
An SNP resource for rice genetics and breeding based on subspecies indica and japonica genome alignments.
2004,
Pubmed
Gene Ontology Consortium,
Creating the gene ontology resource: design and implementation.
2001,
Pubmed
Grivet,
ESTs as a source for sequence polymorphism discovery in sugarcane: example of the Adh genes.
2003,
Pubmed
Jander,
Arabidopsis map-based cloning in the post-genome era.
2002,
Pubmed
Kirst,
Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana.
2003,
Pubmed
Li,
A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes.
1985,
Pubmed
Marth,
Single-nucleotide polymorphisms in the public domain: how useful are they?
2001,
Pubmed
Marth,
A general approach to single-nucleotide polymorphism discovery.
1999,
Pubmed
Matise,
A 3.9-centimorgan-resolution human single-nucleotide polymorphism linkage map and screening set.
2003,
Pubmed
Neale,
Association genetics of complex traits in conifers.
2004,
Pubmed
Ning,
SSAHA: a fast search method for large DNA databases.
2001,
Pubmed
Pavy,
Generation, annotation, analysis and database integration of 16,500 white spruce EST clusters.
2005,
Pubmed
,
Xenbase
Pelgas,
A composite linkage map from two crosses for the species complex Picea mariana x Picea rubens and analysis of synteny with other Pinaceae.
2005,
Pubmed
Picoult-Newberg,
Mining SNPs from EST databases.
1999,
Pubmed
Rafalski,
Applications of single nucleotide polymorphisms in crop genetics.
2002,
Pubmed
Reich,
Quality and completeness of SNP databases.
2003,
Pubmed
Rozen,
Primer3 on the WWW for general users and for biologist programmers.
2000,
Pubmed
Sachidanandam,
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.
2001,
Pubmed
Schmid,
Large-scale identification and analysis of genome-wide single-nucleotide polymorphisms for mapping in Arabidopsis thaliana.
2003,
Pubmed
Shen,
Development of genome-wide DNA polymorphism database for map-based cloning of rice genes.
2004,
Pubmed
Useche,
High-throughput identification, database storage and analysis of SNPs in EST sequences.
2001,
Pubmed
Wright,
Rates and patterns of molecular evolution in inbred and outbred Arabidopsis.
2002,
Pubmed
Zhang,
Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana.
2002,
Pubmed
Zhu,
Single-nucleotide polymorphisms in soybean.
2003,
Pubmed