Pavy N et al. (2006), Automated SNP detection from a large collection...

XB-ART-34304

BMC Genomics 2006 May 10;7:174. doi: 10.1186/1471-2164-7-174.

Show Gene links Show Anatomy links

Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs.

Pavy N , Parsons LS , Paule C , MacKay J , Bousquet J .

???displayArticle.abstract???
High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance. A white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (PSNP), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either PSNP > or = 0.95 or > or = 0.99. A total of 9,310 SNPs were detected by using PSNP > or = 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17. We integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies.

???displayArticle.pubmedLink??? 16824208
???displayArticle.pmcLink??? PMC1557672
???displayArticle.link??? BMC Genomics

Species referenced: Xenopus
Genes referenced: prg4

???attribute.lit??? ???displayArticles.show???

	Figure 1. Number of in silico detected SNPs and of snp'ed contigs as a function of the prior probability . P_prior stands for the a priori expected polymorphism rate used by PolyBayes to compute the SNP score PSNP. A value of p_prior of 0.02 means one SNP expected each 50 nt.
	Figure 2. In silico detected SNPs and experimentally verified SNPs according to PSNP . A subset of the predicted SNPs was verified by the independant resequencing of fragments amplified from the genomic DNA extracted from the PG653 genotype. The sequence traces were manually inspected to verify the sites where SNPs were predicted by PolyBayes. Predicted SNPs that were indeed found in the genomic DNA sequence were called "true positives" (in blue on the figure), whereas the ones that were not verified were called "false positives" (in yellow on the figure).
	Figure 3. Number of contigs including in silico SNPs detected with PSNP ≥ 0.95 . Mean size of the contigs according to the length of the consensus sequence or mean size of the alignment per contig according to the number of clones.
	Figure 4. ForestTreeDB screenshot showing the result from a query based on the Contig4486 (ID: 10387). This page displays the Gene Ontology terms associated to the contig and SNP data and the similarity data obtained by Hidden Markov Model searches against the domains and families available in the PFAM and SMART database. A SNP table displays four SNPs predicted by PolyBayes in Contig4486, with PSNP scores ranging from 0.89 to 0.98. Links also allow retrieval of the members (clones and ESTs) of the studied contig, their sequences, as well as the read alignment in a MSF format.

References [+] :

Altschul, Basic local alignment search tool. 1990, Pubmed