| ![]() |
| community | gene expression | genomics | genetics | atlas/movies/fate | cell biology |
| literature | methods | events | magazine | search | |
| Magpie | ESTs | Microarrays |
Important: Proposed 2nd Generation Xenopus Chip
General Information From Affymetrix
Comments from NCBI on the chip design:
A Xenopus Consortium worked with Affymetrix to create the array.
There was a call for sequences to be deposited into GenBank by May 2003 posted
on Xenbase. A UniGene build was assembled (June 2003, build 36) and utilized
by Affymetrix to seed a clustering and assembly process. This process culminated
in the creation of a consensus sequence representing potential transcripts.
The Xenopus laevis GeneChip® will contain 15,500 probe sets representing approximately 13,600 gene clusters and 14,400 gene transcripts. Approximately 3,500 well characterized genes with mRNA evidence at design time are represented on the array by ~4200 probe sets (to insure proper representation, genes and transcripts may be represented by more than one probe set). In addition to this, ~ 10,400 EST based gene clusters are represented on the array by ~11,300 probe sets. All “known” genes with a gene name at design time are represented on the chip.
NCBI made these comments on the design: “The proposed design of the Xenopus laevis chip includes almost all the transcript sequence data present in Xenopus laevis UniGene. Both in scope and in sequence quality, there is excellent representation of characterized mRNA sequences, which are genes of current research interest, and of novel transcripts now available only as 3' EST sequences.”
Gene transcripts are represented by probe sets that consist of multiple 25mer oligonucleotides. Independent measurements provided by multiple probes are required for specificity and sensitivity. Two basic types of probe sets are created: 1) those that represent gene sequences uniquely, 2) those that represent multiple closely related gene sequences.
Affymetrix probe sets typically consist of 11 to 16 probe pairs. For the Xenopus array 16 probe pairs were selected per probe set. A probe pair consists of a 25mer perfect match oligo and a reference 25mer mismatch oligo. For the eukaryotic gene expression assay, labeled antisense cRNA target is hybridized to the complementary 25mer probes.
A thorough description and diagram of the assay can be found on Affymetrix.com. (http://www.affymetrix.com/support/technical/index.affx). Briefly: the starting material is either total RNA (minimum of 5 µg) or poly-A mRNA (minimum of 0.2 µg). The sample is reverse transcribed, using a T7-oligo(dT) primer, and double-stranded cDNA is synthesized. The double-stranded cDNA, with the incorporated T7 promoter, is then used as a template in the subsequent in vitro transcription reaction. Biotinylated UTP and CTP are incorporated into cRNA during the in vitro transcription reaction. Labeled cRNA is detected via streptavidin-phycoerythrin staining.
Affymetrix arrays must be processed on GeneChip® Systems (processing is also available in institutional core labs, or service providers for those who do not have systems installed locally in department or individual labs).
Annotations will be available via the public NetAffx database at Affymetrix.com. Annotations will be updated on roughly a quarterly basis.
The chip design (i.e., the content) has been completed. The goal is to have a GeneChip® probe arrays available by December of 2003.
Chip prices are volume-based and are set based on institutional agreements.
Academic pricing will be $300 - $500 per chip - depending upon your institution
A general comment on the cost per experiment: In almost all cases, the biological variation is substantially larger than the variation caused by chip itself. To determine what level of variation there is, a pilot experiment is recommend consisting of 3 controls and 3 treatments to determine how much statistical power is required.
Approximately 13,600 Xenopus UniGene clusters are represented on the chip by 15,500 probe sets. Most EST singletons were not included due to quality and space constraints. However, a search performed by NCBI for putative orthologs resulted in the inclusion of approximately 300 EST singletons based on matches with well characterized genes in other organisms (see following section). The design is 16 probe pairs per probe set with 15,500 probe sets tiled on Affymetrix’s standard 49 format GeneChip® probe array.
Cluster quality was used as means to rank sequences for representation, e.g. EST stacks within the assembly, poly-A sites and signals, biological evidence for expression, sequence annotation, and sequence orientation. Not all of the UniGene clusters, particularly EST singletons and doubletons, will be represented. The cluster assembly and probe selection regions (PSR, the contiguous 600 base region of the transcript where probes are selected) were evaluated by NCBI against a set of Xenopus mRNA sequences with both an annotated CDS and poly-A tail.
In general, the same design process used for Affymetrix catalog products was applied (HG-U133, Rat 230, and Mouse 430). A point of note is that probe selection for the Affymetrix gene expression assay is biased towards the 3’ end of transcripts. More details about the design process and probe selection can be obtained from the current human, mouse and rat design technotes available on the Affymetrix website (http://www.affymetrix.com/support/technical/index.affx).
Initial Inputs for the Xenopus laevis design where:
|
June 5, 2003 |
The probe sets broken out by cluster type (best cDNA sequence in the subcluster assembly) and PSR tier (the evidence or basis for selecting the particular region of the consensus sequence) for those probe sets tiled on the chip:
tier | type | probe_set_count
----------+--------+-----------------
FL+Stack | FLmRNA | 547
FL | FLmRNA | 1835
Stack | FLmRNA | 123
ConsEnd | FLmRNA | 72
Stack | mRNA | 240
ConsEnd | mRNA | 1358
Stack | EST | 1304
ConsEnd | EST | 10021
There will be 5’, middle, and 3’ probe sets for beta actin and GAPDH to assess sample quality.
A list of the reporter genes can be accessed on: http://bioinfo.affymetrix.com/Community/xenopus/
Paralogous genes represented by a single probe set will not turn out to be a common problem in this dataset.
Data kindly provided by Dr.
| Number of hits |
% degree of sequence identity |
| 2 |
88 |
| 2 |
89 |
| 2 |
92 |
| 2 |
94 |
| 2 |
98 |
|
4 |
99 |
Looking at a larger sample size, the 1328 CDS-complete high quality sequence records generated for the Xenopus Gene Collection were compared with each other. The following histogram shows the distribution of best matches between any pair of sequences in that dataset:
| Number of hits |
% of best matches |
| 4 |
82 |
| 2 |
83 |
| 6 |
84 |
| 2 |
85 |
| 6 |
86 |
| 2 |
88 |
| 6 |
89 |
| 12 |
90 |
| 28 |
91 |
| 26 |
92 |
| 30 |
93 |
| 36 |
94 |
| 8 |
95 |
| 6 |
96 |
| 2 |
98 |
| 10 |
99 |
NCBI also looked at the degree of 3' UTR conservation in the dataset:
| Number of hits |
% conservation in 3’ UTRs |
| 4 |
85 |
| 6 |
86 |
| 8 |
87 |
| 19 |
88 |
| 12 |
89 |
| 12 |
90 |
| 8 |
91 |
| 6 |
92 |
| 10 |
93 |
| 10 |
94 |
| 6 |
95 |
| 4 |
96 |
| 4 |
97 |
| 4 |
99 |
While there are some well-conserved 3' UTRs, these are not very common. This is good, since a single UniGene cluster (or even a single assembled transcript or Affymetrix probe set) representing two genes is not common.
The representation of singleton sequences in HomoloGene was also analyzed. HomoloGene is NCBI's automated system for detecting putative orthologs on the basis of mRNA reciprocal best matches. If an ortholog is detected, the sequence is unlikely to be an artifact or an uninformative repeat, though the converse does not hold.
Looking at the 3574 singleton cluster in UniGene, 19% (684) are in HomoloGene; this compares with 38% of Xenopus clusters supported by more than 1 EST. If there's the ability to rescue some of the singletons, NCBI suggests these 684, since anything that has putative orthology is unlikely to be a sequencing artifact or uninformative repetitive element.
Since population variation in Xenopus is still not known, NCBI recommends that it would be safest to move forward with a 16 probes/gene rather than 11 probes/gene design.