See the technical manual for the complete details of the algorithm, and expanded discussion of all issues. This document is meant to be a relatively quick introduction to the software. See also the examples for a walk though the usage on acutal data.
PaGE is a tool for analyzing microarray data that is used to:
id | c0r1 | c0r2 | c0r3 | c1r1 | c1r2 | c2r1 | c2r2 | c2r3 | c3r1 | c3r2 | c3r3 | c3r4 |
G1 | 0.907 | 0.964 | 1.075 | 2.410 | 1.897 | 2.556 | 3.238 | 2.923 | 0.993 | 0.972 | 0.983 | 1.071 |
G2 | 1.136 | 1.114 | 1.069 | 7.311 | 6.197 | 1.114 | 0.874 | 1.192 | 3.310 | 3.399 | 3.860 | 4.077 |
G3 | 10.288 | 10.008 | 12.023 | 25.507 | 24.694 | 4.459 | 6.234 | 4.234 | 11.332 | 12.323 | 8.243 | 14.230 |
G4 | 5.234 | 6.493 | 2.330 | 4.570 | 8.498 | 4.349 | 8.323 | 6.384 | 21.937 | 25.788 | 18.847 | 14.324 |
Etc... |
id | c1 | c2 | c3 |
G1 | |||
G2 | |||
G3 | |||
G4 |
Positive integers represent upregulation and negative integers represent downregulation. Higher positive number represent greater differential expression however they are not meant to represent actual fold-change. A zero means there was insufficient evidence to make a differential expression call at the desired confidence level. The difference between levels 1,2,3,... of upregulation, and -1,-2,-3,... of downregulation, will be explained below.
The statistical confidence measures used in PaGE are False Discovery Rate (FDR) measures. Therefore what is controlled is the percentage of false predictions in the set of all predictions. Note that this differs fundamentally from a p-value based multiple testing approach which would control the probability of making any false predictions at all. The FDR has largely replaced the p-value in microarray differential expression analysis, since the p-value approach is generally considered too conservative.
As such, it is not unreasonable to use an FDR of .5, while a p-value of .5 would be completely unreasonable. For example, if there are 10,000 elements on an array and only 100 are differentially expressed, then it will be virtually impossible to find them by PCR verification. However, if we can find a set of 200 genes with FDR of .5 then one out of every two genes in this set will verify.
An FDR of .05 or .01, is often lower than necessary and many genes will be missed. Usually you will want to choose the FDR based on the size of the result set. If at first you get too few or too many genes, you can raise or lower it to find a reasonable number (that is assuming there are any differentially expressed genes at all to be found in the data).
Unfortunately microarray analysis is not a push-button exercise, every data set is unique and requires special considerations. Differential expression analysis is best performed interactively with an algorithm flexible enough to allow looking at the data from different angles. See the the technical manual for a more detailed discussion of this.
Microarray data come in many formats and there are many ways to design a microarray experiment when looking for differential expression. Therefore there are numerous options in PaGE required to tell it exactly what kind of data and study you are entering.
PaGE takes as input microarray data from two or more conditions. There must be at least two replicates per condition. Possible designs are:
id c0r1 c0r2 c0r3 c1r1 c1r2 c1r3 c2r1 c2r2
You follow the menu items from left to right.
You start by choosing File new analysis type from the menu.
You will be asked whether it is 1-channel or 2-channel data. Note that AFfymetrix data is considered 1-channel.
If you choose 2-channel you will be asked whether it is a reference design or a direct comparison design.
For both 1-channel and 2-channel data, the algorithm needs to know if you have already log transformed the data.
In the case of 1-channel and 2-channel direct comparison designs, you will be asked if the data are paired.
After answering these questions the analysis type has been fully defined. You next have to input the data.
You should see a file browser that will allow you to find the data file on your disk.
This allows you to enter a file of gene descriptions which will be included in the results file.
Choose menu item Data Open id2url file
This allows you to enter a file of URLs, to be included in the results file as links.
You may choose to set some of the configuration options with the menu choice Options Configuration (generally you will leave them as their defaults).
To execute the algorithm, choose menu item Run PaGE, and then choose one of the two statistics (generally you will want to start with the t-statistic).
You will now be asked to give the level confidence and the reference group.
This is the most important parameter to adjust.
If one has very few genes differentially expressed, or if the data are very noisy, then a relatively low level confidence might be necessary to find them. A confidence is very different from a p-value, so that even a confidence as low as .5 might be useful.
Conversely, if there is a large number of differentially expressed genes, on the order of thousands, then the user will generally want to set the level confidence higher to see just the most confident predictions. One in this case might wish to raise the level confidence as high as .95, or even .99 in extreme cases.
If appropriate, you will be asked whether to run the algorithm on the logged or the unlogged data.