PaGE 5.1 Documentation

This document gives basic user documentation of the Perl version of PaGE 5.1.

See the technical manual for the complete details of the algorithm, and expanded discussion of all issues. This document is meant to be a relatively quick introduction to the software. See also the examples for a walk though the usage on acutal data.

Index

Introduction

PaGE is a tool for analyzing microarray data that is used to:

The patterns are derived from comparisons to a reference group. So if there are n groups, then the patterns have length n-1. For example suppose there are four groups, with 3, 2, 3, and 4 replicates, respectively. The data might look like this:
 
idc0r1c0r2c0r3c1r1c1r2c2r1c2r2c2r3c3r1c3r2c3r3c3r4
G10.9070.9641.0752.4101.8972.5563.2382.9230.9930.9720.9831.071
G21.1361.1141.0697.3116.1971.1140.8741.1923.3103.3993.8604.077
G310.28810.00812.02325.50724.6944.4596.2344.23411.33212.3238.24314.230
G45.2346.4932.3304.5708.4984.3498.3236.38421.93725.78818.84714.324
Etc...

The patterns attached to these genes might look like this.
id   c1      c2      c3   
G1
2
3
0
G2
3
0
1
G3
2
-1
0
G4
0
0
5

Positive integers represent upregulation and negative integers represent downregulation. Higher positive number represent greater differential expression however they are not meant to represent actual fold-change. A zero means there was insufficient evidence to make a differential expression call at the desired confidence level. The difference between levels 1,2,3,... of upregulation, and -1,-2,-3,... of downregulation, will be explained below.

The statistical confidence measures used in PaGE are False Discovery Rate (FDR) type measures. Therefore what is controlled is the percentage of false predictions in the set of all predictions. The FDR approach is widely accepted as being the most appropriate for gene expression analysis.

Note that an FDR is fundamentally different from a p-value. It is not unreasonable to use an FDR of .5, while a p-value of .5 would be unreasonable. For example, if there are 10,000 elements on an array and only 100 are differentially expressed, then it will be virtually impossible to find them by PCR verification. However, if we can find a set of 200 genes with FDR of .5 then one out of every two genes in this set will verify.

PaGE works with the "confidence" rather than the FDR. The confidence equals 1 - FDR. A confidence of .99, or even .95, is often higher than necessary and many genes will be missed. Usually you will want to choose the confidence based on the size of the result set. Therefore PaGE allows you to set this parameter after the confidences have been calculated. In this case the program displays a summary of the number of genes found over a range of confidences.

Unfortunately microarray analysis is not a push-button exercise, every data set is unique and requires special considerations. Differential expression analysis is best performed interactively with an algorithm flexible enough to allow looking at the data from different angles. See the the technical manual for a more detailed discussion of this.

Microarray data come in many formats and there are many ways to design a microarray experiment when looking for differential expression. Therefore there are numerous options in PaGE required to tell it exactly what kind of data and study you are entering.

Input Data

Study Design

PaGE takes as input microarray data from two or more conditions. There must be at least two replicates per condition. Possible designs are:

Data and File Format

Running PaGE

PaGE is invoked on the command line with the command
> perl PaGE_5.1.6.pl

It is not necessary to specify any on the command line, they will be requested while the program is running, however if you want to include descriptions and links in the output, files mapping ID's to descriptions and ID's to URL's must be specified on the command line.

Including gene information and links in the output

PaGE options and parameters

When you run PaGE, you will be asked for all of the necessary options and parameters. If you are running PaGE repeatedly with many of the same options or parameters then you may want to specify them on the command line to avoid having to answer the same questions over and over.

Use the --help option to get a brief summary of the commands.

> perl PaGE_5.1.6.pl --help
NOTE: The most important parameter to adjust is the level confidence. Every dataset is particular, so it is difficult to guess ahead of time what will be the best level confidence. Therefore PaGE allows you to set this parameter after the confidences have been calculated and the program displays a summary of the number of genes found over a range of confidences.

The confidence is very different from a p-value, so that even a confidence as low as .5 might be useful if there are very few genes differentially expressed, or if the data are very noisy.

Conversely, if there is a large number of differentially expressed genes, then the user may want to set the the level confidence higher to see just the most confident predictions. In such a case one might raise the level confidence as high as .95, or even .99.

PaGE commands break into five categories.
 
Click on command for detailed information.

Examples