PaGE_5.1 Documentation - the --level_confidence command
This is the most important parameter to use when running PaGE.
The level_confidence is a number between zero and one and gives the confidence at which the levels are generated. Positive levels in PaGE output indicate up-regulation from the reference condition (condition 0), while negative levels indicate down-regulation from the reference condition.
A level_confidence of x means that we expect 100x% of the set of all genes with positive levels to be truely up, and 100(1-x)% to be false-positives. (Similarly for the negative levels and down-regulation.)
Confidence x means that roughly 100x% of the predictions should be true. So for those familiar with the False Discovery Rate (FDR), the confidence x = 1 - FDR.
This command takes one argument of type float, which must be strictly between 0 and 1.
- User Strategy. The lower you set the level_confidence, the more genes will be predicted and more levels will be generated. The trick is to find a level_confidence that gives a "reasonable" amount of genes. PaGE allows you to set this parameter after the confidences have been calculated and the program displays a summary of the number of genes found over a range of confidences.
In short, this parameter is like the volume on radio: the higher you turn it, the more you hear, while when turned low you only hear the loudest noises.
- The Meaning of the Levels: Levels are generated as follows: If the statistic used is the t-statistic, a value of C is found so that the set of all genes with t-statistic greater than C is expected to have 100x% true positives and (1-x)100% false positives. Anything with statistic between 0 and C is given level 0, anything with statistic between C and 2C is given level 1, anything with statistic between 2C and 3C is given level 2. Etc. Negative levels are generated similarly.
Levels are generated to make the results easier to digest, however they only have a rigorous statistical confidence if interpreted to apply to the difference between level 0 and greater than zero, or likewise between 0 and less than zero. Indeed the set of genes with level "2" will generally have higher confidence than the set with level "1", and "3" even higher, etc. It is simply the set of all genes with positive levels which constitute a set of confidence x. And similarly for down-regulation.
The above discussion was in terms of the t-statistic. If instead the statistic is the ratio of the means, then a similar multiplicative version of this argument gives the levels. In other words a C is determined so that the set of genes with ratio of means greater than 1 have 100x% false-positives, and genes whose ratio of meansis between 1 and C are given level 1, those between C and C2 are given level 2, between C2 and C3 give level 3. Etc. Negative levels are generated similarly.
This is how PaGE generates patterns across conditions.
- Gene Confidences: As seen in above, for each cutoff C there is a resultant set of predictions and a confidence for that set. Confidences are also applied to the individual gene as follows: Note that each cutoff C has a corresponding confidence. If the up-regulated gene has statistic S, then it is given the maximum confidence for all cutoffs C>S. And similarly for down-regulated. These are the confidences given in the PaGE report.
Note: If you have more than two conditions and you use
--level_confidence then that confidence will be applied to all positions in the pattern. To apply a separate level_confidence to each position in the patterns use
--level_confidence_list.
See the technical manual for more detailed information on how the confidence is determined.
Example:
> perl PaGE_5.1.6.pl --level_confidence .8