PaGE 5.1 Command Line Options Summary
--help
If set, will show this help page.
--usage
Synonym for help.
------------------
| File locations |
------------------
--infile
Name of the input file containing the table of data.
This file must conform to the format in the README file.
--outfile
Optional. Name of the output file, if not specified outfile name will be
derived from the infile name.
--id2info
Optional. Name of the file containing a mapping of gene id's to names
or descriptions.
--id2url
Optional. Name ot the file containing a mapping of gene id's to urls.
--id_filter_file
If you just want to run the algorithm on a subset of the genes in your
data file, you can put the id's for those genes in a file, one per line,
and specify that file with this option.
------------------
| Output Options |
------------------
--output_gene_confidence_list
Optional. Set this to output a tab delimited file that maps every gene to
its confidence of differential expression.
--output_text
Optional. Set this to output the results also in text format.
--note
Optional. A string that will be included at the top of the output file.
--aux_page_size
Optional. A whole number greater than zero. This specifies the minimum
number of tags there can be in one pattern before the results for that
pattern are written to an auxiliary page (this keeps the main results page
from getting too large). This argument is optional, the default is 500.
---------------------------------------------
| Study Design and Nature of the Input Data |
---------------------------------------------
--num_channels
Is your data one or two channels? (note: Affymetrix is considered one
channel).
--design
For two channel data, either set this to "R" for "reference" design,
or "D" for "direct comparisons" (see the documentation for more
information on this setting).
--data_is_logged
Use this option if your data has already been log transformed.
--data_not_logged
Use this option if your data has not been log transformed.
--paired
The data is paired.
--unpaired
The data is not paired.
--missing_value
If you have missing values designated by a string (such as "NA"), specify
that string with this option. You can either put quotes around the string
or not, it doesn't matter as long as the string has no spaces.
-------------------------------------
| Statistics and Parameter Settings |
-------------------------------------
--level_confidence
A number between 0 and 1. Generate the levels with this confidence.
See the README file for more information on this parameter. This can
be set separately for each group using --level_confidence_list (see
below)
NOTE: This parameter can be set at the end of the run after the program has
displayed a summary breakdown of how many genes are found with what
confidence. To do this either set the command line option to "L" (for
"later"), or do not specify this command line option and enter "L" when
the program prompts for the level confidence
--level_confidence_list
Comma-separated list of confidences. If there are more than two
conditions (or more than one direct comparision), each position in the
pattern can have its own confidence specified by this list. E.g. if
there are 4 conditions, the list might be .8,.7,.9 (note four conditions
gives patterns of length 3)
--min_presence
A positive integer specifying the minimum number of values a tag should
have in each condition in order to not be discarded. This can be set
separately for each condition using --min_presence_list
--min_presence_list
Comma-separated list of positive integers, one for each condition,
specifying the minimum number of values a tag should have, for each
condition, in order not to be discarded. E.g. if there are three
conditions, the list might be 4,6,3
--use_logged_data
Use this option to run the algorithm on the logged data (you can only
use this option if using the t-statistic as statistic). Logging the
data usually give better results, but there is no rule. Sometimes
different genes can be picked up either way. It is generally best,
if using the t-statistic, to go with the logged data. You might try
both ways and see if it makes much difference. Both ways give valid
results, what can be effected is the power.
--use_unlogged_data
Use this option to run the algorithm on the unlogged data. (See
--use_loggged_data option above for more information.)
--tstat
Use the t-statistic as statistic.
--means
Use the ratio of the means of the two groups as statistic.
--tstat_tuning_parameter
Optional. The value of the t-statistic tuning parameter. This is set to
a default value determined separately for each data set, but can be
adjusted to possibly increase the power of the results. See the
documentation for more information on this parameter.
--shift
Optional. A real number greater than zero. This number will be added to
all intensities (of the unlogged data). See the documentation for more on
why you might use this parameter.
-----------------
| Configuration |
-----------------
--silent_mode
Optional. Do not output warning messages or progress to screen.
--num_permutations
Optional. The number of permutations to use. The default is to use all
or 200, whichever is smaller. You might want to lower it to increase the
speed, though at a possible loss power or accuracy
--num_bins
Optional. The number of bins to use in granularizing the statistic over
its range. This is set to a default of 1000 and you probably shouldn't
need to change it.