This document describes the general usage of STAC Station v1.2, an application for
performing STAC analysis and for viewing the results of STAC analysis.
For a more detailed description see also the STAC Station tutorial.
STAC analysis requires aberration data for each of a number of
different samples and it performs a statistical test to determine
locations which are concordantly aberrant across individuals more than
should be expected by chance. The statistical theory and algorithm are
desribed in the STAC publication.
The algorithm described in that paper is implemented exactly in version 1.1.
The algorithm has been modified slightly to run much faster without affecting
the results and this is implemented in version 1.2
(The details of this modification will be given in a technical document
by Aug 20th, please check back.)
Diskin SJ, Eck T, Greshock J, Mosse YP, Naylor T, Stoeckert CJ Jr., Weber BL, Maris JM and Grant GR. (2006)
STAC: A method for testing the significance of DNA copy-number aberrations across multiple array-CGH experiments.
Genome Research (in press).
STAC Input
STAC analyzes data in the form of binary aberration calls. So gains and losses (or any other type of aberration) are analyzed separately and separate input files must be prepared for each case.
IMPORTANT: For results local to a chromosome arm, STAC analysis should be run on only a single chromosome arm at a time, omitting centromeres and other regions of poor coverage.
Currently STAC Station takes input data in two possible formats.
- Location data:
- Span data:
- The first line of the file must give the span of the entire chromosome arm, or the span of whatever region is to be anlayzed.
- Each consecutive line gives the spans for each experiment, separated by semi-colons
- Example Span Formatted Input
NOTE: The full set of input files used in Diskin et al (2006) are provided with the STAC Station 1.2 distribution in the subdirectory called "Examples".
Go here for more notes on formatting and pre-processing
Usage: STAC Analysis
You open either a single file or directory of files to analyze.
Once a file has been been analyzed, the STAC results can be re-opened to review
the results, without re-analyzing - because analyzing can take a considerable amount of time.
If the file with the STAC input is called "stacinput.txt" then the results will be
saved as stacinput.txt.stac. This is why the file menu
File
Open has two options:
- "Open to Run" which takes a STAC input file to be analyzed, as
described above.
- "Open to View" which takes a STAC analysis results file and opens it just to view.
STAC Station can open one file at a time to analyze, or a directory of files. If you open a single file, the ANALYZE
button will become enabled; press ANALYZE to perform STAC analysis on the current file. If you open a directory, the analysis
will start automatically for each of the appropriately formatted STAC input file in that directory.
STAC Station can also view multiple results files (the number of files you can open is limited
only by memory - if you run out of memory change the "256" in the
command used to execute STAC Station to something higher).
NOTE: As you open files for viewing or analyses, they get added to the viewer
list. You may scroll through the files to view the data. Once the analysis for a file is complete, you can also
browse through the results files using the forward and back arrow buttons. The current file number and the total number
of files is shown in the status bar at the bottom of the window along with the state of key results display options.
Analysis Options
There are two options you can set regarding
how the analysis is performed. These are "number of permutations"
and "resolution". This second option is only meaningful if your data is "span formatted" (see above).
- Number of permutations: STAC is based on a permutation test. This sets the number of
permutations to use. The default is 500. The run time can vary
depending on the data set, so in some cases 500
permutations will take seconds and in other cases it could take
longer. If your data set is running very quickly you should
raise the number of permutations to get p-values accurate to more decimal places.
Going from 500 to 1000 can make some small difference and give less variable results. Going to even more permutations,
for example to 10,000, shouldn't make much difference. There is probably no
point in going over 10,000.
- Resolution: STAC analyzes data in fixed-width genomic locations. This option sets the width of those locations.
For example, if one has a 1-Mb array, they may set the resolution to 1-Mb.
Note: The "search parameter" option present in STAC 1.1 has been deprecated due to the
optimized search strategy available in STAC 1.2.
Display Options
There are many display options.
- Confidences. STAC uses two statistics, the "frequency" and the
"footprint". In most cases the footprint will be more
powerful at all locations. But sometimes the frequency is better, so that option is
available. STAC Station does not display p-values, but rather displays "confidences"
which are one minus the p-values.
The footprint and frequency confidences can be displayed separately or
together. Furthermore, each can be displayed with bars or with a line.
To turn them on or off use the four confidence buttons on the task bar.
These can also be turned on using the "Options
Display" menu.
- Frequencies. You can turn on the frequency line using the
"Options" in the pull down menu. Note the difference between the
frequency line and the frequency confidence line. The frequency line
simply gives the frequency of aberration for each location. The
frequency confidence line gives the actual confidences of the
frequencies. If a row of data contains no aberrations then it is
omitted from the view, however you can still give the frequencies based
on all the data using the "global frequency line" option. If this
option is to be used, make sure to include also the rows in the input
file which consist of all 0's.
- Colors. The colors of the frequency line, the global
frequency line, the frequency confidence and the global frequency
confidence can be set using the menu Options
Display
Colors.
- Highlighting Positions. Using the Options
Display
Enter Positions to Highlight menu item you can select a position for
which the intervals that contribute to the confidence will be
highlighted. Enter the position using the number of the position given
in parentheses.
- Font Size. You can increase the experiment ID and positions
font independently using the buttons on the task bar. You can also fit
the view to the screen.
Output
These options are on the File
Save menu.
- Save to jpeg - you can save the current file being viewed to jpeg, or all open files to jpeg.
- Reports - you can export the analyses as report files which give the frequency and footprint p-values for all locations in tab delimited text format.
- Note: STAC analyses are automatically saved to the disk to view again later without having to rerun
the analysis - if
you opened a STAC input file and ran STAC analysis, the results file is saved under the same name as the
STAC input file with ".stac" appended to the end.
General Operating Characteristics
There are several factors that can influence the sensitivity of STAC and the time required for
execution. We summarize each of these here.
- Sample size: In general, increasing sample size will increase the power of STAC to detect significant
concordant aberrations at a cost in run time (dependent upon the number of intervals per sample).
- Location width: In general, decreasing the fixed-width location size (see
notes
on formatting and pre-processing) will allow for finer-resolution mapping
of significant concordant aberrations. This should not have a significant impact on run time, however we
generally recommend setting the location width approximately equal to the resolution
of the array.
- Number of probes: Increasing the number of probes in
a region will allow for finer-resolution mapping of concordant aberrations and
this should be reflected in the use of a smaller fixed-width location size used
in the input. Increasing the number of probes alone will not increase the run time of STAC.
However, if an inappropriate method for calling aberrations in
individual samples used (e.g. using thresholds for a SNP array), then you may
see a significant increase due to fragmented intervals in the input.
- Number of intervals: Data sets with fewer intervals of aberration will run faster than those
with many intervals.
- Number of permutations: to a point, increasing the number of permutations
will decrease variability seen in the results. We have seen differences
between 100 and 1000 permutations, far less between 1000 and 10,000
permutations, and only negligable differences have been observed after 10,000 permutations.
Increasing the number of permutations will increase the runtime; the amount of this increase
is dependent upon sample size and then number of intervals.
NOTE: STAC 1.2 is greatly optimized in terms of speed over STAC 1.1. However, If STAC is taking a very long time
to run on your data, please contact us at diskin@email.chop.edu or ggrant@pcbi.upenn.edu
. Identification of problematic data sets can help focus further optimazation efforts.