STAC Station User's Guide
STAC
STAC (Significance Testing for Aberrant Copy-Number)
tests for significantly concordant aberration across multiple
samples in array CGH data.
This document describes the use of STAC Station, an application for
performing STAC analysis and for viewing the results of STAC analysis.
See also the STAC tutorial.
STAC analysis requires aberration data for each of a number of
different samples and it performs a statistical test to determine
locations which are concordantly aberrant across individuals more than
should be expected by chance. The statistical theory and algorithm are
desribed in the STAC technical manual.
Reference
If you use STAC in your research, please reference the following publication:
Diskin SJ, Eck T, Greshock J, Mosse YP, Naylor T, Stoeckert CJ Jr., Weber BL, Maris JM and Grant GR. (2006)
STAC: A method for testing the significance of DNA copy-number aberrations across multiple array-CGH experiments.
Genome Research (in press).
Installation
First make sure you have Java 1.4 or higher installed on your system. You can get Java from SUN at http://java.sun.com/j2se/1.4.2/download.html
- Mac:
Create a directory somewhere on your system
where STAC Station will live.
Download
STACStation_1.1.tar and save it to the directory you created.
Open a terminal shell and move into the directory.
Unpack with the following command:
> tar -xvf STACStation_v1.1.tar
To execute STAC enter the command:
> java -Xmx256M -jar StacStation_v1.1.jar
The STAC window should open. If you do not have 256 Mb of memory
availalbe this might not work, in which case try changing 256 to 128.
It's not advisable to lower it below 128 as you might run out of memory
while trying to run it.
- Windows:
Download STACStation_v1.1.zip and save it to the directory
where STAC Station is to live. Unzip the archive STACStation_v1.1.zip to the
directory.
To execute STAC double click on the StacStation_v1.1.jar file. Alternatively, STAC Station can be started as follows:
Open a command prompt (this should be under Start
Program Files
Accessories
Command Prompt). In the command prompt window move into the directory that contains the STAC file. You can do this by typing "cd" followed by a space, and then drag-and-drop the directory into the command prompt window and hit enter.
enter the command:
> java -Xmx256M -jar StacStation_v1.1.jar
STAC Input
NOTE: STAC analyzes data in the form of binary aberration calls. So gains and losses are analyzed separately and separate input files must be prepared for each case.
NOTE: For best results STAC analysis should be run on only a single chromosome arm at a time, omitting centromeres and other regions of poor coverage.
Currently STAC takes as input data in two possible formats.
- Location data:
- Span data:
- The first line of the file must give the span of the entire chromosome arm, or the span of whatever region is to be anlayzed.
- Each consecutive line gives the spans for each experiment, separated by semi-colons
- Example Span Formatted Input
Examples that you can run through STAC are given with the STAC 1.1 distribution in the subdirectory "Examples".
Go here for more notes on formatting and pre-processing
Usage
You open a file, or directory of files, to analyze (which must be in one of the two input formats given above.
Once a file has been been analyzed, the STAC results can be re-opened just to review
the results, without re-analyzing - because analyzing can take a considerable amount of time.
If the file with the STAC input is called "stacinput.txt" then the results will be
saved as stacinput.txt.stac. This is why the file menu has two
File
Open options:
- "Open to Run" which takes a STAC input file to be analyzed, as
described above.
- "Open to View" which takes a STAC analysis results file and opens it just to view.
Note: STAC Station can open one file at a time to analyze, or a directory of files.
Once a file or directory is open to analyze, the "ANALYZE" button becomes enabled.
STAC can also view multiple files (the number of files you can open is limited
only by memory - if you run out of memory change the "256" in the
command used to execute STAC Station to something higher).
The file opened to analyze does not appear in the results viewer until "ANALYZE" has been
pressed and the analysis is complete.
As you perform analyses on files, they get added to the viewer
list. Similarly as you open results files to view, they also get added
to the list. You can then browse through the results files using the
forward and back arrow buttons. The current file number and the total number of files is
shown in the status bar at the bottom of the window.
Analysis Options
There are two options you can set regarding
how the analysis is performed. These are "number of permutations"
and "search parameter". If your file is "span formatted" (see above), then
there is a third option called the "resolution".
- STAC is based on a permutation test. This sets the number of
permutations to use. The default is 100. The run time can vary
tremendously depending on the data set, so in some cases 100
permutations will take seconds and in other cases it could take much
longer. If your data set is running very quickly you might want to
raise the number of permutations to get p-values accurate to more decimal places.
Going from 100 to 1000 can make some small difference and give less variable results. Going to even more permutations,
for example to 10,000, shouldn't make much difference. There is probably no
point in going over 10,000.
- STAC uses a herustic approximation, the accuracy of which depends on something called the search parameter.
The default is 3,500. The higher this parameter is set the more powerful
the method will be. However the default gives generally good results.
Display Options
There are many display options.
- Confidences. STAC uses two statistics, the "frequency" and the
"footprint". In most cases the footprint will be more
powerful at all locations. But sometimes the frequency is better, so that option is
available. STAC Station does not display p-values, but rather displays "confidences"
which are one minus the p-values.
The footprint and frequency confidences can be displayed separately or
together. Furthermore, each can be displayed with bars or with a line.
To turn them on or off use the four confidence buttons on the task bar.
These can also be turned on using the "Options
Display" menu.
- Frequencies. You can turn on the frequency line using the
"Options" in the pull down menu. Note the difference between the
frequency line and the frequency confidence line. The frequency line
simply gives the frequency of aberration for each location. The
frequency confidence line gives the actual confidences of the
frequencies. If a row of data contains no aberrations then it is
omitted from the view, however you can still give the frequencies based
on all the data using the "global frequency line" option. If this
option is to be used, make sure to include also the rows in the input
file which consist of all 0's.
- Colors. The colors of the frequency line, the global
frequency line, the frequency confidence and the global frequency
confidence can be set using the menu Options
Display
Colors.
- Highlighting Positions. Using the Options
Display
Enter Positions to Highlight menu item you can select a position for
which the intervals that contribute to the confidence will be
highlighted. Enter the position using the number of the position given
in parentheses.
- Font Size. You can increase the experiment ID and positions
font independently using the buttons on the task bar. You can also fit
the view to the screen.
Output
These options are on the File
Save menu.
- Save to jpeg - you can save the current file being viewed to jpeg, or all open files to jpeg.
- Reports - you can export the analyses as report files which give the frequency and footprint p-values for all locations in tab delimited text format.
- Note: STAC analyses are automatically saved to the disk to view again later without having to rerun
the analysis - if
you opened a STAC input file and ran STAC analysis, the results file is saved under the same name as the
STAC input file with ".stac" appended to the end.
General Operating Characteristics
There are several factors that can influence the sensitivity of STAC and the time required for
execution. We summarize each of these here.
- Sample size: In general,increasing sample size will increase the power of STAC to detect significant
concordant aberrations at a cost in run time (dependent upon the number of intervals per sample).
- Location width: In general, decreasing the fixed-width location size (see
notes
on formatting and pre-processing) will allow for finer-resolution mapping
of significant concordant aberrations. This should not have a significant impact on run time, however we
generally recommend setting the location width approximately equal to the resolution
of the array.
- Number of probes: Increasing the number of probes in
a region will allow for finer-resolution mapping of concordant aberrations and
this should be reflected in the use of a smaller fixed-width location size used
in the input. Increasing the number of probes alone will not increase the run time of STAC.
However, if an inappropriate method for calling aberrations in
individual samples used (e.g. using thresholds for a SNP array), then you may
see a significant increase due to fragmented intervals in the input.
- Number of intervals: Data sets with fewer intervals of aberration will run faster than those
with many intervals. In extreme cases (exceptionally large number of intervals compared to the current
"average" data set) execution time can be affected significantly. We are currently evalutating approaches
to better accomodate these extreme situations.
- Number of permutations: to a point, increasing the number of permutations
will decrease variability seen in the results. We have seen differences
between 100 and 1000 permutations, far less between 1000 and 10,000
permutations, and no differences ever after 10,000 permutations. Increasing the number of permutations will
increase the runtime; the amount of this increase is dependent upon sample size
and then number of intervals as well as the value of the search parameter.
- Search parameter: In general, increasing the search parameter will increase the power of the
heuristic search implemented in STAC with a slight increase in run-time. However, for very large data sets
containing many samples with many aberrant intervals, increasing the search
parameter significantly (say to 10,000 or more) will increase the run-time
substantially but may produce the same results as obtained with a lower search
parameter. We have found that the default produces good results on the average cancer data set containing 50
samples with a high frequency of aberration.
NOTE: If STAC is taking a very long time
to run on your data, please contact us at diskin@email.chop.edu
to obtain a distributed version (STAC-Grid); we will be glad to help you install and configure this.
STAC-grid parallelizes the permutation portion of the algorithm and in most cases can greatly speed up execution time.