STAC Station notes on file formatting

In order to use STAC you first have to do some work to pre-process your data into binary "calls". We are developing a package which incorporates STAC with a pre-preprocessing engine (see the MSA website).

Aberrations come in many forms, they can be gains, losses, LOH, high level gain, etc. For any given type of aberration that you are interested in you have to run a separate STAC analysis. In other words, if you are interested in both gain and loss, you have to run them as two separate cases. More generally, to input data into STAC you must have binary aberration calls of "0" or "1" where "0" means no aberration and "1" means aberration, where the type of aberration is fixed throughout the analysis.

It is not advisable to analyze an entire chromosome as one stretch, the analysis should be done one arm at a time with the centromere omitted. One can also focus on just a piece of a chromosome arm, it is not necessary to analyze the entire arm, it is just not advisable to analyze more than an arm at once.

IMPORTANT: If the spacing on your array is such that there is significant gaps between probes (e.g. 1-Mb), we strongly recommend estimating the aberration state in those regions without direct coverage prior to running STAC. There are several ways this can be accomplished. In the STAC publication, we elected to use the method employed by both Mosse et al (2005) and Naylor et al (2005). There are several cases which must be handled:

You can enter data in either of two possible formats, depending on what is more convenient for you.

NOTE: The best approach to calling gains and losses in each sample has yet to be determined and depends on the particular array and experimental design. See Lai et al (2005) or Willenbrock and Fridlyand (2005) for recent comparisons of available methods. We have found that unless the data are very clean, the use of ratio thresholds for calling gain and loss often leads to false negatives (missing regions of aberration in individual samples) and can also lead to false positives, depending on experimental design. Concordant bias such as that which may be introduced by severe sample processing should accounted for before running STAC. For example, if the probe distributions are significantly variable, one can hybridize a battery of normal controls (processed identically to the test samples) in order to use a standard deviation criterion instead of a global ratio cutoff. Use of any of the model-based methods to make gain/loss calls for each sample can result in a decrease in resolution since they tend to not call a region as aberrant unless it is supported by several array elements. Once the concordant bias has been minimized as described above, the single slide calls should be made fairly liberally, so to avoid false negatives, since the false positives in individual samples will be randomly scattered across the genome and STAC will not assign significance to these additional aberrations.