|
||||||
Summary RAD schema now available on-line: A simple relational schema browser provides access to all of the tables in the current version of RAD, as part of the GUS documentation. Scroll down the page to see RAD tables. A high-level overview of the schema is also available. A manuscript that describes the RAD schema, its design and implementation in detail is scheduled to appear in Bioinformatics. Precise descriptions of experiments:The ability to interpret (and compare) experimental results is limited by the amount and type of information available. For a database, the ability to query this information is also limited by how the available information is structured. RAD seeks to maximize the ability to interpret and query gene expression experiments by providing information whenever possible for all categories identified as important by a recent international effort to develop standards for gene expression experiments. Ontologies are used to describe the organism studied, the anatomical structure that the experimental sample is derived from, the developmental stage, and pathology associated with the sample. Relationships between experiments are also captured for the red and green channels (each are described as separate experiments) in a 2-color microarray experiment as well as for different members of experimental studies. In the latter case, the experimental "groups" may be ordered (e.g., time series) or unordered (e.g., blood B-cells). Distinctions between raw and processed data: Raw data from array based data is considered the output from the image analysis software used to quantify the signals for each array element. For SAGE, the raw data is simply the gene tag count. Some type of normalization is usually performed in order to compare the results between experiments. Often, replicates of a gene (clone, PCR fragment, etc.) are found on an array and can averaged to simplify analysis. The actual array elements used in the analysis may depend on their values (FLAG, signal/background, difference between replicate signals). Thus, given raw data, a number of postprocessing steps may be desireable before viewing the data or passing the data to an application. It is important to know what those processing steps are and what alternatives may be available. These are captured in RAD as "evidence." A gene index is used to integrate array elements and gene tags: It is common for different representations of the same gene to be pesent on the same array and is very likely when comparing experiments done with different arrays (either versions or types). For example, different IMAGE clones may be used that encode the same gene or a GenBank accession for a gene may be available instead of a clone ID. When informative names are available, the results can be compared by visual inspection, however, most genes currently studied in array-based experiment and SAGE are unknown. While it is desireable to know which array elements and gene tags represent the same genes, it is beyond the scope of RAD to perform this mapping which is complex and evolving. Therefore, RAD relies on a database especially built for this purpose. The Database of Transcribed Sequences (DoTS) clusters and assembles all publicly available ESTs and mRNAs for human and mouse based on sequence similarity. DoTS also provides annotation for these assemblies such as chromsomal location (from EST RH mapping data), cellular role (from an algorithm that uses SWISS-PROT keywords), and tissue-distribution (from a curated list of cDNA library source). Immediate plans are to:
|
| RAD database and interfaces Copyright (2001-2007) Center for Bioinformatics, University of Pennsylvania Content provided by the Computational Biology and Informatics Laboratory |
rad@pcbi.upenn.edu |