RAD2 | Home Page
News items of interest List queries that are available. General overview of RAD resources

Special Note
RAD3 site now available! This site has been deprecated and replaced by the new veriosn of RAD, namely RAD3. We will no longer be updating this site, but we will maintain it as a demo. Specifically the Query page page contains some multi-step query functionality not available on the new RAD website.

Summary
RAD (RNA Abundance Database) is a public gene expression database designed to hold data from array-based (microarrays, high-density oligo arrays, macroarrays) and nonarray-based (SAGE) experiments. The ultimate goal is to allow comparative analysis of experiments performed by different laboratories using different platforms and investigating different biological systems. To achieve this goal, RAD contains: precise descriptions of the experiments and distinctions between raw data and processed results. In addition, a gene index is used to integrate array elements and gene tags. The selection of experiments to include in RAD will be directed by our research interests and those of our collaborators such as hematopoiesis.

RAD schema now available on-line:

A simple relational schema browser provides access to all of the tables in the current version of RAD, as part of the GUS documentation. Scroll down the page to see RAD tables. A high-level overview of the schema is also available. A manuscript that describes the RAD schema, its design and implementation in detail is scheduled to appear in Bioinformatics.

Precise descriptions of experiments:

The ability to interpret (and compare) experimental results is limited by the amount and type of information available. For a database, the ability to query this information is also limited by how the available information is structured. RAD seeks to maximize the ability to interpret and query gene expression experiments by providing information whenever possible for all categories identified as important by a recent international effort to develop standards for gene expression experiments. Ontologies are used to describe the organism studied, the anatomical structure that the experimental sample is derived from, the developmental stage, and pathology associated with the sample. Relationships between experiments are also captured for the red and green channels (each are described as separate experiments) in a 2-color microarray experiment as well as for different members of experimental studies. In the latter case, the experimental "groups" may be ordered (e.g., time series) or unordered (e.g., blood B-cells).

Distinctions between raw and processed data:

Raw data from array based data is considered the output from the image analysis software used to quantify the signals for each array element. For SAGE, the raw data is simply the gene tag count. Some type of normalization is usually performed in order to compare the results between experiments. Often, replicates of a gene (clone, PCR fragment, etc.) are found on an array and can averaged to simplify analysis. The actual array elements used in the analysis may depend on their values (FLAG, signal/background, difference between replicate signals). Thus, given raw data, a number of postprocessing steps may be desireable before viewing the data or passing the data to an application. It is important to know what those processing steps are and what alternatives may be available. These are captured in RAD as "evidence."

A gene index is used to integrate array elements and gene tags:

It is common for different representations of the same gene to be pesent on the same array and is very likely when comparing experiments done with different arrays (either versions or types). For example, different IMAGE clones may be used that encode the same gene or a GenBank accession for a gene may be available instead of a clone ID. When informative names are available, the results can be compared by visual inspection, however, most genes currently studied in array-based experiment and SAGE are unknown. While it is desireable to know which array elements and gene tags represent the same genes, it is beyond the scope of RAD to perform this mapping which is complex and evolving. Therefore, RAD relies on a database especially built for this purpose. The Database of Transcribed Sequences (DoTS) clusters and assembles all publicly available ESTs and mRNAs for human and mouse based on sequence similarity. DoTS also provides annotation for these assemblies such as chromsomal location (from EST RH mapping data), cellular role (from an algorithm that uses SWISS-PROT keywords), and tissue-distribution (from a curated list of cDNA library source).

Immediate plans are to:

  1. Increase the number of experiments and the analysis types available for them. Current and planned analysis types include normalization by fluorescence balancing, percent of total intensity, removal of background and "flagged" elements, normalization to selected controls, normalization to group median, and data transformation (for ratios) to log(base 2) scale.
  2. Retrieve data from more than one experiment and compare them by selected value type or by ratios. Array elements (gene tags) representing the same gene will be grouped using DoTS (Database of Transcribed Sequences) based on clusters and assemblies of ESTs and mRNAs as well as sequence similarity of consensus sequences for the assemblies to GenBank and SWISS-PROT.
  3. Filter the data based on annotation available in DoTs. These include chromosomal location, cellular roles, and cDNA library distribution.
Longer range plans are to:

  1. Allow data submission. We will solicit gene expression data in certain research areas such as hematopoiesis. To enable direct submission we will provide a web-based form to fill out and an XML parser.
  2. Incorporate graphical views of the data. For some experiments, gifs are available and we will provide selected portions of these images. When data from more than one experiment is retrieved (e.g., a time series group), the ability to graph the data will be provided.
  3. Incorporate analysis programs. A number of software packages are available to cluster gene expression data and we have developed software to identify differentially-expressed genes based on replicate experiments. Selected datasets will be transformed into the appropriate format for these programs and the ability to run these programs on these datasets will be provided. Results from selected analyses will be provided.


RAD database and interfaces Copyright (2001-2007) Center for Bioinformatics, University of Pennsylvania
Content provided by the Computational Biology and Informatics Laboratory
rad@pcbi.upenn.edu