MGED Ontology Working Group - Building an Ontology
This page is a cleaned up version of the mailing sent August 3, 2001 (see the listserv) setting out the base concepts to structure and pointers to various editing tools for this purpose. Also provided are use cases and scenarios to provide motivation for the ontology. A top level ontology which has slightly modified is also provided.
Use Cases/ Scenarios:
- Return a summary of all experiments that use a specified type of biosource.
Group the experiments according to treatment.
- Return a summary of all experiments done examining effects of a specified treatment.
Group the experiments according to biosource.
- Return a summary of all experiments measuring the expression of a specified gene.
Indicate when experiments confirm results, provide new information, or conflict.
- Generate a distance metric for experiment types
- Generate an error estimation for experimental descriptions
Concepts:
Listed are concepts to be structured in an ontology. Most are defined and I will continue to add to and define the rest. Usage of terms such as "biomaterial" and "biosource" comes from the OMG MAGE-OM submission. I see our goal as to extend the standard that they have made and only modify it when absolutely necessary. I have changed "cell source and type" to "biosource provider" because of confusion about the term. MAGE (the combination of MAML and GEML) is guided by MIAME. (see MAGE and MIAME figures at the end of this section). Some of these concepts such as "organism", "organism part" , and "disease state" are references to external controlled vocabularies/ ontologies. Established ontologies such as the NCBI taxonomy should be used whenever possible. Links to the NCBI taxonomy browser and to model organism databases such as FlyBase, MGD, SGD, TAIR, and WormBase are available in the ontology resources (on the main OWG page). Thanks to B. Aronow, U. Cincinnati, and M. Ashburner, Cambridge U. for suggestions.
Biomaterial: The source of the nucleic acid used to generate labelled material for the microarray experiment.
Biosource: The primary source of the nucleic acid used to generate labelled material for the microarray experiment.
Biosample: The biosource after any treatment.
Labeled Extract: The biosample after labeling for detection of the nucleic acids.
Organism: The genus and species (and subspecies) of the organism from which the biomaterial is derived from.
Biosource provider: The resource (e.g, company, hospital, geographical location) used to obtain or purchase the biomaterial.
- Biosource donor [Name, Donor ID, Geographic Location, Biomaterial Origin, Consent, Surgical Path #, IRB* Protocol#]
- Biosource Owner [Name, Address, contact information]
- Biosource type: The procurement type of the biomaterial (e.g., paraffin section, biopsy)
Sex: The gender of the organism or the reproductive organs present on the organism (prior to any modification) that the
biomaterial is derived from.
| Term | Definition |
| male | The organism contains only the reproductive organ that produces male gametes (spermatozoa). |
| female | The organism contains only the reproductive organs that produces female gametes (oocytes). |
| both | The organism contains both male and female reproductive organs. |
| none | The organism does not have reproductive organs. |
| unknown | The reproductive organs of the organism are unknown. |
Age: The time period elapsed since an identifiable point early in the life of an organism. Examples of the identifiable point
include conception, birth, or planting.
| Initial time point | Definition |
| birth | The time point at the end of parturition. |
| fertilization | The time point at which gametes are joined. May also be used for post-coital measurements. |
| hatching | The time point at which the organism leaves the egg. |
| planting | The time point at which a seed is planted. |
Developmental stage: The developmental stage of the organism's life cycle during which the biomaterial was extracted.
Organism part: The part of the organism's anatomy from which the biomaterial was derived.
Strain or line: Animals or plants that have a single ancestral breeding pair or parent as a result of brother x sister or
parent x offspring matings.
Genetic Variation: The genetic modification introduced into the organism from which the biomaterial was derived. Examples of
genetic variation include specification of a transgene or the gene knocked-out.
Individual: Identifier or name of the individual organism from which the biomaterial was derived.
Individual genetic characteristics: The genotype of the individual organism from which the biomaterial was derived. Individual
genetic characteristics include polymorphisms, disease alleles, and haplotypes.
Disease state: The name of the pathology diagnosed in the organism from which the biomaterial was derived. The disease state is
normal if no disease has been diagnosed.
Targeted cell type: The target cell type is the cell of primary interest. The biomaterial may be derived from a mixed
population of cells although only one cell type is of interest.
Cell line: The identifier for the immortalized cell line if one was used to derive the biomaterial.
Biomaterial preparation: A description of the state and condition of the biomaterial.
- Time of day when the biomaterial was generated (i.e., sampled).
- Pathological staging: pre or post mortem at sampling
- state at start of treatment (age, time of day)
- physio-chemical composition of the sample: amount of material, number of cells, purity
- protocol: method used.
Environmental or experimental history: A description of the conditions the organism has been exposed to that are not one of the
variables under study.
- culture conditions: A description of the isolated environment used to grow organisms or parts of the organism.
- atmosphere: The gases and their concentrations used during culture.
- humidity: The percent humidity.
- temperature: The temperature during culture.
- light: The photoperiod and type (e.g., natural, restricted wavelength) of light exposure.
- nutrients: The food provided to the organism (e.g., chow, fertilizer, DEMM 10%FBS, etc.).
- medium: The physical state or matrix used to provide nutrients to the organism (e.g., liquid, agar, soil)
- density range: The concentration range of the organism.
- contaminant organisms: Organisms present that were not planned as part of the study (e.g., mycoplasma).
- removal of contaminants: Steps taken to eliminate contaminant organisms.
- host organism or organism parts: Organisms or organism parts used as a designed part of the culture (e.g., red
blood cells, stromal cells).
- generations: The number of cell divisions if the organism or organism part that is cultured is unicellular
otherwise the number of breedings.
- clinical history: The organism's (i.e., the patient's) medical record.
- Past medical history
- Current disease history
- Clinic treatment history
- Associated Laboratory values
- family history: Relevant aspects of genetic preconditions or family members clinical history
- water: additives and treatments
- bedding
- barrier facility
- pathogen test results: both positive and negative.
- preservation: seed dormancy, frozen storage
Treatment: The manipulation of the biomaterial for the purposes of generating one of the variables under study.
- somatic modification: The organism has had parts removed, added, or rearranged.
- genetic modification: The organism has had genes removed, added, or rearranged.
- starvation: The organism (or organism part) has been deprived of nutrients.
- infection: The organism (or organism part) has been exposed to a virus or pathogen.
- behavioral stimulus: The organism is forced to respond to a stimulus with some behavior (e.g., avoidance,
obtaining a reward, etc.)
- agent-based treatment: The treatment is effected by a defined chemical, biological, or physical agent.
- agent type: chemical (drugs), biological (macromolecule), physical (stress from light, temperature,
etc.)
- agent application
- in vivo, in vitro, in situ
- qualitative or quantitative
- treatment protocol: method of treatment
- treatment parameters: constant, variable
- treatment duration: length of treatment
MIAME Sample Description:
The concepts are flattened into a list of attributes. An ontology would provide greater detail in a structured form that would allow computational analysis (e.g., SQL, graph comparisons) between experiments. How much more structure is needed will be driven by the use cases/ scenarios.
MAGE Biomaterial Class Diagram:
As with MIAME, sample descriptions are mostly a flat list of attributes. Treatment has been given attributes (an order, an action) and relationships (measurements of different types). The effects of treatment are the generation of a biosample or labelled extract. Compounds represent everything from culture media components to fluorescent labels.
Ontology tools:
The ontology editors are either open source or licensed for free (at least for academics, let me know if I misrepresented anything). Thanks to Robert Stevens, U. Manchester, for info on Protege and OILed.
- GKB-Editor: Generic knowledge base editor from SRI that uses frames and slots.
- GO-EDIT: Editor from the Gene Ontology Consortium.
- OILed: OILed is a little editor for creating the
OIL and DAML+OIL language. Like GKB and Protege, it has a Frame look and
feel, but the option of reasoning support. Oiled is really a toy tool. It
is reasonably robust, but not supported in any signficiant fashion. It has
a little manual, but one would have to read a little about OIL in order to
use some of the features fully.
Some examples of
the tambis and GO ontologies in OIL (more will appear soon). The advantange
that OILed has, is that one can create simple DAGS, like GO, all the way to
full logic models; and migrate from the former to the latter.
- Protege 2000:Protege 2000 is the most widely known and used tool for creating
ontologies and knowledge bases.
Products such as those from Rational Rose and Embarcadero can also be used to generate UML models (class diagrams). These are not free.
Top level ontology:
All relationships are ISA (class/ subclass). e.g., biosource is a biomaterial state which is a
biomaterial description.
Biomaterial description
Biomaterial state
Biosource
Biosample
Labeled extract
Biosource descriptions
MGED concept descriptions
biosource provider
sex
age
Non-MGED concept descriptions
organism
developmental stage
organism part
strain or line
genetic variation
individual
individual genetic characteristics
disease state
targeted cell type
cell line
clinical information
Biomaterial external influence
Biomaterial preparation
Environmental history
Treatment
Last updated August 7, 2001