MGED Ontology Working Group - Building an Ontology

This page is a cleaned up version of the mailing sent August 3, 2001 (see the listserv) setting out the base concepts to structure and pointers to various editing tools for this purpose. Also provided are use cases and scenarios to provide motivation for the ontology. A top level ontology which has slightly modified is also provided.


Use Cases/ Scenarios:

Concepts:

Listed are concepts to be structured in an ontology. Most are defined and I will continue to add to and define the rest. Usage of terms such as "biomaterial" and "biosource" comes from the OMG MAGE-OM submission. I see our goal as to extend the standard that they have made and only modify it when absolutely necessary. I have changed "cell source and type" to "biosource provider" because of confusion about the term. MAGE (the combination of MAML and GEML) is guided by MIAME. (see MAGE and MIAME figures at the end of this section). Some of these concepts such as "organism", "organism part" , and "disease state" are references to external controlled vocabularies/ ontologies. Established ontologies such as the NCBI taxonomy should be used whenever possible. Links to the NCBI taxonomy browser and to model organism databases such as FlyBase, MGD, SGD, TAIR, and WormBase are available in the ontology resources (on the main OWG page). Thanks to B. Aronow, U. Cincinnati, and M. Ashburner, Cambridge U. for suggestions.

Biomaterial: The source of the nucleic acid used to generate labelled material for the microarray experiment.

Biosource: The primary source of the nucleic acid used to generate labelled material for the microarray experiment.

Biosample: The biosource after any treatment.

Labeled Extract: The biosample after labeling for detection of the nucleic acids.

Organism: The genus and species (and subspecies) of the organism from which the biomaterial is derived from.

Biosource provider: The resource (e.g, company, hospital, geographical location) used to obtain or purchase the biomaterial.

Sex: The gender of the organism or the reproductive organs present on the organism (prior to any modification) that the biomaterial is derived from.
TermDefinition
maleThe organism contains only the reproductive organ that produces male gametes (spermatozoa).
female The organism contains only the reproductive organs that produces female gametes (oocytes).
bothThe organism contains both male and female reproductive organs.
noneThe organism does not have reproductive organs.
unknown The reproductive organs of the organism are unknown.

Age: The time period elapsed since an identifiable point early in the life of an organism. Examples of the identifiable point include conception, birth, or planting.
Initial time pointDefinition
birthThe time point at the end of parturition.
fertilizationThe time point at which gametes are joined. May also be used for post-coital measurements.
hatchingThe time point at which the organism leaves the egg.
plantingThe time point at which a seed is planted.

Developmental stage: The developmental stage of the organism's life cycle during which the biomaterial was extracted.

Organism part: The part of the organism's anatomy from which the biomaterial was derived.

Strain or line: Animals or plants that have a single ancestral breeding pair or parent as a result of brother x sister or parent x offspring matings.

Genetic Variation: The genetic modification introduced into the organism from which the biomaterial was derived. Examples of genetic variation include specification of a transgene or the gene knocked-out.

Individual: Identifier or name of the individual organism from which the biomaterial was derived.

Individual genetic characteristics: The genotype of the individual organism from which the biomaterial was derived. Individual genetic characteristics include polymorphisms, disease alleles, and haplotypes.

Disease state: The name of the pathology diagnosed in the organism from which the biomaterial was derived. The disease state is normal if no disease has been diagnosed.

Targeted cell type: The target cell type is the cell of primary interest. The biomaterial may be derived from a mixed population of cells although only one cell type is of interest.

Cell line: The identifier for the immortalized cell line if one was used to derive the biomaterial.

Biomaterial preparation: A description of the state and condition of the biomaterial.

Environmental or experimental history: A description of the conditions the organism has been exposed to that are not one of the variables under study.

Treatment: The manipulation of the biomaterial for the purposes of generating one of the variables under study.

MIAME Sample Description:

The concepts are flattened into a list of attributes. An ontology would provide greater detail in a structured form that would allow computational analysis (e.g., SQL, graph comparisons) between experiments. How much more structure is needed will be driven by the use cases/ scenarios.

MAGE Biomaterial Class Diagram:

As with MIAME, sample descriptions are mostly a flat list of attributes. Treatment has been given attributes (an order, an action) and relationships (measurements of different types). The effects of treatment are the generation of a biosample or labelled extract. Compounds represent everything from culture media components to fluorescent labels.

Ontology tools:

The ontology editors are either open source or licensed for free (at least for academics, let me know if I misrepresented anything). Thanks to Robert Stevens, U. Manchester, for info on Protege and OILed. Products such as those from Rational Rose and Embarcadero can also be used to generate UML models (class diagrams). These are not free.

Top level ontology:

All relationships are ISA (class/ subclass). e.g., biosource is a biomaterial state which is a biomaterial description.
Biomaterial description
        Biomaterial state
                Biosource
                Biosample
                Labeled extract
        Biosource descriptions
                MGED concept descriptions
                        biosource provider
                        sex
                        age
                Non-MGED concept descriptions
                        organism
                        developmental stage
                        organism part
                        strain or line
                        genetic variation
                        individual
                        individual genetic characteristics
                        disease state
                        targeted cell type
                        cell line
                        clinical information
        Biomaterial external influence
                Biomaterial preparation
                Environmental history
                Treatment


Last updated August 7, 2001