( Draft; 4/24/01; by Martin Ringwald )
Introduction:
The preparation of this
document was driven by the question: What are the biological parameters
(pertinent to samples from mouse) that might influence expression levels, and
how should we describe these parameters in databases?
First, we deal with information about organism, strain, genotype, sex, age, developmental stage, and tissue. Annotation is relatively straightforward in the mouse because controlled vocabularies have been developed to describe these parameters. These vocabularies and associated conventions are and will be widely used by the “mouse community” and by community databases such as the Mouse Genome Database (MGD) and the Mouse Gene Expression Database (GXD) (http://www.informatics.jax.org/). Use of the standardized descriptors enables not only data integration (which is important!) but also representation of complex biological experiments such as genetic breeding schemes – see below.
Next, we discuss the description of information on husbandry and experimental treatments of animals. These are areas where standardized descriptors and conventions have not yet been developed.
Experimental treatments are often used to produce phenotypes that are to be correlated with expression data. Some relevant annotation and data integration issues are discussed in the last part of this document.
Organism:
e.g. Mus musculus
Mus musculus (M.m.)
constitutes the “house mouse” group. Classical inbred strains are
mosaics of M.m. domesticus, M.m. musculus, M. m. castaneus, and perhaps M.m.
bactrianus and are therefore best referred to as ‘laboratory mouse’
(used as a descriptor in addition to the taxonomic term ‘Mus
musculus’)
Strain:
Inbred strains should be named according to the rules for nomenclature of inbred strains approved by the International Committee on Standardized Genetic Nomenclature for Mice (http://www.informatics.jax.org/mgihome/nomen/strains.shtml). These rules provide standardized descriptors for strains and substrains and, implicitly, breeding schemes.
Because the characteristics of a strain may be altered due to environmental influences (e.g. diet, husbandry, physical environment, microbiological flora and fauna of the mice, etc.) unique laboratory registration codes assigned by the Institute for Laboratory Animal Resources (ILAR; http://www4.nas.edu/cls/afr.nsf) are included in strain names.
Examples:
CBA/HN,
a substrain of CBA found by National Institutes of Health (N) to be
genetically
different
from CBA/H, a substrain originated by Harwell, from which it
arose.
CXB, a set of recombinant inbred strains derived from a cross of BALB/c x C57BL
C = BALB/c; maternal progenitor strain
X indicates recombinant inbred strain
B = C57BL; paternal progenitor strain
CXB1, strain 1 from CXB set
DBA/Ha-Myo5ad, coisogenic strain: carries a mutation in the Myo5 gene and is
otherwise identical to the parent strain DBA/Ha.
B10.129-H12b is a congenic strain with the genetic background of C57BL/10Sn (=B10)
which differs from that strain by the introduction of a differential allele H12b derived from strain 129/J.
More information about mouse strains can be found at http://www.informatics.jax.org/menus/strain_menu.shtml
Genetic
Variation (Genotype):
Mutant Alleles should be named according to the rules approved by the International Committee on Standardized Genetic Nomenclature for Mice (http://www.informatics.jax.org/mgihome/nomen/table.shtml or http://www.informatics.jax.org/mgihome/nomen/allmut_help.shtml )
Please note: According to the nomenclature rules, gene and allele symbols should always be italicized in publications. However, since italics can be difficult to read in web pages, depending on the browser, they are not used in this web document.
Examples:
Wnt3avt is the allele symbol for the recessive (indicated by lower case) vestigal tail allele of the Wnt3a gene,
Atp7aMo is the allele symbol for the dominant (indicated by upper case) mottled allele of the Atp7a gene.
T39H is the 39th mutation of T from Harwell (H) (allelic series)
Cspg2Tg(Hoxa1)1Chm, the Tg(Hoxa1)1Chm allele of Cspg2. A construct containing the mouse
homeobox a1 (Hoxa1) was inserted by microinjection, the first reported by Corey H. Mjaatvedt (Chm).
Tlx1tm1Sjk , 1st targeted mutation of Tlx1 created in the laboratory of Stanley J Korsmeyer.
Alleles for
a specific gene can be found by using the quick gene search at http://www.informatics.jax.org/ and
following the link ‘Phenotypic Alleles’ on the query
result page.
Sex:
Male, female, unknown, pooled
‘Unknown’ and ‘pooled’ are particularly relevant for developmental studies where the sex of embryos is not determined or where samples might be pooled from several embryos.
Age:
E19: embryonic day 19, i.e. 19 days pc;
Staging convention: the morning / noon on which the vaginal plug is found is counted as day 0.5.
P10: postnatal day 10
Pw10: postnatal week 10, i.e. a 10 week old mouse
These are designations GXD uses. The more general scheme proposed by the MIAME document would work, too.
Developmental
stage:
Theiler stages: staging of embryos according to appearance of anatomical structures (http://genex.hgu.mrc.ac.uk/Databases/Anatomy/MAstaging.shtml)
Allows for comparison with Human (Carnegie stages) and Rat (Witschi stages).
Anatomical structure (tissue): (referred to as ‘organism part’ in MIAME document)
Use terms and IDs from the Mouse Anatomical Dictionary:
http://www.informatics.jax.org/mgihome/GXD/AD/
or
http://www.informatics.jax.org/mgihome/GXD/GEN/AD/
(adapted from recommendations of the Mouse Phenome Project (www.jax.org/phenome) with modifications)
Feed (type/vendor/catalogue number)
Water (specific additives/treatments)
Bedding (type/vendor/catalogue number)
Photoperiod
Temperature
Pathogen free? Yes /No
If yes: state pathogens for which mice are tested
If no: state pathogens
Health status report (general statement from Investigator)
Treatment
a) brief description:
In example 2, the rate by
which Angiotensin II was added (by infusion) is listed as parameter. Because
the rate is, in this case, an experimental parameter (i.e. the experiment
includes a series of Angiotensin treatments with different rates) the rate
information needs to be described in a structured way rather than as part of a
‘plain text’ protocol (assuming, for the sake of the argument, that
the protocol is a plain text description).
b) expanded description
for example 1:
Type: Diet: atherogenic diet; ref. to description of diet
Time of the day at the beginning of treatment: N/A
Date of the beginning of treatment: 01/12/2001
Duration of treatment: 6 weeks
Annotation of the time of the day at the beginning of treatment may not be necessary in experiments involving long term treatment. Otherwise, this parameter is important due to circadian rhythms.
Annotation of date is important because seasonal affects can influence gene expression in the mouse.
Experiment:
Examine gene expression changes in response to atherogenic diet in congenic strains that are susceptible or resistant to arteriosclerosis.
Relevant disorder: arteriosclerosis; D001161 (MeSH heading; MeSH ID)
Strain: B6.SPRET-Ath1r
Relevant phenotype: resistant to diet-induced arteriosclerosis
Trait measured: level of HDL; ref. to protocol; ref. to data
Criteria for resistance: increased levels of HDL after treatment
Strain: B6.SJL-Ath9s
Trait measured: level of HDL; ref. to protocol; ref. to data
Criteria for susceptibility: unchanged levels of HDL after treatment
Treatment 1:
Type: Diet: atherogenic diet; ref. to description of diet
Age of mouse at the beginning of treatment: Pw8
Date of the beginning of treatment: 01/12/2001
Duration of treatment: 6 weeks
Treatment 2:
Type: food deprivation
Time of day at the beginning of treatment: 6am
Duration of treatment: 4 hours
Treatment 3:
Type: Tissue harvesting
Time of day at the beginning of treatment: 10am
Duration of treatment: 2 min (N/A)
Ref. to protocol
This annotation example is, obviously, not complete. It is shown to illustrate the following points/ideas:
The annotation below would report all the factual data:
Strains used: B6.SPRET-Ath1r, B6.SJL-Ath9s
Treatment: Diet: atherogenic diet; ref. to protocol
(followed by other treatments listed above)
However, people unfamiliar with the specific research area would have difficulties understanding the essence of the experiment.
Indexing with disease terms using MeSH headings and Mesh IDs place the data in a larger context and make them available for global and biomedicaly significant searches. If applicable, references to OMIM could be included as well.
Many phenotype data might be available for a specific strain. Therefore, it would be helpful to annotate or reference the specific phenotype that is relevant for the expression study.
Information about the measured trait defines the analyzed phenotype in more detail.