EpoDB stands for "erythropoiesis database." It is a tool for researchers studying red blood cells to obtain information on genes of interest. It is also a working project for database management system development, schema and database design for gene expression information, and computational analysis of gene sequences and expression data. Therefore, we have been working both on content and functionality of EpoDB as well as interface tools for annotation and access to the database.
A practical guide to the different web pages comprising the EpoDB web site is given in "Description of EpoDB pages." The intent and methodology for creation of EpoDB are described in "Purpose of EpoDB" and "Construction of EpoDB."
The EpoDB server can be viewed with or without "frames" depending on the capability of the web browser in use. Both versions have the same content; the frame version has in addition a listing of query page links.
The overall purpose of creating the EpoDB database is to provide a powerful tool for understanding control of gene expression during erythropoiesis. In particular, the database is being built to study the organization of transcription regulatory elements as well as the dynamics of transcription factor interaction with those regulatory elements. To accomplish this, the content of the database will include both structural information as well as gene expression information represented in novel schemas and integrated with data analysis tools and algorithms such as BLAST and GenLang . The sources of information are from GenBank, Swiss-Prot, Transfac/TRRD entries and the literature. The quality of data obtained from GenBank is improved by removing redundancy, and eliminating syntactic/semantic errors. More powerful queries of EpoDB are possible than of GenBank because of greater structure, consistent and uniform annotation, and controlled vocabularies. Collaborations will also be available that allow for secure data.
Currently, EpoDB contains approximately 3715 GenBank and 1500 Swiss-Prot entries . Schematic representation of the entries is evolving, however, a first pass schema has been completed. First pass controlled vocabularies for gene names and gene family names have also been constructed. Present capabilities are to extract features and subsequences (e.g. retrieve proximal promoter region, -500 to +20 around start of transcription, for all beta-globin genes); and find transcription factor motifs using TESS. Efforts are underway to remove redundancy of gene entries, to incorporate "virtual" sequences linking related but non-contiguous sequences, and to increase gene expression annotation. Future plans are to expand the queries available to take advantage of protein, gene regulation and gene expression information being entered into EpoDB. EpoDB focuses on gene expression during red cell development and serves as a model database that can be easily extended to other hematopoietic lineages.
Virtual Entries in EpoDB: (
Return)
Virtual entries are the products of merges between two or more Genbank
entries. These are constructed to link fragments of a gene into a single
entry and/or syntenic members of a gene family. The syntax for creating
virtual entries is essentially the Genbank "join" feature with the addition
of "gap"s to represent missing sequence of indeterminate length. Thus, a
query for human alpha-spectrin will return a single entry consisting of 50
ordered exons instead of 50 entries containing individual exons. Likewise,
a query for human alpha globin genes will return a single entry from zeta to
theta (with intergenic gaps) instead of several entries including duplicates.
A sample virtual entry for the goat alpha globin gene cluster is given below in
prolog format.
class(entry_id(142526),yes,ref,virt,0,0,0).
na_seq_info(oid(142526),
[locus_id([]),
accession([]),
na_seq_id(sid(142527)),
length(10000),
strand(text("ds")),
mol(text("dna")),
def(text("goat alpha-globin cluster")),
extra_accessions([]),
source([]),
keywords([]),
origin([]),
date(std([year(1995),month(9),day(12)])),
div(text("mam")),
taxonomy(text("Eukaryota; Animalia; Metazoa; Chordata; Vertebrata; Mammalia; Theria; Eutheria; Artiodactyla; Ruminantia; Pecora; Bovidae.")),
organism(text("Capra hircus"))]).
comments(entry_id(142526),[text("contains the zeta, I alpha, and II alpha-globin genes")]).
na_seq(entry_id(142526),
na_seq_id(sid(142527)),
join([ za - gi(946):'1..568',
g1 - gap,
zb - gi(948):'1..1247',
g2 - gap,
a1 - gi(164123):'1..1894',
g3 - gap,
a2 - gi(164125):'1..1691' ]) ).
can_feature( source, [location('za:1..a2:1691'),
can_feat_id(142554),
organism('Capra hircus'),
change(date(1995,9,27),author(cjs),[del(type(new)),add(type(source)),
add(location('za:1..a2:1691')),add(organism('Capra hircus'))]
)] ).
can_feature(can_feat_id(142540),
entry_id(142526),
type(cluster),
location('za:1..a2:1691'),
quals([gene_family_name([globin,'globin, alpha-like']),
components([can_feat_id(142528),can_feat_id(123846),can_feat_id(123858)])]),
change([])).
can_feature(can_feat_id(142528),
entry_id(142526),
type(transcript_unit),
location('za:1..zb:1247'),
quals([gene_name(['zeta-globin']),
gene_family_name([globin,'globin, alpha-like']),
spatio_temporal_transcription([]),
components([can_feat_id(142529),can_feat_id(142530),can_feat_id(142531),can_feat_id(142532),
can_feat_id(142533),can_feat_id(142534),can_feat_id(142535),can_feat_id(142536),
can_feat_id(142537),can_feat_id(142538),can_feat_id(142539),
parents([])
])]),
change([])).
can_feature(can_feat_id(142529),
entry_id(142526),
type(mRNA_boundaries),
location('za:216..zb:1020'),
quals([]),
change([])).
can_feature(can_feat_id(142530),
entry_id(142526),
type(mRNA),
location(join('za:216..356','zb:538..742','zb:852..1020')),
quals([]),
change([])).
can_feature(can_feat_id(142531),
entry_id(142526),
type(exon),
location('za:216..356'),
quals([]),
change([])).
can_feature(can_feat_id(142533),
entry_id(142526),
type('CDS'),
location('za:262..356,zb:366..570,zb:809..934'),
quals([]),
change([])).
can_feature(can_feat_id(142534),
entry_id(142526),
type('intron'),
location('za:357..zb:365,'),
quals([]),
change([])).
can_feature(can_feat_id(142535),
entry_id(142526),
type(exon),
location('zb:366..570'),
quals([]),
change([])).
can_feature(can_feat_id(142536),
entry_id(142526),
type('intron'),
location('zb:571..808'),
quals([]),
change([])).
can_feature(can_feat_id(142537),
entry_id(142526),
type(exon),
location('zb:809..1020'),
quals([]),
change([])).
can_feature(can_feat_id(142532),
entry_id(142526),
type('5\'UTR'),
location('za:216..261'),
quals([]),
change([])).
can_feature(can_feat_id(142538),
entry_id(142526),
type('3\'UTR'),
location('zb:935..1020'),
quals([]),
change([])).
can_feature(can_feat_id(142539),
entry_id(142526),
type(misc_feature),
location('zb:1015..1020'),
quals([feat_id(10794),gb_note('polyA signal')]),
change([])).
Participants in the Project
The EpoDB group at CBIL consists of:
Collaborators at the Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia in creating GERD: