Welcome to EpoDB


Overview

EpoDB stands for "erythropoiesis database." It is a tool for researchers studying red blood cells to obtain information on genes of interest. It is also a working project for database management system development, schema and database design for gene expression information, and computational analysis of gene sequences and expression data. Therefore, we have been working both on content and functionality of EpoDB as well as interface tools for annotation and access to the database.

A practical guide to the different web pages comprising the EpoDB web site is given in "Description of EpoDB pages." The intent and methodology for creation of EpoDB are described in "Purpose of EpoDB" and "Construction of EpoDB."

Description of EpoDB pages

The EpoDB server can be viewed with or without "frames" depending on the capability of the web browser in use. Both versions have the same content; the frame version has in addition a listing of query page links.

The Purpose of EpoDB

The overall purpose of creating the EpoDB database is to provide a powerful tool for understanding control of gene expression during erythropoiesis. In particular, the database is being built to study the organization of transcription regulatory elements as well as the dynamics of transcription factor interaction with those regulatory elements. To accomplish this, the content of the database will include both structural information as well as gene expression information represented in novel schemas and integrated with data analysis tools and algorithms such as BLAST and GenLang . The sources of information are from GenBank, Swiss-Prot, Transfac/TRRD entries and the literature. The quality of data obtained from GenBank is improved by removing redundancy, and eliminating syntactic/semantic errors. More powerful queries of EpoDB are possible than of GenBank because of greater structure, consistent and uniform annotation, and controlled vocabularies. Collaborations will also be available that allow for secure data.

Construction of EpoDB

Currently, EpoDB contains approximately 3715 GenBank and 1500 Swiss-Prot entries . Schematic representation of the entries is evolving, however, a first pass schema has been completed. First pass controlled vocabularies for gene names and gene family names have also been constructed. Present capabilities are to extract features and subsequences (e.g. retrieve proximal promoter region, -500 to +20 around start of transcription, for all beta-globin genes); and find transcription factor motifs using TESS. Efforts are underway to remove redundancy of gene entries, to incorporate "virtual" sequences linking related but non-contiguous sequences, and to increase gene expression annotation. Future plans are to expand the queries available to take advantage of protein, gene regulation and gene expression information being entered into EpoDB. EpoDB focuses on gene expression during red cell development and serves as a model database that can be easily extended to other hematopoietic lineages.


Entries in EpoDB: ( Return)
Entries were identified by keyword searches of Genbank and Swiss-Prot generally for erythroid genes as well as for specific genes of interest. Entries were individually checked and included only if expressed in, act as ligands for, or transported into vertebrate red cells. Duplicate entries were removed by taking the set union of the individual searches. To remove some unwanted entries arising from substring matches, searches for keywords such as myoglobin, uteroglobin, and nonerythroid were performed. The wanted-unwanted set difference was then taken.


Virtual Entries in EpoDB: ( Return)
Virtual entries are the products of merges between two or more Genbank entries. These are constructed to link fragments of a gene into a single entry and/or syntenic members of a gene family. The syntax for creating virtual entries is essentially the Genbank "join" feature with the addition of "gap"s to represent missing sequence of indeterminate length. Thus, a query for human alpha-spectrin will return a single entry consisting of 50 ordered exons instead of 50 entries containing individual exons. Likewise, a query for human alpha globin genes will return a single entry from zeta to theta (with intergenic gaps) instead of several entries including duplicates. A sample virtual entry for the goat alpha globin gene cluster is given below in prolog format.

class(entry_id(142526),yes,ref,virt,0,0,0).

na_seq_info(oid(142526),
            [locus_id([]),
             accession([]),
             na_seq_id(sid(142527)),
             length(10000),
             strand(text("ds")),
             mol(text("dna")),
             def(text("goat alpha-globin cluster")),
             extra_accessions([]),
             source([]),
             keywords([]),
             origin([]),
             date(std([year(1995),month(9),day(12)])),
             div(text("mam")),
             taxonomy(text("Eukaryota; Animalia; Metazoa; Chordata; Vertebrata; Mammalia; Theria; Eutheria; Artiodactyla; Ruminantia; Pecora; Bovidae.")),
	     organism(text("Capra hircus"))]).

comments(entry_id(142526),[text("contains the zeta, I alpha, and II alpha-globin genes")]).

na_seq(entry_id(142526),
       na_seq_id(sid(142527)),
       join([ za - gi(946):'1..568',
              g1 - gap,
              zb - gi(948):'1..1247',
              g2 - gap,
              a1 - gi(164123):'1..1894',
              g3 - gap,
              a2 - gi(164125):'1..1691' ]) ).

can_feature( source,         [location('za:1..a2:1691'),
                                can_feat_id(142554),
                                organism('Capra hircus'),
                                change(date(1995,9,27),author(cjs),[del(type(new)),add(type(source)),
                                    add(location('za:1..a2:1691')),add(organism('Capra hircus'))]
                                    )] ).

can_feature(can_feat_id(142540),
            entry_id(142526),
            type(cluster),
            location('za:1..a2:1691'),
            quals([gene_family_name([globin,'globin, alpha-like']),
                   components([can_feat_id(142528),can_feat_id(123846),can_feat_id(123858)])]),
            change([])).

can_feature(can_feat_id(142528),
            entry_id(142526),
            type(transcript_unit),
            location('za:1..zb:1247'),
            quals([gene_name(['zeta-globin']),
                   gene_family_name([globin,'globin, alpha-like']),
                   spatio_temporal_transcription([]),
                   components([can_feat_id(142529),can_feat_id(142530),can_feat_id(142531),can_feat_id(142532),
		               can_feat_id(142533),can_feat_id(142534),can_feat_id(142535),can_feat_id(142536),
                               can_feat_id(142537),can_feat_id(142538),can_feat_id(142539),
                   parents([])
                  ])]),
            change([])).

can_feature(can_feat_id(142529),
            entry_id(142526),
            type(mRNA_boundaries),
            location('za:216..zb:1020'),
            quals([]),
            change([])).

can_feature(can_feat_id(142530),
            entry_id(142526),
            type(mRNA),
            location(join('za:216..356','zb:538..742','zb:852..1020')),
            quals([]),
            change([])).

can_feature(can_feat_id(142531),
            entry_id(142526),
            type(exon),
            location('za:216..356'),
            quals([]),
            change([])). 

can_feature(can_feat_id(142533),
            entry_id(142526),
            type('CDS'),
            location('za:262..356,zb:366..570,zb:809..934'),
            quals([]),
            change([])).

can_feature(can_feat_id(142534),
            entry_id(142526),
            type('intron'),
            location('za:357..zb:365,'),
            quals([]),
            change([])).
   
can_feature(can_feat_id(142535),
            entry_id(142526),
            type(exon),
            location('zb:366..570'),
            quals([]),
            change([])). 

can_feature(can_feat_id(142536),
            entry_id(142526),
            type('intron'),
            location('zb:571..808'),
            quals([]),
            change([])).

can_feature(can_feat_id(142537),
            entry_id(142526),
            type(exon),
            location('zb:809..1020'),
            quals([]),
            change([])). 

can_feature(can_feat_id(142532),
            entry_id(142526),
            type('5\'UTR'),
            location('za:216..261'),
            quals([]),
            change([])).

can_feature(can_feat_id(142538),
            entry_id(142526),
            type('3\'UTR'),
            location('zb:935..1020'),
            quals([]),
            change([])).

can_feature(can_feat_id(142539),
            entry_id(142526),
            type(misc_feature),
            location('zb:1015..1020'),
            quals([feat_id(10794),gb_note('polyA signal')]),
            change([])).

Participants in the Project

The EpoDB group at CBIL consists of:

Collaborators at the Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia in creating GERD: