
There are two types of things you can use EpoDB for.
What makes EpoDB different from other databases is that we:
Find entries in EpoDB based on:
So the Result table gave me the mouse erythropoietin receptor. Is there any way I can get the
human?
The answer to this is pretty much the same answer for the previous question. Try the "text search"
query. Type "erythropoietin receptor" (including the quotes) and you'll get 17 DNA entries or
3 protein entries which include the human gene.
What's with all the id numbers and computerese? Can't you just give us the relevant
information?
Sorry. The next version of EpoDB will get rid of all that stuff. It helps us keep track of
whether things are working right but you don't need to see it.
I'll have to take your word that the graphical version is spiffy because it never worked on
my computer. Any suggestions?
Yes, use Netscape 4.05 on a PC or UNIX station. The problem is that the programming language we use
to make the graphical views (Java) needs the latest version for it to work right - the Mac version
only sort of works.
Why are you showing the GenBank information twice?
The two versions (EpoDB's and the original GenBank) are not the same. For example, the GenBank
version does not come with 5'UTR and 3'UTR features. You have to figure them out which is what we
did with a computer program. So our version gives you the COMPLETE annotated gene. The GenBank
version is provided so you can see what we based our version on. Sometimes it's an opportunity
to show off because we can deduce a lot from minimal GenBank data. Othertimes, our program
doesn't work (like on the non reference genes) so you can at least get the GenBank info.
Retrieval of sequence for gene and gene family entries
as specified by:
Search gene sequences for:
First we'll get the proximal promoters of fetally-expressed gamma-globin genes, then get the promoters for embryonically-expressed gamma-globin genes, and then aligh them together.
Ta Dah! You should get a Result of EpoDB Query Result page with the information you specified listed at the top.
This is followed by a table with 10 entries; the first two are sheep and goat followed by G-gamma and A-gamma globin promoters for
chimpanzee (pan), gorilla, orangutan (pongo), and humans. The gamma-globin gene was duplicated and recruited for fetal
expression in old world apes. The beta-like globin genes in ungulates (sheep, goats, cows)
went through a different type of duplication. In fact the whole cluster (epsilon, gamma, eta, and beta) was triplicated and one
of the beta gene descendants was recruited for fetal expression.
For our analysis let's just focus on the apes to keep things simple and focussed on the gamma-globin gene.
Now you've got a table with 37 entries! As before, we will just focus on the gamma-globin genes.
Note that not all the entries are 200 bp long even though that's what was requested. The EpoDB query returns
whatever it can in the range specified so if only 126 bp are available for lemur (#14) that's what you get.
OK. Now you have the embryonic promoters for gamma-globin genes (a rabbit, a lemur, and some monkeys) in a box ready to be aligned. Add the fetal globin promoters to the list and hit the "perform alignment" button.
Presto! You've got an alignment of the gamma promoters; the top 5 sequences are the embryonic followed by the 8 fetal. The "+" at the bottom indicates complete conservation which you can see around the "TATA" (actually the AATAAA box in this case) and upstream of the CCAAT box. Note that just upstream of the TATA box is a sequence "GGCGGCTGGCT" which is well conserved among the fetal 8 but not in the embryonic 5! Could this be something special like a potential transcription factor binding site?
Let's check it out.
This time instead of selecting entries to align we will use the TESS link.
A new browser window pops up with our sequence ready to be scanned for potential transcription factor binding sites using the TESS (Transcription Element Search Software).
Wow. Of the 649 human transcription factors in the Transfac database, 416 had binding sites which could be used to search our
sequence. These are listed at the bottom of that page. Let's focus on our region of interest which is around 00151.
Hey, there's an AP-2 site (T00035) which overlaps our sequence "GGCCGGCGGCgGG." Maybe it's involved in the fetal recruitment.
It's back to the lab to find out but you get the idea as to why retrieving sequence and analysing can be useful.
Let me know if you find out anything!
-Chris Stoecker