Description of the Trypanosoma brucei Database of Clustered ESTs


 

Generation of Database

The Trypanosoma Database was generated by aligning 3983 Trypanosoma sequences contained in dbEST as of November 22, 1998. These sequences were first sorted by length, PolyA tracts trimmed and regions from both the 5' and the 3' end containing greater than 25% N's were removed. Sequences of less than 20 bp were eliminated resulting in 3983 sequences. These sequences were aligned using the cap2 program (Xiaoqiu Huang, 1996, Genomics 33, 21-31). This output file was then parsed to generate the files containing a library of consensus sequences and the clustering information.


Description of the Database

The Trypanosoma consensus database contains 2496 consensus sequences, 2041 of which are singletons. 4 different cDNA libraries were used as templates for sequencing.


Analyses of the Database

A number of analyses were done to attempt to assign relatedness or function to the consensus sequences. These analyses are presented here in text form. The files are quite large and are better accessed by using a text search or via the individual sequences which are linked from within the database when information is available.


Questions and complaints to brunkb@cbil.humgen.upenn.edu
Contents by Brian Brunk.