Version: 3.44

Date: 2007/03/01

Information on EPConDB Clone Sequencing

The Endocrine Pancreas Consortium has generated over 100,000 ESTs from over 20 mouse and human pancreatic and islet libraries. This effort is directed toward the identification of novel transcripts and incorporation into the custom microarray chip (PancChip). Clones have been submitted to the IMAGE consortium for distribution and all ESTs have been submitted to dbEST.

Genes expressed in pancreatic tissues have been identified in the following manner. All mouse (August 19, 2002) and human (Oct 2, 2002) ESTs were obtained from dbEST. These were used along with identifiable mRNAs in GenBank as part of the DoTS v.5 build. Sequences were trimmed of poor quality sequences, poly A's, and repeats masked and then clustered by running an all against all BLASTN matrix. Clusters were formed by a connected components analysis of all the BLASTN matches with minimum cutoff values of 92% identity and 40 base pair length. The clusters were assembled to form consensus sequences using the CAP4 algorithm packaged with the Paracel Transcript Assembler (www.paracel.com). The resulting consensus sequences were then clustered with BLASTN to form "genes" based on 95% identity and 75 bp overlap. The assembly consensus sequences are assigned a stable DT. identifier and subjected to a number of automated annotations including assignment of Gene Ontology functions and BLASTX homology to the non-redundant database (nrdb) of protein sequences at NCBI. Upon completion of the DoTS build, all assemblies containing an EST from one of the Consortium cDNA libraries were identified to generate the data presented here. Clone information was not used during the DoTS build process because errors in clone assignments lead to generating chimeric assemblies. Clone information was used to form assembly groups to compress DoTS assemblies to better represent transcripts.


NIDDK 3.4K clone set

To look at information for the set of clones that have actually been incorporated into the PancChip, please go to the NIDDK 3.4K page.

For information on all of the clones within the libaries, please read on.

EPConDB EST Libraries

To date, there is information available for 15 mouse and 7 human libraries. Analysis of these libraries has identified approximately 9,500 mouse and 14,000 human transcripts in pancreatic tissues of which 1,821 mouse and 2,529 human transcripts were unique to the Consortium-generated libraries. The list of these transcripts with links to the AllGenes site can be viewed through the file described below.

Libraries

More information on how the libraries were constructed and sequencing reports are available below:

ESTs

EST sequence was generated by the Darwin EST Production team (Deana Pape, Senior Coordinator), Genome Sequencing Center, Washington University in St. Louis School of Medicine (WU GSC).

Follow these links for details of EST records submitted to dbEST by the indicated date.

Information on the overlap of the library sequences to the first set of clones chosen for the PancChip was computed using incorporation/homology to DoTS consensus sequences.

DoTS Assemblies

We have collected human and mouse DoTS assemblies into two overlapping sets, assemblies that contain either only clones of interest and contain at least one clone of interest.

A full breakdown of the summary statistics are available on the Clone Library Assembly Analysis page.

Files are available at the DoTS Transcipt Download page.