Penn Center for Bioinformatics

Computational Biology and Informatics Laboratory

Requirements Document

Annotator Interface 2.0

 

 


Requirements Document

Annotator Interface 2.0

1  Introduction

 1.1 Purpose of this document

This specification establishes CBIL’s requirements for the software product named Annotator Interface 2.0 and is identified as CBIL document number XX-XX-XXX. The intended audience is the analyst, programmer and user of Annotator Interface 2.0. 

1.2 Scope of this document

The scope of this document includes functional, interface, and performance requirements.  These requirements were generated from the input of software developers, annotators and management.

1.3 Overview

Annotator Interface 2.0 will be a generic graphical interface that facilitates the manual annotation of genes, RNAs and proteins in the Genomics Unified Schema (GUS).  The application will provide tools to support the creation of curated features and entries in the central dogma, as well as allow users to add new annotations for curated genes, RNAs and proteins. 

Version 2.0 of the Annotator Interface has several objectives.  The main objective is to transition from an “assembly-oriented” annotation process used in the current version of the interface to a more “gene-oriented” approach that makes use of available genomic sequence data and gene predictions.

The (ambitious) goal is to completely replace the current application with a generic, readily expandable application which includes an API that will allow it to be configured for any appropriate database.     

1.3 Definitions, Acronyms and Abbreviations

See Appendix A for Definitions, Acronyms, and Abbreviations.

1.4 References

See Appendix B for Applicable and Reference Documents.

1.5 Document Overview

1)      Section 1: Provides a brief description of the product as well as a general overview of this SRS document

2)      Section 2: Describes, in general terms, desired product functions, defined requirements, and the product perspective.

3)      Section 3: Outlines all software functional requirements to a level of detail which specifies an approach without specifying one particular design.

4)      Section 4: Outlines all software interface requirements to a level of detail which specifies an approach without specifying one particular design.

5)      Section 5: Outlines all non-functional software requirements to a level of detail which specifies an approach without specifying one particular design

2 General Description

2.1 Product Functions

Annotator Interface v2.0 will be a tool used to perform manual annotation of genes, RNAs and proteins.  To that end, the main function of the system will be to provide an interface for annotators to perform their annotation tasks.  This interface will facilitate updates to a database, as well as provide users with any tools that will aid them in making decisions regarding the annotation of a gene, RNA or protein.

 

2.2 Related System Information

For the most part, Annotator’s Interface v2.0 will be a stand-alone application.  However, it will interact with a database.  For example, at CBIL, the Annotator’s Interface will use the Genomics Unified Schema database (GUS).  Data will be read from and written to GUS using an object layer written in Java that interfaces with GUS.

2.3 User Characteristics

Users of this system will most likely be trained biologists.  They are expected to have experience with computer systems, and most will have had experience with manual annotation in the past.

2.4 User Problem Statement

The current version of the Annotator’s Interface was developed as a prototype and made to interact with GUS, however, the design is not optimal.   Furthermore, since the release of the current version of the interface, additional data has become available.  For example, genomic sequence and translated assemblies are available.  We would like to take advantage of this new data.

 

2.5 General Constraints

The annotation group is divided between the US and Russia, and for this reason, speed is a concern.    

3. Functional Requirements

This section lists the functional requirements of the Annotator Interface v2.0 in ranked order. These requirements describe the possible effects of the application, in other words, what the system must accomplish. Other kinds of requirements (such as interface, performance, or reliability requirements) describe related requirements necessary for annotators to perform these functional tasks and how the system accomplishes its functional requirements. 

 

These main functional requirements are separated based on the central dogma.  For example, we first list requirements related to a gene, followed by RNA, and then protein.   The main functional requirements correspond to tasks that directly modify one or more entries in the database.  Each of these requirements includes a definition of the corresponding evidence that may be added and has been assigned  a criticality based on ‘level of annotation’ assignment.  The criticality is one of high, medium, or low.  The ‘level of annotation’ is one of ‘surface’ or ‘deep’ representing surface level and detailed annotation.

 

The secondary functional requirements involve the definition of tools or information that must be provided to the user in order for a user to successfully utilize a feature represented in a main functional requirement.  For example, in order to a user to successfully create curated RNAs, a secondary requirement may be that all DoTS assemblies be aligned and available for analysis.

 

3.1 Gene Annotation Functional Requirements

3.1.1  Creation of curated gene model

Description

The interface shall provide the ability to create a curated gene model by selecting existing features for duplication and shall also allow users to modify any feature they create.

A curated gene model includes the following components:

-          definition of gene boundaries (creation of GeneFeature)

-          definition of exons (and introns) within the gene (creation of features whose parent is the above GeneFeature)

-          any other feature deemed appropriate (TBD)

By defining gene boundaries, an annotator defines the full sequence of a gene.   A curated gene model is a representation of a gene which, to the best ability of the annotator, defines that gene.

Evidence

Evidence for the GeneFeature includes other features used to create the curated GeneFeature, such as assemblies and gene predictions.

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

Fulfillment of this requirement must take into account that we may or may not have access to genomic sequence. Furthermore, the available genomic sequence may or may not be accurate and/or complete.

The creation of a curated feature must always result in creation of new entries in the corresponding feature tables.  Under no circumstance shall existing computational features be modified.  The only modification to existing features shall be to link them to a curated gene via the GeneInstance table.

Risks

No known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirement 3.1.9 as it is necessary for an annotator to view alignments on genomic sequence in order to properly evaluate and define curated gene models.

This requirement is dependent upon requirement 3.1.10 as it may be necessary for an annotator to view assemblies that are part of a given cluster of assemblies, i.e. “gene” which does not align to genomic sequence.

3.1.2  Linkage of existing GeneFeatures to curated gene

Description

The interface shall provide the ability to link pre-existing GeneFeatures to a curated gene via the GeneInstance table. 

Evidence

TBD

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

None directly related to this requirement, however, this requirement will impact the update process for GUS.   Future updates will need to consider which Gene to make the new predictions an instance of so we can avoid duplication of genes in the gene table. 

There is still an open discussion regarding this requirement and whether it will be fulfilled computationally, manually, or both.

Risks

No known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to linking existing genes to curated gene.

3.1.3  Assignment of category to GeneInstances

Description

The interface shall provide the ability to assign a category from a controlled vocabulary to each instance of a curated gene. 

Evidence

TBD

Criticality

This requirement is included in surface level annotation and thus has criticality of high. 

Technical issues

Need to create a GeneInstanceCategory table and define a controlled vocabulary.  Also, need to update the GeneInstance (currently GeneSequence) table to include a field for the gene_instance_category_id.

There is still an open discussion regarding this requirement and whether it will be fulfilled computationally, manually, or both.

Risks

No known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning categories to gene instances.

3.1.4          Assign Gene Name/Symbol

Description

The interface shall provide the ability for users to assign or confirm a gene symbol.  For example, this could be the HUGO or MGI approved gene name for human or mouse genes respectively.

Evidence

Evidence for the Gene Name/Symbol includes that source database where name/symbol was obtained or the mapping provided by the source database (eg. MGI-DoTS mapping, GeneCards-DoTS mapping).

Criticality

This requirement is included in surface level annotation and thus has criticality of high. 

 Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning a gene name/symbol.

3.1.5 Assign Full Gene Name

Description

The interface shall provide the ability for users to assign a full gene name.  For example, this could be the HUGO or MGI full gene name for human or mouse genes respectively.

Evidence

Evidence for the full gene name includes that source database where the full gene name was obtained or the mapping provided by the source database (eg. MGI-DoTS mapping, GeneCards-DoTS mapping).

Criticality

This requirement is included in surface level annotation and thus has criticality of high. 

 Technical issues

Should consider assigning MGI or HUGO gene names computationally.  For example, based on the MGI-DoTS mapping provided by MGI. NOTE: at the same time, we should store the links to MGI in the newly created Links table that can be used to join any entry in GUS with any external database.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning a gene name/symbol.

3.1.6          Assign Gene Synonymn(s)

Description

The interface shall provide the ability for users to assign one or more gene synonyms to a gene. 

Evidence

Evidence for the gene synonym includes that source database where the gene synonym was obtained or the mapping provided by the source database (eg. MGI-DoTS mapping, GeneCards-DoTS mapping).

Criticality

This requirement is included in surface level annotation and thus has criticality of high.

 Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning a gene synonym(s).

3.1.7          Assign Gene Alias(es)

Description

The interface shall provide the ability for users to assign one or more gene name aliases to a gene.  A gene alias is an alternative full gene name.

Evidence

TBD

Criticality

This requirement is included in surface level annotation and thus has criticality of high.

Technical issues

No outstanding technical issues, GeneAlias table created in GUS3.0

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning a gene alias.

3.1.8 Assign Gene Chromosomal Location

Description

The interface shall provide the ability for users to assign chromosomal location for gene.

Evidence

Evidence for gene chromosomal location includes RH mapping data, literature, OMIM, alignments (curated gene model).

Criticality

This requirement is included in surface level annotation and has criticality of medium. 

Technical issues

The plan is to make use of the GeneChromosomalLocation table.   We should attempt to populate based on the gene model.

Note that we will potentially know the chromosomal location based on alignments, however, may map to multiple locations. 

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning a chromosomal location.

3.1.10            Assign gene family

Description

The interface shall provide the ability for users to associate gene with known gene family.

Evidence

Evidence for gene family includes TBD.

 Criticality

This requirement is included in deep level annotation and has criticality of low. 

 Technical issues

This is not currently included in the annotation tasks, however, this should be discussed with the PSU as they expressed interest in assigning gene families.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning a gene family.

3.1.11            Assign gene category

Description

The interface shall provide the ability for users to associate gene with known gene category.  Categories will include such terms as pseudogene, paralog, non-coding protein.

Evidence

Evidence for gene category includes literature references and/or similarity to known non-coding RNA.

 Criticality

This requirement is included in deep level annotation and has criticality of low. 

 Technical issues

This is not currently included in the annotation tasks, however, this should be discussed with the PSU as they expressed interest in assigning gene families.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning a gene family.

 

3.1.12            Assign OMIM Link

Description

Annotator Interface 2.0 shall provide the ability for users to assign OMIM links to curated genes.

Evidence

No evidence will be associated with the link out to OMIM.

Criticality

This requirement is included in deep level annotation and has criticality of medium. 

Technical issues

Requires the creation of a table to hold the OMIM identifiers and descriptions.  This will provide a mechanism for assignment to genes.  Annotator should not type in text, or even copy and paste it.

We may be able to make an effort to computationally assign OMIM identifiers to genes.   OMIM is now searchable through Entrez and links to sequence data and MGD are available.  Also, they provide a mapping of gene symbols to OMIM identifiers, this may also be helpful in the automated assignment of OMIM identifiers.   The ENZYME database includes mappings of EC numbers to OMIM identifiers associated with a deficiency in the enzyme.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.1.1 as we require a “gene model” prior to assigning a link out to OMIM.

3.1.13            Display alignment of features and similarities on genomic sequence

Description

Annotator Interface 2.0 shall display alignments of features and assembly similarities on genomic sequence.

 Criticality

This requirement is included in surface level annotation and has criticality of high. 

 Technical issues

Just the technical issues as to how to best display alignments.

Risks

There are no known risks.

Dependencies on/with other requirements

There are no known dependencies.

3.1.13 Display of assemblies that do not align to genomic sequence

Description

Annotator Interface 2.0 shall provide a display of all assemblies that are part of a given cluster but do not align to genomic sequence.

 Criticality

This requirement is included in surface level annotation and has criticality of high. 

Technical issues

There are no significant technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

There are no known dependencies.

3.1.15                                                 Zoom to sequence level  for alignments

Description

Annotator Interface 2.0 shall provide the ability to zoom in to the level of the sequence  for all alignments.

 Criticality

This requirement is included in surface level annotation and has criticality of high. 

 Technical issues

Just the technical issues as to how to display alignments and allow users to zoom in to  the sequence level.

Risks

There are no known risks.

Dependencies on/with other requirements

There are no known dependencies.

3.2.16  Deletion of curated genes

Description

Annotator Interface 2.0 shall support the deletion of curated genes. 

Evidence

Evidence for the deletion of a curated gene will be a comment selected from a pull down menu on the interface (eg. Sequence change, mistake made during curation, etc).

Criticality

This requirement is included in surface level annotation and has criticality of high.  

Technical issues

Deletion of a curated gene must take into account any computationally generated instances of the gene, and also any computationally generated features that are pointing to the gene and/or its RNAs. 

We should discuss what we want to happen to these computationally generated GeneInstances. 

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA and 3.2.2 Linkage of existing assemblies to curated RNA.

3.2 RNA Annotation Functional Requirements

3.2.1 Creation of curated  RNAs

Description

Annotator Interface v2.0 shall provide the ability for users to create curated RNAs either by duplication (and potential modification) of existing features, or by manual creation.  A curated RNA is comprised of exons from a curated GeneFeature along with 5’ and 3’ UTRs.   Only curated RNAs will be manually annotated further.

Evidence

Evidence for a curated RNA includes features used to create the RNAFeature (eg. Assemblies) 

Evidence for a 5’ UTR feature includes literature references or promoters (known or predicted).

Evidence for a 3’ UTR feature includes literature references or poly-A signal.

Criticality

This requirement is included in surface level annotation and thus has criticality of high. 

Technical issues

Fulfillment of this requirement must take into account that we may or may not have access to genomic sequence. Furthermore, the available genomic sequence may or may not be accurate and/or complete.

In GUS, the creation of a curated RNA involves the creation of an RNAFeature, RNAFeatureExon(s), RNAInstance and RNA objects.  All curated objects should be marked as such using the review_status field (note: manually_reviewed to be changed to review_status in next schema update).

Risks

No known risks

Dependencies on/with other requirements

This requirement is dependent upon requirement 3.1.1 since one can only create a curated RNA from a curated gene model.   There is also a dependency on requirement 3.1.10 as we must be able to view the alignment of features and similarities on genomic sequence

3.2.2  Linkage of existing features to curated RNA

Description

The interface shall provide the ability to link pre-existing features to a curated RNA. via the RNAInstance table. 

Evidence

TBD

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

Currently the assemblies do not have a GeneFeature, and the notion of a GeneFeature is contained within BlastSim4 alignment.  However, in the future this information will be promoted to a GeneFeature.  Once this happens, care must be taken to update both the  RNAInstance for the assembly and to point the assembly to the new curated GeneFeature instead of its original GeneFeature.  This is necessary in order to maintain consistency in the database. 

NOTES: 

1) We should consider overriding the set methods for these fields so that we automatically update the related fields. 

2) This will impact the update process.

Risks

No known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirements 3.2.1 as we require a curated RNA prior to re-linking an assembly.

3.2.3  Assignment of category to RNA Instances

Description

The interface shall provide the ability to assign a category from a controlled vocabulary to each instance of a curated RNA. 

Evidence

TBD

Criticality

This requirement is included in deep level annotation and thus has criticality of medium for manual assignment. 

Technical issues

RNASequenceType should be renamed when the RNASequence table is renamed.  Suggest RNAInstanceCategory.  These categories will need to be defined.  The intent is to capture some notion of the degree of belief that this RNA instance represents the RNA, as well as reason they may not be identical at the sequence level. 

Risks

No known risks.

Dependencies on/with other requirements

This requirement is dependent upon requirement 3.2.1 as we require a curated RNA prior assigning RNA instance categories.  This requirement is also dependent upon requirement 3.2.2 as we will likely be assigning categories to RNA instances that represent an assembly that has been reassigned to the new curated RNA.

3.2.4  Assign RNA description

Description

The interface shall provide the ability to assign or confirm descriptions for curated RNAs.

Evidence

Evidence for the RNA description will be the gene from which it is derived.

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  

Technical issues

Considering the computational assignment of curated RNA descriptions based on the gene name/synonym/description.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA, as we only manually annotate curated RNAs.

3.2.4  Assign RNA categories

Description

The interface shall provide the ability to assign categories to RNAs.

Evidence

Evidence for the RNA category includes literature references and/or the definition of the RNA with respect to the gene from which it is derived.

Criticality

This requirement is included in surface level annotation and has criticality of medium.  

Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA, as we only manually annotate curated RNAs.

3.2.5  Assign anatomy RNA is known to be expressed in

Description

The interface shall provide the ability to view and associate anatomy terms with the curated RNA , specifying that the RNA is known to be expressed.

Evidence

Evidence for the assignment of Anatomy to curated RNA includes literature references, the assembly anatomy percent from DoTS, and/or RAD experiments.  Also, annotator must specify the supporting lines of evidence (eg. RAD confirmed, literature confirmed, etc).

Criticality

This requirement is included in deep level annotation and has criticality of medium.  

Technical issues

Current plan is to use the Anatomy, AnatomyLOE, RNAAnatomy, and RNAAnatomyLOE tables.    Under this schema, the specific evidence (such as the actual literature reference) would have to be associated with the RNAAnatomyLOE entry where the LOE is literature.

Consider computational assignment of Anatomy to RNA (DoTS assemblies) based on experiments.  Would require someone to define the acceptable rules for association.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA, as we only manually annotate curated RNAs.

3.2.6  Assign GO terms to curated RNA

Description

The interface shall provide the ability to view and assign GO terms to curated RNAs.  Some RNAs are not translated, however do have function.  This requirement allows us to capture this information.

Evidence

Evidence for the assignment of GO terms to curated RNA includes source database references and/or literature references.

Criticality

This requirement is included in deep level annotation and has criticality of medium.  

Technical issues

Associations will be made in GUS using the generic GOTermAssociation table.   In future could consider possible ways to generate computational annotations.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA, as we only manually annotate curated RNAs.

3.2.7  Blast RNA Instances against one another

Description

The interface shall provide the ability to blast instances of the same curated RNA against each other.

Criticality

This requirement is included in surface level annotation and has criticality of medium.  

Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA and 3.2.2 Linkage of existing assemblies to curated RNA.

3.2.8  Blast RNAs within a ‘gene cluster’ against each other

Description

The interface shall provide the ability to blast RNAs in a ‘gene cluster’ against one another.  This is the equivalent to the ‘self blast’ feature on the existing interface.

Criticality

This requirement is included in surface level annotation and has criticality of medium.  

Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

There are no known dependencies.

3.2.9  Blast curated RNA against all members of a gene cluster containing a specified assembly

Description

The interface shall provide the ability to blast a curated RNA against all members of a specified gene cluster. 

Criticality

This requirement is included in surface level annotation and has criticality of medium.  

Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

There are no known dependencies.

3.2.10  Automated annotation of curated RNA

Description

Annotator Interface 2.0 shall initiate the automated annotation of a new curated RNAFeature.  This annotation will include all annotation that is performed on a computationally generated RNAFeature/Assembly.   Automated annotation includes the following:

-          TBD - Define all computational annotation tasks here.  These will likely be the same as for the DoTS assemblies.

Criticality

This requirement is included in surface level annotation and has criticality of high.  

Technical issues

Should attempt to accomplish this in the shortest amount of time possible.  Could perhaps make use of the workflow engine.  This would likely require a streamlined version of some of the plug-ins in order to guarantee swift completion.  For example, the GO function predictions have some large queries up from that could perhaps be avoided through streamlining. 

Also, should provide notification to individual annotators when their curated RNAs are ready for annotation.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA and 3.2.2 Linkage of existing assemblies to curated RNA.

3.2.11  Deletion of curated RNAs

Description

Annotator Interface 2.0 shall support the deletion of curated RNAs. 

Evidence

Evidence for the deletion of a curated RNA will be a comment selected from a pull down menu on the interface (eg. Sequence change, mistake made during curation, etc).

Criticality

This requirement is included in surface level annotation and has criticality of high.  

Technical issues

Deletion of a curated RNA must take into account any computationally generated RNA instances that were instances of that curated RNA. 

We should discuss what we want to happen to these computationally generated RNAInstances.  Do we create a RNA and point all the computationally generated instances to it?  If so, what do we do about its gene_id (same issue at the feature level).

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA and 3.2.2 Linkage of existing assemblies to curated RNA.

3.3 Protein Annotation Functional Requirements

3.3.1 Create/Confirm curated protein sequence

Description

The interface shall provide the ability to confirm curated protein.

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

Protein sequence should be generated from reviewed RNA as part of requirement 3.2.16 Automated annotation of curated RNAs.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on requirement 3.2.1 Create curated RNA and 3.2.16 Automated annotation of curated RNAs.

3.3.2 Assign GO Molecular Function terms

Description

The interface shall provide the ability to confirm, delete, or assign GO molecular functions to curated proteins.

Evidence

Evidence for the association of GO molecular function terms with a curated protein includes literature references, database references and/or similarities.

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

As written, this requirement depends on “3.3.1 Create/Confirm curated protein”, as we only assign post translational modification categories to curated proteins.

3.3.3  Assign GO Biological Process Terms

Description

The interface shall provide the ability to add or delete GO biological process terms to curated proteins.

Evidence

Evidence for the association of GO biological process terms with a curated protein includes literature references, database references and/or similarities.

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

Need to load the available MGI and Human GO Process associations.  Can then use the DoTS-MGI mapping to computationally associate process terms with assemblies.  May be able to use similarities to known human proteins to associate process terms to human assemblies.  These ‘predictions’ could aid annotators in their attempt to assign GO biological process terms.

Risks

There are no known risks.

Dependencies on/with other requirements

As written, this requirement depends on “3.3.1 Create curated protein sequence”, as we only assign GO biological process terms to curated proteins.

3.3.4  Assign GO Cellular ComponentTerms

Description

The interface shall provide the ability to add or delete GO Cellular Component terms to curated proteins.

Evidence

Evidence for the association of GO cellular component terms with a curated protein includes literature references, database references and/or similarities.

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

Need to load the available MGI and Human GO Cellular Component associations.  Can then use the DoTS-MGI mapping to computationally associate component terms with assemblies.  May be able to use similarities to known human proteins to associate component terms to human assemblies.  

Risks

There are no known risks.

Dependencies on/with other requirements

As written, this requirement depends on “3.3.1 Create/Confirm curated protein”, as we only assign post translational modification categories to curated proteins.

3.3.5  Assign Protein Name

Description

The interface shall provide the ability to add or delete a name for a curated protein.

Evidence

Evidence for the assignment of a protein name to a curated protein includes literature references, database references and/or similarities.

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

As written, this requirement depends on “3.3.1 Create/Confirm curated protein”, as we only assign post translational modification categories to curated proteins.

3.3.6  Assign Protein Synonyms

Description

The interface shall provide the ability to add or delete synonyms for a curated protein.

Evidence

Evidence for the assignment of a protein synonyms to a curated protein includes literature references, database references and/or similarities.

Criticality

This requirement is included in surface level annotation and thus has criticality of high.  This requirement must be fulfilled.

Technical issues

There are no known technical issues.

Risks

There are no known risks.

Dependencies on/with other requirements

As written, this requirement depends on “3.3.1 Create/Confirm curated protein”, as we only assign protein synonyms to curated proteins.

3.3.7  Assign post translational modification categories

Description

The interface shall provide the ability to assign post translational modification categories.

Evidence

Evidence for the assignment of a protein category to a curated protein includes literature references and/or database references.

Criticality

This requirement is included in deep level annotation and thus has criticality of low-medium.  

Technical issues

Requires the creation of a ProteinCategory table in GUS and an additional protein attribute, or linking table to associate categories with proteins.  Also, the categories need to be defined.

Perhaps this should be discussed in more detail.

Risks

There are no known risks.

Dependencies on/with other requirements

This requirement depends on “3.3.1 Create/Confirm curated protein”, as we only assign post translational modification categories to curated proteins.

3.3.8  Assign protein interactions

Description

The interface shall provide the ability to assign known interactions involving a curated protein.

Evidence

TBD

Criticality

This requirement is included in deep level annotation and thus has criticality of low-medium.  

Technical issues

New pathway and interaction related tables have been created in GUS3.0.  These include Pathway, PathwayInteraction, Interaction, Complex, and ComplexComponent.

Do we need to capture information defining how they interact, or only that there is an interaction?  Need to consider evidence for such assignments.

Would like to load known interactions into GUS and use in some way to attempt to computational define interactions?  This way, annotators would be confirming or deleting, but could also add interactions.

Risks

Should give more thought as to what we want to do with this.  Seems that it could be very useful.  Open question – how much do we want to store in GUS vs. simply linking out to other sites?

Dependencies on/with other requirements

There are no known dependencies.

3.3.9  Assign pathway(s) protein is known to be involved in

Description

The interface shall provide the ability to assign pathways a curated protein is involved in.

Evidence

TBD

Criticality

This requirement is included in deep level annotation and thus has criticality of low-medium.  

Technical issues

New pathway and interaction related tables have been created in GUS3.0.  These include Pathway, PathwayInteraction, Interaction, Complex, and ComplexComponent.

Will want to load known pathway information into GUS.  Could possibly use in some way to attempt to computational associate proteins with pathways?  This way, annotators would be confirming or deleting? 

More thought is needed regarding the manual assignment of curated proteins with pathways.  Would want to associate curated protein with a specific protein in the pathway, not simply the pathway as an entity itself.  This will require the ability to have multiple instances of a protein in a pathway.  This could possibly be accomplished using the ProteinInstance table, but would require merging proteins, and this has not been addressed yet.

Risks

More thought is needed regarding the manual assignment of curated proteins with pathways. 

Dependencies on/with other requirements

This requirement may depend on “3.2.7 Assign protein interactions, depending on how we intend to fulfill requirement.

3.3.10  Assign protein family

Description

The interface shall provide the ability to associate this protein with a protein family.

Evidence

TBD

Criticality

This requirement is included in deep level annotation and thus has criticality of low-medium.  

Technical issues

No known technical issues.

Risks

No known risks.

Dependencies on/with other requirements

As written, this requirement depends on “3.3.1 Create/Confirm curated protein”, as we only assign protein families to curated proteins.

3.2.11  Deletion of curated proteins

Description

Annotator Interface 2.0 shall support the deletion of curated proteins. 

Evidence

Evidence for the deletion of a curated protein will be a comment selected from a pull down menu on the interface (eg. Sequence change, mistake made during curation, etc).

Criticality

This requirement is included in surface level annotation and has criticality of high.  

Technical issues

Deletion of a curated protein must take into account any computationally generated instances of the protein. 

We should discuss what we want to happen to these computationally generated ProteinInstances. 

Risks

There are no known risks.

Dependencies on/with other requirements

As written, this requirement depends on “3.3.1 Create/Confirm curated protein”, as we only delete curated proteins.

 

4. Interface Requirements

 

This section describes how the software interfaces with other software products or users for input or output. Examples of such interfaces include library routines, token streams, shared memory, data streams, and so forth.

 

4.1 User Interfaces

          4.1.1 GUI

This product will include a Graphical User Interface consisting of three main components.  The main component will be a “Gene” page.  This page will provide the main view of the gene, including alignments to genomic sequence when available.  The second component will be a “RNA page” which provides various information regarding specific RNAs and allows users to create and modify annotation regarding a specific RNA.  The third component will be a “Protein page”.  This page will allow users to view existing annotation for a protein as well as assign new annotation as described in the section of this document entitled “Functional Requirements”.

A graphical user interface will be provided.  The initial window will consist of a login window.  After successful login, the user will be presented with a window allowing them to select “genes” or a region of genomic sequence that they wish to annotate.

 

Selection of a gene or region of genomic sequence will cause the interface to display a “Gene Page”.  This window will include alignments of various features to genomic sequence.   These features include, but are not limited to, assemblies and gene predictions. This window will also provide a mechanism for allowing the user to create a gene model. 

 

INCLUDE MOCKUPS HERE….

 

The “Gene Page” will include menus to control the display.  For example, there will be a menu option to display or hide each feature aligned to genomic sequence.  Another menu option is to display “surface” vs. “deep” annotation options.

 

          4.1.2 CLI

Annotator Interface 2.0 does not include any command line interfaces.

 

          4.1.3 API

TBD

 

          4.1.4 Diagnostics

TBD

4.2 Hardware Interfaces

This system does not interface with any hardware devices.

 

 4.3 Communications Interfaces

TBD - Describe network interfaces here.

 

 4.4 Software Interfaces

          4.4.1 Database

The Annotator Interface v2.0 will query an Oracle database through the use of a Java Object Layer. There will be no SQL embedded in code in any way. 

 

Updates to the Oracle database will be made through the use of a Java Object Layer.  Preliminary plans include the use of Java RMI with a single remote object to handle database communication and object passing.  Classes for all the GUS tables/views will be created in a manner similar to the Perl Object Layer for GUS.  However, as mentioned previously, database communication will be centralized in a remote GUSServer object.   A code generator will be written similar to the Perl code generator.

 

THERE SHOULD BE A SEPARATE DOCUMENT DISCUSSING THE JAVA OBJECT LAYER.

 

To be added: Discuss why straight JDBC is not a good idea for updates.  Roughly – does not allow easy capturing of all tracking information that makes GUS so strong.  For example, algorithm invocation identifiers, user ids, etc.  In order to use straight JDBC for updates, this would be a mess and the risk of error too high.

 

Still under consideration:

 

- BioJava Objects on top of the GUS objects.  This has the potential of speeding up development time dramatically and PSU could help here.   This would likely require some enhancements to the current BioJava objects.  Will have more detailed discussions when visitors from Sanger are here. However, a decision on this should be decided before development of client software commences as it will have significant impact.

 

          4.4.1 Links

The Annotator Interface v2.0 will provide links out to the following web sites:

n        allgenes-gusdev

n        ProDom

n        CDD

n        Any source protein database

n        OMIM

Others?

URLs for these sites will be stored either in GUS, or in a configuration file.  Under no circumstances shall a URL be embedded in code.

5 Non-functional Requirements

5.1 Performance Requirements

The minimum machine currently used by an annotator is an Intel Pentium (P54) 133 MHz; 64 Mb RAM. 

5.3 Security Requirements

Access to Annotator Interface 2.0 shall be password protected.  Users will be required to register a username and password.  This username and password shall be recorded in GUS.  Each user shall be assigned a unique user_id which will be used to track all modifications made to GUS by that individual.

5.5 Business Rules

All Business rules shall be clearly defined in an Annotation Protocol.

5.6 User Documentation

A tutorial, user’s guide and annotation protocol shall be provided to all users of the system.

Appendix A: Glossary

To be completed

 

Appendix B: References

To be completed