PERL Object Layer for Oracle or Sybase Relational DBMS


The perl object layer on a relational dbms is intended to simplify database interactivity and manage relationships and submits.  The query capability is currently very limited.  Functionality includes:

NOTE:  There are many more methods in the superclasses including methods for querying (DbiTable) and getting meta data from the db.

Methods by functional groups: (the most useful methods are indicated in red ).  Most of the others are convenience methods or could be considered private methods of the objects.

DbiDatabase login/logou t:  Used to create database login which allows all subsequent functionality.
Getting and Setting attribute values
Retrieving objects from the database
Managing object cache
Setting default values
Increasing the maximum number of objects allowed in RAM
Setting global NO version
Instance level management of versioning and updating:
Checking read and write permissions
Managing global behavior of objects
Managing memory (undefPointerCache!)
Managing valid parents/children
Managing Children
Managing Parents
Managing submits to database:
Managing deletes
Similarity facts for storing sequence similarities
Methods for dealing with sequence
Parsing XML into objects
Comparing attribute values
Retrieving table ids and primary key attributes given name
General methods for dealing with facts
Miscellaneous method
Special objects (hand coded so have extra functionality).

DbiDatabase Methods:  Establish logins, retrieve meta information:

DbiDatabase is the module that is used to create logins to the database and from which all the other objects are generated.  Thus, to start a session, you simply create a new DbiDatabase and you are then able to use all the objects representing specific tables in that database.  DbiDatabase also has methods for getting meta information relevant to that database (such as getTables()) in addition to getTable(tableName) which returns a DbiTable for tableName.  This enables users to access meta information for this specific table such as foreign key relationships, primary key attributes, etc.  These methods, however, will in most cases be unecessary as the Table specific objects take care of all this for the user.

DbiDatabase->new( DBI_DSN, login,  password, verbose,  noInsert, default, dbName)
    This creates a DbiDatabase that allows the use of all the following methods on specific objects.  DBI_DSN is optional if you have your environment variable of this same name set.  login, password, and dbname are required where dbname is case sensitive such as "GUSdev".   Verbose if true prints all SQL statements executed and noInsert if true does not commit to the database although all sql is executed and then rolled back.  The objects manage these attributes of the Database with setVerboseOn() and setCommitOff() (see below ).

    Ends this session and disconnects all the handles generated.

Additional methods are available if needed to retrieve meta information from the database such as a listing of all the tables/views etc.


TableName->new($dbiTableHashRef, $primary_key_list, $dbhandle)
    This will create an object for TableName.  All aguments are optional.  I normally only pass in the hash reference of attribute values if I want to set some values this way.  The primary key list is only necessary if the table does not have a primary key defined (which all GUS objects do) and dbhandle only if you are using a database that is different than that logged into with the DbiDatabase constructor which establishes the necessary database handles for the objects to interact with the database.

Getting and Setting attribute values:
Attribute values can be set in two ways.  The first is by passing in a hash ref of attribute => value pairs to the constructor.  The second is via set methods in each object of the form:

setRowUserId($userId); for the attribute "row_user_id".

Alternatively, one can use the set method in the super class directly.    NOTE: this is in general not a good idea as this bypasses any specific functionality that may be associated with the set method of the specific objects.  For instance, the setSequence method of the sequence objects removes returns, uppercases all characters  and sets  the length attribute so there is a consistent representation in the db.


Get methods follow this same syntax except do not take in the value but rather return it.



Getting substrings from  CLOB attribute types.  This is particularly useful if one hasn't retrieved the CLOB attribute (see below) to save retrieving a very long sequence, for example, into memory.  The method is a generic one to allow retrieving substrings from any CLOB attribute:

getSubstrFromClob($attribute,$start,$length)  Note that the start here is indexed from 1.  To retrieve a sequence substring from a sequence object one would use
    $sequence->getSubstrFromClob('sequence',2334,1000) #would retrieve 1000 characters from the sequence              attribute starting at position 2334.

Retrieving objects from the database:
Enough attributes must be set to ensure a single row is retrieved.

    returns 1 if successful and populates the objects attribute values from the database.  If unsuccessful returns 0 and leaves object unaffected.  If multiple rows are returned then is considered unsuccessful.  The query to the database simply constrains all the attributes that have been set.  One can choose to NOT retrieve specific attributes by passing in an array reference of attribute names that should not  be retrieved.  This can be extremely useful for objects that have very large atts such as sequence if one only wants other attributes or just a substring from the sequence (see previous command).

Managing Object Database Cache:
Objects are cached when retrieved from the database either with retrieveFromDB or getParent/Children methods.  In the case of getParent/Children, if an object is requested that is already in the cache, the object in the cache is returned/added so that only a single object representing the tuple is present at any given time.  RetrieveFromDB always retrieves the object from the database even if there is already a copy in the cache and replaces the current cached copy if the optional $replaceCache variable is used.  Users should check the cache to see if an object already exists before retrieving from the db if there is a chance that they are constructing duplicate objects.   NOTE:  the key of the cache is a concatenated string of the primary key values.  This is generated by the method getConcatPrimKey().  In most cases (when there is a single primary key attribute) this will be just the value of the identifier returned by getId().

    adds objects to cache...this is done automatically when objects are retrieved from the database so shouldn't be called by users.

    returns the object if it exists else undef.

    returns 1 if in the cache and undef otherwise.   Note for testing purposes this is very similar to getFromDbCache.

    removes the object ($object) from the dbCache.

Setting default values:
These methods are used for setting default values which all subsequent objects created in the application will inherit. On submit these values will be set  automatically.

setDefaultProjectId($project_id) (default = 0 (GUS))
setDefaultGroupId($id)  (default = CBIL)
setDefaultUserRead($read)  (default = 1)
setDefaultUserWrite($val)  (default = 1)
setDefaultGroupRead($val)  (default = 1)
setDefaultGroupWrite($val)  (default = 1)
setDefaultOtherRead($val)  (default = 1)
setDefaultOtherWrite($val)  (default = 0)

Increasing the maximum number of objects allowed in RAM at one time (default is 10000):
This method is necessary to prevent memory leaks due to forgetting to call undefPointerCache() in each loop to release objects for garbage collection.  Unless the user needs more than 10000 objects at one time (which seems very unlikely) then these methods should be uneccessary.


Setting global NO version (default = 0):
While this functionality defies the initial design of GUS, it is needed for dealing with things like draft sequence where one wants to simply wipe the database of the current version and replace it with something new.  No objects are versioned.


Instance level management of versioning and updating:
Indvidual objects can be specified to be not versioned or not updateable.  Default is to version and allow updates for all objects.

setVersionable($v) (default = 1)
setUpdateable($u) (default = 1)

Checking read and write permissions:
Methods to check the read/write permissions based on the default group and user.  Enforced when retrieving from database or retrieving children or parents.


Managing global behavior of objects (commit, verbose, debug):
These parameters (once set) take affect on all objects.  Defaults:  commit = on, verbose = off, debug = off.

    Reports debug statements for methods in RelationalRow.

    Turns on printing to stdout all SQL statements.

    All submits will be rolled back rather than committed.  Should not be turned off except for testing on a few loops.

Managing memory (allowing garbage collection of objects that are out of scope):
Necessary because of perls lack of a garbage collector that can deal with circular references.  Only method really necessary as all others are automatically taken care of is undefPointerCache() which must be called in each loop. Note that this removes all objects from the pointer cache so all parent/child relationships are lost and must be re-established if needed.

    If $recursive = 1 then will call method on all children recursively. 

    Add objects to the pointer cache....done automatically when objects are created rarely if ever needs to be called.
    Method to retrieve the object from the pointer cache.
    Removes single object from the cache.
    MUST be called in each loop to allow garbage collection.  Removes all child and parent pointers so they can not be retrieved.

Managing valid parents/children (code generator takes care of this).
The objects keep a list of valid parents and children.  Assuming the code generator is working correctly, these methods should not need to be used.

    @list is an array of array references [ChildTableName,My_primary_key,Childs_foreign_key_to_me]



Managing Children (objects with my primary key as a foreign key):
Methods for managing children including retrieving from the database, getting (returning children) marking children deleted and submitting.  NOTE that submitting children is automatically done whenever an object is submitted and should be unecessary (in fact the methods for managing the transaction may NOT work appropriately if one submits children manually).  Submitting self results in submitting self  and then all children within a single transaction.

getChildren or retrieveChildrenFromDB with one of the ImpClasses will bring back the appropriate subclass children/parent of that ImpClass. For example: getChildren('NAFeatureImp',1) will get any of the NAFeature subclass children such as RNAFeature, GeneFeature, ExonFeature etc.  In this way, one can retrieve all the features associated with a given NASequence entry.

getChildren($className, $retrieveIfNoChildren, $getDeletedToo, $where,\@doNotRetrieveAttributes)
    Returns an array of children in classname.  If $retrieveIfNoChildren = 1 then does a retrieveFromDB if there are currently no children of this className.  If $getDeletedToo = 1 then will also return children that have been marked deleted.  $where is a hash ref where children returned must meet all the attribute (key) => value (value) pairs in the hashref.  If the $className is one of the "Imp" tables such as NASequenceImp, then the subclass objects of the correct type are ExternalNASequence, Assembly as appropriate.  The last argument is an array reference of attbibutes that should not be retrieved should these children need to be retrieved from the $retrieveIfNoChildren = 1.

getAllChildren($retrieve, $getDeletedToo, $where)
    Returns all children.

getChild($className, $retIfNochildren, $getDeletedToo)
    Returns one child (NOTE: only useful if you know there is only one child as returns the first one on the list if more than one).

    Replaces children in this childs class with this child.  Should use addChild for adding children.

retrieveChildrenFromDB($className, $resetIfHave, $where,\@doNotRetrieveAttributes )
    Retrieves children in $className from the database.  If $resetIfHave = 1 will reset any children that you already have to the values from the database.  NOTE that any relationships established on the children that are reset will be lost.  $where hash ref as for getChildren().  Note that using getChildren($className,1) will call retrieveChildrenFromDB if they have not already been retrieved and so save a step.

resetChildrenToDB($className, $where)
    Removes all the children with this className and retrieves them fresh from the db.

retrieveAllChildrenFromDB($recursive, $resetIfHave)
    Retrieves all this objects children.  If $recursive = 1 will do this recursively.  NOTE that this has the potential for retrieving a huge number of objects if called from something like Group.  $resetIfHave resets children you already have to the database.

    Retrieves all children following removal of all existing ones.

addChild($c, $resetIfHave)
    Method of choice for adding a child.

    Adds an array of children.

    Remove child from self.  NOTE that this has NO impact on the database....simply removes pointer between parent and child.




Managing Parents:
Similar to managing children.  Difference is that can only have one parent of each type.

    Sets this object as my parent.  The parent likewise points to me as its child.  Thus calling $p->addChild($self) is equivalent to $self->setParent($p)

getParent($className, $retrieveIfNoParent ,\@doNotRetrieveAttributes)
    if $retrieveIfNoParent = 1 then calls retrieveParentFromDB if does not have that parent.  If the $className is one of the "Imp" tables such as NASequenceImp, then the subclass objects of the correct type are ExternalNASequence, Assembly as appropriate.

getAllParents($retrieveIfNoParent,\@doNotRetrieveAttributes )

retrieveParentFromDB($className ,\@doNotRetrieveAttributes)


    Sets my foreign key to be the Id of this parent.  Note that this occurs automatically upon submit to the database.



Managing submits to database:
Principal here is that the top level object (an object with only children) should be submitted.  That object following submitting itself, will submit all its children recursively.  Simple many-to-many relationships are traversed in order to set the foreign keys of the relation, however, the parent on the far side of the relation will not submit any of its children.  All objects submitting with a single submit command are submitted in a single transaction so that if something fails, all are rolled back together to mantain database integrity.  All tuples that are updated are versioned by default and sequences are stored (and versioned) appropriately.  Evidence  and Similarity facts are also submitted if they have been set (added).  Upon delete, all Evidence and Similarity tuples that relate to the object are also deleted (and versioned of course!).  Additionally, objects can store other objects to be  submitted after they have submitted themselves and all their children via the method addToSubmitList($object).

Note that submit and perhaps manage transaction are the only methods in this group that need to be called by user.  The others are automatically taken care of appropriately.

submit($notDeep, $noTran)
    Automatically submits inside a transaction.  $notDeep = 1 only submits self but not children.  $noTran = 1 does not begin or (when finished) commit a transaction.  Used by all children so are submitted in the initial objects transcation.

manageTransaction($noTran, $task)
    This is dealt with automatically, However, if one wants to submit several top level objects in a single transaction, you can use manageTransaction to start and then commit a transaction and submit with noTran the objects in between.  $task is either 'commit' or 'begin'.

    Add $object to a submit list so that it gets submitted after I submit myself and all children (in the same transaction).  This is the best way to submit objects on far side of a many-to-many relation  that have children and thus need to be submitted explicitly.

    This can be used during an update/submit to force a roll back at end of submit if errors are detected.

submitEvidence($e, $notDeep, $noTran)

submitAllEvidence($notDeep, $noTran)

deleteAllEvidence($notDeep, $noTran)

submitSimilarityFact($fact, $notDeep, $noTran)

submitAllSimilarityFacts($notDeep, $noTran)

deleteSimilarityFact($f, $notDeep, $noTran)

deleteAllSimilarityFacts($notDeep, $noTran)
    All similarity facts of an object are deleted upon submit if that object has been marked deleted.









    Sets all the attributes for which defaults have been set.

Managing deletes:
Deletes occur in two phases.  Objects must first be marked for deletion at wich time they can only be retrieved (with getChildren) if one adds in the bit to getDeletedToo.  Objects are deleted from the database (and versioned) when submitted.  Note that in order to delete a tuple, all it's foreign key relations (children) must also be marked deleted.  The retrieveAllChildrenFromDB(1) and markDeleted(1) are recursive to facillitate this process.

    Mark self deleted.  It $doChildren = 1 then does this recursively.

    Returns 1 if object is marked deleted.


    Mark this child deleted so will be removed from the database on submit.

    Mark all children in this class deleted.

    Mark each of these children deleted.

    Mark all the children I have deleted....recursively if $recursive = 1.

    Undelete child previously marked deleted.




Methods that allow setting and retrieving Evidence.  Evidence gets submitting automatically by "target" object.

addEvidence($fact, $evidence_group_id, $attribute_name)
    $fact is a fact object.  Can optionally pass in an evidence group id to group relevant evidence such as for a gene model one might group the evidence for all the exons.  Also optional is the attribute_name if the evidence is for a particular attribute of the target table such as "name" or "description".

getEvidence($factTableName, $retrieveUnlessHave)
    This method returns an array of Evidence objects.  One can retrieve the fact associated with each with EvidenceObj->getEvidenceFact(1);
        note that the optional argument 1 causes the fact object to be retrieved from the database if this has not already been done.

getEvidenceByGroupId($factTableName, $retrieveUnlessHave)
    method untested.

retrieveEvidenceFromDB($factTableName, $resetIfHave)

retrieveAllEvidenceFromDB($resetIfHave, $targetOrFact)
    Retrieves all the evidence for this object.  It $targetOrFact then will retrieve whether this object is the target table or fact table.  Default behaviour is to only retrieve evidence if target table.  Use getEvidence then to return an array of evidence ojbects.

retrieveEvidenceFactsFromDB($evidence, $resetIfHave)


Similarity facts for storing sequence similarities:
Methods for dealing with Similarity facts generated by blast and perhaps other sequence similarity algorithms.

addSimilarityFact($fact, $resetIfHave)


retrieveSimilarityFactsFromDB($subjectTableName, $getEitherWay, $resetIfHave)

Methods for dealing with sequence:
Get, set, and format sequence.

Returns the sequence as single string.

Takes sequence string, cleans it up a bit (removes returns and non-sequence characters) and then sets.

    All features support this method which retrieves the sequence corresponding to that feature.  Basically retrieves a substring from the sequence to which the feature points corresponding to the span of the feature on the sequence.  RNAFeature generates the sequence from the exons which make it up if it is predicted unless the actual sequence is stored in SplicedNASequence.  NOTE that this method retrieves the sequence object to which it points without retrieving the sequence attribute.  getSubstrFromClob is then used to extract the relevant substring, thus saving much memory in the case of very long sequences.

Returns fasta formatted sequence.  If $type = 1 the id used is the na_sequence_id (or aa_sequence_id)  otherwise it is the source_id.

Printing:  (returns a string that can be printed)
Facility for printing to a string format or to XML.

    Returns in a simple string format the table object (does not indicate parent or child relationshps).

toXML($indent, $suppressDef, $doXmlIds, $family)
    If $suppressDef = 1 then the default attributes below modification_date are suppressed.  $doXmlIds = 1 will print XML ids in the object tags.  $family = 1 will print parent/child relationships in object tags rather than nesting children (I think?).

Parsing XML into objects:

    Objects parse the xml themselves... $xml is an array reference where each line of the XML is an element.






Comparing attribute values:
An object can do simple checks given a hash reference (attributename => value) to see if all the values match ones in the object.  Likewise, an object can take in an object with the same (or a subset of) atributes set and test for equivalence.

compareValues($v1, $v2, $type)

    Test to see if all attribute => value pairs of the hash reference match my values.

    Self test to see if all attributes of $ob match values of self.  Returns 1 if successful.

    For comparing two objects...pass in the smaller object and $self will  compare all attributes to see if values are same.  Returns an array of attribute names that are different.

Retrieving table ids and primary key attributes given name:
For getting the ids and table names from the Evidence and Similarity (mongo many to many) tables.


    Returns the table name for $table_id.

    Returns the table_id for $tableName.

    Returns the primary key attribute for $table_id.

General methods for dealing with facts:
Have NOT been tested and likely do not work...debug if you want to use!!




submitFact($fact, $notDeep, $noTran)

submitAllFacts($notDeep, $noTran)

Miscellaneous methods:

    Returns class name (table name) of self.

    returns the values of the primary key attributes concatentated into a string.  Used by the dbCache as not all objects have a single attribute primary key.


    Returns 1 if $att is a valid attribute of self.

    returns the DbiDatabase object from which one can get the extent and instance handles (and thus do queries) with the methods:
        getDbHandle() ##handle all objects use to interact with the DB.
        getMetaDbHandle() ##handle that retrieve meta information from DB.

    Returns a Dbi database handle that does not interfere with the objects as may be the case with the above methods.  If $autocommit is true, then the handle autocommits on submit to the db.  You always get the same handle as it is cached so if you need yet another handle you can get it from DbiDatabase with:

    returns the total number of rows updated in this session.

    returns the total number of rows inserted in this session.

    returns the total number of rows deleted in this session.
Hand coded objects with extra functionality...only a few are mentioned..

A number of the objects for GUS have been handcoded to have extra functionality.  A couple of these have been mentioned earlier.  For example, sequence objects (and  all subclasses) have setSequence methods that clean up the sequence on submit.  Also, all Feature objects have getFeatureSequence() methods.

NASequence and all subclasses:

    cleans up the sequence a bit before setting.

    returns the sequence in fasta format.  It $type == 1, uses the source_id for the primary identifier, otherwise uses the na_sequence_id.

    returns the sequence entry in CAML format (from Paracel as input into CAP4).

(NA|AA)Feature and all subclasses:

    returns the sequence for the feature.

    gets the location of this feature as an array (startLocation, endLocation, is_reversed)

Assembly and AssemblySequence

These objects have many methods for manipulating assemblies and AssemblySequences from parsing cap2/cap4 output  to reverseComplementing and generating  cap2 alignments.  Also has methods for computing/retrieving from the database putative SNPs.  Only a few will be indicated here...users should look at the object (Objects/GUSdev/hand_edited/ for more complete info.

    returns a cap2 format alignment for the assembly object.  $idType if true uses source_ids for the assembly sequences rather than assembly_sequence_ids.  if $suppressNumbers, does not print the locations on the right margin.  If $print, prints to STDOUT the alignment as it is being generated....useful for very large assemblies as it  starts printing very quickly.

    returns the RNA associated with this assembly.

Assembly->ge tTranslatedAASequences($retrieve,$getDeletedToo)
    returns an array of TranslatedAASequences


    sets the pvalue given the actual pvalue (separates it into pvalue_mant and exp.

    returns the actual P Value

    returns an alignment  for each HSP (SimilaritySpan) generated using $type algorithm where the algorithm is ([blast]|fasta|sim4) with blast being the default.

Last updated: Wed Oct 24 15:19:58 EDT 2001