The perl object layer on a relational dbms is intended to simplify database interactivity and manage relationships and submits. The query capability is currently very limited. Functionality includes:
Methods by functional groups: (the most useful methods are indicated in red ). Most of the others are convenience methods or could be considered private methods of the objects.
DbiDatabase login/logou
t: Used to create database login which allows all subsequent functionality.
Constructor
Getting and Setting attribute values
Retrieving objects from the database
Managing object cache
Setting default values
Increasing the maximum number of objects allowed
in RAM
Setting global NO version
Instance level management of versioning and
updating:
Checking read and write permissions
Managing global behavior of objects
Managing memory (undefPointerCache!)
Managing valid parents/children
Managing Children
Managing Parents
Managing submits to database:
Managing deletes
Evidence
Similarity facts for storing sequence similarities
Methods for dealing with sequence
Printing
Parsing XML into objects
Comparing attribute values
Retrieving table ids and primary key attributes given
name
General methods for dealing with facts
Miscellaneous method
Special objects (hand coded so have extra functionality).
DbiDatabase Methods: Establish logins, retrieve meta information:
DbiDatabase is the module that is used to create logins to the database and from which all the other objects are generated. Thus, to start a session, you simply create a new DbiDatabase and you are then able to use all the objects representing specific tables in that database. DbiDatabase also has methods for getting meta information relevant to that database (such as getTables()) in addition to getTable(tableName) which returns a DbiTable for tableName. This enables users to access meta information for this specific table such as foreign key relationships, primary key attributes, etc. These methods, however, will in most cases be unecessary as the Table specific objects take care of all this for the user.
DbiDatabase->new( DBI_DSN, login, password,
verbose, noInsert, default, dbName)
This creates a DbiDatabase that allows the use of all
the following methods on specific objects. DBI_DSN is optional if you
have your environment variable of this same name set. login, password,
and dbname are required where dbname is case sensitive such as "GUSdev".
Verbose if true prints all SQL statements executed and noInsert if true does
not commit to the database although all sql is executed and then rolled back.
The objects manage these attributes of the Database with setVerboseOn() and
setCommitOff() (see below
).
DbiDatabase->logout()
Ends this session and disconnects all the handles generated.
Additional methods are available if needed to retrieve meta information from the database such as a listing of all the tables/views etc.
TableName->new($dbiTableHashRef, $primary_key_list,
$dbhandle)
This will create an object for TableName. All aguments
are optional. I normally only pass in the hash reference of attribute
values if I want to set some values this way. The primary key list
is only necessary if the table does not have a primary key defined (which
all GUS objects do) and dbhandle only if you are using a database that is
different than that logged into with the DbiDatabase constructor which establishes
the necessary database handles for the objects to interact with the database.
Getting and Setting attribute values:
Attribute values can be set in two ways. The first is by passing in
a hash ref of attribute => value pairs to the constructor. The second
is via set methods in each object of the form:
setRowUserId($userId); for the attribute "row_user_id".
Alternatively, one can use the set method in the super class directly. NOTE: this is in general not a good idea as this bypasses any specific functionality that may be associated with the set method of the specific objects. For instance, the setSequence method of the sequence objects removes returns, uppercases all characters and sets the length attribute so there is a consistent representation in the db.
set('row_user_id',$userId);
Get methods follow this same syntax except do not take in the value but rather return it.
getRowUserId()
get('row_user_id')
Getting substrings from CLOB attribute types. This is particularly useful if one hasn't retrieved the CLOB attribute (see below) to save retrieving a very long sequence, for example, into memory. The method is a generic one to allow retrieving substrings from any CLOB attribute:
getSubstrFromClob($attribute,$start,$length)
Note that the start here is indexed from 1. To
retrieve a sequence substring from a sequence object one would use
$sequence->getSubstrFromClob('sequence',2334,1000)
#would retrieve 1000 characters from the sequence
attribute starting at position 2334.
Retrieving objects from the database:
Enough attributes must be set to ensure a single row is retrieved.
retrieveFromDB(\@attributesToNotRetrieve,$replaceCache)
returns 1 if successful and populates the objects attribute
values from the database. If unsuccessful returns 0 and leaves object
unaffected. If multiple rows are returned then is considered unsuccessful.
The query to the database simply constrains all the attributes that have
been set. One can choose to NOT retrieve specific attributes by passing
in an array reference of attribute names that should not be retrieved.
This can be extremely useful for objects that have very large atts
such as sequence if one only wants other attributes or just a substring from
the sequence (see previous command).
Managing Object Database Cache:
Objects are cached when retrieved from the database either with retrieveFromDB
or getParent/Children methods. In the case of getParent/Children, if
an object is requested that is already in the cache, the object in the cache
is returned/added so that only a single object representing the tuple is
present at any given time. RetrieveFromDB always retrieves the object
from the database even if there is already a copy in the cache and replaces
the current cached copy if the optional $replaceCache variable is used.
Users should check the cache to see if an object already exists before retrieving
from the db if there is a chance that they are constructing duplicate objects.
NOTE: the key of the cache is a concatenated string of the primary
key values. This is generated by the method getConcatPrimKey().
In most cases (when there is a single primary key attribute) this will be
just the value of the identifier returned by getId().
addToDbCache($o,$replace)
adds objects to cache...this is done automatically when
objects are retrieved from the database so shouldn't be called by users.
getFromDbCache($class,$key)
returns the object if it exists else undef.
isInDbCache($class,$key)
returns 1 if in the cache and undef otherwise.
Note for testing purposes this is very similar to getFromDbCache.
removeFromDbCache($object)
removes the object ($object) from the dbCache.
Setting default values:
These methods are used for setting default values which all subsequent objects
created in the application will inherit. On submit these values will be set
automatically.
setDefaultProjectId($project_id) (default = 0 (GUS))
getDefaultProjectId()
setDefaultUserId($id)
getDefaultUserId()
setDefaultGroupId($id) (default = CBIL)
getDefaultGroupId()
setDefaultAlgoInvoId($id)
getDefaultAlgoInvoId()
setDefaultUserRead($read) (default = 1)
getDefaultUserRead()
setDefaultUserWrite($val) (default = 1)
getDefaultUserWrite()
setDefaultGroupRead($val) (default = 1)
getDefaultGroupRead()
setDefaultGroupWrite($val) (default = 1)
getDefaultGroupWrite()
setDefaultOtherRead($val) (default = 1)
getDefaultOtherRead()
setDefaultOtherWrite($val) (default = 0)
getDefaultOtherWrite()
Increasing the maximum number of objects allowed in RAM
at one time (default is 10000):
This method is necessary to prevent memory leaks due to forgetting to call
undefPointerCache() in each loop to release objects for garbage collection.
Unless the user needs more than 10000 objects at one time (which seems very
unlikely) then these methods should be uneccessary.
setMaximumNumberOfObjects($num)
getMaximumNumberOfObjects()
Setting global NO version (default = 0):
While this functionality defies the initial design of GUS, it is needed for
dealing with things like draft sequence where one wants to simply wipe the
database of the current version and replace it with something new. No
objects are versioned.
setGlobalNoVersion($val)
getGlobalNoVersion()
Instance level management of versioning and updating:
Indvidual objects can be specified to be not versioned or not updateable.
Default is to version and allow updates for all objects.
setVersionable($v) (default = 1)
isVersionable()
setUpdateable($u) (default = 1)
isUpdateable()
Checking read and write permissions:
Methods to check the read/write permissions based on the default group and
user. Enforced when retrieving from database or retrieving children
or parents.
checkReadPermission()
checkWritePermission()
Managing global behavior of objects (commit, verbose,
debug):
These parameters (once set) take affect on all objects. Defaults:
commit = on, verbose = off, debug = off.
setDebuggingOn()
Reports debug statements for methods in RelationalRow.
setDebuggingOff()
getDebuggingState()
setVerboseOn()
Turns on printing to stdout all SQL statements.
setVerboseOff()
getVerboseState()
setCommitOff()
All submits will be rolled back rather than committed.
Should not be turned off except for testing on a few loops.
setCommitOn()
getCommitState()
Managing memory (allowing garbage collection of objects
that are out of scope):
Necessary because of perls lack of a garbage collector that can deal with
circular references. Only method really necessary as all others are
automatically taken care of is undefPointerCache() which must be called in
each loop. Note that this removes all objects from the pointer cache so all
parent/child relationships are lost and must be re-established if needed.
removeAllChildPointers($recursive)
If $recursive = 1 then will call method on all children
recursively.
addToPointerCache($ob)
Add objects to the pointer cache....done automatically
when objects are created rarely if ever needs to be called.
getFromPointerCache($object_reference)
Method to retrieve the object from the pointer cache.
removeFromPointerCache($ob)
Removes single object from the cache.
undefPointerCache()
MUST
be called in each loop to allow garbage collection. Removes all child
and parent pointers so they can not be retrieved.
Managing valid parents/children (code generator takes
care of this).
The objects keep a list of valid parents and children. Assuming the
code generator is working correctly, these methods should not need to be
used.
setChildList(@list)
@list is an array of array references [ChildTableName,My_primary_key,Childs_foreign_key_to_me]
addToChildList(@list)
getChildList()
isValidChild($c)
isOnChildList($className)
getChildSelfColumn($className)
getChildColumn($className)
setParentList(@list)
addToParentList(@list)
getParentList()
getParentSelfColumn($className)
getParentColumn($className)
isValidParent($p)
isOnParentList($className)
set255ChildList(@list)
addTo255ChildList(@list)
get255ChildList()
get255ChildListPkAtt($className)
get255ChildListOrderAtt($className)
get255ChildListStringAtt($className)
isValid255ChildName($className)
Managing Children (objects with my primary key as a foreign
key):
Methods for managing children including retrieving from the database, getting
(returning children) marking children deleted and submitting. NOTE
that submitting children is automatically done whenever an object is submitted
and should be unecessary (in fact the methods for managing the transaction
may NOT work appropriately if one submits children manually). Submitting
self results in submitting self and then all children within a single
transaction.
getChildren or retrieveChildrenFromDB with one of the ImpClasses will bring back the appropriate subclass children/parent of that ImpClass. For example: getChildren('NAFeatureImp',1) will get any of the NAFeature subclass children such as RNAFeature, GeneFeature, ExonFeature etc. In this way, one can retrieve all the features associated with a given NASequence entry.
getChildren($className, $retrieveIfNoChildren, $getDeletedToo,
$where,\@doNotRetrieveAttributes)
Returns an array of children in classname. If $retrieveIfNoChildren
= 1 then does a retrieveFromDB if there are currently no children of this
className. If $getDeletedToo = 1 then will also return children that
have been marked deleted. $where is a hash ref where children returned
must meet all the attribute (key) => value (value) pairs in the hashref.
If the $className is one of the "Imp" tables such as NASequenceImp, then
the subclass objects of the correct type are returned....ie ExternalNASequence,
Assembly as appropriate. The last argument is an array reference of
attbibutes that should not be retrieved should these children need to be
retrieved from the DB...ie $retrieveIfNoChildren = 1.
getAllChildren($retrieve, $getDeletedToo, $where)
Returns all children.
getChild($className, $retIfNochildren, $getDeletedToo)
Returns one child (NOTE: only useful if you know
there is only one child as returns the first one on the list if more than
one).
setChild($c)
Replaces children in this childs class with this
child. Should use addChild for adding
children.
retrieveChildrenFromDB($className, $resetIfHave,
$where,\@doNotRetrieveAttributes
)
Retrieves children in $className from the database.
If $resetIfHave = 1 will reset any children that you already have to the values
from the database. NOTE that any relationships established on the children
that are reset will be lost. $where hash ref as for getChildren().
Note that using getChildren($className,1) will call retrieveChildrenFromDB
if they have not already been retrieved and so save a step.
resetChildrenToDB($className, $where)
Removes all the children with this className and retrieves
them fresh from the db.
retrieveAllChildrenFromDB($recursive, $resetIfHave)
Retrieves all this objects children. If $recursive
= 1 will do this recursively. NOTE that this has the potential for retrieving
a huge number of objects if called from something like Group. $resetIfHave
resets children you already have to the database.
resetAllChildrenToDB()
Retrieves all children following removal of all existing
ones.
addChild($c, $resetIfHave)
Method of choice for adding a child.
addChildren(@children)
Adds an array of children.
removeChild($c)
Remove child from self. NOTE that this has NO impact
on the database....simply removes pointer between parent and child.
removeChildren(@ch)
removeChildrenInClass($className)
removeAllChildren()
Managing Parents:
Similar to managing children. Difference is that can only have one
parent of each type.
setParent($p)
Sets this object as my parent. The parent likewise
points to me as its child. Thus calling $p->addChild($self) is equivalent
to $self->setParent($p)
getParent($className, $retrieveIfNoParent
,\@doNotRetrieveAttributes)
if $retrieveIfNoParent = 1 then calls retrieveParentFromDB
if does not have that parent. If the $className is one of the "Imp"
tables such as NASequenceImp, then the subclass objects of the correct type
are returned....ie ExternalNASequence, Assembly as appropriate.
getAllParents($retrieveIfNoParent,\@doNotRetrieveAttributes )
retrieveParentFromDB($className ,\@doNotRetrieveAttributes)
retrieveAllParentsFromDB()
setParentId($p)
Sets my foreign key to be the Id of this parent.
Note that this occurs automatically upon submit to the database.
removeParent($p)
removeAllParents()
Managing submits to database:
Principal here is that the top level object (an object with only children)
should be submitted. That object following submitting itself, will submit
all its children recursively. Simple many-to-many relationships are
traversed in order to set the foreign keys of the relation, however, the
parent on the far side of the relation will not submit any of its children.
All objects submitting with a single submit command are submitted in a single
transaction so that if something fails, all are rolled back together to mantain
database integrity. All tuples that are updated are versioned by default
and sequences are stored (and versioned) appropriately. Evidence
and Similarity facts are also submitted if they have been set (added).
Upon delete, all Evidence and Similarity tuples that relate to the object
are also deleted (and versioned of course!). Additionally, objects
can store other objects to be submitted after they have submitted themselves
and all their children via the method addToSubmitList($object).
Note that submit and perhaps manage transaction are the only methods in this group that need to be called by user. The others are automatically taken care of appropriately.
submit($notDeep, $noTran)
Automatically submits inside a transaction. $notDeep
= 1 only submits self but not children. $noTran = 1 does not begin
or (when finished) commit a transaction. Used by all children so are
submitted in the initial objects transcation.
manageTransaction($noTran, $task)
This is dealt with automatically, However, if one wants
to submit several top level objects in a single transaction, you can use
manageTransaction to start and then commit a transaction and submit with
noTran the objects in between. $task is either 'commit' or 'begin'.
addToSubmitList($object)
Add $object to a submit list so that it gets submitted
after I submit myself and all children (in the same transaction). This
is the best way to submit objects on far side of a many-to-many relation
that have children and thus need to be submitted explicitly.
setRollBack($rb)
This can be used during an update/submit to force a roll
back at end of submit if errors are detected.
submitEvidence($e, $notDeep, $noTran)
submitAllEvidence($notDeep, $noTran)
deleteAllEvidence($notDeep, $noTran)
submitSimilarityFact($fact, $notDeep, $noTran)
submitAllSimilarityFacts($notDeep, $noTran)
deleteSimilarityFact($f, $notDeep, $noTran)
deleteAllSimilarityFacts($notDeep, $noTran)
All similarity facts of an object are deleted upon submit
if that object has been marked deleted.
submitAllChildren()
submitChildren()
submitChildrenInClass($className)
setAllForeignKeys()
allNonNullsSet()
version()
getVersionTableName()
getRollBack()
setDefaultAttributes()
Sets all the attributes for which defaults have been set.
Managing deletes:
Deletes occur in two phases. Objects must first be marked for deletion
at wich time they can only be retrieved (with getChildren) if one adds in
the bit to getDeletedToo. Objects are deleted from the database (and
versioned) when submitted. Note that in order to delete a tuple, all
it's foreign key relations (children) must also be marked deleted.
The retrieveAllChildrenFromDB(1) and markDeleted(1) are recursive to facillitate
this process.
markDeleted($doChildren)
Mark self deleted. It $doChildren = 1 then does
this recursively.
isMarkedDeleted()
Returns 1 if object is marked deleted.
markUnDeleted()
markChildDeleted($c)
Mark this child deleted so will be removed from the database
on submit.
markChildrenInClassDeleted($className)
Mark all children in this class deleted.
markChildrenDeleted(@ch)
Mark each of these children deleted.
markAllChildrenDeleted($recursive)
Mark all the children I have deleted....recursively if
$recursive = 1.
markChildUnDeleted($c)
Undelete child previously marked deleted.
markChildrenInClassUnDeleted($className)
markChildrenUnDeleted(@ch)
markAllChildrenUnDeleted()
Evidence:
Methods that allow setting and retrieving Evidence. Evidence gets submitting
automatically by "target" object.
addEvidence($fact, $evidence_group_id, $attribute_name)
$fact is a fact object. Can optionally pass in an
evidence group id to group relevant evidence such as for a gene model one
might group the evidence for all the exons. Also optional is the attribute_name
if the evidence is for a particular attribute of the target table such as
"name" or "description".
getEvidence($factTableName, $retrieveUnlessHave)
This method returns an array of Evidence objects.
One can retrieve the fact associated with each with EvidenceObj->getEvidenceFact(1);
note that the optional argument
1 causes the fact object to be retrieved from the database if this has not
already been done.
getEvidenceByGroupId($factTableName, $retrieveUnlessHave)
method untested.
retrieveEvidenceFromDB($factTableName, $resetIfHave)
retrieveAllEvidenceFromDB($resetIfHave, $targetOrFact)
Retrieves all the evidence for this object. It $targetOrFact
then will retrieve whether this object is the target table or fact table.
Default behaviour is to only retrieve evidence if target table. Use
getEvidence then to return an array of evidence ojbects.
retrieveEvidenceFactsFromDB($evidence, $resetIfHave)
getAllEvidence()
Similarity facts for storing sequence similarities:
Methods for dealing with Similarity facts generated by blast and perhaps other
sequence similarity algorithms.
addSimilarityFact($fact, $resetIfHave)
getSimilarityFacts($getIfDontHave)
retrieveSimilarityFactsFromDB($subjectTableName, $getEitherWay, $resetIfHave)
Methods for dealing with sequence:
Get, set, and format sequence.
getSequence()
Returns the sequence as single string.
setSequence($sequence)
Takes sequence string, cleans it up a bit (removes returns and non-sequence
characters) and then sets.
getFeatureSequence()
All features support this method which retrieves the sequence
corresponding to that feature. Basically retrieves a substring from
the sequence to which the feature points corresponding to the span of the
feature on the sequence. RNAFeature generates the sequence from the
exons which make it up if it is predicted unless the actual sequence is stored
in SplicedNASequence. NOTE that this method retrieves the sequence
object to which it points without retrieving the sequence attribute. getSubstrFromClob
is then used to extract the relevant substring, thus saving much memory in
the case of very long sequences.
toFasta($type)
Returns fasta formatted sequence. If $type = 1 the id used is the na_sequence_id
(or aa_sequence_id) otherwise it is the source_id.
Printing: (returns a string that can be printed)
Facility for printing to a string format or to XML.
toString()
Returns in a simple string format the table object (does
not indicate parent or child relationshps).
toXML($indent, $suppressDef, $doXmlIds, $family)
If $suppressDef = 1 then the default attributes below modification_date
are suppressed. $doXmlIds = 1 will print XML ids in the object tags.
$family = 1 will print parent/child relationships in object tags rather than
nesting children (I think?).
parseXML($xml)
Objects parse the xml themselves... $xml is an array reference
where each line of the XML is an element.
processXmlAttributes($xml_atts)
getNextXmlId()
getXmlId()
setXmlId($id)
getObjectFromXmlId($xml_id)
Comparing attribute values:
An object can do simple checks given a hash reference (attributename =>
value) to see if all the values match ones in the object. Likewise,
an object can take in an object with the same (or a subset of) atributes set
and test for equivalence.
compareValues($v1, $v2, $type)
testAttributeValues($href)
Test to see if all attribute => value pairs of the hash
reference match my values.
testObjectAttributeValues($ob)
Self test to see if all attributes of $ob match values
of self. Returns 1 if successful.
getAttributeDifferences($ob)
For comparing two objects...pass in the smaller object
and $self will compare all attributes to see if values are same.
Returns an array of attribute names that are different.
Retrieving table ids and primary key attributes given
name:
For getting the ids and table names from the Evidence and Similarity (mongo
many to many) tables.
getTableNameHash()
getTableNameFromTableId($table_id)
Returns the table name for $table_id.
getTableIdFromTableName($tableName)
Returns the table_id for $tableName.
getTablePKFromTableId($table_id)
Returns the primary key attribute for $table_id.
General methods for dealing with facts:
Have NOT been tested and likely do not work...debug if you want to use!!
getFacts($factTableName)
getAllFacts()
addFact($fact)
submitFact($fact, $notDeep, $noTran)
submitAllFacts($notDeep, $noTran)
getClassName()
Returns class name (table name) of self.
getConcatPrimKey()
returns the values of the primary key attributes concatentated
into a string. Used by the dbCache as not all objects have a single
attribute primary key.
haveAllPrimaryKeyValues()
isValidAttribute($att)
Returns 1 if $att is a valid attribute of self.
getDatabase()
returns the DbiDatabase object from which one can get
the extent and instance handles (and thus do queries) with the methods:
getDbHandle() ##handle all objects
use to interact with the DB.
getMetaDbHandle() ##handle that
retrieve meta information from DB.
getQueryHandle($autocommit)
Returns a Dbi database handle that does not interfere with
the objects as may be the case with the above methods. If $autocommit
is true, then the handle autocommits on submit to the db. You always
get the same handle as it is cached so if you need yet another handle you
can get it from DbiDatabase with:
getDatabase()->makeNewHandle($autocommit)
getTotalUpdates()
returns the total number of rows updated in this session.
getTotalInserts()
returns the total number of rows inserted in this session.
getTotalDeletes()
returns the total number of rows deleted in this session.
Hand coded objects with extra functionality...only a few are mentioned..
A number of the objects for GUS have been handcoded to have extra functionality. A couple of these have been mentioned earlier. For example, sequence objects (and all subclasses) have setSequence methods that clean up the sequence on submit. Also, all Feature objects have getFeatureSequence() methods.
NASequence and all subclasses:
setSequence($sequenceString)
cleans up the sequence a bit before setting.
toFasta($type)
returns the sequence in fasta format. It $type ==
1, uses the source_id for the primary identifier, otherwise uses the na_sequence_id.
toCAML()
returns the sequence entry in CAML format (from Paracel
as input into CAP4).
(NA|AA)Feature and all subclasses:
getFeatureSequence()
returns the sequence for the feature.
getFeatureLocation()
gets the location of this feature as an array (startLocation,
endLocation, is_reversed)
Assembly and AssemblySequence
These objects have many methods for manipulating assemblies and AssemblySequences from parsing cap2/cap4 output to reverseComplementing and generating cap2 alignments. Also has methods for computing/retrieving from the database putative SNPs. Only a few will be indicated here...users should look at the object (Objects/GUSdev/hand_edited/Assembly.pm) for more complete info.
Assembly->getCap2Alignment($idType,$suppressNumbers,$print)
returns a cap2 format alignment for the assembly object.
$idType if true uses source_ids for the assembly sequences rather than
assembly_sequence_ids. if $suppressNumbers, does not print the locations
on the right margin. If $print, prints to STDOUT the alignment as it
is being generated....useful for very large assemblies as it starts
printing very quickly.
Assembly->getRNA($retrieve,$getDeletedToo)
returns the RNA associated
with this assembly.
Assembly->ge
tTranslatedAASequences($retrieve,$getDeletedToo)
returns an array of TranslatedAASequences
Similarity:
setPValue($pValue)
sets the pvalue given the actual pvalue (separates it
into pvalue_mant and exp.
getPValue()
returns the actual P Value
generateSimilarityAlignment($type)
returns an alignment for each HSP (SimilaritySpan)
generated using $type algorithm where the algorithm is ([blast]|fasta|sim4)
with blast being the default.
Last updated: Wed Oct 24 15:19:58 EDT 2001