Return to MAST introduction.
Each section of the results file contains an explanation of how
to interpret them.
Confirmation message
The first e-mail message you receive should be a confirmation message
to let you know that your search request has been received.
You should receive an e-mail message that looks something like this:
Subject: MAST confirmation: alcohol dehydrogenase motifs
Your MAST search request 14019 is being processed:
Motif file: adh
Database to search: SwissProt
If you fail to receive the confirmation message, check your e-mail
address and try resubmitting your MAST request.
Search Results
The second e-mail message you should receive contains the results of the MAST
search. It contains:
Match Scores
The match score of a motif to a position in a sequence is the sum of the
score from each row of the position-dependent scoring matrix corresponding
to the letter at that position in the sequence. For example, if the sequence
is
TAATGTTGGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGC
========
and the motif is represented by the position-dependent scoring matrix (where
each row of the matrix corresponds to a position in the motif)
=========|=================================
POSITION | A C G T
=========|=================================
1 | 1.447 0.188 -4.025 -4.095
2 | 0.739 1.339 -3.945 -2.325
3 | 1.764 -3.562 -4.197 -3.895
4 | 1.574 -3.784 -1.594 -1.994
5 | 1.602 -3.935 -4.054 -1.370
6 | 0.797 -3.647 -0.814 0.215
7 |-1.280 1.873 -0.607 -1.933
8 |-3.076 1.035 1.414 -3.913
=========|=================================
then the match score of the fourth position in the sequence (underlined)
would be found by summing the score for T in position 1, G in
position 2 and so on until G in position 8. So
the match score would be
score = -4.095 + -3.945 + -3.895 + -1.994
+ -4.054 + -0.814 + -1.933 + 1.414
= -19.316
The match scores for other positions in the sequence are calculated
in the same way. Match scores are only calculated if the match completely
fits within the sequence. Match scores are not calculated if the
motif would overhang either end of the sequence.
P-values
MAST reports all matches of a sequence to a motif or group of
motifs in terms of the p-value of the match. MAST considers the
p-values of four types of events:
All p-values are based on a random sequence model that assumes
each position in a random sequence is generated according to the
average letter frequencies of all sequences in the
the appropriate (peptide or nucleotide)
non-redundant database
(ftp://ncbi.nlm.nih.gov/blast/db/)
on September 22, 1996.
Position p-value
The p-value of a match of a given position
within a sequence to a motif is defined as the probability
of a randomly selected position in a randomly generated sequence
having a
match score at least as large as that of
the given position.
Sequence p-value
The p-value of a match of a sequence
to a motif is defined as the probability of a randomly generated
sequence of the same length having a match score at least as large
as the largest match score of any position in the sequence.
Combined p-value
The p-value of a match of a sequence to
a group of motifs is defined as the probability of
a randomly generated sequence of the same length having
sequence p-values whose product
is at least as small as the product of the sequence p-values
of the matches of the motifs to the given sequence.
E-value
The e-value of the match of a sequence in a database to a
a group of motifs is defined as the
expected number of sequences in a random database of the same size
that would match the motifs as well as the sequence does and is equal
to the combined p-value of the sequence times the number of sequences
in the database.
Database and Motifs
This section shows information on the database that was searched
and the motifs in the search query. The database section gives
the date the database was last updated as well as the number
of sequences and total sequence characters in it.
The motifs are listed by motif number. The width and
subsequence which would be given the best possible score
for each motif is shown.
If there is more than one motif in the query, all pairwise
correlations between the motifs are shown. The correlations
can range from -1 to +1, with +1 meaning that the shorter motif
is exactly identical to part or all of the longer motif. High
correlations can cause some combined p-values and e-values
to be inaccurate (too low). It may be advisable to remove enough
motifs from the query to insure that no pairs of motifs have high
correlations. Any high correlations are indicated along
with the suggestion that one of the motifs be removed from the query.
High-scoring Sequences
MAST lists the names and part of the descriptive text
of all sequences whose
e-value is less than E.
Sequences shorter than one or more of the motifs are skipped.
The sequences are sorted by increasing e-value.
The value of E is set to 10 for the WEB server but is
user-selectable in the down-loadable version of MAST.
Motif Diagrams
Motif diagrams show the order and spacing of non-overlapping
matches to the motifs in each high-scoring sequence.
Motif occurrences are determined based on
the position p-value of matches to the motif.
Strong matches (p-value < M) are shown in square brackets
27-[3]-44-<4>-99-[1]-7
shows an initial spacer of length 27, followed by a strong
match to motif 3, a spacer of length 44, a weak match
to motif 4, a spacer of length 99, a strong match to motif 1 and a final
non-motif sequence of length 7. The value of M
is 0.0001 for the WEB server but is user-selectable in the
down-loadable version of MAST.
Annotated Sequences
MAST annotates each high-scoring sequence by printing
the sequence along with the position and strength of all
the non-overlapping motif occurrences.
The four lines above each motif occurrence contain,
respectively,
The best possible match to a motif is the sequence of letters
which would acheive the highest match score.
Sample MAST Search Results
Here is an actual MAST search results file. It has been edited
slightly to reduce its size by removing most of the 205 sequences
which matched the motifs.
********************************************************************************
MAST - Motif Alignment and Search Tool
********************************************************************************
MAST version 2.0 (Release date: 1996/10/08 22:59:45)
For further information on how to interpret these results or to get
a copy of the MAST software please access http://www.sdsc.edu/MEME.
********************************************************************************
********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:
Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************
********************************************************************************
DATABASE AND MOTIFS
********************************************************************************
DATABASE /users/app/tbailey/mast_databases/swissprot
Last updated on Wed Oct 16 01:36:08 1996
Database contains 52724 sequences, 18538780 residues
MOTIFS website/meme-output.html
MOTIF WIDTH BEST POSSIBLE MATCH
----- ----- -------------------
1 9 VDVLVNNAG
2 10 LITGCSSGIG
3 13 YSASKFAVRMLTR
3 13 YSASKFAVRMLTR
PAIRWISE MOTIF CORRELATIONS:
MOTIF 1 2
----- ----- -----
2 0.39
3 0.26 0.23
No overly similar pairs (correlation > 0.60) found.
********************************************************************************
********************************************************************************
EXPLANATION OF RESULTS
********************************************************************************
SECTION I: HIGH-SCORING SEQUENCES
- the names of sequences containing occurrences of the
motif(s)
SECTION II: MOTIF DIAGRAMS
- the order and spacing of non-overlapping occurrences
of the motif(s) in each of the high-scoring sequences
SECTION III: ANNOTATED SEQUENCES
- the high-scoring sequences annotated with the
positions and strengths of the motif occurrences
********************************************************************************
********************************************************************************
SECTION I: HIGH-SCORING SEQUENCES
********************************************************************************
- Each of the following 233 sequences has e-value of less than 10.
- The e-value of a sequence is the expected number of sequences
in a random database of the same size that would match the motifs as
well as the sequence does and is equal to the combined p-value of the
sequence times the number of sequences in the database.
- The combined p-value of a sequence measures the strength of the
match of the sequence to all the motifs and is calculated by
o finding the score of the single best match of each motif
to the sequence (best matches may overlap),
o calculating the sequence p-value of each score,
o forming the product of the p-values,
o taking the p-value of the product.
- The sequence p-value of a score is defined as the
probability of a random sequence of the same length containing
some match with as good or better a score.
- The score for the match of a position in a sequence to a motif
is computed by by summing the appropriate entry from each column of
the position-dependent scoring matrix that represents the motif.
- Sequences shorter than one or more of the motifs are skipped.
- The table is sorted by increasing e-value.
********************************************************************************
SEQUENCE NAME DESCRIPTION E-VALUE LENGTH
------------- ----------- -------- ------
sp|P14802|YOXD_BACSU HYPOTHETICAL 25.3 KD PROTEIN IN RTP... 3.5e-18 238
sp|P14061|DHB1_HUMAN ESTRADIOL 17 BETA-DEHYDROGENASE 1 (... 7.5e-18 328
sp|P16544|ACT3_STRCO PUTATIVE KETOACYL REDUCTASE. 1.3e-16 261
sp|P16542|DHK1_STRVN GRANATICIN POLYKETIDE SYNTHASE PUTA... 1.9e-16 272
sp|P41177|DHKR_STRCM MONENSIN POLYKETIDE SYNTHASE PUTATI... 5.1e-16 261
...
223 sequences omitted
...
sp|P38786|YHM2_YEAST HYPOTHETICAL 32.2 KD PROTEIN IN VMA... 8.5 293
sp|P29003|BLP2_BOMOR BOMBININ-LIKE PEPTIDE 2 (BLP-2). 8.6 27
sp|P40397|YHXC_BACSU HYPOTHETICAL PROTEIN IN COMK 5'REGI... 9 114
sp|P16122|URE1_PROVU UREASE ALPHA SUBUNIT (UREA AMIDOHYD... 9.4 567
sp|P22716|RFBJ_SALTY CDP-ABEQUOSE SYNTHASE (O4 ANTIGEN). 9.6 299
********************************************************************************
SECTION II: MOTIF DIAGRAMS
********************************************************************************
- The ordering and spacing of all non-overlapping motif occurrences
are shown for each high-scoring sequence listed in Section I.
- A motif occurrence is defined as a position in the sequence whose
match to the motif has position p-value less than the value
given below in the legend.
- The position p-value of a match is the probability of a single
random subsequence of the length of the motif scoring at least as well
as the observed match.
- For each sequence, all motif occurrences are shown unless there
are overlaps. In that case, a motif occurrence is shown only if its
p-value is less than the product of the p-values of the other
(lower-numbered) motif occurrences that it overlaps.
- The table also shows the e-value of each sequence.
LEGEND
----------------------------------------------------------------------
-d- `d' residues separate the end of the preceding motif occurrence
and the start of the following motif occurrence
[n] occurrence of motif `n' with p-value less than 0.0001
********************************************************************************
SEQUENCE NAME E-VALUE MOTIF DIAGRAM
------------- -------- -------------
sp|P14802|YOXD_BACSU 3.5e-18 9-[2]-64-[1]-62-[3]-71
sp|P14061|DHB1_HUMAN 7.5e-18 6-[2]-68-[1]-45-[2]-7-[3]-160
sp|P16544|ACT3_STRCO 1.3e-16 9-[2]-64-[1]-64-[3]-92
sp|P16542|DHK1_STRVN 1.9e-16 20-[2]-64-[1]-64-[3]-92
sp|P41177|DHKR_STRCM 5.1e-16 9-[2]-64-[1]-64-[3]-92
...
223 sequences omitted
...
sp|P38786|YHM2_YEAST 8.5 246-[2]-37
sp|P29003|BLP2_BOMOR 8.6 6-[3]-8
sp|P40397|YHXC_BACSU 9 18-[3]-83
sp|P16122|URE1_PROVU 9.4 65-[1]-72-[2]-411
sp|P22716|RFBJ_SALTY 9.6 8-[2]-281
********************************************************************************
SECTION III: ANNOTATED SEQUENCES
********************************************************************************
- The positions and p-values of the non-overlapping motif occurrences
are shown above the actual sequence for each of the high-scoring
sequences from Section I.
- A motif occurrence is defined as a position in the sequence whose
match to the motif has position p-value less than 0.0001 as
defined in Section II.
- For each sequence, the first line specifies the name of the sequence.
- The second (and possibly more) lines give a description of the
sequence.
- Following the description line(s) is a line giving the length,
combined p-value, and e-value of the sequence as defined in Section I.
- The next line reproduces the motif diagram from Section II.
- The entire sequence is printed on the following lines.
- Motif occurrences are indicated directly above their positions in the
sequence on four lines showing
o the motif number of the occurrence,
o the position p-value of the occurrence,
o the best possible match to the motif, and
o columns whose match to the motif has a positive score (indicated by
a plus sign).
********************************************************************************
sp|P14802|YOXD_BACSU
HYPOTHETICAL 25.3 KD PROTEIN IN RTP 5'REGION (ORF238).
LENGTH = 238 COMBINED P-VALUE = 6.56e-23 E-VALUE = 3.5e-18
DIAGRAM: 9-[2]-64-[1]-62-[3]-71
[2]
6.1e-11
LITGCSSGIG
++++++++++
1 MQSLQHKTALITGGGRGIGRATALALAKEGVNIGLIGRTSANVEKVAEEVKALGVKAAFAAADVKDADQVNQAVA
[1]
3.7e-10
VDVLVNNAG
+++++++++
76 QVKEQLGDIDILINNAGISKFGGFLDLSADEWENIIQVNLMGVYHVTRAVLPEMIERKAGDIINISSTAGQRGAA
[3]
1.4e-13
YSASKFAVRMLTR
++++++++ ++++
151 VTSAYSASKFAVLGLTESLMQEVRKHNIRVSALTPSTVASDMSIELNLTDGNPEKVMQPEDLAEYMVAQLKLDPR
226 IFIKTAGLWSTNP
sp|P14061|DHB1_HUMAN
ESTRADIOL 17 BETA-DEHYDROGENASE 1 (17-BETA-HSD 1) (PLACENTAL
17-BETA-HYDROXYSTEROID DEHYDROGENASE) (20 ALPHA-HYDROXYSTEROID DEHYDROGENASE)
(20-ALPHA-HSD) (E2DH).
LENGTH = 328 COMBINED P-VALUE = 1.43e-22 E-VALUE = 7.5e-18
DIAGRAM: 6-[2]-68-[1]-45-[2]-7-[3]-160
[2]
5.6e-13
LITGCSSGIG
++++++++++
1 MARTVVLITGCSSGIGLHLAVRLASDPSQSFKVYATLRDLKTQGRLWEAARALACPPGSLETLQLDVRDSKSVAA
[1] [2]
1.8e-09 3.7e-05
VDVLVNNAG LITGCSSGIG
+++++ +++ +++++ +++
76 ARERVTEGRVDVLVCNAGLGLLGPLEALGEDAVASVLDVNVVGTVRMLQAFLPDMKRRGSGRVLVTGSVGGLMGL
[3]
2.5e-12
YSASKFAVRMLTR
+++++++++++++
151 PFNDVYCASKFALEGLCESLAVLLLPFGVHLSLIECGPVHTAFMEKVLGSPEEVLDRTDIHTFHRFYQYLAHSKQ
226 VFREAAQNPEEVAEVFLTALRAPKPTLRYFTTERFLPLLRMRLDDPSGSNYVTAMHREVFGDVPAKAEAGAEAGG
301 GAGPGAEDEAGRSAVGDPELGDPPAAPQ
sp|P16544|ACT3_STRCO
PUTATIVE KETOACYL REDUCTASE.
LENGTH = 261 COMBINED P-VALUE = 2.46e-21 E-VALUE = 1.3e-16
DIAGRAM: 9-[2]-64-[1]-64-[3]-92
[2]
5.2e-10
LITGCSSGIG
+++++ ++++
1 MATQDSEVALVTGATSGIGLEIARRLGKEGLRVFVCARGEEGLRTTLKELREAGVEADGRTCDVRSVPEIEALVA
[1]
1.4e-11
VDVLVNNAG
+++++++++
76 AVVERYGPVDVLVNNAGRPGGGATAELADELWLDVVETNLTGVFRVTKQVLKAGGMLERGTGRIVNIASTGGKQG
[3]
1.4e-11
YSASKFAVRMLTR
+++++++++++++
151 VVHAAPYSASKHGVVGFTKALGLELARTGITVNAVCPGFVETPMAASVREHYSDIWEVSTEEAFDRITARVPIGR
226 YVQPSEVAEMVAYLIGPGAAAVTAQALNVCGGLGNY
sp|P16542|DHK1_STRVN
GRANATICIN POLYKETIDE SYNTHASE PUTATIVE KETOACYL REDUCTASE 1 (ORF5).
LENGTH = 272 COMBINED P-VALUE = 3.69e-21 E-VALUE = 1.9e-16
DIAGRAM: 20-[2]-64-[1]-64-[3]-92
[2]
5.2e-10
LITGCSSGIG
+++++ ++++
1 MTTATATATATPGTAAKPVALVTGATSGIGLAIARRLAALGARTFLCARDEERLAQTVKELRGEGFDVDGTVCDV
[1]
4.5e-11
VDVLVNNAG
+++++++++
76 ADPAQIRAYVAAAVQRYGTVDILVNNAGRSGGGATAEIADELWLDVITTNLTSVFLMTKEVLNAGGMLAKKRGRI
[3]
5.8e-12
YSASKFAVRMLTR
+++++++++++++
151 INIASTGGKQGVVHAVPYSASKHGVVGLTKALGLELARTGITVNAVCPGFVETPMAERVREHYAGIWQVSEEETF
226 DRITNRVPLGRYVETREVAAMVEYLVADDAAAVTAQALNVCGGLGNY
sp|P41177|DHKR_STRCM
MONENSIN POLYKETIDE SYNTHASE PUTATIVE KETOACYL REDUCTASE (ORF5).
LENGTH = 261 COMBINED P-VALUE = 9.75e-21 E-VALUE = 5.1e-16
DIAGRAM: 9-[2]-64-[1]-64-[3]-92
[2]
5.2e-10
LITGCSSGIG
+++++ ++++
1 MTQSTSRVALVTGATSGIGLATARLLAAQGHLVFLGARTESDVIATVKALRNDGLEAEGQVLDVRDGASVTAFVQ
[1]
5.6e-11
VDVLVNNAG
+++++++++
76 AAVDRYGRIDVLVNNAGRSGGGVTADLTDELWDDVIDTNLNSVFRMTRAVLTTGGMRTRERGRIINVASTAGKQG
[3]
1.4e-11
YSASKFAVRMLTR
+++++++++++++
151 VVLGAPYSASKHGVVGFTKALGNELAPTGITVNAVCPGYVETPMAQRVRQGYAAAYDTTEEAILTKFQAKIPLGR
226 YSTPEEVAGLIGYLASDTAASITSQALNVCGGLGNF
...
223 sequences omitted
...
sp|P38786|YHM2_YEAST
HYPOTHETICAL 32.2 KD PROTEIN IN VMA22-SSF1 INTERGENIC REGION.
LENGTH = 293 COMBINED P-VALUE = 1.60e-04 E-VALUE = 8.5
DIAGRAM: 246-[2]-37
1 MLVDLNVPWPQNSYADKVTSQAVNNLIKTLSTLHMLGYTHIAINFTVNHSEKFPNDVKLLNPIDIKRRFGELMDR
76 TGLKLYSRITLIIDDPSKGQSLSKISQAFDIVAALPISEKGLTLSTTNLDIDLLTFQYGSRLPTFLKHKSICSCV
151 NRGVKLEIVYGYALRDVQARRQFVSNVRSVIRSSRSRGIVIGSGAMSPLECRNILGVTSLIKNLGLPSDRCSKAM
[2]
4.5e-08
LITGCSSGIG
++++++++ +
226 GDLASLVLLNGRLRNKSHKQTIVTGGGSGNGDDVVNDVQGIDDVQTIKVVKRSMDAEQLGHASKRHKP
sp|P29003|BLP2_BOMOR
BOMBININ-LIKE PEPTIDE 2 (BLP-2).
LENGTH = 27 COMBINED P-VALUE = 1.62e-04 E-VALUE = 8.6
DIAGRAM: 6-[3]-8
[3]
1.9e-05
YSASKFAVRMLTR
++ + +++++ +
1 GIGSAILSAGKSALKGLAKGLAEHFAN
sp|P40397|YHXC_BACSU
HYPOTHETICAL PROTEIN IN COMK 5'REGION (ORFX).
LENGTH = 114 COMBINED P-VALUE = 1.70e-04 E-VALUE = 9
DIAGRAM: 18-[3]-83
[3]
4.3e-08
YSASKFAVRMLTR
++++++ ++ +++
1 IINTASITAYKGNKTLIDYSATKGRIVTFTRSLSQSLVQQGIRVNAVAPGPIWTPLIPASFAAKDVEVFGSDVPM
76 ERPGQPVEVAPSYLYLASDDSTYVTGQTIHVNGGTIVNG
sp|P16122|URE1_PROVU
UREASE ALPHA SUBUNIT (UREA AMIDOHYDROLASE).
LENGTH = 567 COMBINED P-VALUE = 1.79e-04 E-VALUE = 9.4
DIAGRAM: 65-[1]-72-[2]-411
[1]
2.0e-06
VDVLVNNAG
+++++ ++
1 MKTISRQAYADMFGPTTGDRLRLADTELFLEIEQDFTTYGEEVKFGGGKVIRDGMGQSQVVSAECVDVLITNAII
[2]
4.4e
LITG
++ +
76 IDHWGIVKADIGIKDGRITGIGKAGNPDVQPNVDIVIGPGTEVVAGEGKIITAGGVDTHIHFICPQQAEEGLISG
-06
CSSGIG
++
151 VTTFIGGGTGPVAGTNATTVTPGIWNMHRMLEAVDELPINVGLFGKGCVSQPEAIREQIEAGAIGLKIHEDWGAT
226 PMAIHNCLNVADEMDVQVAIHSDTLNEGGFYEETVKAIAGRVIHVFHTEGAGGGHAPDVIKSVGEPNILPASTNP
301 TMPYTINTVDEHLDMLMVCHHLDPSIPEDVAFAESRIRRETIAAEDILHDMGAISVMSSDSQAMGRVGEVVMRTW
376 QCAHKMKLQRGSLAGDTAENDNNRIKRYIAKYTINPALAHGIAHEVGSIEKGKLADIVLWDPAFFGVKPALIMKG
451 GMVAYAPMGDINAAIPTPQPVHYRPMYACLGKAKYQTSMIFMSKAGIDAGVPEKLGLQSLIGRVEGCRKVTKASM
526 IHNSYVPHIELEPQTYIVKADGVPLVCEPATELPMAQRYFLF
sp|P22716|RFBJ_SALTY
CDP-ABEQUOSE SYNTHASE (O4 ANTIGEN).
LENGTH = 299 COMBINED P-VALUE = 1.83e-04 E-VALUE = 9.6
DIAGRAM: 8-[2]-281
[2]
1.8e-07
LITGCSSGIG
++ ++++ ++
1 MTFLKEYVIVSGASGFIGKHLLEALKKSGISVVAITRDVIKNNSNALANVRWCSWDNIELLVEELSIDSALIGII
76 HLATEYGHKTSSLINIEDANVIKPLKLLDLAIKYRADIFLNTDSFFAKKDFNYQHMRPYIITKRHFDEIGHYYAN
151 MHDISFVNMRLEHVYGPGDGENKFIPYIIDCLNKKQSCVKCTTGEQIRDFIFVDDVVNAYLTILENRKEVPSYTE
226 YQVGTGAGVSLKDFLVYLQNTMMPGSSSIFEFGAIEQRDNEIMFSVANNKNLKAMGWKPNFDYKKGIEELLKRL
CPU: tomten
Time 53.747613 secs.
/misc/www/projects/MEME/dev/meme.2.0.d/bin/sunsparcsolaris/mast mast.logodds.13734.tmp /users/app/tbailey/mast_databases/swissprot ACDEFGHIKLMNPQRSTVWY -mf website/meme-output.html