Return to MAST introduction.
Subject: MAST confirmation: alcohol dehydrogenase motifs Your MAST search request 14019 is being processed: Motif file: adh Database to search: SwissProtIf you fail to receive the confirmation message, check your e-mail address and try resubmitting your MAST request.
Each section of the results file contains an explanation of how to interpret them.
TAATGTTGGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGC ========and the motif is represented by the position-dependent scoring matrix (where each row of the matrix corresponds to a position in the motif)
=========|================================= POSITION | A C G T =========|================================= 1 | 1.447 0.188 -4.025 -4.095 2 | 0.739 1.339 -3.945 -2.325 3 | 1.764 -3.562 -4.197 -3.895 4 | 1.574 -3.784 -1.594 -1.994 5 | 1.602 -3.935 -4.054 -1.370 6 | 0.797 -3.647 -0.814 0.215 7 |-1.280 1.873 -0.607 -1.933 8 |-3.076 1.035 1.414 -3.913 =========|=================================then the match score of the fourth position in the sequence (underlined) would be found by summing the score for T in position 1, G in position 2 and so on until G in position 8. So the match score would be
score = -4.095 + -3.945 + -3.895 + -1.994 + -4.054 + -0.814 + -1.933 + 1.414 = -19.316The match scores for other positions in the sequence are calculated in the same way. Match scores are only calculated if the match completely fits within the sequence. Match scores are not calculated if the motif would overhang either end of the sequence.
27-[3]-44-<4>-99-[1]-7shows an initial spacer of length 27, followed by a strong match to motif 3, a spacer of length 44, a weak match to motif 4, a spacer of length 99, a strong match to motif 1 and a final non-motif sequence of length 7. The value of M is 0.0001 for the WEB server but is user-selectable in the down-loadable version of MAST.
******************************************************************************** MAST - Motif Alignment and Search Tool ******************************************************************************** MAST version 2.0 (Release date: 1996/10/08 22:59:45) For further information on how to interpret these results or to get a copy of the MAST software please access http://www.sdsc.edu/MEME. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** DATABASE AND MOTIFS ******************************************************************************** DATABASE /users/app/tbailey/mast_databases/swissprot Last updated on Wed Oct 16 01:36:08 1996 Database contains 52724 sequences, 18538780 residues MOTIFS website/meme-output.html MOTIF WIDTH BEST POSSIBLE MATCH ----- ----- ------------------- 1 9 VDVLVNNAG 2 10 LITGCSSGIG 3 13 YSASKFAVRMLTR 3 13 YSASKFAVRMLTR PAIRWISE MOTIF CORRELATIONS: MOTIF 1 2 ----- ----- ----- 2 0.39 3 0.26 0.23 No overly similar pairs (correlation > 0.60) found. ******************************************************************************** ******************************************************************************** EXPLANATION OF RESULTS ******************************************************************************** SECTION I: HIGH-SCORING SEQUENCES - the names of sequences containing occurrences of the motif(s) SECTION II: MOTIF DIAGRAMS - the order and spacing of non-overlapping occurrences of the motif(s) in each of the high-scoring sequences SECTION III: ANNOTATED SEQUENCES - the high-scoring sequences annotated with the positions and strengths of the motif occurrences ******************************************************************************** ******************************************************************************** SECTION I: HIGH-SCORING SEQUENCES ******************************************************************************** - Each of the following 233 sequences has e-value of less than 10. - The e-value of a sequence is the expected number of sequences in a random database of the same size that would match the motifs as well as the sequence does and is equal to the combined p-value of the sequence times the number of sequences in the database. - The combined p-value of a sequence measures the strength of the match of the sequence to all the motifs and is calculated by o finding the score of the single best match of each motif to the sequence (best matches may overlap), o calculating the sequence p-value of each score, o forming the product of the p-values, o taking the p-value of the product. - The sequence p-value of a score is defined as the probability of a random sequence of the same length containing some match with as good or better a score. - The score for the match of a position in a sequence to a motif is computed by by summing the appropriate entry from each column of the position-dependent scoring matrix that represents the motif. - Sequences shorter than one or more of the motifs are skipped. - The table is sorted by increasing e-value. ******************************************************************************** SEQUENCE NAME DESCRIPTION E-VALUE LENGTH ------------- ----------- -------- ------ sp|P14802|YOXD_BACSU HYPOTHETICAL 25.3 KD PROTEIN IN RTP... 3.5e-18 238 sp|P14061|DHB1_HUMAN ESTRADIOL 17 BETA-DEHYDROGENASE 1 (... 7.5e-18 328 sp|P16544|ACT3_STRCO PUTATIVE KETOACYL REDUCTASE. 1.3e-16 261 sp|P16542|DHK1_STRVN GRANATICIN POLYKETIDE SYNTHASE PUTA... 1.9e-16 272 sp|P41177|DHKR_STRCM MONENSIN POLYKETIDE SYNTHASE PUTATI... 5.1e-16 261 ... 223 sequences omitted ... sp|P38786|YHM2_YEAST HYPOTHETICAL 32.2 KD PROTEIN IN VMA... 8.5 293 sp|P29003|BLP2_BOMOR BOMBININ-LIKE PEPTIDE 2 (BLP-2). 8.6 27 sp|P40397|YHXC_BACSU HYPOTHETICAL PROTEIN IN COMK 5'REGI... 9 114 sp|P16122|URE1_PROVU UREASE ALPHA SUBUNIT (UREA AMIDOHYD... 9.4 567 sp|P22716|RFBJ_SALTY CDP-ABEQUOSE SYNTHASE (O4 ANTIGEN). 9.6 299 ******************************************************************************** SECTION II: MOTIF DIAGRAMS ******************************************************************************** - The ordering and spacing of all non-overlapping motif occurrences are shown for each high-scoring sequence listed in Section I. - A motif occurrence is defined as a position in the sequence whose match to the motif has position p-value less than the value given below in the legend. - The position p-value of a match is the probability of a single random subsequence of the length of the motif scoring at least as well as the observed match. - For each sequence, all motif occurrences are shown unless there are overlaps. In that case, a motif occurrence is shown only if its p-value is less than the product of the p-values of the other (lower-numbered) motif occurrences that it overlaps. - The table also shows the e-value of each sequence. LEGEND ---------------------------------------------------------------------- -d- `d' residues separate the end of the preceding motif occurrence and the start of the following motif occurrence [n] occurrence of motif `n' with p-value less than 0.0001 ******************************************************************************** SEQUENCE NAME E-VALUE MOTIF DIAGRAM ------------- -------- ------------- sp|P14802|YOXD_BACSU 3.5e-18 9-[2]-64-[1]-62-[3]-71 sp|P14061|DHB1_HUMAN 7.5e-18 6-[2]-68-[1]-45-[2]-7-[3]-160 sp|P16544|ACT3_STRCO 1.3e-16 9-[2]-64-[1]-64-[3]-92 sp|P16542|DHK1_STRVN 1.9e-16 20-[2]-64-[1]-64-[3]-92 sp|P41177|DHKR_STRCM 5.1e-16 9-[2]-64-[1]-64-[3]-92 ... 223 sequences omitted ... sp|P38786|YHM2_YEAST 8.5 246-[2]-37 sp|P29003|BLP2_BOMOR 8.6 6-[3]-8 sp|P40397|YHXC_BACSU 9 18-[3]-83 sp|P16122|URE1_PROVU 9.4 65-[1]-72-[2]-411 sp|P22716|RFBJ_SALTY 9.6 8-[2]-281 ******************************************************************************** SECTION III: ANNOTATED SEQUENCES ******************************************************************************** - The positions and p-values of the non-overlapping motif occurrences are shown above the actual sequence for each of the high-scoring sequences from Section I. - A motif occurrence is defined as a position in the sequence whose match to the motif has position p-value less than 0.0001 as defined in Section II. - For each sequence, the first line specifies the name of the sequence. - The second (and possibly more) lines give a description of the sequence. - Following the description line(s) is a line giving the length, combined p-value, and e-value of the sequence as defined in Section I. - The next line reproduces the motif diagram from Section II. - The entire sequence is printed on the following lines. - Motif occurrences are indicated directly above their positions in the sequence on four lines showing o the motif number of the occurrence, o the position p-value of the occurrence, o the best possible match to the motif, and o columns whose match to the motif has a positive score (indicated by a plus sign). ******************************************************************************** sp|P14802|YOXD_BACSU HYPOTHETICAL 25.3 KD PROTEIN IN RTP 5'REGION (ORF238). LENGTH = 238 COMBINED P-VALUE = 6.56e-23 E-VALUE = 3.5e-18 DIAGRAM: 9-[2]-64-[1]-62-[3]-71 [2] 6.1e-11 LITGCSSGIG ++++++++++ 1 MQSLQHKTALITGGGRGIGRATALALAKEGVNIGLIGRTSANVEKVAEEVKALGVKAAFAAADVKDADQVNQAVA [1] 3.7e-10 VDVLVNNAG +++++++++ 76 QVKEQLGDIDILINNAGISKFGGFLDLSADEWENIIQVNLMGVYHVTRAVLPEMIERKAGDIINISSTAGQRGAA [3] 1.4e-13 YSASKFAVRMLTR ++++++++ ++++ 151 VTSAYSASKFAVLGLTESLMQEVRKHNIRVSALTPSTVASDMSIELNLTDGNPEKVMQPEDLAEYMVAQLKLDPR 226 IFIKTAGLWSTNP sp|P14061|DHB1_HUMAN ESTRADIOL 17 BETA-DEHYDROGENASE 1 (17-BETA-HSD 1) (PLACENTAL 17-BETA-HYDROXYSTEROID DEHYDROGENASE) (20 ALPHA-HYDROXYSTEROID DEHYDROGENASE) (20-ALPHA-HSD) (E2DH). LENGTH = 328 COMBINED P-VALUE = 1.43e-22 E-VALUE = 7.5e-18 DIAGRAM: 6-[2]-68-[1]-45-[2]-7-[3]-160 [2] 5.6e-13 LITGCSSGIG ++++++++++ 1 MARTVVLITGCSSGIGLHLAVRLASDPSQSFKVYATLRDLKTQGRLWEAARALACPPGSLETLQLDVRDSKSVAA [1] [2] 1.8e-09 3.7e-05 VDVLVNNAG LITGCSSGIG +++++ +++ +++++ +++ 76 ARERVTEGRVDVLVCNAGLGLLGPLEALGEDAVASVLDVNVVGTVRMLQAFLPDMKRRGSGRVLVTGSVGGLMGL [3] 2.5e-12 YSASKFAVRMLTR +++++++++++++ 151 PFNDVYCASKFALEGLCESLAVLLLPFGVHLSLIECGPVHTAFMEKVLGSPEEVLDRTDIHTFHRFYQYLAHSKQ 226 VFREAAQNPEEVAEVFLTALRAPKPTLRYFTTERFLPLLRMRLDDPSGSNYVTAMHREVFGDVPAKAEAGAEAGG 301 GAGPGAEDEAGRSAVGDPELGDPPAAPQ sp|P16544|ACT3_STRCO PUTATIVE KETOACYL REDUCTASE. LENGTH = 261 COMBINED P-VALUE = 2.46e-21 E-VALUE = 1.3e-16 DIAGRAM: 9-[2]-64-[1]-64-[3]-92 [2] 5.2e-10 LITGCSSGIG +++++ ++++ 1 MATQDSEVALVTGATSGIGLEIARRLGKEGLRVFVCARGEEGLRTTLKELREAGVEADGRTCDVRSVPEIEALVA [1] 1.4e-11 VDVLVNNAG +++++++++ 76 AVVERYGPVDVLVNNAGRPGGGATAELADELWLDVVETNLTGVFRVTKQVLKAGGMLERGTGRIVNIASTGGKQG [3] 1.4e-11 YSASKFAVRMLTR +++++++++++++ 151 VVHAAPYSASKHGVVGFTKALGLELARTGITVNAVCPGFVETPMAASVREHYSDIWEVSTEEAFDRITARVPIGR 226 YVQPSEVAEMVAYLIGPGAAAVTAQALNVCGGLGNY sp|P16542|DHK1_STRVN GRANATICIN POLYKETIDE SYNTHASE PUTATIVE KETOACYL REDUCTASE 1 (ORF5). LENGTH = 272 COMBINED P-VALUE = 3.69e-21 E-VALUE = 1.9e-16 DIAGRAM: 20-[2]-64-[1]-64-[3]-92 [2] 5.2e-10 LITGCSSGIG +++++ ++++ 1 MTTATATATATPGTAAKPVALVTGATSGIGLAIARRLAALGARTFLCARDEERLAQTVKELRGEGFDVDGTVCDV [1] 4.5e-11 VDVLVNNAG +++++++++ 76 ADPAQIRAYVAAAVQRYGTVDILVNNAGRSGGGATAEIADELWLDVITTNLTSVFLMTKEVLNAGGMLAKKRGRI [3] 5.8e-12 YSASKFAVRMLTR +++++++++++++ 151 INIASTGGKQGVVHAVPYSASKHGVVGLTKALGLELARTGITVNAVCPGFVETPMAERVREHYAGIWQVSEEETF 226 DRITNRVPLGRYVETREVAAMVEYLVADDAAAVTAQALNVCGGLGNY sp|P41177|DHKR_STRCM MONENSIN POLYKETIDE SYNTHASE PUTATIVE KETOACYL REDUCTASE (ORF5). LENGTH = 261 COMBINED P-VALUE = 9.75e-21 E-VALUE = 5.1e-16 DIAGRAM: 9-[2]-64-[1]-64-[3]-92 [2] 5.2e-10 LITGCSSGIG +++++ ++++ 1 MTQSTSRVALVTGATSGIGLATARLLAAQGHLVFLGARTESDVIATVKALRNDGLEAEGQVLDVRDGASVTAFVQ [1] 5.6e-11 VDVLVNNAG +++++++++ 76 AAVDRYGRIDVLVNNAGRSGGGVTADLTDELWDDVIDTNLNSVFRMTRAVLTTGGMRTRERGRIINVASTAGKQG [3] 1.4e-11 YSASKFAVRMLTR +++++++++++++ 151 VVLGAPYSASKHGVVGFTKALGNELAPTGITVNAVCPGYVETPMAQRVRQGYAAAYDTTEEAILTKFQAKIPLGR 226 YSTPEEVAGLIGYLASDTAASITSQALNVCGGLGNF ... 223 sequences omitted ... sp|P38786|YHM2_YEAST HYPOTHETICAL 32.2 KD PROTEIN IN VMA22-SSF1 INTERGENIC REGION. LENGTH = 293 COMBINED P-VALUE = 1.60e-04 E-VALUE = 8.5 DIAGRAM: 246-[2]-37 1 MLVDLNVPWPQNSYADKVTSQAVNNLIKTLSTLHMLGYTHIAINFTVNHSEKFPNDVKLLNPIDIKRRFGELMDR 76 TGLKLYSRITLIIDDPSKGQSLSKISQAFDIVAALPISEKGLTLSTTNLDIDLLTFQYGSRLPTFLKHKSICSCV 151 NRGVKLEIVYGYALRDVQARRQFVSNVRSVIRSSRSRGIVIGSGAMSPLECRNILGVTSLIKNLGLPSDRCSKAM [2] 4.5e-08 LITGCSSGIG ++++++++ + 226 GDLASLVLLNGRLRNKSHKQTIVTGGGSGNGDDVVNDVQGIDDVQTIKVVKRSMDAEQLGHASKRHKP sp|P29003|BLP2_BOMOR BOMBININ-LIKE PEPTIDE 2 (BLP-2). LENGTH = 27 COMBINED P-VALUE = 1.62e-04 E-VALUE = 8.6 DIAGRAM: 6-[3]-8 [3] 1.9e-05 YSASKFAVRMLTR ++ + +++++ + 1 GIGSAILSAGKSALKGLAKGLAEHFAN sp|P40397|YHXC_BACSU HYPOTHETICAL PROTEIN IN COMK 5'REGION (ORFX). LENGTH = 114 COMBINED P-VALUE = 1.70e-04 E-VALUE = 9 DIAGRAM: 18-[3]-83 [3] 4.3e-08 YSASKFAVRMLTR ++++++ ++ +++ 1 IINTASITAYKGNKTLIDYSATKGRIVTFTRSLSQSLVQQGIRVNAVAPGPIWTPLIPASFAAKDVEVFGSDVPM 76 ERPGQPVEVAPSYLYLASDDSTYVTGQTIHVNGGTIVNG sp|P16122|URE1_PROVU UREASE ALPHA SUBUNIT (UREA AMIDOHYDROLASE). LENGTH = 567 COMBINED P-VALUE = 1.79e-04 E-VALUE = 9.4 DIAGRAM: 65-[1]-72-[2]-411 [1] 2.0e-06 VDVLVNNAG +++++ ++ 1 MKTISRQAYADMFGPTTGDRLRLADTELFLEIEQDFTTYGEEVKFGGGKVIRDGMGQSQVVSAECVDVLITNAII [2] 4.4e LITG ++ + 76 IDHWGIVKADIGIKDGRITGIGKAGNPDVQPNVDIVIGPGTEVVAGEGKIITAGGVDTHIHFICPQQAEEGLISG -06 CSSGIG ++ 151 VTTFIGGGTGPVAGTNATTVTPGIWNMHRMLEAVDELPINVGLFGKGCVSQPEAIREQIEAGAIGLKIHEDWGAT 226 PMAIHNCLNVADEMDVQVAIHSDTLNEGGFYEETVKAIAGRVIHVFHTEGAGGGHAPDVIKSVGEPNILPASTNP 301 TMPYTINTVDEHLDMLMVCHHLDPSIPEDVAFAESRIRRETIAAEDILHDMGAISVMSSDSQAMGRVGEVVMRTW 376 QCAHKMKLQRGSLAGDTAENDNNRIKRYIAKYTINPALAHGIAHEVGSIEKGKLADIVLWDPAFFGVKPALIMKG 451 GMVAYAPMGDINAAIPTPQPVHYRPMYACLGKAKYQTSMIFMSKAGIDAGVPEKLGLQSLIGRVEGCRKVTKASM 526 IHNSYVPHIELEPQTYIVKADGVPLVCEPATELPMAQRYFLF sp|P22716|RFBJ_SALTY CDP-ABEQUOSE SYNTHASE (O4 ANTIGEN). LENGTH = 299 COMBINED P-VALUE = 1.83e-04 E-VALUE = 9.6 DIAGRAM: 8-[2]-281 [2] 1.8e-07 LITGCSSGIG ++ ++++ ++ 1 MTFLKEYVIVSGASGFIGKHLLEALKKSGISVVAITRDVIKNNSNALANVRWCSWDNIELLVEELSIDSALIGII 76 HLATEYGHKTSSLINIEDANVIKPLKLLDLAIKYRADIFLNTDSFFAKKDFNYQHMRPYIITKRHFDEIGHYYAN 151 MHDISFVNMRLEHVYGPGDGENKFIPYIIDCLNKKQSCVKCTTGEQIRDFIFVDDVVNAYLTILENRKEVPSYTE 226 YQVGTGAGVSLKDFLVYLQNTMMPGSSSIFEFGAIEQRDNEIMFSVANNKNLKAMGWKPNFDYKKGIEELLKRL CPU: tomten Time 53.747613 secs. /misc/www/projects/MEME/dev/meme.2.0.d/bin/sunsparcsolaris/mast mast.logodds.13734.tmp /users/app/tbailey/mast_databases/swissprot ACDEFGHIKLMNPQRSTVWY -mf website/meme-output.html