The preferred sequence format for MEME is Pearson/Fasta format.
MEME uses the first word in the title line of each sequence, truncated
to 24 characters if necessary, as the name of the sequence. This name must
be unique. Sequences with duplicate names will be flagged and ignored.
(The first word in the title line is everything up to the first blank
following the ">" at the beginning of the line.)
MEME accepts protein and DNA sequences in any of the following formats by
converting them to Pearson/Fasta format:
Sequence formats that allow one or more sequences:
IG/Stanford, used by Intelligenetics and others
GenBank/GB, genbank flatfile format
NBRF format
EMBL, EMBL flatfile format
DNAStrider, for common Mac program
Fitch format, limited use
Pearson/Fasta, a common format used by Fasta programs and others
Zuker format, limited use
Olsen, format printed by Olsen VMS sequence editor
Phylip3.2, sequential format for Phylip programs
Phylip, interleaved format for Phylip programs (v3.3, v3.4)
MSF multi sequence format used by GCG software
PAUP's multiple sequence (NEXUS) format
PIR/CODATA format used by PIR
ASN.1 format used by NCBI
Sequence formats that only allow one sequence. These formats cannot
be used to input multiple sequences.
GCG, single sequence format of GCG software (use MSF format instead)
Plain/Raw, sequence data only (no name, document, numbering)
MEME uses the
ReadSeq program to read in sequences. ReadSeq is copyright 1990 by
D. G. Gilbert, Biology Dept., Indiana University.