Protein motifs with the secondary characteristics of
hydrophobicity and polarity
- Related paper: Xie,J. and Kim, N.K., Bayesian Models and
Markov Chain Monte
Carlo Methods for Protein Motifs with the Secondary Characteristics,
2005,
Journal of Computational Biology, Vol. 12, No. 7.
- Dataset 1glqA2 protein family used in the paper, sequences in FASTA format
- Multiple structural alignment of
1glqA2 family originally reported in CATH
- Programs of identifying protein motifs
Our main development is the mixture model of protein motifs with both
amino acid
sequence and side chain polarity features. To search for a motif in multiple
protein
sequences, we suggest to first run the standard Gibbs sampling approach
several
times. If you think no conserved motif was identified, then run our mixture
model.
Download the programs
- Standard Gibbs sampling approach: Unix
(IBM AIX 4.3.3) executable
You need to provide three arguments, the multiple sequence data set
(FASTA
format), motif width, and the number of repeated runs (each from different
randome initial values). You may try number_of_runs = 3.
Use the command: aa.out data_file motif_width number_of_runs
- Our mixture model for both amino acid sequences and side
chain polarity:
Unix (IBM AIX 4.3.3) executable
You need to provide three arguments, the multiple sequence data set
(FASTA
format), motif width, and the number of iterations. Some choices of the
iteration
number are 1000, 5000, and 10000.
Use the command: mix.out data_file motif_width number_of_iterations