Protein Multiple Alignment Incorporating Primary and
Secondary Structure Information
This website supplements the paper "Protein Multiple Alignment
Incorporating Primary and Secondary Structure Information"
Journal of Computational Biology 2006, Vol. 13, No. 10,
1735-1748. (Preprint)
This work is partially supported by the NSF grant 0604776.
Questions please send to the correspondence author Jun Xie
(junxie@stat.purdue.edu).
- Abstract
Identifying common local segments, also called motifs, in multiple
protein sequences plays an important role for establishing homology
between proteins. Homology is easy to establish when sequences are
similar (sharing an identity > 25%). However, for distant proteins, it
is much more difficult to align motifs that are not similar in sequences
but still share common structures or functions.
This work is a first attempt to align multiple protein sequences using
both primary and secondary structure information. A new sequence
model is proposed so that the model assigns high probabilities not only
to motifs that contain conserved amino acids but also to motifs that
present common secondary structures.
- Secondary structure
prediction program PSI-PRED
- Example of an input file with secondary structure
predictions
- Download the programs
- Unix (IBM AIX 4.3.3) executable sov.out
The running command is
sov.out input_file motif_width lambda iterations
- sov.out: the executable code of the program
- input_file: outputs of PSI-PRED, see the previous example input file
- motif_width: width of the motif of interest, e.g. 20
- lambda: combination ratio of the primary and secondary structures, e.g.
0.5, 1, or 2
- iterations: number of iterations of the program, e.g. 1000
- Output example of the alignment program by the
command
sov.out 1idy_2nd.txt 15 1 100
- Guideline on using the program
We suggest to run multiple simulations (e.g. 10 times) and use
different lambda values, e.g., lambda=0.5, 1, 1.5, or 2. The alignment
repeatedly obtained in multiple simulations would be reliable.