Final Examination of STAT 598K

December 3, 2001

Instruction: The exam is open books. It is due on December 14, 2001. There are 4 questions with equal credits. Data sets for some questions are available in the hyperlinks. Please work INDEPENDENTLY. You are welcome to discuss your questions with the instructor, but NO group work. Please write your answers in a clear format.

  1. The first data set includes 6 sequences of human leucine zipper transcription factors. Use Feng-Doolittle progressive alignment algorithm to align the multiple sequences.
    You do not have to write a program to perform the alignment. To make the problem easier, use BLAST pairwise alignment tool instead of dynamic programming. And simply use BLAST scores for the distances of pairs of sequences. If you are really tired of performing pairwise alignments in the web, Standalone BLAST will be a better choice. But you need some work to learn how to use it.

  2. The second data set has 29 helix turn helix protein sequences. An alignment of these sequences detects a common motif in the multiple sequence set. Build a profile of the motif from the alignment result. And use the profile to search the motif of a new sequence.

  3. Verify protein E.coli gyrase A (with 875 amino acids) and protein E.coli gyrase B (804 amino acids) are functional linked.
    Based on computational methods, you should at least apply both protein phylogenetic profiles and Rosetta stone method to draw your conclusion. Please show detail procedures and results in your implementation of tools. One sentence statement of the final result will not be accepted. Also be aware that the Expected values you chosen when performing BLAST search will determine the phylogenetic profiles. Thus you may have to try different E-values.

  4. A gene expression data set is obtained from Spellman's yeast cell cycle experiments. The data include genes' names, their cell cycle stages, log-ratios of red and green fluorescent intensities from alpha factor and CDC15 synchronization experiments.