Title: "De novo assembly of complex genomes"
Speaker: Michael Schatz, Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Place: LILY G126
Date: September 18, 2012; Tuesday
Time: 4:30pm


Emerging third-generation single molecule sequencing instruments can generate much longer sequences than prior methods, with the potential to dramatically improve genome and transcriptome assembly for complex genomes. However, the high error rate of the sequence reads makes their use in de novo assembly challenging, and has limited their use to specialized applications. To address these limitations, we introduced a novel sequence correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the inherent error in long, single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the de novo assembly of yeast (Saccharomyces cerevisiae), the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than any other sequencing strategy currently available: in many cases, doubling the median contig size relative to high-coverage, second-generation assemblies.

Associated Reading:
S. Koren, M.C. Schatz, et al. 2012. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology. doi:10.1038/nbt.2280.

Click here www.stat.purdue.edu/~doerge/BIOINFORM.D/FALL11/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.