Title: "Part 2: Cloning genes using next generation sequencing"
Speaker: Elizabeth Buescher; Department of Horticulture and Landscape Architecture, Purdue University
Place: PHYS 223; February 14, 2012, Tuesday, 4:30pm

Abstract
With the advent of next generation sequencing, data costs are a fraction of what they were ten years ago. Whole genomes are sequenced in days; all of the genes expressed in a species can be analyzed to answer biological questions. We can now ask questions in a different way. We should, therefore, perform experiments in a different way. EMS (Ethyl Methane Sulfonate) treatment causes G to A and C to T mutations in DNA. Populations of plants treated with EMS can be screened for individuals disrupted in valuable or otherwise interesting biological processes. The genomes of these individuals can then be sequenced to identify the SNPs (Single Nucleotide Polymorphisms) that arise from EMS mutagenesis and are responsible for the change in phenotype. Noise in the sequence data, the quality of the reference genome, and errors in the alignment of sequencing data to the reference genome can produce artifactual results and obscure the true causative mutations.

By changing the way we use sequence data to maximize the signal to noise ratio, we have found that we can easily identify causative mutations. We screened SNPs for only those due to the EMS treatment and incorporated knowledge about the possible biological processes affected by the mutations. These "filtering" steps eliminated many of the errors in the data and increased the likelihood of identifying gene(s) controlling the phenotype. To demonstrate that this procedure will work in the complex genomes of crops, and not just in the favorite model organisms of geneticists, we have carried out gene cloning by next generation sequencing in Sorghum bicolor. The first example was a sorghum EMS-treated line that showed altered accumulation of the cyanogenic glucoside dhurrin. Genes suspected to act in the dhurrin biosynthetic pathway were scanned for mutations and a single disruption was identified, and later confirmed to be responsible for this phenotype. The second project utilized a sorghum TILLING (Targeting Induced Local Lesion IN Genomes) line. Dwarf and non-dwarf pools of plants were sequenced using Illumina technology and a pipeline is currently under construction to align reads to the reference genome for SNP variant calling and mutant identification. Progress on the identification the dwarf gene, improvements to our pipeline, and prospects gene cloning in species with sequenced and un-sequenced genomes will be discussed.

This talk is Part 2 of a two part series. Brian Dilkes delivered Part 1 ("Part 1: Bioinformatics for Next Generation Genetics" ) on February 7, 2012.

Associated Reading:
SHOREmap: simultaneous mapping and mutation identification by deep sequencing Nature Methods 6, 550 - 551 (2009).



Click here
www.stat.purdue.edu/~doerge/BIOINFORM.D/FALL11/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.