Statistical Bioinformatics

Statistical bioinformatics

Biotechnological advances are providing levels and magnitudes of genomic data that were unimaginable even five years ago. As such, every component of what we do as scientists is being stretched, changed, and projected forward in anticipation of what is to come, both in research and in educating the next generation of scientists. The largest shift has been in the way we do science, it is no longer single laboratory science, it is now multidisciplinary efforts that bridge many disciplines and many species.

Bioinformatics is an evolving science that is most recently defined as the generation, organization, and analysis of biological data (initially genomic data). Because its definition is broad, bioinformatics is viewed as a subject of scientific investigation that encompasses all biological phenomena.

Statistical Bioinformatics acknowledges the inherent variation found in data that are generated as part of the Bioinformatics investigation and attempts to utilize experimental structure and design to partition variation into biological and technical components. The ultimate goal of statistical bioinformatics is to statistically identify significant changes in biological processes (e.g., changes in DNA sequence, quantitative trait locus identification, differential expression of genes, or changes in protein abundance) for the purpose of answering biological questions.

Statistical BioinformaticsThe cycle of theory, experiment, and information is nowhere more important than in the life sciences, where we are learning how to piece together various levels of expertise into a global or systems-level understanding of biology. Statistical Bioinformatics is involved at each level: accumulation, organization, and analysis of biological data. Hypotheses that are initiated and tested can be refined, and new experiments formulated for the purpose of supplying more information. 

First Column: The Central Dogma lies at the heart of all biological investigations. Genes are transcribed, and then translated into proteins which in turn have a direct impact on the organism under investigation. Understanding the interrelated connections between DNA, gene, and protein toward function is one of the greatest biological mysteries remaining.

Second Column: Attached to each level of the Central Dogma is a new technology that allows an in depth measurement of that piece of the process. As such, DNA sequencing has allowed the complete sequencing of many genomes, including the human genome, which in turn allows the identification of every gene in an organism. However, knowing and understanding the function of every gene remains at bay. New technologies referred to as transcript profiling, or microarray technology, allow the simultaneous assessment of transcript abundance for every gene in a genome. While detecting changes in transcript abundance across conditions will not lead us to the function of genes, it does provide strong clues into the mechanisms that are involved. Although in its infancy an even newer technology that has an even wider appeal is known as protein microarrays, or proteomics. Protein arrays have great potential to provide strong links between the genes that encode for proteins and the end result/phenotype.

Third Column: New technologies enable genomic hypotheses to be formulated at each stage of the Central Dogma. Addressing individual questions uniquely supplies some limited information, but is restricted by the lack of connection between the stages of the biological process. Using Statistical Bioinformatics the data supplied by each new technology at each stage of the Central Dogma have great potential to be gathered from their individual sources into a single analysis of any biological process to provide avenues of gene networks, thus taking us from DNA sequence and gene to phenotypic result.