Title To return to this sets' summary click Overview.

For the general product directory, click Directory.

Articles
Anthrax
Bioinformatics
Biological Weapons
DNA Fingerprinting
Genetically Modified Foods
Genetics in TV and Films
Molecular Clock Hypothesis
Stem Cells
Transgenic Organisms

Other Elements
Publisher's Note
Index
Table of Contents

Customer Service If you need help with products and ordering, setting up a new account or working with this website, call or email us:

Phone: (800) 221-1592
Email: csr@salempress.com


An excellent starting point for basic information about genetics, particularly those aspects commonly reported in the news, this is recommended for all libraries.

Library Journal  

Recommended.

References for Students  
Gale Group  


The Salem set provides clear explanations and is recommended for college and high-school libraries as well as any public library that has a large science collection.

Booklist  

Recommended. General readers; undergraduates.

Choice  

...covering an immense range of subjects in sufficient detail to engage serious students, this set will not only make a significant addition to deeper collections, but contains enough new material to justify replacing its predecessor.

School Library Journal  


Encyclopedia of Genetics, Rev. Ed.

Editor: Bryan D. Ness, Pacific Union College
ISBN: 978-1-58765-149-6
List Price: $235

February 2004 · 2 volumes · 896 pages · 8"x10"

ALA/RUSA Outstanding Reference Source

Encyclopedia of Genetics, Rev. Ed.
Bioinformatics

Field of Study: Bioinformatics; Molecular genetics; Techniques and methodologies

Significance: Bioinformatics is the application of information technology to the management of biological information to organize data and extract meaning. It is a hybrid discipline that combines elements of computer science, information technology, mathematics, statistics, and molecular genetics.

Key Terms
ALGORITHM: a mathematical rule or procedure for solving a specific problem. In bioinformatics, a computer program is built to implement an algorithm, but different algorithms may be used to achieve the same result--that is, to align two sequences

DATABASE: an organized collection of information within a computer system that can be used for storage and retrieval as well as for complex searches and analyses

GENBANK: a comprehensive, annotated collection of publicly available DNA sequences maintained by the National Center for Biotechnology Information and available through its Web site

GENOMICS: the use of high-throughput technology to analyze molecular events within cells at the whole genome scale (for example, all of the genes, all of the mRNA, or all of the proteins)

HUMAN GENOME PROJECT: a publicly funded international project to determine the complete DNA sequence of human genomic (chromosomal) DNA and to map all of the genes, which produced a "final" sequence in April, 2003

MICROARRAY: a technology to measure gene expression using nucleic acid hybridization of mRNA to a miniature array of DNA probes for many genes

PROTEOMICS: a collection of technologies that examine proteins within a cell in a holistic fashion, identifying or quantitating a large number of proteins within a single sample, identifying many protein-protein or protein-DNA interactions, and so on

The Need for Bioinformatics
The sequencing of cloned DNA molecules has become a routine, automated task in the modern molecular genetics laboratory, and large publicly funded genome projects have determined the complete genomic sequences for humans, mice, fruit flies, dozens of bacteria, and many other species of interest to geneticists. All of this information is now freely available in online databases. Computational molecular biology tools allow for the design of PCR (polymerase chain reaction) primers, restriction enzyme cloning strategies, and even entire in silico experiments. This greatly accelerates the work of researchers but also changes the daily lives of many biologists so that they spend more time working with computers and less time working with test tubes and pipettors. The rapid accumulation of enormous amounts of molecular sequence data and its cryptic and subtle patterns have created a need for computerized databases and analysis tools.

Bioinformatics provides essential support services to modern molecular genetics for organizing, analyzing, and distributing data. As DNA sequencing and other molecular genetic technologies become more automated, data are generated ever more rapidly, and computing systems must be designed to store the data and make them available to scientists in a useful fashion. The use of these vast quantities of data for the discovery of new genes and genetic principles relies on the development of sophisticated new data-mining tools. The challenge of bioinformatics is in finding new approaches to deal with the volume and complexity of the data, and in providing researchers with access both to the raw data and to sophisticated and flexible analysis tools in order to advance researchers' understanding of genetics and its role in health and disease.

Database Design
The DNA sequence data collected by automated sequencing equipment can be represented as a simple sequence of letters: G, A, T, and C--which stand for the four nucleotide bases on one strand of the DNA molecule (guanine, adenine, thymine, and cytosine). These letters can easily be stored as plain text files on a computer. Similarly, protein sequences can also be stored as text files using the twenty single-letter abbreviations for the amino acids.

There is a significant advantage to storing DNA and protein sequence as plain text files, also known as "flat files." Text files take up minimal amounts of hard-drive space, can be used on any type of computer and operating system, and can easily be moved across the Internet. However, a text file with a bunch of letters representing a DNA or protein sequence is essentially meaningless without some basic descriptive information, such as the organism from which it comes, its location on the genome, the person or organization that produced the sequence, and a unique identification number (accession number) so that it can be referenced in scientific literature. This additional annotation information can also be stored as text--even in the same file with the sequence information--but there must be a consistent format, a standard.

In addition to maintaining basic flat-file structures for text data, it is also useful to maintain sequence data in relational databases, which allow for much faster searching across multiple query terms and the linkage of sequence data files with other relevant information. The most sophisticated and widely used relational database system for bioinformatics is the Entrez system at the National Center for Biotechnology Information (NCBI). Entrez is a relational database that includes cross links between all of the DNA sequences in GenBank. (GenBank exchanges data with the DNA DataBank of Japan and the European Molecular Biology Laboratory on a daily basis to ensure that all three centers maintain the same set of data, and all peer-reviewed journals require the submission of sequence data to GenBank prior to publication of research articles; publicly funded sequencing projects, such as the Human Genome Project, submit new sequence data to GenBank as it is collected, so that the scientific community can have immediate access to it.) Entrez also includes all of the derived protein sequences (translations from cDNAs and predicted coding sequences in genomic DNA), the scientific literature in MedLine/PubMed, three-dimensonal protein structures from the Protein Data Base (PDB), and human genetic information from the Online Mendelian Inheritance in Man (OMIM) database. Relational databases are even more important for more complex types of genomic data, such as gene expression microarrays and genetic variation and genotyping data sets.

Key Algorithms
Some of the key algorithms used in bioinformatics include sequence alignment (dynamic programming), sequence similarity (word matching from hash tables), assembly of overlapping fragments, clustering (hierarchical, self-organizing maps, principal components, and the like), pattern recognition, and protein three-dimensonal structure prediction. Bioinformatics is both eclectic and pragmatic: Algorithms are adopted from many different disciplines, including linguistics, statistics, artificial intelligence and machine learning, remote sensing, and information theory. There is no consistent set of theoretical rules at the core of bioinformatics; it is simply a collection of whatever algorithms and data structures have been found to work for the current data-management problems being faced by biologists. As new types of data become important in the work of molecular geneticists, new algorithms for bioinformatics will be invented or adopted.

New Types of Data
In addition to DNA and protein sequences, bioinformatics is being called upon to organize many other types of biological information that are being collected in ever greater amounts. Gene expression microarrays collect information on the amounts of mRNA produced from tens of thousands of different genes in a single tissue sample. Proteomics technologies are automating the process of mass spectroscopy, which allows investigators to identify and measure thousands of proteins in a single cell extract sample. Genes and proteins can also be organized into gene families based on sequence similarity, homology across organisms (comparative genomics), and function in metabolic or regulatory pathways. Many new technologies are being developed to measure genetic variation-genetic tests either for alleles of well-studied genes or for anonymous single nucleotide polymorphisms (SNPs) identified from genome sequence data. As these genotyping technologies are improved, it is becoming possible to collect data in an automated fashion for many genetic loci from a single DNA sample, or to test a single genetic locus on many thousands of DNA samples in parallel. These new data types require new database designs and the inclusion of new types of algorithms (from statistics, population genetics, and other disciplines) in bioinformatics data-management solutions.

Integration
The biggest challenge facing bioinformatics is the integration of various types of data in a form that allows scientists to extract meaningful insights about biology from the masses of information in molecular genetic databases. Genome browsers are one example of this challenge. It is extremely difficult to provide a display that allows someone to view all of the relevant information about a gene or a chromosomal region, including the identity of encoded proteins, protein structure and functional information, involvement in metabolic and regulatory pathways, developmental and tissue-specific gene expression, evolutionary relationships to proteins in other organisms, DNA motifs bound by regulatory proteins, genetic synteny with other species, phenotypes of mutations, and known alleles and SNPs and their frequency in various populations.

Another, much more modest, goal would be simply to alert a person viewing a DNA or protein sequence in one database of the existence of additional information about that entity in other databases. At the present time, such cross-database links are inconsistent and unreliable. The NCBI cross-references its own databases--from DNA to proteins to three-dimensional structures to PubMed articles to genomes. Most special subject databases--such as those that focus on a particular species or on a particular type of molecule, link DNA and protein sequences back to the corresponding "reference" entries in GenBank; however, these links are not reciprocal. Someone looking at a GenBank cDNA sequence in the Entrez browser would have no way of knowing that a corresponding protein entry is present in a database dedicated to Drosophila genetics or to G-protein coupled receptor mutants. It is never possible for scientists to be certain that they have collected all of the relevant information about a molecule of interest from all online databases.

Stuart M. Brown

See Also
cDNA Libraries; DNA Fingerprinting; DNA Sequencing Technology; Forensic Genetics; Genetic Testing: Ethical and Economic Issues; Genetics, Historical Development of; Genomic Libraries; Genomics; Human Genome Project; Icelandic Genetic Database; Linkage Maps; Proteomics.

Further Reading
Baxevanis, Andreas D., and B. F. Francis Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. 2d ed. Hoboken, N.J.: John Wiley & Sons, 2003. This book provides a sound foundation of basic concepts of bioinformatics, with practical discussions and comparisons of both computational tools and databases relevant to biological research. The standard text for most graduate-level bioinformatics courses.

Claverie, Jean-Michel, and Cedric Notredame. Bioinformatics for Dummies. Hoboken, N.J.: John Wiley & Sons, 2003. A practical introduction to bioinformatics: computer technologies that biochemical and pharmaceutical researchers use to analyze genetic and biological data. This reference addresses common biological questions, problems, and projects while providing a UNIX/Linux overview and tips on tweaking bioinformatic applications using Perl.

Krawetz, Stephen A., and David D. Womble. Introduction to Bioinformatics: A Theoretical and Practical Approach. Totowa, N.J.: Humana Press, 2003. Aimed at undergraduates, graduate students, and researchers. Four sections: "Biochemistry: Cell and Molecular Biology," "Molecular Genetics," "Unix Operating System," and "Computer Applications."

Mount, David W. Bioinformatics: Sequence and Genome Analysis. Woodbury, N.Y.: Cold Spring Harbor Laboratory Press, 2001. A textbook written for the biologist who wants to get a thorough understanding of popular bioinformatics programs and molecular databases. It does not teach programming but does explain the theory behind each of the algorithms.

Nucleic Acids Research 31, no. 1 (2003). This widely respected journal produces a special issue in January of each year devoted entirely to online bioinformatics databases. The articles represent the definitive statement by the directors of each of the major public databases of molecular biology data regarding the types of information and analysis tools in their databases and plans for development in the immediate future.

Web Sites of Interest
Bioinformatics Organization. http://www.bioinformatics.org. Provides a helpful tutorial on bioinformatics.

European Bioinformatics Institute. http://www.ebi.ac.uk. Maintains databases concerning nucleic acids, protein sequences, and macro-molecular structures, as well as postings of news and events and descriptions of ongoing scientific projects.


SALEM PRESS, INC. · 131 North El Molino Avenue · Pasadena · CA 91101
© Salem Press, Inc. All Rights Reserved.
Terms of Use Privacy Statement Site Index Contact Salem