The structure of a genome is a linear sequence of nucleotides that encodes genes and regulatory elements. Genes are homologous if they are related by divergence from a common ancestor (Attwood 2000). Homologous genes perform the same or similar functions. The sequences of homologous genes in related organisms are usually similar. For example, the sequences of homologous genes in humans and mice are 85 percent similar on average (Makalowski et al. 1996). If a new genomic DNA sequence is very similar to the sequence of a gene whose function is known, it is very likely that the genomic DNA sequence contains a gene and its function is similar to the function of the known gene. If a new genomic DNA sequence is highly similar to a cDNA sequence, then the genomic DNA sequence contains a gene and the structure of the gene can be found by aligning the two sequences. Thus methods for comparing sequences are very useful for understanding the structures and functions of genes in a genome. This chapter focuses on methods for comparing two sequences, which often serve as a bias for multiple sequence comparison methods, a topic for the next chapter.
Available at: http://works.bepress.com/xiaoqiu-huang/25/
This chapter is published as Bio‐sequence comparison and applications, X Huang. Reprinted from Current Topics in Computation Molecular Biology, edited by Jiang, Tao, Ying Xu, and Michael Q. Zhang, published by The MIT Press, 2002, pp. 45‐69. For more information, please click here.