Long human–mouse sequence alignments reveal novel regulatory elements: A reason to sequence the mouse genome

RC Hardison, J Oeltjen, W Miller - Genome research, 1997 - genome.cshlp.org
RC Hardison, J Oeltjen, W Miller
Genome research, 1997genome.cshlp.org
The utility of sequencing entire genomes of bacteria and fungi is amply demonstrated. For
instance, as the complete set of genes for each species is catalogued, one can ascertain the
full complement of encoded proteins, obtain insights into the function of new proteins by
sequence matches to known proteins, and measure the transcriptional levels of all genes in
a genome under various environmental conditions or at different stages of the cell cycle
(Boguski et al. 1996; Velculescu et al. 1997). The currently sequenced genomes consist …
The utility of sequencing entire genomes of bacteria and fungi is amply demonstrated. For instance, as the complete set of genes for each species is catalogued, one can ascertain the full complement of encoded proteins, obtain insights into the function of new proteins by sequence matches to known proteins, and measure the transcriptional levels of all genes in a genome under various environmental conditions or at different stages of the cell cycle (Boguski et al. 1996; Velculescu et al. 1997). The currently sequenced genomes consist primarily of coding regions with little sequence between the genes, and the amount of genetic information in each segment is usually quite high. Larger genomes from more complex organisms have a considerable amount of DNA between the genes and in introns that interrupt the coding regions, and one could question whether it is useful to determine the sequences of all of these noncoding regions. Indeed, the concerted efforts to determine partial sequences of normalized cDNA libraries have generated rich and very useful databases, such as the TIGR database (TDB) and dbEST (Adams et al. 1991; Boguski 1995). Efforts from Schuler and his colleagues to unite the several sequences from each set of cDNA clones representing a unique gene, the UniGene project, will organize this large amount of sequence data. As of late 1996, the UniGene database contained samples of sequences of almost 50,000 genes, which could represent a majority of human genes (Schuler et al. 1996). Of these UniGene clusters, 16,000 have been placed on the human genome map, which will greatly aid in positional cloning of interesting genes.
genome.cshlp.org