Computer analysis of nucleic acid regulatory sequences

Laurence Jay Korn; Cary Queen; Mark N. Wegman

doi:10.1073/pnas.74.10.4401

PNAS

Paper

01 Jan 1977

Computer analysis of nucleic acid regulatory sequences

View publication

Abstract

A computer program designed to facilitate the analysis of nucleic acid sequences is described. The program can search several nucleotide sequences for oligonucleotides common to all of them. It can examine a DNA or RNA sequence for two kinds of homologous regions - repetitions and dyad symmetries. The homologies need not be perfect: mismatches and 'looping out' of nucleotides are allowed. The program also finds (A+T)- and (G+C)-rich regions, locates restriction enzyme recognition sites, determines the distribution of di- and trinucleotides, and performs various other functions. Two representative applications of the program are included. All published prokaryotic transcription termination sequences (June 1977) were found to share the following features: (i) a string of at least five T residues, (ii) the sequence CGGGC or a close analog immediately preceding the T cluster, (iii) a region of strong dyad symmetry preceding the Ts and including the CGGGC sequence. A sequence of 221 nucleotides consisting of the Escherichia coli trp promoter, operator, and leader was found to contain two strong dyad symmetries. These homologies both occur at known regulatory sites; no comparable homologies occur in regions without regulatory significance.

Conference paper