How to Find Open Reading Frames in a DNA Sequence

An open reading frame (ORF) is a stretch of DNA that begins with a start codon and ends with a stop codon, with no interrupting stop codons in between. ORFs are the primary candidates for protein-coding genes, making ORF analysis one of the first steps in annotating a new sequence.

What Are Reading Frames?

DNA is read in triplets called codons. Because the reading can start at position 1, 2, or 3 of a sequence, there are three possible reading frames on each strand. Since DNA is double-stranded, there are six reading frames in total: three on the forward strand (+1, +2, +3) and three on the reverse complement strand (-1, -2, -3).

A true gene will typically have a long ORF in one of these frames. Random sequence produces stop codons roughly every 20 codons on average, so a long ORF is statistically unlikely to occur by chance.

Start and Stop Codons

The most common start codon is ATG, which codes for methionine. This is the universal start signal in most organisms. Some bacteria also use GTG or TTG as alternative start codons in certain contexts.

There are three stop codons:

TAA (ochre)
TAG (amber)
TGA (opal or umber)

An ORF is defined as the sequence from an ATG to the next in-frame stop codon. The length of the ORF is measured in codons or amino acids.

How to Identify Meaningful ORFs

Not every ATG-to-stop sequence is a real gene. To filter out short random ORFs, a minimum length threshold is applied. Common thresholds are:

100 codons (300 bp) for prokaryotic gene prediction
30 codons (90 bp) for initial screening in eukaryotes

Longer ORFs are more likely to represent real genes. In bacteria, most protein-coding genes are over 100 codons. In eukaryotes, genes are interrupted by introns, so genomic ORF analysis is less straightforward than in cDNA or mRNA sequences.

When working with eukaryotic genomic DNA, ORF analysis works best on cDNA or mRNA sequences where introns have already been spliced out. Genomic ORFs are often fragmented by introns.

Reading the Six Frames

To analyze all six reading frames, you need to:

Take the original sequence and read it in frames +1, +2, and +3
Generate the reverse complement of the sequence
Read the reverse complement in frames +1, +2, and +3 (which correspond to -1, -2, -3 on the original)

In each frame, scan for ATG codons and then find the next in-frame stop codon. The region between them is an ORF candidate.

Translating an ORF

Once you have identified an ORF, you can translate it to a protein sequence using the standard genetic code. Each triplet codon maps to one of the 20 amino acids or a stop signal. For example:

DNA:     ATG GGC AAA CTG TTC GAA TGA
Protein:  M   G   K   L   F   E   *

The asterisk (*) represents the stop codon. The protein sequence starts at M and ends just before the stop.

What to Do with ORF Results

After identifying ORFs, the next steps typically include:

BLAST searching the translated protein sequence against known databases
Checking for conserved domains using tools like PFAM or InterPro
Comparing ORF positions with known gene annotations if available
Checking for a Kozak consensus sequence upstream of the ATG in eukaryotes
Looking for a Shine-Dalgarno sequence upstream of the ATG in prokaryotes

Common Pitfalls

A few things to watch out for when interpreting ORF results:

Nested ORFs: A long ORF may contain shorter ORFs within it. The longest ORF in a frame is usually the most relevant.
Overlapping ORFs: In some viruses and bacteria, two genes overlap in different reading frames. This is rare in eukaryotes.
Pseudogenes: A sequence may look like an ORF but contain premature stop codons due to mutations. These are non-functional pseudogenes.
Non-ATG starts: Some genes use alternative start codons. If you miss a gene, try relaxing the start codon requirement.

ORF analysis is a fast and effective first pass at gene prediction. Combined with homology searching and functional annotation, it gives you a solid starting point for understanding any new DNA sequence.

Find ORFs in Your Sequence

Use our free ORF finder to scan all 6 reading frames, translate to protein, and view results with codon-by-codon highlighting.

Try the ORF Finder