An open reading frame (ORF) is a stretch of DNA that begins with a start codon and ends with a stop codon, with no interrupting stop codons in between. ORFs are the primary candidates for protein-coding genes, making ORF analysis one of the first steps in annotating a new sequence.

What Are Reading Frames?

DNA is read in triplets called codons. Because the reading can start at position 1, 2, or 3 of a sequence, there are three possible reading frames on each strand. Since DNA is double-stranded, there are six reading frames in total: three on the forward strand (+1, +2, +3) and three on the reverse complement strand (-1, -2, -3).

A true gene will typically have a long ORF in one of these frames. Random sequence produces stop codons roughly every 20 codons on average, so a long ORF is statistically unlikely to occur by chance.

Start and Stop Codons

The most common start codon is ATG, which codes for methionine. This is the universal start signal in most organisms. Some bacteria also use GTG or TTG as alternative start codons in certain contexts.

There are three stop codons:

An ORF is defined as the sequence from an ATG to the next in-frame stop codon. The length of the ORF is measured in codons or amino acids.

How to Identify Meaningful ORFs

Not every ATG-to-stop sequence is a real gene. To filter out short random ORFs, a minimum length threshold is applied. Common thresholds are:

Longer ORFs are more likely to represent real genes. In bacteria, most protein-coding genes are over 100 codons. In eukaryotes, genes are interrupted by introns, so genomic ORF analysis is less straightforward than in cDNA or mRNA sequences.

When working with eukaryotic genomic DNA, ORF analysis works best on cDNA or mRNA sequences where introns have already been spliced out. Genomic ORFs are often fragmented by introns.

Reading the Six Frames

To analyze all six reading frames, you need to:

  1. Take the original sequence and read it in frames +1, +2, and +3
  2. Generate the reverse complement of the sequence
  3. Read the reverse complement in frames +1, +2, and +3 (which correspond to -1, -2, -3 on the original)

In each frame, scan for ATG codons and then find the next in-frame stop codon. The region between them is an ORF candidate.

Translating an ORF

Once you have identified an ORF, you can translate it to a protein sequence using the standard genetic code. Each triplet codon maps to one of the 20 amino acids or a stop signal. For example:

DNA: ATG GGC AAA CTG TTC GAA TGA Protein: M G K L F E *

The asterisk (*) represents the stop codon. The protein sequence starts at M and ends just before the stop.

What to Do with ORF Results

After identifying ORFs, the next steps typically include:

Common Pitfalls

A few things to watch out for when interpreting ORF results:

ORF analysis is a fast and effective first pass at gene prediction. Combined with homology searching and functional annotation, it gives you a solid starting point for understanding any new DNA sequence.

Find ORFs in Your Sequence

Use our free ORF finder to scan all 6 reading frames, translate to protein, and view results with codon-by-codon highlighting.

Try the ORF Finder