CpG islands are regions of DNA with a higher-than-expected frequency of CpG dinucleotides. They are found at or near the promoters of roughly 60% of human genes and play a central role in controlling whether those genes are switched on or off. Understanding CpG islands is fundamental to epigenetics and gene regulation research.

What is a CpG Dinucleotide?

A CpG dinucleotide is simply a cytosine (C) followed by a guanine (G) on the same strand of DNA, connected by a phosphodiester bond (hence the lowercase "p"). This is different from a C-G base pair, which refers to complementary bases on opposite strands.

In the human genome, CpG dinucleotides are surprisingly rare. They occur at only about 20% of the frequency you would expect based on the overall GC content of the genome. This depletion happened over evolutionary time because methylated cytosines in CpG contexts are prone to spontaneous deamination, converting them to thymine. Over generations, this process eroded CpG sites from most of the genome.

What Makes a CpG Island?

A CpG island is a region where CpG dinucleotides are present at close to the expected frequency, rather than being depleted. The standard definition, proposed by Gardiner-Garden and Frommer in 1987, requires:

The observed/expected ratio is calculated as:

Obs/Exp = (Number of CpG x Total length) / (Number of C x Number of G)

A ratio close to 1.0 means CpGs are present at the frequency you would predict from the GC content alone. A ratio below 0.6 indicates CpG depletion, typical of most genomic regions.

Where Are CpG Islands Found?

About 70% of human gene promoters are associated with CpG islands. They are also found in:

CpG islands at promoters are typically unmethylated in normal, actively transcribed genes. This unmethylated state allows transcription factors to bind and gene expression to proceed.

The presence of a CpG island near a gene's start site is a strong indicator that the gene is regulated by DNA methylation. Housekeeping genes, which are expressed in nearly all cell types, almost always have CpG island promoters.

DNA Methylation and Gene Silencing

DNA methylation in mammals occurs almost exclusively at CpG dinucleotides. A methyl group is added to the cytosine by DNA methyltransferase enzymes (DNMT1, DNMT3A, DNMT3B), producing 5-methylcytosine (5mC).

When a CpG island promoter becomes methylated, gene expression is typically silenced. Methylation represses transcription through two mechanisms:

  1. Direct blocking: Methylated CpGs prevent transcription factor binding at the promoter.
  2. Chromatin compaction: Methyl-CpG binding proteins (MBDs) recruit histone deacetylases, which compact chromatin into a transcriptionally inactive state called heterochromatin.

CpG Islands in Development and Disease

CpG island methylation is a normal part of development. During embryogenesis, specific genes are silenced in a tissue-specific manner through promoter methylation. This is how the same genome can give rise to hundreds of different cell types.

In cancer, CpG island methylation goes wrong in two ways:

Aberrant CpG island methylation is now used as a biomarker for cancer diagnosis and prognosis. Methylation-specific PCR and bisulfite sequencing are standard methods for detecting these changes.

X-Chromosome Inactivation and Imprinting

Two classic examples of CpG island methylation in normal biology are X-chromosome inactivation and genomic imprinting.

In female mammals, one X chromosome is randomly inactivated in each cell during early development. The inactive X chromosome is heavily methylated at CpG islands, silencing most of its genes. This ensures equal gene dosage between males (XY) and females (XX).

Genomic imprinting is a process where certain genes are expressed from only one parental allele. The silenced allele is marked by CpG island methylation established in the germline. About 100 imprinted genes are known in humans, including IGF2 and H19.

How to Detect CpG Islands Computationally

CpG island detection in a DNA sequence involves scanning with a sliding window and applying the three criteria: length, GC content, and observed/expected CpG ratio. A typical window size is 200 bp, moved in steps of 1 bp across the sequence.

Adjacent windows that meet the criteria are merged into a single island. The result is a list of CpG island coordinates with their GC content and obs/exp ratios.

For experimental validation, bisulfite sequencing converts unmethylated cytosines to uracil (read as thymine after PCR) while leaving methylated cytosines unchanged. Comparing bisulfite-treated and untreated sequences reveals the methylation status of each CpG site.

Practical Uses of CpG Island Analysis

CpG islands sit at the intersection of genetics and epigenetics. They are a window into how gene expression is controlled without changing the DNA sequence itself. As epigenomics continues to grow as a field, CpG island analysis will remain a core part of the toolkit.

Detect CpG Islands in Your Sequence

Use our free DNA Sequence Analyzer to scan for CpG islands using the standard Gardiner-Garden criteria, with a color-coded visual map of results.

Find CpG Islands