CpG islands are regions of DNA with a higher-than-expected frequency of CpG dinucleotides. They are found at or near the promoters of roughly 60% of human genes and play a central role in controlling whether those genes are switched on or off. Understanding CpG islands is fundamental to epigenetics and gene regulation research.
What is a CpG Dinucleotide?
A CpG dinucleotide is simply a cytosine (C) followed by a guanine (G) on the same strand of DNA, connected by a phosphodiester bond (hence the lowercase "p"). This is different from a C-G base pair, which refers to complementary bases on opposite strands.
In the human genome, CpG dinucleotides are surprisingly rare. They occur at only about 20% of the frequency you would expect based on the overall GC content of the genome. This depletion happened over evolutionary time because methylated cytosines in CpG contexts are prone to spontaneous deamination, converting them to thymine. Over generations, this process eroded CpG sites from most of the genome.
What Makes a CpG Island?
A CpG island is a region where CpG dinucleotides are present at close to the expected frequency, rather than being depleted. The standard definition, proposed by Gardiner-Garden and Frommer in 1987, requires:
- Length of at least 200 bp
- GC content of at least 50%
- Observed/expected CpG ratio of at least 0.6
The observed/expected ratio is calculated as:
A ratio close to 1.0 means CpGs are present at the frequency you would predict from the GC content alone. A ratio below 0.6 indicates CpG depletion, typical of most genomic regions.
Where Are CpG Islands Found?
About 70% of human gene promoters are associated with CpG islands. They are also found in:
- The first exon of many genes
- Intragenic regions of some highly expressed genes
- Repetitive elements, though these are usually heavily methylated
CpG islands at promoters are typically unmethylated in normal, actively transcribed genes. This unmethylated state allows transcription factors to bind and gene expression to proceed.
DNA Methylation and Gene Silencing
DNA methylation in mammals occurs almost exclusively at CpG dinucleotides. A methyl group is added to the cytosine by DNA methyltransferase enzymes (DNMT1, DNMT3A, DNMT3B), producing 5-methylcytosine (5mC).
When a CpG island promoter becomes methylated, gene expression is typically silenced. Methylation represses transcription through two mechanisms:
- Direct blocking: Methylated CpGs prevent transcription factor binding at the promoter.
- Chromatin compaction: Methyl-CpG binding proteins (MBDs) recruit histone deacetylases, which compact chromatin into a transcriptionally inactive state called heterochromatin.
CpG Islands in Development and Disease
CpG island methylation is a normal part of development. During embryogenesis, specific genes are silenced in a tissue-specific manner through promoter methylation. This is how the same genome can give rise to hundreds of different cell types.
In cancer, CpG island methylation goes wrong in two ways:
- Hypermethylation of tumor suppressor genes: Promoter CpG islands of genes like BRCA1, MLH1, and CDKN2A become abnormally methylated, silencing these protective genes and contributing to cancer progression.
- Global hypomethylation: The overall level of DNA methylation decreases in cancer cells, which can activate oncogenes and destabilize the genome.
Aberrant CpG island methylation is now used as a biomarker for cancer diagnosis and prognosis. Methylation-specific PCR and bisulfite sequencing are standard methods for detecting these changes.
X-Chromosome Inactivation and Imprinting
Two classic examples of CpG island methylation in normal biology are X-chromosome inactivation and genomic imprinting.
In female mammals, one X chromosome is randomly inactivated in each cell during early development. The inactive X chromosome is heavily methylated at CpG islands, silencing most of its genes. This ensures equal gene dosage between males (XY) and females (XX).
Genomic imprinting is a process where certain genes are expressed from only one parental allele. The silenced allele is marked by CpG island methylation established in the germline. About 100 imprinted genes are known in humans, including IGF2 and H19.
How to Detect CpG Islands Computationally
CpG island detection in a DNA sequence involves scanning with a sliding window and applying the three criteria: length, GC content, and observed/expected CpG ratio. A typical window size is 200 bp, moved in steps of 1 bp across the sequence.
Adjacent windows that meet the criteria are merged into a single island. The result is a list of CpG island coordinates with their GC content and obs/exp ratios.
For experimental validation, bisulfite sequencing converts unmethylated cytosines to uracil (read as thymine after PCR) while leaving methylated cytosines unchanged. Comparing bisulfite-treated and untreated sequences reveals the methylation status of each CpG site.
Practical Uses of CpG Island Analysis
- Predicting gene promoter locations in unannotated sequences
- Identifying candidate genes regulated by methylation in a disease context
- Designing bisulfite sequencing primers for methylation analysis
- Comparing methylation patterns between normal and tumor tissue
- Studying epigenetic changes during differentiation or in response to environmental factors
CpG islands sit at the intersection of genetics and epigenetics. They are a window into how gene expression is controlled without changing the DNA sequence itself. As epigenomics continues to grow as a field, CpG island analysis will remain a core part of the toolkit.
Detect CpG Islands in Your Sequence
Use our free DNA Sequence Analyzer to scan for CpG islands using the standard Gardiner-Garden criteria, with a color-coded visual map of results.
Find CpG Islands