Project Website on CpG Islands Finding in Genome

  • The "CpG" notation is used to distinguish the linear sequence from the CG base-pairing of cytosine and guanine. Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosine.

  • In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately 40% of promoters of mammalian genes. In the human genome, the observed CpG frequency is around 5 times lower than expected by the GC content, most likely due to high mutation rates of methylated CpGs (between 70 and 80% of all CpGs in the human genome are methylated).

  • Given the GC frequency however, the number of CpG dinucleotides is much lower than expected. The "p" in CpG refers to the phosphodiester bond between the cytosine and the guanine, which indicates that the C and the G are next to each other in sequence regardless of being single- or double- stranded. More explicitly, both C and G would be on the same strand of DNA/RNA covalently bonded (chemically connected) by a phosphodiester bond (a strong bond).

  • More than half of the human gene promoters colocalize with CpG islands and their methylation status has been shown to correlate with the expression level of the associated genes. As a consequence, CpG islands are assumed to be hotspots of epigenetic regulation.