CpG Islands in Genome

Background Info
  • The "CpG" notation is used to distinguish the linear sequence from the CG base-pairing of cytosine and guanine.
    Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosine.
  • In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately
    40% of promoters of mammalian genes
  • Given the GC frequency however, the number of CpG dinucleotides is much lower than expected. The "p" in CpG refers
    to the phosphodiester bond between the cytosine and the guanine, which indicates that the C and the G are next to each
    other in sequence regardless of being single- or double- stranded. More explicitly, both C and G would be on the same
    strand of DNA/RNA covalently bonded (chemically connected) by a phosphodiester bond (a strong bond).
  • More than half of the human gene promoters colocalize with CpG islands and their methylation status has been
    shown to correlate with the expression level of the associated genes. As a consequence, CpG islands are assumed
    to be hotspots of epigenetic regulation.
Testing CgiHunter Tested Result Output Files EXAMPLE: CgiHunter Output *.bed-like File Format for Both Cgi Map and Shadow Map

#chrom chromStart chromEnd ID score length C G CpG GC_content O_E_ratio rel_repeat_content
chr1 449 2753 1 2304.0 2304 640 732 122 0.595486111111 0.6 0.501302083333

Useful Links Available Software Genome Browser

First setup on June 10, 2012, last updated by Dr. Jeff Chen on October 1, 2012.