Activity-by-Contact Model to Predict Enhancer-Gene Connections

Team: 20

School: Los Alamos High

Area of Science: Computational Biology

Proposal: Genes are expressed at different levels in every cell, but there is currently a limited understanding of gene activation. A leading theory is that genes are activated by enhancers located in open chromatin regions near the gene. This theory is difficult to test, as multiple enhancers may control one gene, a single enhancer may control many genes, and connections can span large genomic distances.
When a mutation occurs in the genome, it may change the folding of the DNA or relocate genes. Over or under-expression of a gene can lead to uncontrolled cell growth. It is vital to identify enhancer-gene connections to form a better understanding of the activation of oncogenes, identify transcription factor binding sites, and identify kinases for drug targets.
The goal is to create a model of gene activation and predict enhancer-gene connections based on the 3D structure of the genome. This Activity-by-Contact model is defined as:

(ABC score)_E-G = (A_E * C_E-G) / (Summation A_e * C_e-G)

Where A_e is the activity of the enhancer (genomic mean between ATAC-seq and H3K27ac) and C_e-G is the contact between the enhancer and the gene (measured by HiC) (Fulco et al 2019).
The model will first be validated using sequencing data from the cell line K562, a human myelogenous leukemia cell line, using enhancer-gene connection data from Gasperini et al (2019). After validation, the ABC model will be applied to genetic data from 16 B-ALL Leukemia patients to analyze how mutated enhancers affect the expression of oncogenes. All analysis and data processing will be done in python, R, and shell script.

Graham McVicker and Zhichao Xu, at the Salk Institute for Biological Sciences

Team Members:

  Lillian Petersen

Sponsoring Teacher: NA

Mail the entire Team