Activity-by-Contact Model to Predict Enhancer-Gene Connections

Team: 20

School: Los Alamos High

Area of Science: Computational Biology


Proposal: Genes are expressed at different levels in every cell, but there is currently a limited understanding of gene activation. A leading theory is that genes are activated by enhancers located in open chromatin regions near the gene. This theory is difficult to test, as multiple enhancers may control one gene, a single enhancer may control many genes, and connections can span large genomic distances.
When a mutation occurs in the genome, it may change the folding of the DNA or relocate genes. Over or under-expression of a gene can lead to uncontrolled cell growth. It is vital to identify enhancer-gene connections to form a better understanding of the activation of oncogenes, identify transcription factor binding sites, and identify kinases for drug targets.
The goal is to create a model of gene activation and predict enhancer-gene connections based on the 3D structure of the genome. This Activity-by-Contact model is defined as:

(ABC score)_E-G = (A_E * C_E-G) / (Summation A_e * C_e-G)

Where A_e is the activity of the enhancer (genomic mean between ATAC-seq and H3K27ac) and C_e-G is the contact between the enhancer and the gene (measured by HiC) (Fulco et al 2019).
The model will first be validated using sequencing data from the cell line K562, a human myelogenous leukemia cell line, using enhancer-gene connection data from Gasperini et al (2019). After validation, the ABC model will be applied to genetic data from 16 B-ALL Leukemia patients to analyze how mutated enhancers affect the expression of oncogenes. All analysis and data processing will be done in python, R, and shell script.

Mentors:
Graham McVicker and Zhichao Xu, at the Salk Institute for Biological Sciences


Team Members:

  Lillian Petersen

Sponsoring Teacher: NA

Mail the entire Team