School: Los Alamos High
Area of Science: Computational Biology
Understanding gene function is important for many reasons; in particular, one could examine the gene of a pathogen to find a weakness, or find a gene in a human related to cancer. Annotating the function of every gene without a discovered function remains a grand challenge in biology. Genes code for proteins, and protein-protein interaction can be measured. However, working backward from patterns of protein-protein interactions to function is an unsolved problem.
Humans have displayed that the manual, difficult inference of interaction data is possible, suggesting it could be automated. I hypothesize that neural networks can learn patterns in these interactions, and classify them into gene functions.
I have created an autoencoder which compresses then decompresses protein-protein interactions. My model reconstructs both real and fake (fake proteins made by sampling data from actual proteins, to get a similar distribution but new proteins) with low loss. Using Principal Component Analysis (PCA) visualizations, I can already see clusters forming that separate according to the selected functions I am testing.
I expect to predict protein function and possibly reveal errors in the annotations of gene data sets I will be using.Mentors:
Rich A. Bonneau, Vladimir GligorijevićSources:
Sponsoring Teacher: NA
Mail the entire Team