Protein Function Inference via Neural Network

Team: 98

School: Los Alamos High

Area of Science: Computational Biology

Interim:

Problem Definition:

Understanding gene function is important for many reasons; in particular, one could examine the gene of a pathogen to find a weakness, or find a gene in a human related to cancer. Annotating the function of every gene without a discovered function remains a grand challenge in biology. Genes code for proteins, and protein-protein interaction can be measured. However, working backward from patterns of protein-protein interactions to function is an unsolved problem.

Problem Solution:

Humans have displayed that the manual, difficult inference of interaction data is possible, suggesting it could be automated. I hypothesize that neural networks can learn patterns in these interactions, and classify them into gene functions.

Progress to Date:

I have created an autoencoder which compresses then decompresses protein-protein interactions. My model reconstructs both real and fake (fake proteins made by sampling data from actual proteins, to get a similar distribution but new proteins) with low loss. Using Principal Component Analysis (PCA) visualizations, I can already see clusters forming that separate according to the selected functions I am testing.

Expected Outcomes:

I expect to predict protein function and possibly reveal errors in the annotations of gene data sets I will be using.

Mentors:

Rich A. Bonneau, Vladimir GligorijeviÄ‡

Sources:

Vladimir GligorijeviÄ‡, Meet Barot, Richard Bonneau; deepNF: deep network fusion for protein function prediction, Bioinformatics, Volume 34, Issue 22, 15 November 2018, Pages 3873â€“3881, https://doi.org/10.1093/bioinformatics/bty440
Rohit Singh, Jinbo Xu, Bonnie Berger; Global alignment of multiple protein interaction networks with application to functional orthology detection, Proceedings of the National Academy of Sciences Sep 2008, 105 (35) 12763-12768; DOI: 10.1073/pnas.0806627105
Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf; DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier, Bioinformatics, Volume 34, Issue 4, 15 February 2018, Pages 660â€“668, https://doi.org/10.1093/bioinformatics/btx624
Gene Ontology Consortium; The Gene Ontology (GO) database and informatics resource, Nucleic Acids Research, Volume 32, Issue suppl_1, 1 January 2004, Pages D258â€“D261, https://doi.org/10.1093/nar/gkh036
Abadi, MartÃn, et al. "Tensorflow: a system for large-scale machine learning." OSDI. Vol. 16. 2016.

Team Members:

Charles Strauss

Sponsoring Teacher: NA

Mail the entire Team