Protein Function Inference via Neural Network

Team: 98

School: Los Alamos High

Area of Science: Computational Biology


Interim:


Problem Definition:

        Understanding gene function is important for many reasons; in particular, one could examine the gene of a pathogen to find a weakness, or find a gene in a human related to cancer. Annotating the function of every gene without a discovered function remains a grand challenge in biology. Genes code for proteins, and protein-protein interaction can be measured. However, working backward from patterns of protein-protein interactions to function is an unsolved problem.

Problem Solution:

        Humans have displayed that the manual, difficult inference of interaction data is possible, suggesting it could be automated. I hypothesize that neural networks can learn patterns in these interactions, and classify them into gene functions.

Progress to Date:

        I have created an autoencoder which compresses then decompresses protein-protein interactions. My model reconstructs both real and fake (fake proteins made by sampling data from actual proteins, to get a similar distribution but new proteins) with low loss. Using Principal Component Analysis (PCA) visualizations, I can already see clusters forming that separate according to the selected functions I am testing.

Expected Outcomes:

        I expect to predict protein function and possibly reveal errors in the annotations of gene data sets I will be using.

Mentors:

        Rich A. Bonneau,         Vladimir Gligorijević

Sources:
  1. Vladimir Gligorijević, Meet Barot, Richard Bonneau; deepNF: deep network fusion for protein function prediction, Bioinformatics, Volume 34, Issue 22, 15 November 2018, Pages 3873–3881, https://doi.org/10.1093/bioinformatics/bty440
  2. Rohit Singh, Jinbo Xu, Bonnie Berger; Global alignment of multiple protein interaction networks with application to functional orthology detection, Proceedings of the National Academy of Sciences Sep 2008, 105 (35) 12763-12768; DOI: 10.1073/pnas.0806627105
  3. Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf; DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier, Bioinformatics, Volume 34, Issue 4, 15 February 2018, Pages 660–668, https://doi.org/10.1093/bioinformatics/btx624
  4. Gene Ontology Consortium; The Gene Ontology (GO) database and informatics resource, Nucleic Acids Research, Volume 32, Issue suppl_1, 1 January 2004, Pages D258–D261, https://doi.org/10.1093/nar/gkh036
  5. Abadi, Martín, et al. "Tensorflow: a system for large-scale machine learning." OSDI. Vol. 16. 2016.


Team Members:

  Charles Strauss

Sponsoring Teacher: NA

Mail the entire Team