Understanding gene function is important for many reasons; in particular, one could examine the gene of a pathogen to find a weakness, or find a gene in a human related to cancer. Annotating the function of every gene without a discovered function remains a grand challenge in biology. Genes code for proteins, and protein-protein interaction can be measured. However, working backward from patterns of protein-protein interactions to function is an unsolved problem.
Problem Solution:
Humans have displayed that the manual, difficult inference of interaction data is possible, suggesting it could be automated. I hypothesize that neural networks can learn patterns in these interactions, and classify them into gene functions.
Progress to Date:
I have created an autoencoder which compresses then decompresses protein-protein interactions. My model reconstructs both real and fake (fake proteins made by sampling data from actual proteins, to get a similar distribution but new proteins) with low loss. Using Principal Component Analysis (PCA) visualizations, I can already see clusters forming that separate according to the selected functions I am testing.
Expected Outcomes:
I expect to predict protein function and possibly reveal errors in the annotations of gene data sets I will be using.
Mentors:
Rich A. Bonneau,
Vladimir Gligorijević
Sources:
Vladimir Gligorijević, Meet Barot, Richard Bonneau; deepNF: deep network fusion for protein function prediction, Bioinformatics, Volume 34, Issue 22, 15 November 2018, Pages 3873–3881, https://doi.org/10.1093/bioinformatics/bty440
Rohit Singh, Jinbo Xu, Bonnie Berger; Global alignment of multiple protein interaction networks with application to functional orthology detection, Proceedings of the National Academy of Sciences Sep 2008, 105 (35) 12763-12768; DOI: 10.1073/pnas.0806627105
Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf; DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier, Bioinformatics, Volume 34, Issue 4, 15 February 2018, Pages 660–668, https://doi.org/10.1093/bioinformatics/btx624
Gene Ontology Consortium; The Gene Ontology (GO) database and informatics resource, Nucleic Acids Research, Volume 32, Issue suppl_1, 1 January 2004, Pages D258–D261, https://doi.org/10.1093/nar/gkh036
Abadi, MartÃn, et al. "Tensorflow: a system for large-scale machine learning." OSDI. Vol. 16. 2016.