Aarhus Universitets segl

The Fatgraph Models of Proteins and Their Applications in The Protein Folding Problem

by Yuki Koyanagi
PhD Dissertations March 2021

This thesis presents the results from my PhD studies at Aarhus University, supervised by Professor Jørgen Ellegaard Andersen. The main focus of the PhD project was to investigate applications of the fatgraph model of proteins, which was first proposed by Andersen and others. The studies are exploratory in nature, but are designed with the intention of contributing to the protein folding problem using the fatgraph model.

In the first part of the thesis, a review of mathematical objects and theories related to the project is presented. It is followed by a review of the works utilising fatgraphs in the study of another biological macromolecule, RNA. We review the recursion relations obtained by the so-called cut and join method, matrix models and topological recursion.

In the second part of the thesis, we present new results in relation to the study of protein structures. First the basic fatgraph model of proteins is introduced, and recursion relations for the protein diagrams, obtained by cut and join method, are presented. We construct matrix models that encode generating functions for protein diagrams, and derive partial differential equations, which express the cut and join equations. We then discuss three experimental studies in applications of fatgraph models. In the first project, we introduce a novel model of proteins, which we call protein metastructures, and an associated topological model, which is a modification of the basic fatgraph model. These are used to study beta-sheet topology of proteins, which is the configuration of beta-strands in beta-sheets. We show that the proteins favour less complex beta-sheet topologies by comparing the data from the actual proteins and simulated data. Some applications of the models are presented, including an example for combining the method with an existing program for predicting beta-sheet topology. As a result, prediction accuracy was improved by 8 percentage points in Precision and 3 percentage points in Recall. The second project takes inspiration from CASP assessment of model quality, and attempts to select the best structure from a set of candidate structures, which aim to reproduce the target protein structure from its primary sequence. We show the topological information contained in our model is enough to predict, if not the best, a structure close to the best candidate structure. The third project aims to predict local geometry of the proteins, expressed as a rotation between peptide units (expressed as an element in the rotation group SO(3)) that are connected by a hydrogen bond, from their topology. The topological information is expressed as a pattern of other hydrogen bonds around the bond in question. We show that the rotation can be predicted to a high accuracy; close to 90% of the predictions lie within a ball centred at the true rotation occupying 1% of the SO(3) space. We conclude the thesis by a brief discussion of potential future challenges and benefits of the use of fatgraph models in the protein structure research.

Format available: PDF (5 MB)
Dissertation supervisors:: Gergely Bérczi and Jørgen Ellegaard Andersen