Collagen is the most abundant protein in the whole human body and its instability is involved in many important diseases, such as Osteogenesis imperfecta, Ehlers-Danlos syndrome, and collagenopathy. The stability of the collagen triple helix is strictly related to its amino acid sequence, especially the main Gly-X-Y motif. Many groups have used computational methods to investigate collagens structure and the relationship between its stability and structure. In this study, we initially reviewed the most important computational methods that have been applied in this field. We then assembled data on a large number of collagen-like peptides to build the first Markov chain model for predicting the stability of the collagen at different temperatures, simply by analyzing the amino acid sequence. We used the literature to assemble a set of 102 peptides and their relative melting temperatures were determined experimentally, indicating a great variance with the main motif of the collagen. This dataset was then split in two classes, stable and unstable, according to their melting temperatures and the dataset was then used to build artificial neural network (ANN) models to predict collagen stability. We built models to predict stability at temperatures of 38°C, 35°C, 30°C, and 25°C degrees, and all models had an accuracy between 82% and 92%. Several cross-validation procedures were performed to validate the model. This method facilitates fast and accurate predictions of collagen stability at different temperatures.
Keywords: Markov chain model, linear discriminant analysis, artificial neural network, amino acid sequence, collagen stability, Osteogenesis imperfecta, Ehlers-Danlos syndrome, cross-validation, collagenopathy, support vector machines
Rights & PermissionsPrintExport