Background: Using system biology data to investigate diseases is a tendency. In consideration
that protein is the functional unit of human body in molecule level, it is a straight way to view the
relationships among diseases from the perspective of human proteins. However, lack of disease annotations
of human proteins limit this purpose.
Objective: Our objective is to present a framework for extracting associations between diseases and
proteins first, and then constructed human disease network (HDN) based on disease-related proteins.
Method: The protein-disease associations were extracted from UniProt, which involves disease descriptions
of human proteins. Each description contains an Online Mendelian Inheritance in Man
(OMIM) id or a text. OMIM ids of the descriptions were mapped to Comparative Toxicogenomics Database
(CTD)'s ‘merged disease vocabulary' (MEDIC), and disease terms of the texts were annotated
to MEDIC using MGREP. Relativity scores of disease pairs were calculated based on Jaccard Index
for establishing the HDN, where a node represents a disease and an edge of pair-wise diseases indicates
their relativity score more than zero.
Results: 4,466 associations between 2,933 diseases and 2,625 proteins were obtained. The degree distribution
of the diseases in the HDN revealed a power-law distribution with R2 = 0.9762, which shows
that the network displayed scale-free characteristics like many other biological networks.
Conclusion: Here, we constructed a HDN by our protein-disease annotations. As our expectation, hub
nodes of the network are always disease classes or complex diseases. In comparison, the most similar
diseases are always specific diseases.