Background: The mathematical foundation for the information theory in communication
engineering was developed by Claude Shannon in 1948. Since then the information theory has been
utilized to investigate various information carrying systems including biomolecules such as DNA and
Objective: In this study, a measure for the structural information content estimate of proteomes is
proposed. The considered primary structure feature for the information content investigation is the
sequence length organization of proteomic proteins, as opposed to the amino acid order in individual
Method: We analyzed and compared the information content estimates of a representative proteome set
of ten proteomes for measured, model-predicted (linguistic distribution model) and simulated (random
sequence length) cases.
Results: Excellent agreement was observed in the measured and model-predicted information contents
of the proteomes. The overall average information per proteomic protein was obtained as 8 and 7 bits
for the measured/model-predicted and the simulated proteomic collection data, respectively.
Conclusion: The study reveals that the biological interaction mechanisms may primarily rely on the
number of amino acids than the amino acid order of an interaction-initiating protein sequence. The
approach presented here may serve as a practical tool for studying and comparing biological processes
taking place in an organism or in a collection of organisms, and is anticipated to offer numerous
promises for the exploration of proteomic information characteristics present in different structural
hierarchies such as the secondary and tertiary structures.