In order to transform protein sequences into the feature vectors, several works have been done, such as computing
auto covariance (AC), conjoint triad (CT), local descriptor (LD), moran autocorrelation (MA), normalized moreaubroto
autocorrelation (NMB) and so on. In this paper, we shall adopt these transformation methods to encode the proteins,
respectively, where AC, CT, LD, MA and NMB are all represented by ‘+’ in a unified manner. A new method, i.e. the
combination of least squares regression with ‘+’ (abbreviated as LSR+), will be introduced for encoding a protein-protein
correlation-based feature representation and an interacting protein pair. Thus there are totally five different combinations
for LSR+, i.e. LSRAC, LSRCT, LSRLD, LSRMA and LSRNMB. As a result, we combined a support vector machine
(SVM) approach with LSR+ to predict protein-protein interactions (PPI) and PPI networks. The proposed method has been
applied on four datasets, i.e. Saaccharomyces cerevisiae, Escherichia coli, Homo sapiens and Caenorhabditis elegans.
The experimental results demonstrate that all LSR+ methods outperform many existing representative algorithms. Therefore,
LSR+ is a powerful tool to characterize the protein-protein correlations and to infer PPI, whilst keeping high performance
on prediction of PPI networks.