Background: Principal Component Analysis (PCA) and Orthogonal Projections
to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical
modeling tools that provide insights into separations between experimental groups
based on high-dimensional spectral measurements from NMR, MS or other analytical
instrumentation. However, when used without validation, these tools may lead investigators
to statistically unreliable conclusions. This danger is especially real for Partial
Least Squares (PLS) and OPLS, which aggressively force separations between experimental
groups. As a result, OPLS-DA is often used as an alternative method when
PCA fails to expose group separation, but this practice is highly dangerous. Without
rigorous validation, OPLS-DA can easily yield statistically unreliable group separation.
Methods: A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was
performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing
amount of Gaussian noise was added to each data matrix followed by the construction and validation
of PCA and OPLS-DA models.
Results: With increasing added noise, the PCA scores-space distance between groups rapidly decreased
and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between
the estimated loadings (added noise) and the true (original) loadings was also observed. While the
validity of the OPLS-DA model diminished with increasing added noise, the group separation in scoresspace
remained basically unaffected.
Conclusion: Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA
cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable
inference from PCA and OPLS-DA models.