Table of Contents
Principal Component Analysis
Overview
Principal component analysis (PCA) is a multi-variate statistical analysis that reduces the high-dimensional matrix of correlated, time-varying signals into a low-dimensional and statistically uncorrelated set of principal components (PCs). These PCs explain the variance found in the original signals and represent the most important features of the data, e.g., the overall magnitude or the shape of the time series at a particular point in the stride cycle. The value of each particular subject’s score for the individual PCs represents how strongly that feature was present in the data. In other words, PCA is a powerful dimensionality reduction technique commonly used in biomechanics research to identify patterns and sources of variation across large, high-dimensional datasets. Commonly time-series joint angle or force data collected across many subjects or trials.
More detail on the mathematics behind PCA can be found on the page: The Math of Principal Component Analysis. A look at the Dialog can be found here
The Utility of PCA
When we analyse biomechanical signals, we could identify many isolated quantities (e.g. maximum and minimum values) and compare the signals based on these metrics. Given that there is a common underlying shape to all of these signals, however, it can be more informative to use a multivariate statistical technique that can capture this basic shape and compare the shape of each signal to this underlying shape.
Visualizing PCA Results
Sift provides a number of ways to visualize and interact with the results of PCA. An overview of all PCA visualizations is available on the Sift - Analyse Page
On the Analyse Page
- Visualizing the variance explained by each PC individually;
- Visualizing the variance explained by each PC at each point in the signal's cycle;
- Scatter-plotting workspace scores in PC-space;
- Showing the distribution of scores by group for each PC;
- Visualizing the mean and extreme values that result from reconstructing the underlying data with each PC; and
- Visualizing how the signals can be reconstructed from the computed PCs.
Further Analysis
Once PCA has been run in SIFT, several built-in modules support deeper investigation of results:
- Outlier Detection for PCA: An overview of the PCA outlier detection methods built into Sift.
- Mahalanobis Distances: Finding outliers through their Mahalanobis Distances.
- Squared Prediction Error: Finding outliers through their SPE.
- Local Outlier Factors: Finding outliers through the Local Outlier Factor.
- K-Means Analysis: Clustering PCA results through K-Means clustering.
Tutorials
Step-by-step tutorials are available for common PCA workflows in Sift:
- PCA Tutorial: Introduction to running PCA on biomechanical data
- Run K-Means: Clustering PCA results using K-Means
- PCA Outlier Analysis: Detecting and reviewing outliers in PCA results
- Treadmill Walking in Healthy Individuals: A full example processing a large dataset and using PCA to distinguish between groups
- Analysis of Baseball Hitters: Applying PCA to compare athletes across different levels of competition
References
Our implementations of Principal Component Analysis is based on the article:
Deluzio KJ and Astephen JL (2007) Biomechanical features of gait waveform data associated with knee osteoarthritis. An application of principal component anslysis. Gait & Posture 23. 86-93
Check out a more recent article using PCA on whole-body kinematics:
Eveleigh KJ, Deluzio KJ, Scott SH and Laende EK (2023) Principal component analysis of whole-body kinematics using markerless motion capture during static balance tasks. Journal of Biomechanics. [1]
The fundamentals of Principal Component Analysis for our implementations and the presented article are derived from the Research Methods in Biomechanics textbook.
