The Math of Principal Component Analysis (PCA): Difference between revisions

From Software Product Documentation
Jump to navigation Jump to search
No edit summary
No edit summary
Line 7: Line 7:
note: this page is under construction
note: this page is under construction


New technology allows researchers to generate large quantities of data, whether it be time-series data, or any other waveform data that we commonly see in biomechanics research. And what is common amongst all these forms of data is that they are highly dimensional. With new technologies like [[https://www.theiamarkerless.ca/|Theia Markerless]], we are collecting so much more data that trying to make meaningful comparisons is difficult to understand and to do computationally.  
New technology allows researchers to generate large quantities of data, whether it be time-series data, or any other waveform data that we commonly see in biomechanics research. And what is common amongst all these forms of data is that they are highly dimensional [1]. With new technologies like [[https://www.theiamarkerless.ca/|Theia Markerless]], we are collecting so much more data that trying to make meaningful comparisons is difficult to understand and to do computationally.  


In the research methods in biomechanics text, it is emphasized that the basic point of looking at methods of processing waveform data is that no matter the type of research questions that a biomechanist is trying to answer, transforming our data into smaller forms makes it more manageable and useful for analysis. Prior to being able to make meaningful comparisons, we want to reduce the amount of data and not lose any important discriminatory information that may be crucial to our comparisons.  
In the research methods in biomechanics text, it is emphasized that the basic point of looking at methods of processing waveform data is that no matter the type of research questions that a biomechanist is trying to answer, transforming our data into smaller forms makes it more manageable and useful for analysis. Prior to being able to make meaningful comparisons, we want to reduce the amount of data and not lose any important discriminatory information that may be crucial to our comparisons [1].  


Traditional approaches of extracting features from waveform data (max, min, range, etc.), mentioned in the text, result in a lose of so much temporal information in the waveform, leading to such a large reduction in data and discarded information. Additionally, varying definitions of these discrete parameters across the field lead to inconsistent conclusions, and these measures may not always be definitive in certain populations. One such example mentioned in the text found that knee adduction angles in patients with OA don’t have definitive peaks, and thus finding max and min angles is meaningless.
Traditional approaches of extracting features from waveform data (max, min, range, etc.) result in a lose of so much temporal information in the waveform, leading to such a large reduction in data and discarded information. Additionally, varying definitions of these discrete parameters across the field lead to inconsistent conclusions, and these measures may not always be definitive in certain populations [1]. One such example mentioned in the text found that knee adduction angles in patients with OA don’t have definitive peaks, and thus finding max and min angles is meaningless [2].


=PCA=
=PCA=
PCA is an orthogonal decomposition technique that computes and extracts a unique set of basis functions from the waveforms based on the variation that is present in the waveform data.
PCA is an orthogonal decomposition technique that computes and extracts a unique set of basis functions from the waveforms based on the variation that is present in the waveform data [1].


== Waveform Data ==
== Waveform Data ==
Waveform data can be represented in matrix form , where the rows are the time series waveforms for each subject, and the columns are particular points within the waveform (n = subjects, p = points).
Waveform data can be represented in matrix form , where the rows are the time series waveforms for each subject, and the columns are particular points within the waveform (n = subjects, p = points) [1].


[[image:X_matrix.png]]
[[image:X_matrix.png]]


== Covariance Matrix ==
== Covariance Matrix ==
To define variation in the waveform matrix, X, based on how the data changes over time, and between subjects (n) at each point in time (p), we look at the covariance matrix, S. The diagonal elements of the covariance matrix are the variation at each instance in the data. The off-diagonal elements represent the variation in each pair of instances. If the covariance values are non-zero, this means there is a linear relationship between the variables. If the off-diagonal values of S are non-zero, this means that the columns of the original data X are correlated. However, principal components are orthogonal, independent features of the original data, and thus we look for principal components that have a covariance matrix with off-diagonal components equal to zero.  
To define variation in the waveform matrix, X, based on how the data changes over time, and between subjects (n) at each point in time (p), we look at the covariance matrix, S. The diagonal elements of the covariance matrix are the variation at each instance in the data. The off-diagonal elements represent the variation in each pair of instances. If the covariance values are non-zero, this means there is a linear relationship between the variables. If the off-diagonal values of S are non-zero, this means that the columns of the original data X are correlated. However, principal components are orthogonal, independent features of the original data, and thus we look for principal components that have a covariance matrix with off-diagonal components equal to zero [1].  


[[image:S_Matrix.png]]
[[image:S_Matrix.png]]


To transform the original data set covariance matrix, S, into a principal component covariance matrix, D, diagonalization is used, where U is an orthogonal transformation matrix. U transforms the data into the new coordinate system, which are the coordinates of the principal components.
To transform the original data set covariance matrix, S, into a principal component covariance matrix, D, diagonalization is used, where U is an orthogonal transformation matrix. U transforms the data into the new coordinate system, which are the coordinates of the principal components [1].
  U'SU = D
  U'SU = D


Line 36: Line 36:
[1] Robertson G., Caldwell G., Hamill J., Kamen G., Whittlesey S., Research Methods in Biomechanics Textbook, 2nd edition ([https://us.humankinetics.com/products/research-methods-in-biomechanics-2nd-edition])
[1] Robertson G., Caldwell G., Hamill J., Kamen G., Whittlesey S., Research Methods in Biomechanics Textbook, 2nd edition ([https://us.humankinetics.com/products/research-methods-in-biomechanics-2nd-edition])


[1] Deluzio KJ and Astephen JL (2007) Biomechanical features of gait waveform data associated with knee osteoarthritis. An application of principal component anslysis. Gait & Posture 23. 86-93 ([http://m.me.queensu.ca/People/Deluzio/files/PublishedArticle.pdf pdf])
[2] Deluzio KJ and Astephen JL (2007) Biomechanical features of gait waveform data associated with knee osteoarthritis. An application of principal component anslysis. Gait & Posture 23. 86-93 ([http://m.me.queensu.ca/People/Deluzio/files/PublishedArticle.pdf pdf])

Revision as of 12:09, 10 October 2023

Language:  English  • français • italiano • português • español 

note: this page is under construction

New technology allows researchers to generate large quantities of data, whether it be time-series data, or any other waveform data that we commonly see in biomechanics research. And what is common amongst all these forms of data is that they are highly dimensional [1]. With new technologies like [Markerless], we are collecting so much more data that trying to make meaningful comparisons is difficult to understand and to do computationally.

In the research methods in biomechanics text, it is emphasized that the basic point of looking at methods of processing waveform data is that no matter the type of research questions that a biomechanist is trying to answer, transforming our data into smaller forms makes it more manageable and useful for analysis. Prior to being able to make meaningful comparisons, we want to reduce the amount of data and not lose any important discriminatory information that may be crucial to our comparisons [1].

Traditional approaches of extracting features from waveform data (max, min, range, etc.) result in a lose of so much temporal information in the waveform, leading to such a large reduction in data and discarded information. Additionally, varying definitions of these discrete parameters across the field lead to inconsistent conclusions, and these measures may not always be definitive in certain populations [1]. One such example mentioned in the text found that knee adduction angles in patients with OA don’t have definitive peaks, and thus finding max and min angles is meaningless [2].

PCA

PCA is an orthogonal decomposition technique that computes and extracts a unique set of basis functions from the waveforms based on the variation that is present in the waveform data [1].

Waveform Data

Waveform data can be represented in matrix form , where the rows are the time series waveforms for each subject, and the columns are particular points within the waveform (n = subjects, p = points) [1].

Covariance Matrix

To define variation in the waveform matrix, X, based on how the data changes over time, and between subjects (n) at each point in time (p), we look at the covariance matrix, S. The diagonal elements of the covariance matrix are the variation at each instance in the data. The off-diagonal elements represent the variation in each pair of instances. If the covariance values are non-zero, this means there is a linear relationship between the variables. If the off-diagonal values of S are non-zero, this means that the columns of the original data X are correlated. However, principal components are orthogonal, independent features of the original data, and thus we look for principal components that have a covariance matrix with off-diagonal components equal to zero [1].

To transform the original data set covariance matrix, S, into a principal component covariance matrix, D, diagonalization is used, where U is an orthogonal transformation matrix. U transforms the data into the new coordinate system, which are the coordinates of the principal components [1].

U'SU = D

Loading Vectors

Resources

[1] Robertson G., Caldwell G., Hamill J., Kamen G., Whittlesey S., Research Methods in Biomechanics Textbook, 2nd edition ([1])

[2] Deluzio KJ and Astephen JL (2007) Biomechanical features of gait waveform data associated with knee osteoarthritis. An application of principal component anslysis. Gait & Posture 23. 86-93 (pdf)

Retrieved from ""