Inspect3D Tutorial: Perform Principal Component Analysis: Difference between revisions

From Software Product Documentation
Jump to navigation Jump to search
(Moved tutorial page to complete title.)
 
mNo edit summary
Line 10: Line 10:


==Data==
==Data==
This tutorial uses overground walking data from roughly 100 subjects. The data is divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in the download for Inspect3D in the demo folder.
This tutorial uses overground walking data from roughly 100 subjects divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in Demo folder of your Inspect3D installation.


==Load and query the CMO library==
==Load and query the CMO library==

Revision as of 18:28, 14 November 2022

Language:  English  • français • italiano • português • español 

This tutorial will show you how to use Inspect3D in order to perform Principal Component Analysis (PCA) using data from a CMO library. For a full treatment of waveform-based PCA to find differences in waveform data, see the explanation presented in Research Methods in Biomechanics.

For this tutorial, we will be comparing the the knee flexion angles between participants with osteoarthritis and the normal control group. Our problem is to provide an explanation for differences in knee flexion angles between osteoarthritic walking versus normal walking. We can accomplish this by defining two groups that meet these signal definitions, performing PCA, and interpreting the results.

Data

This tutorial uses overground walking data from roughly 100 subjects divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in Demo folder of your Inspect3D installation.

Load and query the CMO library

As with previous tutorials, we begin by loading the CMO library and defining the queries relevant to our question. In this case we will manually create two queries based on tags.

We begin by defining a query for subjects with the OA tag (indicating osteoarthritis).

1. Select Query Dialog

2. Add Query

3. Select Modify Name

4. Enter OA as the name of the Query

5. Select Save and Select Perform Query (Both with the same button). There is only one Condition in this tutorial, so the default names are fine.

6. Select Condition Zero

7. Select the Modify Button

8. Select the Signals Tab (There are no events in this data set)

9. Select the RKnee_Angle Signal (The only signal in this data set)

10. Select the Refinement Tab

11. Select OA (This is a TAG for subjects with Osteoarthritis)

12. Select SAVE

Next we will define a query for subjects with the NC tag (indicating Normal Control). In this case we can easily modify our previous query rather than starting from scratch.

1. Add A New Query

2. Select Modify Name

3. Rename to NC

4. Change the TAG from OA to NC

5. Select Save

6. Select Perform Query

7. Close the Query Dialog

There will now be two Query Groups in the main window's Groups widget. Selecting either of these will display the associated workspaces in the Workspace widget below.

Exploring the data

We can verify the queries that we produced in the first section by visualizing our traces.

1. Select All Groups

2. Select All Signals

3. Select Plot Subject Mean

4. Select Refresh Plot

Inspecting this plot shows that although the participants for the two groups walk very differently (when watching them in person) their knee flexion angles are quite similar. Because the traces overlap significantly between the groups throughout the entire gait cycle, conventional statistics will likely not be useful for describing the differences between these two groups.

This is one of the motivations behind PCA: by transforming our original data into a coordinate system based on principal components we will end up with a few dimensions that explain most of the variance in the data set. This, in turn, will help us to explain and detect the differences between the groups.

Running Principal Component Analysis

1. Select the desired groups and subjects in the Groups and Workspaces widgets

2. Open the PCA Options dropdown menu on the toolbar

3. Set Number PCs to 4

4. Click Run PCA

4.1 The main results from these calculations are the eigenvectors (loading vectors) for each principal component (which describe that shape of the variability), and the subject scores for each principal component (which describe the magnitude of the variability).

4.2 After clicking Run PCA on Selected Groups, the results should appear in the PCA Graphs on on the far right side of the screen.

4.3 If these are not visible then open the PCA Options dropdown menu and then select Show PCA Graphs.

Interpreting PCA results

The results from the PCA are described in 6 windows: Variance Explained, Loading Vector, Subject Scores, Group Scores, Extreme Plot, and PC Reconstruction. The views provided by each of these windows are described in full here.

Variance Explained

The Variance Explained window, which displays the variance explained by each principal component as well as the cumulative variance for each principal components. Hovering over each bar will display the exact variance explained by that principal component.

It's important to verify that the principal components that have been calculated explain a significant amount of the data set's variability. A good heuristic to use is that you want enough principal components to explain 95% of the data set's variety, otherwise there will be at least a moderate amount of variation that your analysis has not captured. In this example, our 4 principal components explain 96% of the data set's variability, which is sufficient and we can continue the exploration. If there less than 95% of the data set's variance was explained then we should re-run the analysis with more principal components.

Group Scores

Next, the Group Scores window helps us to determine if there is a significant differences between the groups for any principal component. This is established by looking at each principal component's mean score and their standard errors. If the score +/- the standard error does not cross the x-axis, then there are significant differences in these scores. For this example, this is the case for PC1 and PC2, which will therefore be the focus of the remaining investigation.

Subject Scores

Once we have identified our principal components of interest, switch to the Subject Scores window and plot the chosen principal components against one another (i.e., for this tutorial plot PC1 versus PC2). We are looking to be able to distinguish our groups and in this case there is a pretty clear separation between the groups in this 2D space. Although PC1 may describe more of the data set's variability than PC2, PC2 can discriminate between the groups better (this isn't mathematically shown here, but can be seen if the the visual inspection is followed by a cluster analysis). Of course, PC1 and PC2 can discriminate between the groups even better if they are used together instead of separately.

If you want to investigate the original data for a particular subject, click on the subject's scatter point and the corresponding data will be highlighted in the original Queried Data plot. Similarly, if you click on a trace in the original Queried Data plot then the corresponding subject is highlighted in the Subject Score widget. This can be useful to identify outliers.

Extreme Plot

The Extreme Plot widget allows us to animate the principal components to get some further insight into our problem.

Select the Mean and Slider option, then select PC1 and click play. This shows that the PC1 is capturing variability as an offset in the signal. Vertical offsets in gait data can occur for a variety of reasons, including differences in standing calibration posture. Therefore, PC1 may explain a significant amount of variability that is not very helpful for answering our question.

Take the same steps again but change the PC to 2 (top right drop down menu). This animation shows a different story. The differences here are the largest during late stance and mid swing. The OA group shows less range of motion compared to the normal control group. This observation makes a lot of sense considering differences in the two conditions.

Retrieved from ""