Inspect3D Tutorial: Perform Principal Component Analysis: Difference between revisions
m (Added Tutorials category.) |
(Updated tutorial for current version of Inspect3D.) |
||
Line 5: | Line 5: | ||
|} | |} | ||
This tutorial will show you how to use Inspect3D in order to perform Principal Component Analysis (PCA) using data from a CMO library. For a full treatment of waveform-based PCA to find differences in waveform data, see the explanation presented in [http://www2.c-motion.com/support/overview/textbook Research Methods in Biomechanics]. | This tutorial will show you how to use Inspect3D in order to perform [[Principal Component Analysis]] (PCA) using data from a [[CMO Library|CMO library]]. For a full treatment of waveform-based PCA to find differences in waveform data, see the explanation presented in [http://www2.c-motion.com/support/overview/textbook Research Methods in Biomechanics]. | ||
For this tutorial, we will be comparing the the knee flexion angles between participants with osteoarthritis and the normal control group. Our problem is to provide an explanation for differences in knee flexion angles between osteoarthritic walking versus normal walking. We can accomplish this by defining two groups that meet these signal definitions, performing PCA, and interpreting the results. | For this tutorial, we will be comparing the the knee flexion angles between participants with osteoarthritis and the normal control group. Our problem is to provide an explanation for differences in knee flexion angles between osteoarthritic walking versus normal walking. We can accomplish this by defining two groups that meet these signal definitions, performing PCA, and interpreting the results. | ||
==Data== | ==Data== | ||
This tutorial uses overground walking data from roughly 100 subjects divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in Demo folder of your Inspect3D installation. | This tutorial uses overground walking data from roughly 100 subjects divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in Demo folder of your Inspect3D installation (e.g., C:\Program Files\C-Motion\Inspect3D\Demo). | ||
== | ==Set the library path to the data directory== | ||
As with previous tutorials, we begin by loading the CMO library and defining the queries relevant to our question | As with previous tutorials, we begin by loading the CMO library and defining the queries relevant to our question. | ||
1. Click [[Image:I3DLoadLibrary.png|20px]] <b>Load Library</b> in the [[Inspect3D_Documentation_ToolBar|toolbar]] to open the Load Library dialog. | |||
2. Click <b>Browse</b> and select the folder where the data is stored. (Choose .cmo file types when browsing to see the sample data files.) | |||
3. Click <b>Load</b> button to import the data. | |||
==Define queries and group signals== | |||
For this tutorial we will manually create two groups based on tags, one for subjects with osteoarthritis and one for normal control subjects. We begin by defining a query for subjects with the OA tag (indicating osteoarthritis). | |||
1. Click on the [[Image:I3DGroups.png|20px]] <b>Group Definitions</b> icon on the [[Inspect3D_Documentation_ToolBar|toolbar]] and then click on [[Image:I3DGroups.png|20px]] <b>Group Definitions</b> to open the Group Definitions Dialog. | |||
2. [[Image:ActionAdd48x48.png|20px]] Add a group, name it OA, and click <b>Save</b>. | |||
3. [[Image:ActionAdd48x48.png|20px]] Add a sub-group and name it OA. | |||
3.1. <b>Signals</b>: Set TYPE - DERIVED, FOLDER - PCA, NAME - RKNEE_ANGLE, COMPONENT - X. This is the only signal in the data set. | |||
3.2. <b>Events</b>: There are no events in this data set, so this tab can be skipped. | |||
3.3. <b>Refinements</b>: Check the <b>Refine using tag</b> checkbox and select the OA tag. | |||
3.4. Click <b>Save</b> | |||
Next we will define a query for subjects with the NC tag (indicating Normal Control). In this case we can easily modify our previous query rather than starting from scratch. | Next we will define a query for subjects with the NC tag (indicating Normal Control). In this case we can easily modify our previous query rather than starting from scratch. | ||
1. Add | 1. [[Image:ActionAdd48x48.png|20px]] Add a group, name it NC, and click <b>Save</b>. | ||
2. | 2. [[Image:ActionAdd48x48.png|20px]] Add a sub-group and name it OA. | ||
3. | 3. In the <b>Refinements</b> tab, change the selected tag from OA to NC. | ||
4. | 4. Click <b>Save</b>. | ||
You can verify here that the new NC group has the same signal and event selections as the OA group. Click <b>Calculate All Groups</b> and then close the Group Definitions dialog. | |||
==Exploring the data== | |||
There will now be two | There will now be two Groups in the main window's <b>Groups</b> list. Selecting either of these will display the associated workspaces in the <b>Workspace</b> list below. We can verify the queries that we produced in the first section by visualizing our traces. | ||
1. Select all groups and all workspaces. | |||
2. Check only the <b>Plot Workspace Mean</b> checkbox. | |||
3. Click <b>Refresh Plot</b>. | |||
The plot that is produced will not be very informative if the traces are not coloured by group, which is the comparison we are interested in. If this is the case, open the [[Image:I3DShowOptions|20px]] <b>Show Options</b> dialog from the application tool bar and under <b>Plotting options</b> set the <b>Plot style</b> to "Group". | |||
[[Image:OA NC ByWorkspace.PNG|800px]] | |||
Once this is done, inspecting the plot shows that although participants from the two groups walk very differently, obvious when watching them in-person, their knee flexion angles are quite similar. Because the traces overlap significantly between the groups throughout the entire gait cycle, conventional statistics will likely not be useful for describing the differences between these two groups. | |||
[[Image: | [[Image:OA NC ByGroup.PNG|800px]] | ||
This is one of the motivations behind PCA: by transforming our original data into a coordinate system based on principal components we will end up with a few dimensions that explain most of the variance in the data set. This, in turn, will help us to explain and detect the differences between the groups. | This is one of the motivations behind PCA: by transforming our original data into a coordinate system based on principal components we will end up with a few dimensions that explain most of the variance in the data set. This, in turn, will help us to explain and detect the differences between the groups. | ||
Line 80: | Line 73: | ||
==Running Principal Component Analysis== | ==Running Principal Component Analysis== | ||
<div style="overflow: hidden"> | <div style="overflow: hidden"> | ||
1. | 1. Ensure that all groups and workspaces are selected in the Groups and Workspaces lists. | ||
2. Open the [[Image:I3D_PCAOptions2.png|30px]] <b>PCA Options</b> dropdown menu on the [[Inspect3D_Documentation_ToolBar|toolbar]]. | |||
3. Set <b>Number PCs</b> to 4. | |||
4. | 4. Ensure that <b>Use Workspace Mean</b> is checked. | ||
5. Click [[Image:I3D_RunPCA.png|30px]] <b>Run PCA</b>. | |||
[[Image: | 6. The results of these calculations will automatically populate the PCA graphs. If these aren't already displayed, click [[Image:I3D_PCAShowGraphs.png|20px]] <b>Show PCA Graphs</b> in the [[Image:I3D_PCAOptions2.png|30px]] <b>PCA Options</b> dropdown menu. | ||
[[Image:OA NC PCAResults.png]] | |||
</div> | </div> | ||
Line 104: | Line 95: | ||
===Variance Explained=== | ===Variance Explained=== | ||
<div style="overflow: hidden"> | <div style="overflow: hidden"> | ||
The Variance Explained window, which displays the variance explained by each principal component as well as the cumulative variance for each principal components. Hovering over each bar will display the exact variance explained by that principal component. | The Variance Explained window, which displays the variance explained by each principal component as well as the cumulative variance for each principal components. It is important to verify that the calculated principal components do explain a significant amount of the data set's variability. A good heuristic to use is that you want enough principal components to explain 95% of the data set's variety, otherwise there will be at least a moderate amount of variation that your analysis has not captured. In this example, our 4 principal components explain 96% of the data set's variability, which is sufficient and we can continue the exploration. If there less than 95% of the data set's variance was explained then we should re-run the analysis with more principal components. | ||
[[Image:OA NC VarianceExplained.png]] | |||
Hovering over each bar will display the exact variance explained by that principal component. In this example, the second principal component explains 17.6% of the variation in the original data set. | |||
</div> | </div> | ||
===Group Scores=== | ===Group Scores=== | ||
<div style="overflow: hidden"> | <div style="overflow: hidden"> | ||
Next, the Group Scores window helps us to determine if there is a significant differences between the groups for any principal component. This is established by looking at each principal component's mean score and their standard errors | Next, the Group Scores window helps us to determine if there is a significant differences between the groups for any individual principal component. This is established by looking at each principal component's mean score and their standard errors, where if the mean scores for a principal component +/- their standard errors do not cross the x-axis then there are significant differences in the scores for this principal component. In this tutorial principal components 1 and 2 show significant differences for the groups and will therefore be the focus of the remaining investigation. | ||
[[Image:OA NC GroupScores.png]] | |||
As with the Variance Explained window, hovering over a point will display a tooltip with precise values for that group score. | |||
</div> | </div> | ||
===Subject Scores=== | ===Subject Scores=== | ||
<div style="overflow: hidden"> | <div style="overflow: hidden"> | ||
[[Image: | [[Image:OA NC SubjectScores.png|right|450px]] Having identified our principal components of interest, switch to the Subject Scores window and plot the chosen principal components against one another (i.e., for this tutorial, plot PC1 versus PC2). We are looking to be able to distinguish our groups and in this case there is a pretty clear separation between the groups in this 2D space. Although PC1 may describe more of the data set's variability than PC2, PC2 can discriminate between the groups better (this isn't mathematically shown here, but can be seen if the the visual inspection is followed by a cluster analysis). Of course, PC1 and PC2 can discriminate between the groups even better if they are used together instead of separately. | ||
</div> | </div> | ||
Line 125: | Line 120: | ||
If you want to investigate the original data for a particular subject, click on the subject's scatter point and the corresponding data will be highlighted in the original Queried Data plot. Similarly, if you click on a trace in the original Queried Data plot then the corresponding subject is highlighted in the Subject Score widget. This can be useful to identify outliers. | If you want to investigate the original data for a particular subject, click on the subject's scatter point and the corresponding data will be highlighted in the original Queried Data plot. Similarly, if you click on a trace in the original Queried Data plot then the corresponding subject is highlighted in the Subject Score widget. This can be useful to identify outliers. | ||
[[Image: | [[Image:OA NC SubjectScores Highlighted.png]] | ||
</div> | </div> | ||
Line 133: | Line 128: | ||
The Extreme Plot widget allows us to animate the principal components to get some further insight into our problem. | The Extreme Plot widget allows us to animate the principal components to get some further insight into our problem. | ||
For the <b>Plot Type</b> drop-down menu, select the Mean and Slider option. Leave the <b>PC Number</b> set to 1 and then click <b>Play</b>. The mean reconstruction (gray in the figure below) will remain static while the other reconstruction (green in the figure below) will vary in the range Mean +/- 2 Standard Deviations. Watching this animation demonstrates that PC1 is capturing variability as an offset in the signal. Vertical offsets in gait data can occur for a variety of reasons, including differences in standing calibration posture. Therefore, PC1 may explain a significant amount of variability that is not very helpful for answering our question. | |||
[[Image:OA NC ExtremeValues.png]] | |||
Now select <b>PC Number</b> 2 and click <b>Play</b> again. This PC shows a different story, with the largest differences in this animation occurring during late stance and mid-swing. | |||
</div> | </div> | ||
[[Category:Inspect3D]] | [[Category:Inspect3D]] | ||
[[Category:Tutorials]] | [[Category:Tutorials]] |
Revision as of 20:48, 16 November 2022
Language: | English • français • italiano • português • español |
---|
This tutorial will show you how to use Inspect3D in order to perform Principal Component Analysis (PCA) using data from a CMO library. For a full treatment of waveform-based PCA to find differences in waveform data, see the explanation presented in Research Methods in Biomechanics.
For this tutorial, we will be comparing the the knee flexion angles between participants with osteoarthritis and the normal control group. Our problem is to provide an explanation for differences in knee flexion angles between osteoarthritic walking versus normal walking. We can accomplish this by defining two groups that meet these signal definitions, performing PCA, and interpreting the results.
Data
This tutorial uses overground walking data from roughly 100 subjects divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in Demo folder of your Inspect3D installation (e.g., C:\Program Files\C-Motion\Inspect3D\Demo).
Set the library path to the data directory
As with previous tutorials, we begin by loading the CMO library and defining the queries relevant to our question.
1. Click Load Library in the toolbar to open the Load Library dialog.
2. Click Browse and select the folder where the data is stored. (Choose .cmo file types when browsing to see the sample data files.)
3. Click Load button to import the data.
Define queries and group signals
For this tutorial we will manually create two groups based on tags, one for subjects with osteoarthritis and one for normal control subjects. We begin by defining a query for subjects with the OA tag (indicating osteoarthritis).
1. Click on the Group Definitions icon on the toolbar and then click on Group Definitions to open the Group Definitions Dialog.
2. Add a group, name it OA, and click Save.
3. Add a sub-group and name it OA.
3.1. Signals: Set TYPE - DERIVED, FOLDER - PCA, NAME - RKNEE_ANGLE, COMPONENT - X. This is the only signal in the data set.
3.2. Events: There are no events in this data set, so this tab can be skipped.
3.3. Refinements: Check the Refine using tag checkbox and select the OA tag.
3.4. Click Save
Next we will define a query for subjects with the NC tag (indicating Normal Control). In this case we can easily modify our previous query rather than starting from scratch.
1. Add a group, name it NC, and click Save.
2. Add a sub-group and name it OA.
3. In the Refinements tab, change the selected tag from OA to NC.
4. Click Save.
You can verify here that the new NC group has the same signal and event selections as the OA group. Click Calculate All Groups and then close the Group Definitions dialog.
Exploring the data
There will now be two Groups in the main window's Groups list. Selecting either of these will display the associated workspaces in the Workspace list below. We can verify the queries that we produced in the first section by visualizing our traces.
1. Select all groups and all workspaces.
2. Check only the Plot Workspace Mean checkbox.
3. Click Refresh Plot.
The plot that is produced will not be very informative if the traces are not coloured by group, which is the comparison we are interested in. If this is the case, open the File:I3DShowOptions Show Options dialog from the application tool bar and under Plotting options set the Plot style to "Group".
Once this is done, inspecting the plot shows that although participants from the two groups walk very differently, obvious when watching them in-person, their knee flexion angles are quite similar. Because the traces overlap significantly between the groups throughout the entire gait cycle, conventional statistics will likely not be useful for describing the differences between these two groups.
This is one of the motivations behind PCA: by transforming our original data into a coordinate system based on principal components we will end up with a few dimensions that explain most of the variance in the data set. This, in turn, will help us to explain and detect the differences between the groups.
Running Principal Component Analysis
Interpreting PCA results
The results from the PCA are described in 6 windows: Variance Explained, Loading Vector, Subject Scores, Group Scores, Extreme Plot, and PC Reconstruction. The views provided by each of these windows are described in full here.