Inspect3D Tutorial 3: Difference between revisions

From Software Product Documentation
Jump to navigation Jump to search
No edit summary
(Edited for formatting and to remove duplicated content from elsewhere in the documentation.)
Line 5: Line 5:
|}
|}


This tutorial will show you how to use Inspect3D in order to perform Principal Component Analysis (PCA) using data from a CMO library. For a full treatment of waveform-based PCA to find differences in waveform data, see the explanation presented in [http://www2.c-motion.com/support/overview/textbook Research Methods in Biomechanics].


=Purpose=
For this tutorial, we will be comparing the the knee flexion angles between participants with osteoarthritis and the normal control group. Our problem is to provide an explanation for differences in knee flexion angles between osteoarthritic walking versus normal walking. We can accomplish this by defining two groups that meet these signal definitions, performing PCA, and interpreting the results.


<big>
==Data==
'''Overview:''' The purpose of this tutorial is to demonstrate how to use Inspect3D in order to perform Principal Component Analysis (PCA) using data from a CMO library. For more information on how to use PCA to find differences in waveform data, and why it is a powerful statistical tool, see the explanation presented in [http://www2.c-motion.com/support/overview/textbook Research Methods in Biomechanics]. The implementation of PCA in Inspect3D is exploratory in nature. After this exploration is complete, the data can be exported and other statistical methods used.  
This tutorial uses overground walking data from roughly 100 subjects. The data is divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in the download for Inspect3D in the demo folder.


'''Data:''' This tutorial uses overground walking data from roughly 100 subjects. The data is divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in the download for Inspect3D in the demo folder.
==Load and query the CMO library==
</big>


=Steps=
As with previous tutorials, we begin by loading the CMO library and defining the queries relevant to our question. In this case we will manually create two queries based on tags.
==Query the Library==


The library must be loaded and the groups to be compared must be defined.  
We begin by defining a query for subjects with the OA tag (indicating osteoarthritis).


[[File:Compare1.jpg]]
1. Select Query Dialog
:Select Query Dialog


[[File:Compare2.jpg|600px]]
2. Add Query
:Add Query


[[File:Compare3.jpg]]
3. Select Modify Name
:Select Modify Name


[[File:Compare4.jpg]]
4. Enter OA as the name of the Query
:Enter OA as the name of the Query


[[File:Compare5.jpg|600px]]
5. Select Save and Select Perform Query (Both with the same button). There is only one Condition in this tutorial, so the default names are fine.
:Select Save and Select Perform Query (Both with the same button)
:There is only one Condition in this tutorial, so the default names are fine.


[[File:Compare6.jpg|600px]]
6. Select Condition Zero
:Select Condition Zero
:Select the Modify Button
:Select the Signals Tab (There are no events in this data set)
:Select the RKnee_Angle Signal (The only signal in this data set)


[[File:Compare7.jpg|600px]]
7. Select the Modify Button
:Select the Refinement Tab
:Select OA (This is a TAG for subjects with Osteoarthritis)
:Select SAVE


[[File:Compare8.jpg|600px]]
8. Select the Signals Tab (There are no events in this data set)
:Add A New Query
:Select Modify Name
:Rename to NC
:Change the TAG from OA to NC
:Select Save
:Select Perform Query


[[File:Compare9.jpg]]
9. Select the RKnee_Angle Signal (The only signal in this data set)
:Close the Query Dialog and there should be Two Query Groups in the main window


[[File:Compare10.jpg|600px]]
10. Select the Refinement Tab
:Select All Groups
:Select All Signals
:Select Plot Subject Mean
:Select Refresh Plot


==Defining Data==
11. Select OA (This is a TAG for subjects with Osteoarthritis)


<big>
12. Select SAVE
'''1.''' The library must be loaded and the groups to be compared must be defined.  


:A thorough explanation of loading a library and defining data can be found in [[Inspect3D_Tutorial_1|Tutorial 1]]. It may be useful to read all of [[Inspect3D_Tutorial_1|Tutorial 1]] in order to understand how data is loaded and how to verify the quality of the data set. Before exporting data, it is essential that the veracity of the data is verified.  
Next we will define a query for subjects with the NC tag (indicating Normal Control). In this case we can easily modify our previous query rather than starting from scratch.


:For this tutorial, we will be comparing the the knee flexion angles between participants with osteoarthritis and the normal control group. This is done by defining two groups that meet these signal definitions and plot the results.  
1. Add A New Query


:This plot shows that although the participants for the two groups walk very differently (when watching them in person) their knee flexion angles are quite similar. Because there is a significant amount of overlap between the groups throughout the entire gait cycle, conventional statistics are challenging to use. PCA can be used to detect the differences between the groups.
2. Select Modify Name
</big>


[[Image:PCA_OA_and_NC_Plotted3.png]]
3. Rename to NC


==Running the Principal Component Analysis==
4. Change the TAG from OA to NC
<table>
<tr>
<td>


<big>
5. Select Save
[[Image:I3D_PCAOptions2.png|left]] '''1.''' To compute the principal components use the PCA button on the tool bar. Click the Show PCA Graph so that the results of the PCA will be visible.  


'''2.''' Select the desired groups and subjects under Query Group and Group Subjects.
6. Select Perform Query


'''3.''' On the Number PC's drop down menu select 4.
7. Close the Query Dialog


'''4. ''' Click Run PCA.  
There will now be two Query Groups in the main window's Groups widget. Selecting either of these will display the associated workspaces in the Workspace widget below.


:The main results from these calculations are the eigenvectors (loading vectors) for each principal component (which describe that shape of the variability), and the subject scores for each principal component (which describe the magnitude of the variability).
==Exploring the data==


:After clicking Run PCA on Selected Groups, the results should appear on the far right side of the screen. Before continuing, open the Variance Explained tab. For this data set and these signals the four PC's describe 96% of the variability in the data. The cumulative variance explained should be over 95%. If it is any lower then more PC's should be selected and the analysis should be re-performed. This will ensure that a significant amount of variability is captured in the analysis.
We can verify the queries that we produced in the first section by visualizing our traces.
</big>


</td>
1. Select All Groups
<td>


[[Image:Figure2_varianceExplained_2.png]]
2. Select All Signals


</td>
3. Select Plot Subject Mean
</tr>
</table>


==Results Summary==
4. Select Refresh Plot


<big>
Inspecting this plot shows that although the participants for the two groups walk very differently (when watching them in person) their knee flexion angles are quite similar. Because the traces overlap significantly between the groups throughout the entire gait cycle, conventional statistics will likely not be useful for describing the differences between these two groups.
The results from the PCA are described in 6 windows: Variance Explained, Loading Vector, Subject Scores, Group Scores, Extreme Plot, and PC Reconstruction. This section provides a brief overview of the features present within each section.
</big>


===Variance Explained===
[[Image:PCA_OA_and_NC_Plotted3.png]]
<table>
<tr>
<td>


<big>
This is one of the motivations behind PCA: by transforming our original data into a coordinate system based on principal components we will end up with a few dimensions that explain most of the variance in the data set. This, in turn, will help us to explain and detect the differences between the groups.  
This window shows the variance explained by each principal component as well as the cumulative variance for successive principal components. When using this window it is important that the cumulative variance described by the PC's is high (normally 95% or more). If it is low (such as 60%), then there are other components that describe a large amount of variability that are not included in the analysis.  


In this window, you can view the variance explained by each PC by hovering over the bar.
==Running Principal Component Analysis==
</big>
<div style="overflow: hidden">
1. Select the desired groups and subjects in the Groups and Workspaces widgets


</td>
2. Open the [[Image:I3D_PCAOptions2.png|30px]] <b>PCA Options</b> dropdown menu on the [[Inspect3D_Documentation_ToolBar|toolbar]]
<td>


[[Image:Figure3_VarianceExplainedHover_2.png]]
3. Set <b>Number PCs</b> to 4


</td>
4. Click [[Image:I3D_RunPCA.png|30px]] <b>Run PCA</b>
</tr>
</table>


===Loading Vector===
4.1 The main results from these calculations are the eigenvectors (loading vectors) for each principal component (which describe that shape of the variability), and the subject scores for each principal component (which describe the magnitude of the variability).
<table>
<tr>
<td>


<big>
4.2 After clicking Run PCA on Selected Groups, the results should appear in the PCA Graphs on on the far right side of the screen.
This window shows the shape of the variance for each PC, at each point during the plotted cycle (also called the eigen vector). You can change which PC to view by using the Combo Box in the top right of the window.


When the curve values are higher, it indicates the there are larger differences between the groups for that PC. When the value is zero, it indicates that for that PC, there is no difference between the groups.
4.3 If these are not visible then open the [[Image:I3D_PCAOptions2.png|30px]] <b>PCA Options</b> dropdown menu and then select [[Image:I3DShowGraphs2.png|30px]] <b>Show PCA Graphs</b>.
</big>


</td>
[[Image:Figure2_varianceExplained_2.png]]
<td>
</div>


[[Image:Figure6-LoadingVector_2.png]]
==Interpreting PCA results==


</td>
The results from the PCA are described in 6 windows: Variance Explained, Loading Vector, Subject Scores, Group Scores, Extreme Plot, and PC Reconstruction. The views provided by each of these windows are described in full [[Inspect3D_Graphs_Section|here]].
</tr>
</table>


===Subject Scores===
===Variance Explained===
<table>
<div style="overflow: hidden">
<tr>
The Variance Explained window, which displays the variance explained by each principal component as well as the cumulative variance for each principal components. Hovering over each bar will display the exact variance explained by that principal component.
<td>
[[Image:Figure3_VarianceExplainedHover_2.png]]


<big>
It's important to verify that the principal components that have been calculated explain a significant amount of the data set's variability. A good heuristic to use is that you want enough principal components to explain 95% of the data set's variety, otherwise there will be at least a moderate amount of variation that your analysis has not captured. In this example, our 4 principal components explain 96% of the data set's variability, which is sufficient and we can continue the exploration. If there less than 95% of the data set's variance was explained then we should re-run the analysis with more principal components.
The subject score for a PC is an indication of the magnitude of the variability for a given subject. Larger scores mean that that subject is further away from the mean PC score, where as a score of zero would indicate that there is no difference between the mean PC score and the subject score.  
</div>


When using this window, you can choose which PC's to plot on the x and y axis by selecting different numbers in the drop down boxes labeled as First Axis and Second Axis. Normalized scores can be used instead of raw data by selecting the Normalize Scores checkbox.  
===Group Scores===
<div style="overflow: hidden">
Next, the Group Scores window helps us to determine if there is a significant differences between the groups for any principal component. This is established by looking at each principal component's mean score and their standard errors. If the score +/- the standard error does not cross the x-axis, then there are significant differences in these scores. For this example, this is the case for PC1 and PC2, which will therefore be the focus of the remaining investigation.


If you want to view the original data from a particular subject score, click on the scatter point, and the data should be highlighted in the original plot. Similarly, if you click on original data, the corresponding subject is also highlighted in the subject score widget. This can be useful to identify outliers.
[[Image:Figure7-GroupScores_2.png]]
</big>
</div>


</td>
===Subject Scores===
<td>
<div style="overflow: hidden">
[[Image:Fi9-clusters.png|right|450px]] Once we have identified our principal components of interest, switch to the Subject Scores window and plot the chosen principal components against one another (i.e., for this tutorial plot PC1 versus PC2). We are looking to be able to distinguish our groups and in this case there is a pretty clear separation between the groups in this 2D space. Although PC1 may describe more of the data set's variability than PC2, PC2 can discriminate between the groups better (this isn't mathematically shown here, but can be seen if the the visual inspection is followed by a cluster analysis). Of course, PC1 and PC2 can discriminate between the groups even better if they are used together instead of separately.
</div>


[[Image:Figure4-SubjectScores_3.png]]
<div style="overflow: hidden">
 
If you want to investigate the original data for a particular subject, click on the subject's scatter point and the corresponding data will be highlighted in the original Queried Data plot. Similarly, if you click on a trace in the original Queried Data plot then the corresponding subject is highlighted in the Subject Score widget. This can be useful to identify outliers.
</td>
</tr>
</table>


[[Image:Figure5-SubjectScores_2.png]]
[[Image:Figure5-SubjectScores_2.png]]


===Group Scores===
</div>
<table>
<tr>
<td>
 
<big>
The Group Scores window shows similar data to the subject scores window, but instead of showing a scatter plot, with scores for each subject, the group scores shows the mean scores for each group and each PC, along with error bars showing the standard error. Hovering over the scatter point shows the data details.
 
If the error bars do not cross the x-axis then there are significant differences between the groups for that PC. As a result, these PC's may be useful to investigate further.
</big>
 
</td>
<td>
 
[[Image:Figure7-GroupScores_2.png]]
 
</td>
</tr>
</table>


===Extreme Plot===
===Extreme Plot===
<table>
<div style="overflow: hidden">
<tr>
The Extreme Plot widget allows us to animate the principal components to get some further insight into our problem.
<td>
 
<big>
The extreme plot window is used to explore the shape and magnitude of the principal components. This is very useful to show what the principal components look like compared to the original data. In this example we are looking at osteoarthritis versus normal control. This section could be used to visualize what groups would look like if they were constructed using principal components. Here there are several plotting options, under the drop down box called Plot Type.
 
'''Mean +/- 1 SD Reconstruction:''' Shows the mean signal, as well as the reconstructed waveforms at +/- 1 standard deviation for that PC. The reconstructed wave forms are computed by multiplying the Loading Vector by the mean score for that PC.
 
'''Mean +/- 5th and 95th percentile:''' Shows the mean signals, as well as the 5th and 95th percentiles of the reconstructed signal for that PC.
 
'''Mean +/- 5th and 95th percentile raw data:''' Shows the mean as well as the 5th and 95th percentiles of the raw data for the PC.
 
'''Mean +/- custom percent reconstruction:''' Plots the mean as well as reconstructed data for the inputted percentiles.
 
'''Mean +/- Slider:''' This option animates each PC from minus 2 Standard deviations to plus 2 standard deviations, as well as shows the mean curve. The animation starts when the user clicks Play in the bottom right corner.
</big>
 
</td>
<td>
 
[[Image:PCA_ExtremePlot_1.png]]
 
</td>
</tr>
</table>
 
===PC Reconstruction===
<table>
<tr>
<td>
 
<big>
The PC Reconstruction window is used to view what the data would look like if it were created using the principal components alone. Here, you can choose which PCs to use in the reconstruction. For instance, if you decide that PC 1 captures a lot of variability but is related to a systematic data collection issue you can view the reconstructed signal by using the remaining principal components and omit PC 1.
 
There are several reconstruction options available to the user. The user can choose different PCs as well as multi-select PCs to use in the reconstruction. The user can also specify which groups and subjects. The plot updates as selections are made.
</big>
 
</td>
<td>
 
[[Image:Figure8-Reconstruction_2.png]]
 
</td>
</tr>
</table>
 
==Practical Use==
 
<big>
This section explains how to use Inspect3D to find discriminating differences in this data set.
 
After running PCA, the variance explained window should be examined. It's important to check this window to make sure that a significant amount of variability is captured in the analysis. In this example, 96 % of the variability is captured using 4 PCs, which is sufficient and we can continue the exploration. If there is less than 95% of the variance explained more PCs should be added.
 
The second step is the look at the group scores and determine if there is a significant differences between the groups for any PC. This is established by looking at the score and their standard errors. If the score +/- the standard error does not cross the x-axis, then there are significant differences in these scores. For this example, this is the case for PC1 and PC2, which will therefore be the focus of the remaining investigation.
 
[[Image:Fi9-clusters.png|right|450px]]
 
With the PCs of interest identified, switch to the subject scores window and plot the PCs against one another. In this case that means plotting PC1 versus PC2. What we are looking for are clusters. In this case there is a pretty clear clusters between the groups for PC2. Although PC1 may describe a significant portion or the variability, PC2 can discriminate between the groups better (this isn't mathematically shown here, but can be seen if the the visual inspection is followed by a cluster analysis).
 
With the PCs of interest identified, we can go to the extreme plot widget and animate the PC's to get some further insight.


In the extreme plot window, select the Mean and Slider option, select the first PC and click play. This shows that the PC1 is capturing variability as an offset in the signal. Vertical offsets in gait data can occur for a variety of reasons, including differences in standing calibration posture. Therefore, PC1 may explain a significant amount of variability, but is not very helpful when providing an explanation for these findings with respect to differences in the groups. The question would be: can we provide an explanation for differences in this offset that relates to differences between osteoarthritic walking versus normal walking? Which would prove to be challenging to answer.
Select the Mean and Slider option, then select PC1 and click play. This shows that the PC1 is capturing variability as an offset in the signal. Vertical offsets in gait data can occur for a variety of reasons, including differences in standing calibration posture. Therefore, PC1 may explain a significant amount of variability that is not very helpful for answering our question.


Take the same steps again but change the PC to 2 (top right drop down menu). This animation shows a different story. The differences here are the largest during late stance and mid swing. The OA group shows less range of motion compared to the normal control group. This observation makes a lot of sense considering differences in the two conditions.
Take the same steps again but change the PC to 2 (top right drop down menu). This animation shows a different story. The differences here are the largest during late stance and mid swing. The OA group shows less range of motion compared to the normal control group. This observation makes a lot of sense considering differences in the two conditions.
</big>
</div>


[[Category:Inspect3D]]
[[Category:Inspect3D]]

Revision as of 18:19, 28 October 2022

Language:  English  • français • italiano • português • español 

This tutorial will show you how to use Inspect3D in order to perform Principal Component Analysis (PCA) using data from a CMO library. For a full treatment of waveform-based PCA to find differences in waveform data, see the explanation presented in Research Methods in Biomechanics.

For this tutorial, we will be comparing the the knee flexion angles between participants with osteoarthritis and the normal control group. Our problem is to provide an explanation for differences in knee flexion angles between osteoarthritic walking versus normal walking. We can accomplish this by defining two groups that meet these signal definitions, performing PCA, and interpreting the results.

Data

This tutorial uses overground walking data from roughly 100 subjects. The data is divided into two conditions, normal control and osteoarthritis (moderate to severe). This data set is included in the download for Inspect3D in the demo folder.

Load and query the CMO library

As with previous tutorials, we begin by loading the CMO library and defining the queries relevant to our question. In this case we will manually create two queries based on tags.

We begin by defining a query for subjects with the OA tag (indicating osteoarthritis).

1. Select Query Dialog

2. Add Query

3. Select Modify Name

4. Enter OA as the name of the Query

5. Select Save and Select Perform Query (Both with the same button). There is only one Condition in this tutorial, so the default names are fine.

6. Select Condition Zero

7. Select the Modify Button

8. Select the Signals Tab (There are no events in this data set)

9. Select the RKnee_Angle Signal (The only signal in this data set)

10. Select the Refinement Tab

11. Select OA (This is a TAG for subjects with Osteoarthritis)

12. Select SAVE

Next we will define a query for subjects with the NC tag (indicating Normal Control). In this case we can easily modify our previous query rather than starting from scratch.

1. Add A New Query

2. Select Modify Name

3. Rename to NC

4. Change the TAG from OA to NC

5. Select Save

6. Select Perform Query

7. Close the Query Dialog

There will now be two Query Groups in the main window's Groups widget. Selecting either of these will display the associated workspaces in the Workspace widget below.

Exploring the data

We can verify the queries that we produced in the first section by visualizing our traces.

1. Select All Groups

2. Select All Signals

3. Select Plot Subject Mean

4. Select Refresh Plot

Inspecting this plot shows that although the participants for the two groups walk very differently (when watching them in person) their knee flexion angles are quite similar. Because the traces overlap significantly between the groups throughout the entire gait cycle, conventional statistics will likely not be useful for describing the differences between these two groups.

This is one of the motivations behind PCA: by transforming our original data into a coordinate system based on principal components we will end up with a few dimensions that explain most of the variance in the data set. This, in turn, will help us to explain and detect the differences between the groups.

Running Principal Component Analysis

1. Select the desired groups and subjects in the Groups and Workspaces widgets

2. Open the PCA Options dropdown menu on the toolbar

3. Set Number PCs to 4

4. Click Run PCA

4.1 The main results from these calculations are the eigenvectors (loading vectors) for each principal component (which describe that shape of the variability), and the subject scores for each principal component (which describe the magnitude of the variability).

4.2 After clicking Run PCA on Selected Groups, the results should appear in the PCA Graphs on on the far right side of the screen.

4.3 If these are not visible then open the PCA Options dropdown menu and then select Show PCA Graphs.

Interpreting PCA results

The results from the PCA are described in 6 windows: Variance Explained, Loading Vector, Subject Scores, Group Scores, Extreme Plot, and PC Reconstruction. The views provided by each of these windows are described in full here.

Variance Explained

The Variance Explained window, which displays the variance explained by each principal component as well as the cumulative variance for each principal components. Hovering over each bar will display the exact variance explained by that principal component.

It's important to verify that the principal components that have been calculated explain a significant amount of the data set's variability. A good heuristic to use is that you want enough principal components to explain 95% of the data set's variety, otherwise there will be at least a moderate amount of variation that your analysis has not captured. In this example, our 4 principal components explain 96% of the data set's variability, which is sufficient and we can continue the exploration. If there less than 95% of the data set's variance was explained then we should re-run the analysis with more principal components.

Group Scores

Next, the Group Scores window helps us to determine if there is a significant differences between the groups for any principal component. This is established by looking at each principal component's mean score and their standard errors. If the score +/- the standard error does not cross the x-axis, then there are significant differences in these scores. For this example, this is the case for PC1 and PC2, which will therefore be the focus of the remaining investigation.

Subject Scores

Once we have identified our principal components of interest, switch to the Subject Scores window and plot the chosen principal components against one another (i.e., for this tutorial plot PC1 versus PC2). We are looking to be able to distinguish our groups and in this case there is a pretty clear separation between the groups in this 2D space. Although PC1 may describe more of the data set's variability than PC2, PC2 can discriminate between the groups better (this isn't mathematically shown here, but can be seen if the the visual inspection is followed by a cluster analysis). Of course, PC1 and PC2 can discriminate between the groups even better if they are used together instead of separately.

If you want to investigate the original data for a particular subject, click on the subject's scatter point and the corresponding data will be highlighted in the original Queried Data plot. Similarly, if you click on a trace in the original Queried Data plot then the corresponding subject is highlighted in the Subject Score widget. This can be useful to identify outliers.

Extreme Plot

The Extreme Plot widget allows us to animate the principal components to get some further insight into our problem.

Select the Mean and Slider option, then select PC1 and click play. This shows that the PC1 is capturing variability as an offset in the signal. Vertical offsets in gait data can occur for a variety of reasons, including differences in standing calibration posture. Therefore, PC1 may explain a significant amount of variability that is not very helpful for answering our question.

Take the same steps again but change the PC to 2 (top right drop down menu). This animation shows a different story. The differences here are the largest during late stance and mid swing. The OA group shows less range of motion compared to the normal control group. This observation makes a lot of sense considering differences in the two conditions.

Retrieved from ""