User Tools

Site Tools


sift:tutorials:using_kmeans_to_cluster_kinetic_features_in_above_the_knee_amputees

Using K-means to cluster kinetic features in above the knee amputees

Abstract

TODO

Data

Public Data Set

The publicly available dataset used in this project is credited to Hunt et al. of the paper "Open dataset of kinetics, kinematics, and electromyography of above-knee amputees during stand-up and sit-down". The dataset includes 3D kinematic and kinetic data for 9 above-knee amputees during stand-up and sit-down with their passive, microprocessor-controlled prostheses. The biomechanics were captured using a 12-camera motion capture system with two force plates and four EMG sensors on the intact lower limb.

In this tutorial, we will be using the whole dataset, which can be downloaded fromtheir original website, with the dataset name V3D_STS.zip. Within this folder, each participant has a workspace, for a total of 9 workspaces, and these workspaces can be directly loaded into Visual3D and Sift.

In Visual3D you can see the full body model that was built using a modified Plug-In gait model.

Kinetic Data

We will focus on the Ground Reaction Force (GRF), Knee Joint Moment, and Hip Joint Moment for analysis, normalized to body weight. These link model based items were created using the pipelines available in the data set, and you can find the pipeline files under the path “V3D_STS/V3D_Pipeline_Files”. Below is an example of the command line for generating Knee Joint Moment (Torque).

Compute_Model_Based_Data
/RESULT_NAME=L_knee_torque
/SUBJECT_TAG=ALL_SUBJECTS
/FUNCTION=JOINT_MOMENT
/SEGMENT=LSK
/REFERENCE_SEGMENT=
/RESOLUTION_COORDINATE_SYSTEM=LTH
! /USE_CARDAN_SEQUENCE=FALSE
/NORMALIZATION=TRUE
/NORMALIZATION_METHOD=DEFAULT_NORMALIZATION
! /NORMALIZATION_METRIC=
! /NEGATEX=FALSE
/NEGATEY=TRUE
/NEGATEZ=TRUE
! /AXIS1=X
! /AXIS2=Y
! /AXIS3=Z
! /TREADMILL_DATA=FALSE
! /TREADMILL_DIRECTION=UNIT_VECTOR(0,1,0)
! /TREADMILL_SPEED=0.0
;

Methods

Visual3D Processing

Although the data set provides completed .cmz workspaces, there was some processing that was required to prepare for analysis.

1. Download V3D_STS.zip

2. Add Tags for each workspace: Each workspace indicates a single participant. Since there are only 9 participants in this dataset, we will add tags manually. The table shown below is from the original paper, which includes all the demographics and details of the participants. In this tutorial, we will add tags for these two columns: “Prosthesis Side” and “Knee Prosthesis”.

  • Load workspace: In Visual3D, select File → Open/Add…, and go to the V3D_STS folder downloaded previously. Open a workspace. As an example, open the workspace for the TF01 participant: 20210916_TF01_STS.cmz. You should see the workspace being loaded as following:

  • Add tags: Click Add New File Tag, and you will see a pop up window Enter the new file tag. According to the table, TF01 has “Prosthesis Side” as “L”, so we enter “Left” in the pop up window, and click Continue». Similarly, since TF01 is using “C-Leg” as “Knee Prosthesis”, we will add another tag named as “CLeg”. Make sure both tags are checked.
    • Note that there should be no hyphens in the tag!

Now that we have added the tags for TF01, we can repeat the same steps for the other workspaces.

Loading Data into Sift

Within Sift we can visualize and analyze the data.

Select Load Library and click on Browse to go to the V3D_STS folder where all 9 workspaces with updated tags are located. Then select Load and exit the window. Make sure you see the tags showing up on the screen. You could also click on the + button on the left side of each workspace to see whether the corresponding tags are being checked.

Build Queries in Sift

In our analysis we have two main objectives:

  1. To compare intact side vs prosthetic's side kinetic signals.
  2. To compare the the three Knee Prosthetic brands (i.e. C-Leg, Rheo, Plie) kinetic signals.

To do so we will focus on Ground Reaction Force, Knee Joint Moment in Flexion/Extension, and Hip Joint Moment in Flexion/Extension.

Queries for Objective 1

We will build queries for Ground Reaction Force as an example, but the steps and logistics for Knee Joint Moment, and Hip Joint Moment are the same.

Open the Query Builder icon on the Explore Page or on the toolbar to prompt the Query Builder Dialog.

Create a new query with Query Name: GRF_intact. This would be the query for all participants' intact-side. Within this query, create the following conditions:

1. Condition Name: left prosthesis CLeg.

  • This condition includes all participants who's left leg is the prosthetic leg, AND using the C-Leg knee prosthetic brand.
  • In the Signals tab, select the following settings:
    • Since this participant's left leg is the prosthetic, the GRF of the intact leg would be the RIGHT leg. Thus, R_GRF is selected as the Signal Name.
    • Since the Z-axis is the vertical axis in the coordinate system in this study, we select Z for the Component, as the ground reaction force is in the vertical direction from the floor up to the leg.

  • In the Events tab, within the All Events block, you should see two events: Foot Off and Foot Strike. Select Foot Off and click the > button to add it into Event Sequence block. Then select Foot Strike and click the > button to add it into Event Sequence block. If you accidentally selected the wrong event, you can also click the < button to move the event from Event Sequence back to All Events block. You can also use the Up and Down buttons to move the event correspondingly in the sequence.

  • In the Refinement tab, check Refine using tag and Use AND Logic, and then select the tags CLeg and Left. Then click Save.

2. Condition Name: right prosthesis CLeg.

  • This name indicates that the participant has the right leg as the prosthesis, so the left leg is the intact leg.
  • Signals tab

  • Events tab: Same as previous.
  • Refinement tab:

  • Click Save.

Up to this point, we have successfully created the conditions for C-Leg. The same steps are repeated for Plie and Rheo, and the only difference would be to select the corresponding tags for the different brands of prosthetics.


After creating the GRF_intact query, we can then use the same logic as above to create the GRF_prosthesis query. Note that in this case, when we are creating for example left prosthesis CLeg, we select L_GRF for the Signal Name, because the left leg would be the prosthetic leg.


The queries for Hip Moment and Knee Moment have the same logic as above.

Queries for Objective 2

For this objective, we are trying to compare the kinetics data of the prosthetic leg across different brands. Thus, we will create queries specifically for each brand: C-Leg, Plie, and Rheo. For illustration purpose, we will take C-Leg as and example, but idea would be the same for the other two brands as well.

  • In the Explore Page, click on the Query Builder icon.
  • Create a new query, and set Query Name as “CLeg GRF_prosthesis” to indicate that this query is for participants who use C-Leg as their prosthetic leg, and we will focus on the Ground Reaction Force of the prosthetic legs.
  • Within this query, create two conditions:

1. A condition named as left prosthesis for participants with left leg as prosthetic.

  • Signals :
    • Type: LINK_MODEL_BASED
    • Folder: ORIGINAL
    • Signal Name: L_GRF
    • Component: Z
  • Events Sequence is Foot Off followed by Foot Strike
  • Refinement:
    • Refine using tag: checked
    • Use AND Logic: checked
    • Tags: CLeg, Leg
    • Click Save

2. A condition named as right prosthesis for participants with right leg as prosthetic.

  • Signals :
    • Type: LINK_MODEL_BASED
    • Folder: ORIGINAL
    • Signal Name: R_GRF
    • Component: Z
  • Events Sequence is Foot Off followed by Foot Strike
  • Refinement:
    • Refine using tag: checked
    • Use AND Logic: checked
    • Tags: CLeg, Leg
    • Click Save

Now we have built the C-leg query for GRF, we can also do the same for Knee Joint Moment and Hip Joint Moment. Note that the Component should be set to X for Knee Joint Moment and Hip Joint Moment, because the rotations of these joints are relative to the X-axis rather than Z-axis for GRF.

After all queries are created, click on Calculate All Queries at the bottom of the Query Builder Dialog.

Visualizing Data in Sift

Before going into the analysis we can visualize the data to see if we can identify any patterns between groups.

As an example, we can quickly visualize the GRF difference between intact sides vs.. prosthetic sides:

  • Go to the Explore page
  • Select both GRF_intact and GRF_prosthesis in the Groups block by pressing Ctrl, and then check Select All Workspaces.
  • Check Plot Group Mean and Plot Group Dispersion.
  • The plot that looks like the following would be generated:

The X-axis is the normalized time points, with the following events in the corresponding ranges:

  • Standing Up: Point 0 ~ 40
  • Standing: Point 40 ~ 60
  • Sitting Down: Point 60 ~ 100

Once we are happy with the data that is visualized we can go ahead with the analysis.

PCA Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space to reveal patterns and visualize variations within the data. In this project, we will show how to perform PCA to visualize the variations of kinetic features between intact leg and prosthetic leg, as well as across the three prosthetic brands. The PCA scores will then be used to perform K-means clustering.

The following PCA results are generated for further analysis.

For Objective 1

  • Groups: GRF_intact, GRF_prosthesis
  • PCA Name: GRF_all_together
  • Number of PCs: 4
  • Use Workspace Mean: Unchecked

  • Groups: Hip_Moment_prosthesis, Hip_Moment_intact
  • PCA Name: Hip_Moment_all_together
  • Number of PCs: 4
  • Use Workspace Mean: Unchecked

  • Groups: Knee_Moment_intact, Knee_Moment_prosthesis
  • PCA Name: Knee_Moment_all_together
  • Number of PCs: 4
  • Use Workspace Mean: Unchecked

For Objective 2

  • Groups: CLeg GRF_prosthesis, Plie GRF_prosthesis, Rheo GRF_prosthesis
  • PCA Name: GRF Prosthesis
  • Number of PCs: 7
  • Use Workspace Mean: Unchecked

  • Groups: CLeg Knee Moment prosthesis, Plie Knee Moment prosthesis, Rheo Knee Moment prosthesis
  • PCA Name: Knee Moment Prosthesis
  • Number of PCs: 6
  • Use Workspace Mean: Unchecked

  • Groups: CLeg Hip Moment prosthesis, Plie Hip Moment prosthesis, Rheo Hip Moment prosthesis
  • PCA Name: Hip Moment Prosthesis
  • Number of PCs: 7
  • Use Workspace Mean: Unchecked

K-means Analysis

K-means is an unsupervised clustering technique that groups similar datapoints together to form a “cluster” based on the Euclidean distances between points. In our project, we will use K-means clustering to answer our two objects:

  1. Investigating whether K-means can find the clusters for “Intact” and “Prosthetic” based on the PC scores of kinetic data.
  2. Investigating whether K-means would be able to cluster the three prosthetic brands based on GRF.

The following shows the settings for performing K-means in Sift.

Settings for Objective 1

  • Go to the Analyse page, select the PCA Results which you would like to perform K-means analysis on.
  • From the toolbar, click the Outlier Detection Using PCA icon and select K-means from the drop-down menu.
  • Use the following settings to perform K-means:

Settings for Objective 2

  • Go to the Analyse page, select the PCA Results which you would like to perform K-means analysis on.
  • From the toolbar, click the Outlier Detection Using PCA icon and select K-means from the drop-down menu.
  • Use the following settings to perform K-means. Note that the K-means setting is the same for all cases, we just need to change “Number of Clusters” parameter correspondingly.

Results / Discussion

Visualizing Data in Sift

Comparison of Ground Reaction Force of Intact side v.s. Prosthesis side

We can see the GRF of the intact side is much greater than prosthetic side, indicating people more weight on it when standing up and sitting down.

Comparison of Knee Moment of Intact side v.s. Prosthesis side

Comparison of Hip Moment of Intact side v.s. Prosthesis side

The above three figures all shows the same commonality: There is a noticeable variation in Ground Reaction Force, Knee Joint Moment and Hip Joint Moment between intact and prosthetic legs in the standing-up and sitting-down motions. This is especially obvious for Ground Reaction Force, where the standard variations of the two curves barely overlap.

Knee Moment of Prosthesis side across three brands

Hip Moment of Prosthesis side across three brands

While comparing the joint moments across three prosthetic brands, the variation is not that obvious. We can see that there is a lot of overlap of standard deviation across three brands.

PCA Analysis

For interpretation, we only take Ground Reaction Force as an example.

Variance Explained

Our 4 principal components explain 81.3% of the dataset's variability, and Principal Component 1 explains 59.3% of the variation in the original dataset. In practice, it is recommended to have 90% to 95% variability explained by increasing the number of principal components when performing the analysis, but for demonstration purpose only, we can continue the exploration.

Group Scores

Since we want to use PCA to distinguish between groups, we will look at the group scores to find the PC that explains the most variance between groups. In our case, this is PC1, where we can see a big separation between the two groups.

Loading Vector

Loading vector can explain which parts of the curve have variance. Any regions that are greater or less than zero means that there is a lot of variance that is being explained by that PC. In this figure, we plot the loading vector for PC1, and we see that the whole curve is above zero, which supports that majority of the variance is explained by PC1.

Extreme Plot

The Extreme Plot allows us to visualize the principal components in the range of Mean +/- 2 Standard Deviations to get some further insight into our problem. We see amplitude variance by the two standard deviations primarily around the 20% and the 80% of the time stamps, which are the “standing up” and “sitting down” motions.

Workspace Scores

Workspace scores can help us visualize the variation of the dataset across each principal components, and to also see whether there is a clear separation between groups. From this plot, the points are colored based on workspace (i.e. each color indicates a single participant, and each dot represents a single trace of that participant). We see that each participant has data points on both sides of the PC1 = 0 vertical line, which illustrates the variation between intact side versus prosthetic side of the participant, and we can also see the data points cluster on both sides.

K-means Analysis

To see the performance of K-means in differentiating different groups, we compare the clustered labels with the real labels from the original data. To do this, on the toolbar, click the Show Data Options icon, and in the pop up window, select Cluster in the Display Styles From… block to color the data points based on K-means cluster labels, and select Group to color the data points based on the real group labels.

Objective 1: Intact v.s. Prosthetic

GRF

Real Labels

K-means

Below, the incorrectly clustered data points are circled, and a boundary line is drawn to separate the two clusters for better visualization.

We can see that K-means performs very well for differentiating the intact side and the prosthetic side given GRF.

Knee Joint Moment

Real Labels

K-means

Data points that are incorrectly clustered are circled:

We observe that although there are more miss-clustering data points, in general K-means still has a good performance for clustering the two groups.

Hip Joint Moment

Real Labels

K-means

Data points that are incorrectly clustered are circled, and a boundary line is drawn:

The result shows that K-means performs a pretty good job on clustering the two groups given Hip Join Moments.

Objective 2: Across Three Prosthetic Brands

GRF

Real Labels

K-means (K = 2)

With only 2 clusters, we see that it mostly correctly captured the clusters for C-Leg and Plie.

Then, we also performed K-means with three and four clusters.

K = 3

K = 4

In these cases, we observed that K-means was not able to correctly identify the Rheo cluster, which makes sense because Rheo has few data points. However, C-Leg and Plie still show clusters to some extent.


Next, we look at Hip Joint Moment across three brands to take a glance at the situation where the majority waveforms overlap in the original data, and see how K-means perform on such situation. We will skip Knee Joint Moment for three brands comparison because it generates similar conclusions as GRF.


Hip Joint Moment

Real Labels

We can see that the clusters are all mixed together, indicating that there is not too much clear variation among the three brands. This claim aligns with what we observed from Visualizing Data in Sift section on hip moment previously, for having a lot of overlapping in the standard deviation across three brands.

K-means (K = 2)

K = 3

K = 4

As expected, K-means doesn't correctly identify the clusters, as there is no clear separation between groups in the original data.

Conclusion

TODO

sift/tutorials/using_kmeans_to_cluster_kinetic_features_in_above_the_knee_amputees.txt · Last modified: 2025/05/13 20:27 by wikisysop