Visual3D Tutorial: Looking at Large Public Datasets: Difference between revisions

From Software Product Documentation
Jump to navigation Jump to search
Line 22: Line 22:
== Dataset ==
== Dataset ==


The dataset used for able-bodied participants is available [https://springernature.figshare.com/articles/dataset/c3d_files_of_3D_full-body_gait_kinematics_kinetics_and_EMG_of_138_able-bodied_adults/24192480?backTo=/collections/A_full-body_motion_capture_gait_dataset_of_138_able-bodied_adults_across_the_life_span_and_50_stroke_survivors/6503791 here], while the dataset used for stroke survivors is available [https://springernature.figshare.com/articles/dataset/c3d_files_of_3D_full-body_gait_kinematics_kinetics_and_EMG_of_50_adults_with_stroke/24192483?backTo=/collections/A_full-body_motion_capture_gait_dataset_of_138_able-bodied_adults_across_the_life_span_and_50_stroke_survivors/6503791 here].
The dataset used for able-bodied participants is available [https://springernature.figshare.com/articles/dataset/c3d_files_of_3D_full-body_gait_kinematics_kinetics_and_EMG_of_138_able-bodied_adults/24192480?backTo=/collections/A_full-body_motion_capture_gait_dataset_of_138_able-bodied_adults_across_the_life_span_and_50_stroke_survivors/6503791 here], while the dataset used for stroke survivors is available [https://springernature.figshare.com/articles/dataset/c3d_files_of_3D_full-body_gait_kinematics_kinetics_and_EMG_of_50_adults_with_stroke/24192483?backTo=/collections/A_full-body_motion_capture_gait_dataset_of_138_able-bodied_adults_across_the_life_span_and_50_stroke_survivors/65037911 here].


The authors have made an already-processed dataset available for use, but for the purposes of this tutorial, we shall be using the original, raw data.
The authors have made an already-processed dataset available for use, but for the purposes of this tutorial, we shall be using the original, raw data.

Revision as of 19:52, 12 January 2024

Language:  English  • français • italiano • português • español 

Biomechanics is coming into a new age, where large datasets are the norm and individually processing the files can be tiresome and time consuming. Captured over long periods of time, datasets can be inconsistent, which makes it difficult to automate working with them.

In this tutorial, we will be going through common issues seen while working with large public biomechanics datasets, through the use of a large dataset released which tracks the full body gait of both stroke-victims and healthy participants.

Sections highlighted like this show when actions should be performed, if you are following along with the tutorial.

Large Datasets

The dataset used for this tutorial is the raw motion data collected by (Van Criekinge et al , [reference]), which includes the full-body motion capture gait of 138 able-bodied adult participants and 50 stroke-survivors. They include Full Body Kinematics (PiG Model), Kinetics, and EMG Data across a wide variety of ages, heights and weights. This is an open sourced dataset, which allows for free and open science, supports the growing biomedical research community and increases the ability for scientists to duplicate results of tests.

This dataset is unique in its size and quality for an open source dataset. Large datasets are essential for scientific observations to be able to accurately draw conclusions, and are becoming the norm within the biomechanical community. Because of their nature as large collections of data, there are often unique problems relating to working with such datasets, which is to be explored here. Datasets such as these can be generated over a long period of time, where people, standards or equipment may change. Large datasets can also be the culmination of several different projects, which further leads to inconsistencies. The most frequent issues we found were due to inconsistencies within the measured dataset: it can be difficult to keep a rigid structure to any large dataset, but inconsistencies can wreck the ability to automate tasks, so we’ll show how you can effectively process the data in large datasets like this.


Dataset

The dataset used for able-bodied participants is available here, while the dataset used for stroke survivors is available here.

The authors have made an already-processed dataset available for use, but for the purposes of this tutorial, we shall be using the original, raw data.

File Naming Conventions

A common inconsistency in large datasets can be the usage of filenames. Filenames should convey information about a file in a manner that can be easily referenced, eg: calibration files should be named differently from motion files so that they can be separated, and motion files should have a trail number in their file name (if applicable), etc.

In this dataset, we found inconsistencies within filenames, as well as an error relating to the file name. Particularly within the stroke-victims, there was no consistent naming convention across all of the motion files, other than all including the letters “BWA”. How the trail number was conveyed was done in a variety of different methods, including separating with spaces, underscores or nothing, as well as prepending a “0” to the trial number. Ultimately this forces the user to use wildcards to collect all the motion files at once, and limits their ability to individually select motion files.


It was also observed that Stroke Victim 47 (TVC47) had a mislabeled calibration file. This must be renamed from “BWA (0)” to something following their calibration conventions, such as “TVC-47 Cal 0”.

Inconsistent Subject Prefixes (Stroke Survivors)

All the markers in the static files provided are prefixed with the subject ID, however the dynamic file markers are not, this will prevent the built model from linking with the dynamic markers. A pipeline can be made to prepend the subject IDs to all the marker points in the dynamic files. The subject ID is stored in the PARAMETERS::SUBJECTS::NAMES signal within the dynamic files, however they use a dash instead of an underscore, “BWA-2” vs “BWA_2” these still wont match. String expressions can be used to convert the subject ID strings to match the static files.

Set a pipeline parameter to a list of all dynamic files contained in the "50_Stroke_PiG":
Set_Pipeline_Parameter_To_List_Of_Files
/PARAMETER_NAME=DYNAMIC_FILES
/FOLDER=C:\Users\shane\Documents\Tutorial Study\50_StrokePiG
/SEARCH_SUBFOLDERS=TRUE
/FILE_MASK=*BWA*.c3d
/ALPHABETIZE=FALSE
! /RETURN_FOLDER_NAMES=FALSE
;


Loop through all dynamic files:
For_Each
/ITERATION_PARAMETER_NAME=DYNAMIC_FILE
! /ITERATION_PARAMETER_COUNT_NAME=
/ITEMS=::DYNAMIC_FILES
;


Open the current file:
File_Open
/FILE_NAME=::DYNAMIC_FILE
! /FILE_PATH=
! /SUFFIX=
! /SET_PROMPT=File_Open
! /ON_FILE_NOT_FOUND=PROMPT
! /FILE_TYPES_ON_PROMPT=
;


Get the Participant ID from PARAMETERS:SUBJECT:NAMES
Set_Pipeline_Parameter_To_Data_Value
/SIGNAL_TYPES=PARAMETERS
/SIGNAL_FOLDER=SUBJECTS
/SIGNAL_NAMES=NAMES
! /SIGNAL_COMPONENTS=ALL_COMPONENTS
/PARAMETER_NAME=SUBJECT_ID
;


Get the preceding half of the Participant ID using the STRING_LEFT expression, since all IDs begin with BWA we can hard code the index value:
Set_Pipeline_Parameter_From_Expression
/PARAMETER_NAME=LEFT_SUBJECT_ID
/EXPRESSION=STRING_LEFT("&::SUBJECT_ID&",3)
/AS_INTEGER=FALSE
;


Get the length of the participant id using the STRING_LENGTH expression:
Set_Pipeline_Parameter_From_Expression
/PARAMETER_NAME=SUBJECT_ID_LENGTH
/EXPRESSION=STRING_LENGTH("&::SUBJECT_ID&")
! /AS_INTEGER=TRUE
;


Get the index of the dash using the STRING_FIND expression
Set_Pipeline_Parameter_From_Expression
/PARAMETER_NAME=DASH_INDEX
/EXPRESSION=STRING_FIND("&::SUBJECT_ID&","-",0)
! /AS_INTEGER=TRUE
;


Subtract the DASH_INDEX from SUBJECT_ID_LENGTH to get the negative index of the dash, subtract that value by one to get the value needed for STRING_RIGHT (the negative index of the character immediately following the dash):
Set_Pipeline_Parameter_From_Expression
/PARAMETER_NAME=RIGHT_SUBJECT_ID
/EXPRESSION=STRING_RIGHT("&::SUBJECT_ID&",&::DASH_NEGATIVE_INDEX&)
!/AS_INTEGER=FALSE
;


Get the characters in the subject id following the dash using STRING_RIGHT:
Set_Pipeline_Parameter_From_Expression
/PARAMETER_NAME=RIGHT_SUBJECT_ID
/EXPRESSION=STRING_RIGHT("&::SUBJECT_ID&",&::DASH_NEGATIVE_INDEX&)
!/AS_INTEGER=FALSE
;


Concatenate LEFT_SUBJECT_ID and RIGHT_SUBJECT_ID with an underscore between them:
Set_Pipeline_Parameter
/PARAMETER_NAME=SUBJECT_ID_CONVERTED
/PARAMETER_VALUE=_
! /PARAMETER_VALUE_SEARCH_FOR=
! /PARAMETER_VALUE_REPLACE_WITH=
/PARAMETER_VALUE_PREFIX=::LEFT_SUBJECT_ID
/PARAMETER_VALUE_APPEND=::RIGHT_SUBJECT_ID
! /MULTI_PASS=FALSE
;


Get a list of all target names
Set_Pipeline_Parameter_To_List_Of_Signal_Names
/PARAMETER_NAME=TARGETS
/SIGNAL_TYPES=TARGET
SIGNAL_FOLDER=ORIGINAL
;


Using the updated subject ID and the list of target names, use the modify participants parameters pipeling function the add the neeeded subject prefixes, ensuring OVERWRITE_C3D_FILE is set to true so the changes are preserved:
Modify_C3D_Subjects_Parameters
! /INCLUDE_CAL_FILE=TRUE
/IS_STATIC=0
/SUBJECTS_USES_PREFIXES=1
! /REMOVE_EXISTING_LABEL_PREFIXES=TRUE
! /REMOVE_EXISTING_PREFIX_PARAMETERS=FALSE
/SUBJECTS_LABEL_PREFIXES=::SUBJECT_ID_CONVERTED
/PREFIX_MARKERS=::TARGETS
! /PREFIX_ROTATIONS=
! /PREFIX_C3D_PARAMETERS=
! /C3D_PARAMETER_GROUPS=SUBJECTS
! /MARKER_SETS=
/SUBJECTS_NAMES=::SUBJECT_ID_CONVERTED
! /SUBJECTS_DISPLAY_SETS=
! /SUBJECTS_MODELS=
! /SUBJECTS_MODEL_PARAMS=
/OVERWRITE_C3D_FILE=1
;


Clear the workspace before continuing the loop to preserve memory, and close the loop:
File_New
;

End the loop
End_For_Each
/ITERATION_PARAMETER_NAME=DYNAMIC_FILE
;

Inconsistent Subject Prefixes (Able Bodied)

The markers in the static files of the able bodied participants do not have subject prefixes, however a random selection of the dynamic file markers do have prefixes. In this case it makes sense to strip the subject prefixes from the select files that have them instead of adding them to every other file


Set a pipeline parameter to a list of all C3D files in the 138_HealthyPiG_10.05 folder ensure that SEARCH_SUB_FOLDERS is set to true:
Set_Pipeline_Parameter_To_List_Of_Files
/PARAMETER_NAME=FILES
/FOLDER=C:\Users\shane\Documents\Tutorial Study\138_HealthyPiG_10.05
/SEARCH_SUBFOLDERS=TRUE
/FILE_MASK=*.c3d
! /ALPHABETIZE=TRUE
! /RETURN_FOLDER_NAMES=FALSE
;


Loop through the list of files:
For_Each
/ITERATION_PARAMETER_NAME=FILE
! /ITERATION_PARAMETER_COUNT_NAME=
/ITEMS=::FILES
;


Open the current file:
File_Open
/FILE_NAME=::FILE
! /FILE_PATH=
! /SUFFIX=
! /SET_PROMPT=File_Open
! /ON_FILE_NOT_FOUND=PROMPT
! /FILE_TYPES_ON_PROMPT=
;

Using the modify C3D subjects parameters pipeline function clear the participant prefixes from the files, set OVERWRITE_C3D_FILE to true to ensure the changes are preserved, keep all other parameters at their default value:
Modify_C3D_Subjects_Parameters
! /INCLUDE_CAL_FILE=TRUE
! /IS_STATIC=
! /SUBJECTS_USES_PREFIXES=
! /REMOVE_EXISTING_LABEL_PREFIXES=TRUE
! /REMOVE_EXISTING_PREFIX_PARAMETERS=FALSE
! /SUBJECTS_LABEL_PREFIXES=
! /PREFIX_MARKERS=
! /PREFIX_ROTATIONS=
! /PREFIX_C3D_PARAMETERS=
! /C3D_PARAMETER_GROUPS=SUBJECTS
! /MARKER_SETS=
! /SUBJECTS_NAMES=
! /SUBJECTS_DISPLAY_SETS=
! /SUBJECTS_MODELS=
! /SUBJECTS_MODEL_PARAMS=
/OVERWRITE_C3D_FILE=TRUE
;


Close the currently open file to preserve memory, and then close the loop:
File_Close
/FILE_NAME=::FILE
! /QUERY=
! /CLOSE_ASSOCIATED_MODELS=FALSE
;

End_For_Each
/ITERATION_PARAMETER_NAME=FILE
;

Building Models

The data set used the Plug in Gait models set, unfortunately not all that standard Plug in Gait markers were used for example the right and left upper arm markers were missing, so only a model of the lower body can be made, see the tutorial here[1] to walk through the process of making a lower body plug in gait model.

The subset of healthy subjects used in this tutorial has three markers for the pelvis as opposed to the four used for the stroke survivors; two separate models need to be made. The default values provided in the tutorial can be used to build the models, as the subject data will be entered in via the processing pipeline.

Processing C3Ds into CMZs (Stroke)

Instead of manually processing each C3D file, assigning the models running the computations and saving them as CMZs a pipeline can be developed to automate the process. Since the parameters needed to accurately calculate the models are stored within the dynamic files, they must be opened first and the data retrieved so it can then be applied to the model.


Processing C3Ds into CMZs (Healthy)

Missing Parameters

Upon inspection some of the dynamic files were missing or have incorrectly named parameters needed to calculate the model (height, mass, ankle widths, and knee widths) in the case of the stroke subjects this data was provided in a spreadsheet so the values can be entered manually and the models recalculated, unfortunately no such data exists for the healthy subjects so files missing the parameters must be excluded (SUBJ37 and SUBJ42)

Building Gait Events

Due to inconsistent force plate data, gait events cannot be reliably determined using kinetics, kinematics must be used instead. A tutorial on the process can be found here


Conclusion

In conclusion, through this tutorial we have identified the common pitfalls that may occur when processing a large biomechanics dataset, and gone through an example where we experienced these first hand, and presented realistic solutions to these problems. We made all of our files open, and you are encouraged to attempt the same problem as we did, and to build upon it by analyzing the post-processed data.

References

Paper: Van Criekinge et al. a full-body motion capture gait dataset of 138 able-bodied adults across the life span and 50 stroke survivors: [2]

Retrieved from ""