Visual3D Tutorial: Looking at Large Public Datasets: Difference between revisions

Revision as of 19:07, 12 January 2024

Language:	English • français • italiano • português • español

Biomechanics is coming into a new age, where large datasets are the norm and individually processing the files can be tiresome and time consuming. Captured over long periods of time, datasets can be inconsistent, which makes it difficult to automate working with them.

In this tutorial, we will be going through common issues seen while working with large public biomechanics datasets, through the use of a large dataset released which tracks the full body gait of both stroke-victims and healthy participants.

Sections highlighted like this show when actions should be performed, if you are following along with the tutorial.

Large Datasets

The dataset used for this tutorial is the raw motion data collected by (Van Criekinge et al , [reference]), which includes the full-body motion capture gait of 138 able-bodied adult participants and 50 stroke-survivors. They include Full Body Kinematics (PiG Model), Kinetics, and EMG Data across a wide variety of ages, heights and weights. This is an open sourced dataset, which allows for free and open science, supports the growing biomedical research community and increases the ability for scientists to duplicate results of tests.

This dataset is unique in its size and quality for an open source dataset. Large datasets are essential for scientific observations to be able to accurately draw conclusions, and are becoming the norm within the biomechanical community. Because of their nature as large collections of data, there are often unique problems relating to working with such datasets, which is to be explored here. Datasets such as these can be generated over a long period of time, where people, standards or equipment may change. Large datasets can also be the culmination of several different projects, which further leads to inconsistencies. The most frequent issues we found were due to inconsistencies within the measured dataset: it can be difficult to keep a rigid structure to any large dataset, but inconsistencies can wreck the ability to automate tasks, so we’ll show how you can effectively process the data in large datasets like this.

Dataset

File Naming Conventions

A common inconsistency in large datasets can be the usage of filenames. Filenames should convey information about a file in a manner that can be easily referenced, eg: calibration files should be named differently from motion files so that they can be separated, and motion files should have a trail number in their file name (if applicable), etc.

In this dataset, we found inconsistencies within filenames, as well as an error relating to the file name. Particularly within the stroke-victims, there was no consistent naming convention across all of the motion files, other than all including the letters “BWA”. How the trail number was conveyed was done in a variety of different methods, including separating with spaces, underscores or nothing, as well as prepending a “0” to the trial number. Ultimately this forces the user to use wildcards to collect all the motion files at once, and limits their ability to individually select motion files.

1.

It was also observed that Stroke Victim 47 (TVC47) had a mislabeled calibration file. This must be renamed from “BWA (0)” to something following their calibration conventions, such as “TVC-47 Cal 0”.

Inconsistent Subject Prefixes (Stroke)

All the markers in the static files provided are prefixed with the subject ID, however the dynamic file markers are not, this will prevent the built model from linking with the dynamic markers. A pipeline can be made to prepend the subject IDs to all the marker points in the dynamic files. The subject ID is stored in the PARAMETERS::SUBJECTS::NAMES signal within the dynamic files, however they use a dash instead of an underscore, “BWA-2” vs “BWA_2” these still wont match. String expressions can be used to convert the subject ID strings to match the static files.

Set a pipeline parameter to a list of all dynamic files contained in the "50_Stroke_PiG": Set_Pipeline_Parameter_To_List_Of_Files /PARAMETER_NAME=DYNAMIC_FILES /FOLDER=C:\Users\shane\Documents\Tutorial Study\50_StrokePiG /SEARCH_SUBFOLDERS=TRUE /FILE_MASK=*BWA*.c3d /ALPHABETIZE=FALSE ! /RETURN_FOLDER_NAMES=FALSE ;

Loop through all dynamic files: For_Each /ITERATION_PARAMETER_NAME=DYNAMIC_FILE ! /ITERATION_PARAMETER_COUNT_NAME= /ITEMS=::DYNAMIC_FILES ;

Open the current file: File_Open /FILE_NAME=::DYNAMIC_FILE ! /FILE_PATH= ! /SUFFIX= ! /SET_PROMPT=File_Open ! /ON_FILE_NOT_FOUND=PROMPT ! /FILE_TYPES_ON_PROMPT= ;

Get the Participant ID from PARAMETERS:SUBJECT:NAMES Set_Pipeline_Parameter_To_Data_Value /SIGNAL_TYPES=PARAMETERS /SIGNAL_FOLDER=SUBJECTS /SIGNAL_NAMES=NAMES ! /SIGNAL_COMPONENTS=ALL_COMPONENTS /PARAMETER_NAME=SUBJECT_ID ;

Get the preceding half of the Participant ID using the STRING_LEFT expression, since all IDs begin with BWA we can hard code the index value: Set_Pipeline_Parameter_From_Expression /PARAMETER_NAME=LEFT_SUBJECT_ID /EXPRESSION=STRING_LEFT("&::SUBJECT_ID&",3) /AS_INTEGER=FALSE ;

Get the length of the participant id using the STRING_LENGTH expression: Set_Pipeline_Parameter_From_Expression /PARAMETER_NAME=SUBJECT_ID_LENGTH /EXPRESSION=STRING_LENGTH("&::SUBJECT_ID&") ! /AS_INTEGER=TRUE ;

Get the index of the dash using the STRING_FIND expression Set_Pipeline_Parameter_From_Expression /PARAMETER_NAME=DASH_INDEX /EXPRESSION=STRING_FIND("&::SUBJECT_ID&","-",0) ! /AS_INTEGER=TRUE ;

Subtract the DASH_INDEX from SUBJECT_ID_LENGTH to get the negative index of the dash, subtract that value by one to get the value needed for STRING_RIGHT (the negative index of the character immediately following the dash): Set_Pipeline_Parameter_From_Expression /PARAMETER_NAME=RIGHT_SUBJECT_ID /EXPRESSION=STRING_RIGHT("&::SUBJECT_ID&",&::DASH_NEGATIVE_INDEX&) !/AS_INTEGER=FALSE ;

Get the characters in the subject id following the dash using STRING_RIGHT: Set_Pipeline_Parameter_From_Expression /PARAMETER_NAME=RIGHT_SUBJECT_ID /EXPRESSION=STRING_RIGHT("&::SUBJECT_ID&",&::DASH_NEGATIVE_INDEX&) !/AS_INTEGER=FALSE ;

Concatenate LEFT_SUBJECT_ID and RIGHT_SUBJECT_ID with an underscore between them: Set_Pipeline_Parameter /PARAMETER_NAME=SUBJECT_ID_CONVERTED /PARAMETER_VALUE=_ ! /PARAMETER_VALUE_SEARCH_FOR= ! /PARAMETER_VALUE_REPLACE_WITH= /PARAMETER_VALUE_PREFIX=::LEFT_SUBJECT_ID /PARAMETER_VALUE_APPEND=::RIGHT_SUBJECT_ID ! /MULTI_PASS=FALSE ;

Get a list of all target names Set_Pipeline_Parameter_To_List_Of_Signal_Names /PARAMETER_NAME=TARGETS /SIGNAL_TYPES=TARGET SIGNAL_FOLDER=ORIGINAL ;

Using the updated subject ID and the list of target names, use the modify participants parameters pipeling function the add the neeeded subject prefixes, ensuring OVERWRITE_C3D_FILE is set to true so the changes are preserved: Modify_C3D_Subjects_Parameters ! /INCLUDE_CAL_FILE=TRUE /IS_STATIC=0 /SUBJECTS_USES_PREFIXES=1 ! /REMOVE_EXISTING_LABEL_PREFIXES=TRUE ! /REMOVE_EXISTING_PREFIX_PARAMETERS=FALSE /SUBJECTS_LABEL_PREFIXES=::SUBJECT_ID_CONVERTED /PREFIX_MARKERS=::TARGETS ! /PREFIX_ROTATIONS= ! /PREFIX_C3D_PARAMETERS= ! /C3D_PARAMETER_GROUPS=SUBJECTS ! /MARKER_SETS= /SUBJECTS_NAMES=::SUBJECT_ID_CONVERTED ! /SUBJECTS_DISPLAY_SETS= ! /SUBJECTS_MODELS= ! /SUBJECTS_MODEL_PARAMS= /OVERWRITE_C3D_FILE=1 ;

Clear the workspace before continuing the loop to preserve memory, and close the loop: File_New ; End the loop End_For_Each /ITERATION_PARAMETER_NAME=DYNAMIC_FILE ;

Inconsistent Subject Prefixes (Healthy)

The static files provided for the healthy subjects do not have subject prefixes, however a random selection of dynamic files do, in this case it makes sense to strip the subject prefixes from the select files that have them instead of adding them to every other file.

Building Models

The data set usies the Plug in Gate models set, unfortunately not enough markers were attached to the upper body to model, so only a model of the lower body can be made, see the tutorial here to walk through the process of making a lower body plug in gait model. The subset of healthy subjects used in this tutorial has three markers for the pelvis as opposed to the four used for the stroke subjects; two separate models need to be made. The default values provided in the tutorial can be used to build the model, as the subject data will be entered in the processing pipeline

Processing C3Ds into CMZs (Stroke)

Instead of manually processing each C3D file, assigning the models running the computations and saving them as CMZs a pipeline can be developed to automate the process. Since the parameters needed to accurately calculate the models are stored within the dynamic files, they must be opened first and the data retrieved so it can then be applied to the model.

Processing C3Ds into CMZs (Healthy)

Missing Parameters

Upon inspection some of the dynamic files were missing or have incorrectly named parameters needed to calculate the model (height, mass, ankle widths, and knee widths) in the case of the stroke subjects this data was provided in a spreadsheet so the values can be entered manually and the models recalculated, unfortunately no such data exists for the healthy subjects so files missing the parameters must be excluded (SUBJ37 and SUBJ42)

Building Gait Events

Due to inconsistent force plate data, gait events cannot be reliably determined using kinetics, kinematics must be used instead. A tutorial on the process can be found here

Conclusion

References

@@ Line 37: / Line 37: @@
 == Inconsistent Subject Prefixes (Stroke)==
-<p style="float: left;">
 All the markers in the static files provided are prefixed with the subject ID, however the dynamic file markers are not, this will prevent the built model from linking with the dynamic markers. A pipeline can be made to prepend the subject IDs to all the marker points in the dynamic files. The subject ID is stored in the PARAMETERS::SUBJECTS::NAMES signal within the dynamic files, however they use a dash instead of an underscore, “BWA-2” vs “BWA_2” these still wont match. String expressions can be used to convert the subject ID strings to match the static files.
-</p>
-<code>Set_Pipeline_Parameter_To_List_Of_Files
+<strong>Set a pipeline parameter to a list of all dynamic files contained in the "50_Stroke_PiG":</strong>
+<code>
+<br>Set_Pipeline_Parameter_To_List_Of_Files
 <br>/PARAMETER_NAME=DYNAMIC_FILES
 <br>/FOLDER=C:\Users\shane\Documents\Tutorial Study\50_StrokePiG
@@ Line 50: / Line 50: @@
 <br>;
 </code>
+<br>
+<br><strong>Loop through all dynamic files:</strong>
+<code>
+<br>For_Each
+<br>/ITERATION_PARAMETER_NAME=DYNAMIC_FILE
+<br>! /ITERATION_PARAMETER_COUNT_NAME=
+<br>/ITEMS=::DYNAMIC_FILES
+<br>;
+</code>
+<br>
+<br><strong>Open the current file:</strong>
+<code>
+<br>File_Open
+<br>/FILE_NAME=::DYNAMIC_FILE
+<br>! /FILE_PATH=
+<br>! /SUFFIX=
+<br>! /SET_PROMPT=File_Open
+<br>! /ON_FILE_NOT_FOUND=PROMPT
+<br>! /FILE_TYPES_ON_PROMPT=
+<br>;
+</code>
+<br>
+<br><strong>Get the Participant ID from PARAMETERS:SUBJECT:NAMES</strong>
+<code>
+<br>Set_Pipeline_Parameter_To_Data_Value
+<br>/SIGNAL_TYPES=PARAMETERS
+<br>/SIGNAL_FOLDER=SUBJECTS
+<br>/SIGNAL_NAMES=NAMES
+<br>! /SIGNAL_COMPONENTS=ALL_COMPONENTS
+<br>/PARAMETER_NAME=SUBJECT_ID
+<br>;
+</code>
+<br>
+<br><strong>Get the preceding half of the Participant ID using the STRING_LEFT expression, since all IDs begin with BWA we can hard code the index value:</strong>
+<code>
+<br>Set_Pipeline_Parameter_From_Expression
+<br>/PARAMETER_NAME=LEFT_SUBJECT_ID
+<br>/EXPRESSION=STRING_LEFT("&::SUBJECT_ID&",3)
+<br>/AS_INTEGER=FALSE
+<br>;
+</code>
+<br>
+<br><strong>Get the length of the participant id using the STRING_LENGTH expression:</strong>
+<code>
+<br>Set_Pipeline_Parameter_From_Expression
+<br>/PARAMETER_NAME=SUBJECT_ID_LENGTH
+<br>/EXPRESSION=STRING_LENGTH("&::SUBJECT_ID&")
+<br>! /AS_INTEGER=TRUE
+<br>;
+</code>
+<br>
+<br><strong>Get the index of the dash using the STRING_FIND expression</strong>
+<code>
+<br>Set_Pipeline_Parameter_From_Expression
+<br>/PARAMETER_NAME=DASH_INDEX
+<br>/EXPRESSION=STRING_FIND("&::SUBJECT_ID&","-",0)
+<br>! /AS_INTEGER=TRUE
+<br>;
+</code>
+<br><strong>Subtract the DASH_INDEX from SUBJECT_ID_LENGTH to get the negative index of the dash, subtract that value by one to get the value needed for STRING_RIGHT (the negative index of the character immediately following the dash):</strong>
+<code>
+<br>Set_Pipeline_Parameter_From_Expression
+<br>/PARAMETER_NAME=RIGHT_SUBJECT_ID
+<br>/EXPRESSION=STRING_RIGHT("&::SUBJECT_ID&",&::DASH_NEGATIVE_INDEX&)
+<br>!/AS_INTEGER=FALSE
+<br>;
+</code>
+<br>
+<br><strong>Get the characters in the subject id following the dash using STRING_RIGHT:</strong>
+<code>
+<br>Set_Pipeline_Parameter_From_Expression
+<br>/PARAMETER_NAME=RIGHT_SUBJECT_ID
+<br>/EXPRESSION=STRING_RIGHT("&::SUBJECT_ID&",&::DASH_NEGATIVE_INDEX&)
+<br>!/AS_INTEGER=FALSE
+<br>;
+</code>
+<br>
+<br><strong>Concatenate LEFT_SUBJECT_ID and RIGHT_SUBJECT_ID with an underscore between them:</strong>
 <code>
+<br>Set_Pipeline_Parameter
+<br>/PARAMETER_NAME=SUBJECT_ID_CONVERTED
+<br>/PARAMETER_VALUE=_
+<br>! /PARAMETER_VALUE_SEARCH_FOR=
+<br>! /PARAMETER_VALUE_REPLACE_WITH=
+<br>/PARAMETER_VALUE_PREFIX=::LEFT_SUBJECT_ID
+<br>/PARAMETER_VALUE_APPEND=::RIGHT_SUBJECT_ID
+<br>! /MULTI_PASS=FALSE
+<br>;
+</code>
+<br>
+<br><strong>Get a list of all target names</strong>
+<code>
+<br>Set_Pipeline_Parameter_To_List_Of_Signal_Names
+<br>/PARAMETER_NAME=TARGETS
+<br>/SIGNAL_TYPES=TARGET
+<br>SIGNAL_FOLDER=ORIGINAL
+<br>;
+</code>
+<br>
+<br><strong>Using the updated subject ID and the list of target names, use the modify participants parameters pipeling function the add the neeeded subject prefixes, ensuring OVERWRITE_C3D_FILE is set to true so the changes are preserved:</strong>
+<code>
+<br>Modify_C3D_Subjects_Parameters
+<br>! /INCLUDE_CAL_FILE=TRUE
+<br>/IS_STATIC=0
+<br>/SUBJECTS_USES_PREFIXES=1
+<br>! /REMOVE_EXISTING_LABEL_PREFIXES=TRUE
+<br>! /REMOVE_EXISTING_PREFIX_PARAMETERS=FALSE
+<br>/SUBJECTS_LABEL_PREFIXES=::SUBJECT_ID_CONVERTED
+<br>/PREFIX_MARKERS=::TARGETS
+<br>! /PREFIX_ROTATIONS=
+<br>! /PREFIX_C3D_PARAMETERS=
+<br>! /C3D_PARAMETER_GROUPS=SUBJECTS
+<br>! /MARKER_SETS=
+<br>/SUBJECTS_NAMES=::SUBJECT_ID_CONVERTED
+<br>! /SUBJECTS_DISPLAY_SETS=
+<br>! /SUBJECTS_MODELS=
+<br>! /SUBJECTS_MODEL_PARAMS=
+<br>/OVERWRITE_C3D_FILE=1
+<br>;
+</code>
+<br>
+<br><strong>Clear the workspace before continuing the loop to preserve memory, and close the loop:</strong>
+<code>
+<br>File_New
+<br>;
+<br>
+<br>End the loop
+<br>End_For_Each
+<br>/ITERATION_PARAMETER_NAME=DYNAMIC_FILE
+<br>;
 </code>

Visual3D Tutorial: Looking at Large Public Datasets: Difference between revisions

Revision as of 19:07, 12 January 2024

Contents

Large Datasets

Dataset

File Naming Conventions

Inconsistent Subject Prefixes (Stroke)

Inconsistent Subject Prefixes (Healthy)

Building Models

Processing C3Ds into CMZs (Stroke)

Processing C3Ds into CMZs (Healthy)

Missing Parameters

Building Gait Events

Conclusion

References

Navigation menu

Visual3D Tutorial: Looking at Large Public Datasets: Difference between revisions

Revision as of 19:07, 12 January 2024

Large Datasets

Dataset

File Naming Conventions

Inconsistent Subject Prefixes (Stroke)

Inconsistent Subject Prefixes (Healthy)

Building Models

Processing C3Ds into CMZs (Stroke)

Processing C3Ds into CMZs (Healthy)

Missing Parameters

Building Gait Events

Conclusion

References

Navigation menu

Search