Sift - Mahalanobis Distance Dialog: Difference between revisions

From Software Product Documentation
Jump to navigation Jump to search
No edit summary
No edit summary
Line 7: Line 7:
Mahalanobis Distance is a common measure used to determine outliers in a data sample. The Mahalanobis distance can be conceptualized as the distance from a point to a centroid of a data set, taking into account correlations in the data set. The Mahalanobis distance method can be used on PCA results. This is done by measuring the distance of each point to the centroid in the transformed PCA space.
Mahalanobis Distance is a common measure used to determine outliers in a data sample. The Mahalanobis distance can be conceptualized as the distance from a point to a centroid of a data set, taking into account correlations in the data set. The Mahalanobis distance method can be used on PCA results. This is done by measuring the distance of each point to the centroid in the transformed PCA space.


The Mahalanobis Distance is found on the toolbar and under 'Outlier Detecting Using PCA' in the Analysis menu.
SPE (Squared Prediction Error) measures the distance between the model prediction and the true model measurement i.e. it is the distance between the true point and the k-dimension transformed point. If the distance between these points exceeds a certain threshold it is determined an outlier, since it "Doesn't fit" the predictive model.
 
The Mahalanobis Distance and SPE are found on the toolbar and under 'Outlier Detecting Using PCA' in the Analysis menu.


[[image:MD_button.png]]
[[image:MD_button.png]]
Line 13: Line 15:
==Dialog==
==Dialog==
[[image:MD_dlg.png|right]]
[[image:MD_dlg.png|right]]
The Mahalanobis Distance using PCA allows the user to automatically search the data and identify traces that are outside the norm. The search for outliers can be done by Combined Groups, Group, and Workspaces. Users can decide if they want to auto-exclude any outliers that are detected in the groups or workspaces, and specify any thresholds and P-values of high, low, and combined variability.
<ul>
 
<li><strong>Grouping to Search:</strong> What kind of grouping is used to determine the centroid, Combined Groups, Groups, Workspaces</li>
<li><strong>Auto-exclude results:</strong> If checked and outliers found will automatically be removed</li>
<li><strong>Number of Passes:</strong> How many times should the test be run, removing an outlier may alter the centroid, exposing more outliers</li>
<li><strong>Find All Outliers:</strong> If checked Number of Passes will be ignored, and the test will be run until no outliers are found</li>
<li><strong>Determine Number of PCs Using Variance Explained:</strong> If checked PCs Variance Explained will be displayed instead of Number of PCs</li>
<li><strong>Number of PCs:</strong> How many principal components should be considered for the test</li>
<li><strong>PCs Variance Explained:</strong> Instead of selecting the number of PCs directly, select the amount of variance explained</li>
<li><strong>Outlier alpha value:</strong> The threshold used to determine an outlier</li>


* '''Combined group passes:''' The number of passes to make on the combined groups.
</ul>
* '''Group passes:''' The number of passes to make on each group.
* '''Workspace passes:''' The number of passes to make on each workspace.
* '''Separate conditions in test:''' If conditions should be treated as separate groups.
* '''Auto-exclude results:''' If outliers should be automatically excluded from the results.
* '''High variability PC threshold:''' .
* '''High variability P-Value:''' .
* '''Low variability P-Value:''' .
* '''Combined variability P-Value :''' .


==Results==
==Results==

Revision as of 17:15, 29 May 2024

Language:  English  • français • italiano • português • español 

Mahalanobis Distance is a common measure used to determine outliers in a data sample. The Mahalanobis distance can be conceptualized as the distance from a point to a centroid of a data set, taking into account correlations in the data set. The Mahalanobis distance method can be used on PCA results. This is done by measuring the distance of each point to the centroid in the transformed PCA space.

SPE (Squared Prediction Error) measures the distance between the model prediction and the true model measurement i.e. it is the distance between the true point and the k-dimension transformed point. If the distance between these points exceeds a certain threshold it is determined an outlier, since it "Doesn't fit" the predictive model.

The Mahalanobis Distance and SPE are found on the toolbar and under 'Outlier Detecting Using PCA' in the Analysis menu.

Dialog

  • Grouping to Search: What kind of grouping is used to determine the centroid, Combined Groups, Groups, Workspaces
  • Auto-exclude results: If checked and outliers found will automatically be removed
  • Number of Passes: How many times should the test be run, removing an outlier may alter the centroid, exposing more outliers
  • Find All Outliers: If checked Number of Passes will be ignored, and the test will be run until no outliers are found
  • Determine Number of PCs Using Variance Explained: If checked PCs Variance Explained will be displayed instead of Number of PCs
  • Number of PCs: How many principal components should be considered for the test
  • PCs Variance Explained: Instead of selecting the number of PCs directly, select the amount of variance explained
  • Outlier alpha value: The threshold used to determine an outlier

Results

The Mahalanobis Distance results appear upon completion of running the test.

Retrieved from ""