Sift - Local Outlier Factor Dialog

From Software Product Documentation
Jump to navigation Jump to search
Language:  English  • français • italiano • português • español 

The Local Outlier Factor is a machine learning algorithm that detects outliers by using nearest neighbour distances (k-nearest neighbour). The LOF finds points that are outliers relative to local clusters. The LOF outlier score takes into account the relative density of the data points to the local clusters.

The Local Outlier Factor is found on the toolbar and under 'Outlier Detection Using PCA' in the Analysis menu.

Dialog

The Local Outlier Factor using PCA allows the user to automatically search the data and identify traces that are outside the norm. The search for outliers can be done by Combined Groups, Group, and Workspaces. Users can decide if they want to auto-exclude any outliers that are detected in the groups or workspaces. Users can specify conditions of the LOF technique, including number of nearest neighbours, thresholds, p-values, and outlier percentage.

  • Grouping to Search: How PC scores are grouped together:
    • Combined Groups: Groups all scores together regardless of actual group.
    • Groups: Groups scores from the same group together.
    • Workspaces: Groups scores from the same group and workspace together (Only available when PCA is run on traces and NOT Workspace Means).
  • Auto-exclude results: If outliers should be automatically excluded from the results.
  • Number of passes: The number of passes to make on each grouping. Outliers are deselected after each pass. Stops short if no outliers found at that pass.
  • Find all Outliers: If outliers should be identified until no more outliers are found in a pass.
  • Number of Neighbors: The number of neighbors (k) to compare against in the LOF Algorithm.
  • Determine Number of PCs Using Variance Explained: If the number of PCs (ala dimensions) should be calculated from a threshold for the variance explained, or if a manual number are chosen.
  • Number of PCs / PCs Variance Explained: If above unchecked, the number of PCs. If above checked, the % of variance to explain with the PCs.
  • Outlier Criteria: How outliers should be determined:
    • Manual: Any outlier above a specific LOF threshold is determined to be an outlier.
    • Sequential Grubbs' Test: Uses the Grubbs' test to identify a threshold for a approximate percentage of outliers at a specified alpha value.
  • Manual Outlier Threshold: If Manual above, the specific LOF threshold that determines if a PC score is an outlier.
  • Grubbs' Test Alpha: The alpha value corresponding to the Grubbs' test.
  • Outlier Percentage (approx.): The approximate % of outliers that are in the dataset.
  • Scale Data To % Variability: Scale the distance measures to be proportional to the variance along each PC axis.
  • Show LOF on workspace scores: If the LOF should be shown on the Workspace Scores graph.

Results

The LOF results appear upon completion of running the test.

Retrieved from ""