User Tools

Site Tools


other:inspect3d:documentation:knowledge_discovery:k-means_clustering

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
other:inspect3d:documentation:knowledge_discovery:k-means_clustering [2024/07/16 19:22] – created sgrangerother:inspect3d:documentation:knowledge_discovery:k-means_clustering [2024/12/20 16:08] (current) wikisysop
Line 1: Line 1:
-====== K-Means_Clustering ======+====== K-Means Clustering ======
  
-The k-means clustering algorithm is a commonly used method for grouping //n// individual data points into //k// clusters. is a multi-variate statistical analysis that reduces the high-dimensional matrix of correlated, time-varying signals into a low-dimensional and statistically uncorrelated set of principal components (PCs). These PCs explain the variance found in the original signals and represent the most important features of the data, e.g., the overall magnitude or the shape of the time series at a particular point in the stride cycle. The value of each particular subject’s score for the individual PCs represents how strongly that feature was present in the data.+The k-means clustering algorithm is a commonly used method for grouping //n// individual data points into //k// clusters. is a multi-variate statistical analysis that reduces the high-dimensional matrix of correlated, time-varying signals into a low-dimensional and statistically uncorrelated set of sift:principal_component_analysis:principal_component_analysis|principal components]] (PCs). These PCs explain the variance found in the original signals and represent the most important features of the data, e.g., the overall magnitude or the shape of the time series at a particular point in the stride cycle. The value of each particular subject’s score for the individual PCs represents how strongly that feature was present in the data.
  
 ==== The utility of clustering ==== ==== The utility of clustering ====
Line 11: Line 11:
 Inspect3D allows users to apply the k-means clustering algorithm to the results of PCA. The dimensionality of the data space is the number of principal components and the user specifies the number of clusters to be found - this is the parameter //k//. Inspect3D allows users to apply the k-means clustering algorithm to the results of PCA. The dimensionality of the data space is the number of principal components and the user specifies the number of clusters to be found - this is the parameter //k//.
  
-  * In the {{I3D_PCAOptions2.png}}**PCA** menu there is the option to perform quality assurance (QA) on the results; +  * In the {{:I3D_PCAOptions2.png?30}} [[sift:principal_component_analysis:principal_component_analysis|PCA]] menu there is the option to perform quality assurance (QA) on the results; 
-  * Selecting {{I3D_RunPCA.png}}**Run QA with PCA** brings up a window with multiple options for QA;+  * Selecting {{:I3D_RunPCA.png?30}}**Run QA with PCA** brings up a window with multiple options for QA;
   * The **K-Means** tab allows the user to specify parameter values for the algorithm and then run it on PCA results.   * The **K-Means** tab allows the user to specify parameter values for the algorithm and then run it on PCA results.
  
Line 18: Line 18:
  
 The k-means clustering algorithm is more than 50 years old and is described in almost every textbook on data analysis and machine learning. Inspect3D specifically implements the k-means++ algorithm, which optimizes how the initial cluster centres are chosen. The k-means clustering algorithm is more than 50 years old and is described in almost every textbook on data analysis and machine learning. Inspect3D specifically implements the k-means++ algorithm, which optimizes how the initial cluster centres are chosen.
 +
  
 Arthur D. and Vassilvitskii S. (2007). k-means++: the advantages of careful seeding. Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035. Arthur D. and Vassilvitskii S. (2007). k-means++: the advantages of careful seeding. Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035.
 +
 **Abstract** **Abstract**
 +
 The k-means method is a widely used clustering technique that seeks to minimize the average squared distance between points in the same cluster. Although it offers no accuracy guarantees, its simplicity and speed are very appealing in practice. By augmenting k-means with a simple, randomized seeding technique, we obtain an algorithm that is O(log k)-competitive with the optimal clustering. Experiments show our augmentation improves both the speed and the accuracy of k-means, often quite dramatically. The k-means method is a widely used clustering technique that seeks to minimize the average squared distance between points in the same cluster. Although it offers no accuracy guarantees, its simplicity and speed are very appealing in practice. By augmenting k-means with a simple, randomized seeding technique, we obtain an algorithm that is O(log k)-competitive with the optimal clustering. Experiments show our augmentation improves both the speed and the accuracy of k-means, often quite dramatically.
  
  
other/inspect3d/documentation/knowledge_discovery/k-means_clustering.1721157733.txt.gz · Last modified: 2024/07/16 19:22 by sgranger