Grupo de Investigación Cantor: junio 2023

By: Esteban Alfaro

Date: may 2023

Clustering k-means:

As a method of machine learning, K-means grouping also known as K-means clustering is the most widely used partitional clustering algorithm. A partitional clustering aim to create groups (or clusters) present in the data by optimizing a specific objective function and iteratively improving the quality of the partitions.

K-means clustering is based on the Lloyd algorithm that was proposed by Stuart P. Lloyd of Bell Labs in 1957 as a technique for pulse-code modulation. Lloyd's work became widely circulated but remained unpublished until 1982.

Given a dataset D = {x 1 , x 2 , …, x N } consists of N points, let us denote the clustering obtained after applying K -means clustering by C = {C 1 , C 2 , …, C k …, C K }. The SSE for this clustering is defined by (1) where c k is the centroid of cluster C k . The objective is to find a clustering that minimizes the SSE score. The iterative assignment and update steps of the K -means algorithm aim to minimize the SSE score for the given set of centroids.

K-means clustering algorithm:

K -Means algorithm:

Select K points as initial centroids

repeat

Form K clusters by assigning each point to its closest centroid

Recompute the centroid of each cluster as the mean of each
cluster

until convergence criterion is met
--------------------------------------
Grupo de Investigación Cantor © 2022 by Esteban Alfaro Sabogal is licensed under CC BY-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/

Grupo de Investigación Cantor

lunes, 12 de junio de 2023

Unsupervised machine learning: clustering

Experimenting with different metrics

Denunciar abuso