Grupo de Investigación Cantor: Unsupervised machine learning

By: Esteban Alfaro

Date: may 2023

Clustering k-means:

As a method of machine learning, K-means grouping also known as K-means clustering is the most widely used partitional clustering algorithm. A partitional clustering aim to create groups (or clusters) present in the data by optimizing a specific objective function and iteratively improving the quality of the partitions.

K-means clustering is based on the Lloyd algorithm that was proposed by Stuart P. Lloyd of Bell Labs in 1957 as a technique for pulse-code modulation. Lloyd's work became widely circulated but remained unpublished until 1982.

Given a dataset D = {x 1 , x 2 , …, x N } consists of N points, let us denote the clustering obtained after applying K -means clustering by C = {C 1 , C 2 , …, C k …, C K }. The SSE for this clustering is defined by (1) where c k is the centroid of cluster C k . The objective is to find a clustering that minimizes the SSE score. The iterative assignment and update steps of the K -means algorithm aim to minimize the SSE score for the given set of centroids.

K-means clustering algorithm:

K -Means algorithm:

Select K points as initial centroids

repeat

Form K clusters by assigning each point to its closest centroid

Recompute the centroid of each cluster as the mean of each
cluster

until convergence criterion is met

Grupo de Investigación Cantor

Unsupervised machine learning

No hay comentarios:

Publicar un comentario

Experimenting with different metrics

Denunciar abuso