lunes, 12 de junio de 2023

Unsupervised machine learning: clustering

By: Esteban Alfaro

Date: may 2023

Clustering k-means:

As a method of machine learning, K-means grouping also known as K-means clustering is the most widely used partitional clustering algorithm. A partitional clustering aim to create groups (or clusters) present in the data by optimizing a specific objective function and iteratively improving the quality of the partitions.

K-means clustering is based on the Lloyd algorithm that was proposed by Stuart P. Lloyd of Bell Labs in 1957 as a technique for pulse-code modulation. Lloyd's work became widely circulated but remained unpublished until 1982.

Given a dataset D = {x 1 , x 2 , …, x N } consists of N points, let us denote the clustering obtained after applying K -means clustering by C = {C 1 , C 2 , …, C k …, C K }. The SSE for this clustering is defined by (1) where c k is the centroid of cluster C k . The objective is to find a clustering that minimizes the SSE score. The iterative assignment and update steps of the K -means algorithm aim to minimize the SSE score for the given set of centroids.


K-means clustering algorithm:

K -Means algorithm:

  1. Select K points as initial centroids
  2. repeat
  3. Form K clusters by assigning each point to its closest centroid
  4. Recompute the centroid of each cluster as the mean of each
    cluster
  5. until convergence criterion is met
  6.  --------------------------------------

  7.  
        Grupo de Investigación Cantor   © 2022 by Esteban Alfaro Sabogal is licensed under CC BY-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/
     



Experimenting with different metrics

Non-convex balls By: Esteban Alfaro Sabogal Date: 06 august 2023 We are going to draw computationally the points that are a distance < r ...