dislib.cluster.K-Means

class dislib.cluster.kmeans.base.KMeans(n_clusters=8, max_iter=10, tol=0.0001, arity=50, random_state=None, verbose=False)[source]

Bases: object

Perform K-means clustering.

Parameters:
  • n_clusters (int, optional (default=8)) – The number of clusters to form as well as the number of centroids to generate.
  • max_iter (int, optional (default=10)) – Maximum number of iterations of the k-means algorithm for a single run.
  • tol (float, optional (default=1e-4)) – Tolerance for accepting convergence.
  • arity (int, optional (default=50)) – Arity of the reduction carried out during the computation of the new centroids.
  • random_state (int or RandomState, optional (default=None)) – Seed or numpy.random.RandomState instance to generate random numbers for centroid initialization.
  • verbose (boolean, optional (default=False)) – Whether to print progress information.
Variables:
  • centers (ndarray) – Computed centroids.
  • n_iter (int) – Number of iterations performed.

Examples

>>> from dislib.cluster import KMeans
>>> import numpy as np
>>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
>>> from dislib.data import load_data
>>> train_data = load_data(x=x, subset_size=2)
>>> kmeans = KMeans(n_clusters=2, random_state=0)
>>> kmeans.fit_predict(train_data)
>>> print(train_data.labels)
>>> test_data = load_data(x=np.array([[0, 0], [4, 4]]), subset_size=2)
>>> kmeans.predict(test_data)
>>> print(test_data.labels)
>>> print(kmeans.centers)
fit(dataset)[source]

Compute K-means clustering.

Parameters:dataset (Dataset) – Samples to cluster.
fit_predict(dataset)[source]

Performs clustering on data, and sets the cluster labels of the input Dataset.

Parameters:dataset (Dataset) – Samples to cluster.
predict(dataset)[source]

Predict the closest cluster each sample in dataset belongs to. Cluster labels are stored in dataset.

Parameters:dataset (Dataset) – New data to predict.