dislib.cluster.K-Means¶
- class dislib.cluster.kmeans.base.KMeans(n_clusters=8, init='random', max_iter=10, tol=0.0001, arity=50, random_state=None, verbose=False)[source]¶
Bases:
BaseEstimator
Perform K-means clustering.
- Parameters
n_clusters (int, optional (default=8)) – The number of clusters to form as well as the number of centroids to generate.
init ({‘random’, nd-array or sparse matrix}, optional (default=’random’)) – Method of initialization, defaults to ‘random’, which generates random centers at the beginning.
If an nd-array or sparse matrix is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
max_iter (int, optional (default=10)) – Maximum number of iterations of the k-means algorithm for a single run.
tol (float, optional (default=1e-4)) – Tolerance for accepting convergence.
arity (int, optional (default=50)) – Arity of the reduction carried out during the computation of the new centroids.
random_state (int or RandomState, optional (default=None)) – Seed or numpy.random.RandomState instance to generate random numbers for centroid initialization.
verbose (boolean, optional (default=False)) – Whether to print progress information.
- Variables
centers (ndarray) – Computed centroids.
n_iter (int) – Number of iterations performed.
Examples
>>> import dislib as ds >>> from dislib.cluster import KMeans >>> import numpy as np >>> >>> >>> if __name__ == '__main__': >>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) >>> x_train = ds.array(x, (2, 2)) >>> kmeans = KMeans(n_clusters=2, random_state=0) >>> labels = kmeans.fit_predict(x_train) >>> print(labels) >>> x_test = ds.array(np.array([[0, 0], [4, 4]]), (2, 2)) >>> labels = kmeans.predict(x_test) >>> print(labels) >>> print(kmeans.centers)
- fit(x, y=None)[source]¶
Compute K-means clustering.
- Parameters
x (ds-array) – Samples to cluster.
y (ignored) – Not used, present here for API consistency by convention.
- Returns
self
- Return type
- fit_predict(x, y=None)[source]¶
Compute cluster centers and predict cluster index for each sample.
- Parameters
x (ds-array) – Samples to cluster.
y (ignored) – Not used, present here for API consistency by convention.
- Returns
labels – Index of the cluster each sample belongs to.
- Return type
ds-array, shape=(n_samples, 1)
- load_model(filepath, load_format='json')[source]¶
Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model
load_format (str, optional (default=’json’)) – Format used to load the model.
Examples
>>> from dislib.cluster import KMeans >>> import numpy as np >>> import dislib as ds >>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) >>> x_train = ds.array(x, (2, 2)) >>> model = KMeans(n_clusters=2, random_state=0) >>> model.fit(x_train) >>> model.save_model('/tmp/model') >>> loaded_model = KMeans() >>> loaded_model.load_model('/tmp/model') >>> x_test = ds.array(np.array([[0, 0], [4, 4]]), (2, 2)) >>> model_pred = model.predict(x_test) >>> loaded_model_pred = loaded_model.predict(x_test) >>> assert np.allclose(model_pred.collect(), >>> loaded_model_pred.collect())
- predict(x)[source]¶
Predict the closest cluster each sample in the data belongs to.
- Parameters
x (ds-array) – New data to predict.
- Returns
labels – Index of the cluster each sample belongs to.
- Return type
ds-array, shape=(n_samples, 1)
- save_model(filepath, overwrite=True, save_format='json')[source]¶
Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model
overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.
save_format (str, optional (default=’json)) – Format used to save the models.
Examples
>>> from dislib.cluster import KMeans >>> import numpy as np >>> import dislib as ds >>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) >>> x_train = ds.array(x, (2, 2)) >>> model = KMeans(n_clusters=2, random_state=0) >>> model.fit(x_train) >>> model.save_model('/tmp/model') >>> loaded_model = KMeans() >>> loaded_model.load_model('/tmp/model') >>> x_test = ds.array(np.array([[0, 0], [4, 4]]), (2, 2)) >>> model_pred = model.predict(x_test) >>> loaded_model_pred = loaded_model.predict(x_test) >>> assert np.allclose(model_pred.collect(), >>> loaded_model_pred.collect())