dislib.cluster.K-Means¶
-
class
dislib.cluster.kmeans.base.
KMeans
(n_clusters=8, init='random', max_iter=10, tol=0.0001, arity=50, random_state=None, verbose=False)[source]¶ Bases:
sklearn.base.BaseEstimator
Perform K-means clustering.
Parameters: n_clusters (int, optional (default=8)) – The number of clusters to form as well as the number of centroids to generate.
init ({‘random’, nd-array or sparse matrix}, optional (default=’random’)) – Method of initialization, defaults to ‘random’, which generates random centers at the beginning.
If an nd-array or sparse matrix is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
max_iter (int, optional (default=10)) – Maximum number of iterations of the k-means algorithm for a single run.
tol (float, optional (default=1e-4)) – Tolerance for accepting convergence.
arity (int, optional (default=50)) – Arity of the reduction carried out during the computation of the new centroids.
random_state (int or RandomState, optional (default=None)) – Seed or numpy.random.RandomState instance to generate random numbers for centroid initialization.
verbose (boolean, optional (default=False)) – Whether to print progress information.
Variables: - centers (ndarray) – Computed centroids.
- n_iter (int) – Number of iterations performed.
Examples
>>> import dislib as ds >>> from dislib.cluster import KMeans >>> import numpy as np >>> >>> >>> if __name__ == '__main__': >>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) >>> x_train = ds.array(x, (2, 2)) >>> kmeans = KMeans(n_clusters=2, random_state=0) >>> labels = kmeans.fit_predict(x_train) >>> print(labels) >>> x_test = ds.array(np.array([[0, 0], [4, 4]]), (2, 2)) >>> labels = kmeans.predict(x_test) >>> print(labels) >>> print(kmeans.centers)
-
fit
(x, y=None)[source]¶ Compute K-means clustering.
Parameters: - x (ds-array) – Samples to cluster.
- y (ignored) – Not used, present here for API consistency by convention.
Returns: self
Return type:
-
fit_predict
(x, y=None)[source]¶ Compute cluster centers and predict cluster index for each sample.
Parameters: - x (ds-array) – Samples to cluster.
- y (ignored) – Not used, present here for API consistency by convention.
Returns: labels – Index of the cluster each sample belongs to.
Return type: ds-array, shape=(n_samples, 1)
-
load_model
(filepath, load_format='json')[source]¶ Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model
- load_format (str, optional (default=’json’)) – Format used to load the model.
Examples
>>> from dislib.cluster import KMeans >>> import numpy as np >>> import dislib as ds >>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) >>> x_train = ds.array(x, (2, 2)) >>> model = KMeans(n_clusters=2, random_state=0) >>> model.fit(x_train) >>> model.save_model('/tmp/model') >>> loaded_model = KMeans() >>> loaded_model.load_model('/tmp/model') >>> x_test = ds.array(np.array([[0, 0], [4, 4]]), (2, 2)) >>> model_pred = model.predict(x_test) >>> loaded_model_pred = loaded_model.predict(x_test) >>> assert np.allclose(model_pred.collect(), >>> loaded_model_pred.collect())
-
predict
(x)[source]¶ Predict the closest cluster each sample in the data belongs to.
Parameters: x (ds-array) – New data to predict. Returns: labels – Index of the cluster each sample belongs to. Return type: ds-array, shape=(n_samples, 1)
-
save_model
(filepath, overwrite=True, save_format='json')[source]¶ Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model
- overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.
- save_format (str, optional (default=’json)) – Format used to save the models.
Examples
>>> from dislib.cluster import KMeans >>> import numpy as np >>> import dislib as ds >>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) >>> x_train = ds.array(x, (2, 2)) >>> model = KMeans(n_clusters=2, random_state=0) >>> model.fit(x_train) >>> model.save_model('/tmp/model') >>> loaded_model = KMeans() >>> loaded_model.load_model('/tmp/model') >>> x_test = ds.array(np.array([[0, 0], [4, 4]]), (2, 2)) >>> model_pred = model.predict(x_test) >>> loaded_model_pred = loaded_model.predict(x_test) >>> assert np.allclose(model_pred.collect(), >>> loaded_model_pred.collect())