dislib.cluster.K-Means¶
-
class
dislib.cluster.kmeans.base.
KMeans
(n_clusters=8, init='random', max_iter=10, tol=0.0001, arity=50, random_state=None, verbose=False)[source]¶ Bases:
sklearn.base.BaseEstimator
Perform K-means clustering.
Parameters: n_clusters (int, optional (default=8)) – The number of clusters to form as well as the number of centroids to generate.
init ({‘random’, nd-array or sparse matrix}, optional (default=’random’)) – Method of initialization, defaults to ‘random’, which generates random centers at the beginning.
If an nd-array or sparse matrix is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
max_iter (int, optional (default=10)) – Maximum number of iterations of the k-means algorithm for a single run.
tol (float, optional (default=1e-4)) – Tolerance for accepting convergence.
arity (int, optional (default=50)) – Arity of the reduction carried out during the computation of the new centroids.
random_state (int or RandomState, optional (default=None)) – Seed or numpy.random.RandomState instance to generate random numbers for centroid initialization.
verbose (boolean, optional (default=False)) – Whether to print progress information.
Variables: - centers (ndarray) – Computed centroids.
- n_iter (int) – Number of iterations performed.
Examples
>>> import dislib as ds >>> from dislib.cluster import KMeans >>> import numpy as np >>> >>> >>> if __name__ == '__main__': >>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) >>> x_train = ds.array(x, (2, 2)) >>> kmeans = KMeans(n_clusters=2, random_state=0) >>> labels = kmeans.fit_predict(x_train) >>> print(labels) >>> x_test = ds.array(np.array([[0, 0], [4, 4]]), (2, 2)) >>> labels = kmeans.predict(x_test) >>> print(labels) >>> print(kmeans.centers)
-
fit
(x, y=None)[source]¶ Compute K-means clustering.
Parameters: - x (ds-array) – Samples to cluster.
- y (ignored) – Not used, present here for API consistency by convention.
Returns: self
Return type:
-
fit_predict
(x, y=None)[source]¶ Compute cluster centers and predict cluster index for each sample.
Parameters: - x (ds-array) – Samples to cluster.
- y (ignored) – Not used, present here for API consistency by convention.
Returns: labels – Index of the cluster each sample belongs to.
Return type: ds-array, shape=(n_samples, 1)