dislib.classification.KNeighborsClassifier¶
- class dislib.classification.knn.base.KNeighborsClassifier(n_neighbors: int = 5, weights: str = 'uniform', random_state=None)[source]¶
Bases:
BaseEstimator
Classifier implementing the k-nearest neighbors vote.
- Parameters
n_neighbors (int, default=5) – Number of neighbors to use by default for
kneighbors()
queries.weights ({‘uniform’, ‘distance’} or callable, default=’uniform’) – Weight function used in prediction. Possible values: - ‘uniform’ : uniform weights. All points in each neighborhood
are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
random_state (int, RandomState instance or None, optional (default=None)) – The seed of the pseudo random number generator used when shuffling the data for probability estimates. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of
algorithm
andleaf_size
. .. warning:Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor `k+1` and `k`, have identical distances but different labels, the results will depend on the ordering of the training data.
https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
Examples
>>> import dislib as ds >>> from dislib.classification import KNeighborsClassifier >>> import numpy as np >>> >>> >>> if __name__ == '__main__': >>> x = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) >>> y = np.array([1, 1, 2, 2]) >>> train_data = ds.array(x, block_size=(4, 2)) >>> train_labels = ds.array(y, block_size=(1, 2)) >>> knn = KNeighborsClassifier(n_neighbors=3) >>> knn.fit(train_data, train_labels) >>> test_data = ds.array(np.array([[-0.8, -1]]), block_size=(1, 2)) >>> y_pred = knn.predict(test_data) >>> print(y_pred)
- fit(x: Array, y: Array)[source]¶
Fit the model using training data.
- Parameters
x (ds-array, shape=(n_samples, n_features)) – Training data.
y (ds-array, shape=(n_samples, 1)) – Class labels of x.
- Returns
self
- Return type
- load_model(filepath, load_format='json')[source]¶
Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model
load_format (str, optional (default=’json’)) – Format used to load the model.
Examples
>>> from dislib.clasiffication import KNeighborsClassifier >>> import numpy as np >>> import dislib as ds >>> x_data = np.array([[1, 2], [2, 0], [3, 1], [4, 4], [5, 3]]) >>> y_data = np.array([2, 1, 1, 2, 0]) >>> x_test_m = np.array([[3, 2], [4, 4], [1, 3]]) >>> bn, bm = 2, 2 >>> x = ds.array(x=x_data, block_size=(bn, bm)) >>> y = ds.array(x=y_data, block_size=(bn, 1)) >>> test_data_m = ds.array(x=x_test_m, block_size=(bn, bm)) >>> knn = KNeighborsClassifier() >>> knn.fit(x, y) >>> knn.save_model("./model_KNN") >>> knn_loaded = KNeighborsClassifier() >>> knn_loaded.load_model("./model_KNN") >>> pred = knn_loaded.predict(test_data).collect()
- predict(q: Array)[source]¶
Perform classification on samples.
- Parameters
x (ds-array, shape=(n_samples, n_features)) – Input samples.
- Returns
y – Class labels of x.
- Return type
ds-array, shape(n_samples, 1)
- save_model(filepath, overwrite=True, save_format='json')[source]¶
Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model
overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.
save_format (str, optional (default=’json)) – Format used to save the models.
Examples
>>> from dislib.classification import KNeighborsClassifier >>> import numpy as np >>> import dislib as ds >>> data = np.array([[0, 0, 5], [3, 0, 5], [3, 1, 2]]) >>> y_data = np.array([2, 1, 1, 2, 0]) >>> train = ds.array(x=ratings, block_size=(1, 1)) >>> knn = KNeighborsClassifier() >>> knn.fit(train) >>> knn.save_model("./model_KNN")
- score(q: Array, y: Array, collect=False)[source]¶
Returns the mean accuracy on the given test data and labels.
- Parameters
x (ds-array, shape=(n_samples, n_features)) – Test samples.
y (ds-array, shape=(n_samples, 1)) – True labels for x.
collect (bool, optional (default=False)) – When True, a synchronized result is returned.
- Returns
score – Mean accuracy of self.predict(x) wrt. y.
- Return type
float (as future object)