dislib.classification.KNeighborsClassifier

class dislib.classification.knn.base.KNeighborsClassifier(n_neighbors: int = 5, weights: str = 'uniform', random_state=None)[source]

Bases: sklearn.base.BaseEstimator

Classifier implementing the k-nearest neighbors vote.

Parameters:
  • n_neighbors (int, default=5) – Number of neighbors to use by default for kneighbors() queries.

  • weights ({‘uniform’, ‘distance’} or callable, default=’uniform’) – Weight function used in prediction. Possible values: - ‘uniform’ : uniform weights. All points in each neighborhood

    are weighted equally.

    • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
    • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
  • random_state (int, RandomState instance or None, optional (default=None)) – The seed of the pseudo random number generator used when shuffling the data for probability estimates. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size. .. warning:

Regarding the Nearest Neighbors algorithms, if it is found that two
neighbors, neighbor `k+1` and `k`, have identical distances
but different labels, the results will depend on the ordering of the
training data.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples

>>> import dislib as ds
>>> from dislib.classification import KNeighborsClassifier
>>> import numpy as np
>>>
>>>
>>> if __name__ == '__main__':
>>>     x = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>>     y = np.array([1, 1, 2, 2])
>>>     train_data = ds.array(x, block_size=(4, 2))
>>>     train_labels = ds.array(y, block_size=(1, 2))
>>>     knn = KNeighborsClassifier(n_neighbors=3)
>>>     knn.fit(train_data, train_labels)
>>>     test_data = ds.array(np.array([[-0.8, -1]]), block_size=(1, 2))
>>>     y_pred = knn.predict(test_data)
>>>     print(y_pred)
fit(x: dislib.data.array.Array, y: dislib.data.array.Array)[source]

Fit the model using training data.

Parameters:
  • x (ds-array, shape=(n_samples, n_features)) – Training data.
  • y (ds-array, shape=(n_samples, 1)) – Class labels of x.
Returns:

self

Return type:

KNeighborsClassifier

load_model(filepath, load_format='json')[source]

Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model

  • load_format (str, optional (default=’json’)) – Format used to load the model.

Examples

>>> from dislib.clasiffication import KNeighborsClassifier
>>> import numpy as np
>>> import dislib as ds
>>> x_data = np.array([[1, 2], [2, 0], [3, 1], [4, 4], [5, 3]])
>>> y_data = np.array([2, 1, 1, 2, 0])
>>> x_test_m = np.array([[3, 2], [4, 4], [1, 3]])
>>> bn, bm = 2, 2
>>> x = ds.array(x=x_data, block_size=(bn, bm))
>>> y = ds.array(x=y_data, block_size=(bn, 1))
>>> test_data_m = ds.array(x=x_test_m, block_size=(bn, bm))
>>> knn = KNeighborsClassifier()
>>> knn.fit(x, y)
>>> knn.save_model("./model_KNN")
>>> knn_loaded = KNeighborsClassifier()
>>> knn_loaded.load_model("./model_KNN")
>>> pred = knn_loaded.predict(test_data).collect()
predict(q: dislib.data.array.Array)[source]

Perform classification on samples.

Parameters:x (ds-array, shape=(n_samples, n_features)) – Input samples.
Returns:y – Class labels of x.
Return type:ds-array, shape(n_samples, 1)
save_model(filepath, overwrite=True, save_format='json')[source]

Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model

  • overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.
  • save_format (str, optional (default=’json)) – Format used to save the models.

Examples

>>> from dislib.classification import KNeighborsClassifier
>>> import numpy as np
>>> import dislib as ds
>>>  data = np.array([[0, 0, 5], [3, 0, 5], [3, 1, 2]])
>>> y_data = np.array([2, 1, 1, 2, 0])
>>> train = ds.array(x=ratings, block_size=(1, 1))
>>> knn = KNeighborsClassifier()
>>> knn.fit(train)
>>> knn.save_model("./model_KNN")
score(q: dislib.data.array.Array, y: dislib.data.array.Array, collect=False)[source]

Returns the mean accuracy on the given test data and labels.

Parameters:
  • x (ds-array, shape=(n_samples, n_features)) – Test samples.
  • y (ds-array, shape=(n_samples, 1)) – True labels for x.
  • collect (bool, optional (default=False)) – When True, a synchronized result is returned.
Returns:

score – Mean accuracy of self.predict(x) wrt. y.

Return type:

float (as future object)