dislib.classification.CascadeSVM¶

class dislib.classification.csvm.base.CascadeSVM(cascade_arity=2, max_iter=5, tol=0.001, kernel='rbf', c=1, gamma='auto', check_convergence=True, random_state=None, verbose=False)[source]¶

Bases: sklearn.base.BaseEstimator

Cascade Support Vector classification.

Implements distributed support vector classification based on Graf et al. [1]. The optimization process is carried out using scikit-learn’s SVC. This method solves binary classification problems.

Parameters:

Parameters:	cascade_arity (int, optional (default=2)) – Arity of the reduction process. max_iter (int, optional (default=5)) – Maximum number of iterations to perform. tol (float, optional (default=1e-3)) – Tolerance for the stopping criterion. kernel (string, optional (default=’rbf’)) – Specifies the kernel type to be used in the algorithm. Supported kernels are ‘linear’ and ‘rbf’. c (float, optional (default=1.0)) – Penalty parameter C of the error term. gamma (float, optional (default=’auto’)) – Kernel coefficient for ‘rbf’. Default is ‘auto’, which uses 1 / (n_features). check_convergence (boolean, optional (default=True)) – Whether to test for convergence. If False, the algorithm will run for max_iter iterations. Checking for convergence adds a synchronization point after each iteration. If ``check_convergence=False’’ synchronization does not happen until a call to ``predict’’ or ``decision_function’’. This can be useful to fit multiple models in parallel. random_state (int, RandomState instance or None, optional (default=None)) – The seed of the pseudo random number generator used when shuffling the data for probability estimates. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. verbose (boolean, optional (default=False)) – Whether to print progress information.
Variables:	iterations (int) – Number of iterations performed. converged (boolean) – Whether the model has converged.

cascade_arity (int, optional (default=2)) – Arity of the reduction process.
max_iter (int, optional (default=5)) – Maximum number of iterations to perform.
tol (float, optional (default=1e-3)) – Tolerance for the stopping criterion.
kernel (string, optional (default=’rbf’)) – Specifies the kernel type to be used in the algorithm. Supported kernels are ‘linear’ and ‘rbf’.
c (float, optional (default=1.0)) – Penalty parameter C of the error term.
gamma (float, optional (default=’auto’)) – Kernel coefficient for ‘rbf’.

Default is ‘auto’, which uses 1 / (n_features).
check_convergence (boolean, optional (default=True)) – Whether to test for convergence. If False, the algorithm will run for max_iter iterations. Checking for convergence adds a synchronization point after each iteration.

If ``check_convergence=False’’ synchronization does not happen until a call to ``predict’’ or ``decision_function’’. This can be useful to fit multiple models in parallel.
random_state (int, RandomState instance or None, optional (default=None)) – The seed of the pseudo random number generator used when shuffling the data for probability estimates. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (boolean, optional (default=False)) – Whether to print progress information.

Variables:

iterations (int) – Number of iterations performed.
converged (boolean) – Whether the model has converged.

References

[1]	Graf, H. P., Cosatto, E., Bottou, L., Dourdanovic, I., & Vapnik, V. (2005). Parallel support vector machines: The cascade svm. In Advances in neural information processing systems (pp. 521-528).

Examples

>>> import dislib as ds
>>> from dislib.classification import CascadeSVM
>>> import numpy as np
>>>
>>>
>>> if __name__ == '__main__':
>>>     x = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>>     y = np.array([1, 1, 2, 2])
>>>     train_data = ds.array(x, block_size=(4, 2))
>>>     train_labels = ds.array(y, block_size=(4, 2))
>>>     svm = CascadeSVM()
>>>     svm.fit(train_data, train_labels)
>>>     test_data = ds.array(np.array([[-0.8, -1]]), block_size=(1, 2))
>>>     y_pred = svm.predict(test_data)
>>>     print(y_pred)

decision_function(x)[source]¶

Evaluates the decision function for the samples in x.

Parameters:	x (ds-array, shape=(n_samples, n_features)) – Input samples.
Returns:	df – The decision function of the samples for each class in the model.
Return type:	ds-array, shape=(n_samples, 2)

fit(x, y)[source]¶

Fits a model using training data.

Parameters:	x (ds-array, shape=(n_samples, n_features)) – Training samples. y (ds-array, shape=(n_samples, 1)) – Class labels of x.
Returns:	self
Return type:	CascadeSVM

load_model(filepath, load_format='json')[source]¶

Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model

load_format (str, optional (default=’json’)) – Format used to load the model.

Examples

>>> from dislib.classification import CascadeSVM
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = CascadeSVM(cascade_arity=3, max_iter=10,
>>>              tol=1e-4, kernel='linear', c=2, gamma=0.1,
>>>              check_convergence=False,
>>>              random_state=seed, verbose=False)
>>> model.fit(x, y)
>>> model.save_model('/tmp/model')
>>> loaded_model = CascadeSVM()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> y_pred = model.predict(x_test)
>>> y_loaded_pred = loaded_model.predict(x_test)
>>> assert np.allclose(y_pred.collect(), y_loaded_pred.collect())

predict(x)[source]¶

Perform classification on samples.

Parameters:	x (ds-array, shape=(n_samples, n_features)) – Input samples.
Returns:	y – Class labels of x.
Return type:	ds-array, shape(n_samples, 1)

save_model(filepath, overwrite=True, save_format='json')[source]¶

Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model

overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.

save_format (str, optional (default=’json)) – Format used to save the models.

Examples

>>> from dislib.classification import CascadeSVM
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = CascadeSVM(cascade_arity=3, max_iter=10,
>>>              tol=1e-4, kernel='linear', c=2, gamma=0.1,
>>>              check_convergence=False,
>>>              random_state=seed, verbose=False)
>>> model.fit(x, y)
>>> model.save_model('/tmp/model')
>>> loaded_model = CascadeSVM()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> y_pred = model.predict(x_test)
>>> y_loaded_pred = loaded_model.predict(x_test)
>>> assert np.allclose(y_pred.collect(),
>>> y_loaded_pred.collect())

score(x, y, collect=False)[source]¶

Returns the mean accuracy on the given test data and labels.

Parameters:	x (ds-array, shape=(n_samples, n_features)) – Test samples. y (ds-array, shape=(n_samples, 1)) – True labels for x. collect (bool, optional (default=False)) – When True, a synchronized result is returned.
Returns:	score – Mean accuracy of self.predict(x) wrt. y.
Return type:	float (as future object)