dislib.classification.CascadeSVM¶

class dislib.classification.csvm.base.CascadeSVM(cascade_arity=2, max_iter=5, tol=0.001, kernel='rbf', c=1, gamma='auto', check_convergence=True, random_state=None, verbose=False)[source]¶

Bases: object

Cascade Support Vector classification.

Implements distributed support vector classification based on Graf et al. [1]. The optimization process is carried out using scikit-learn’s SVC.

Parameters:

Parameters:	cascade_arity (int, optional (default=2)) – Arity of the reduction process. max_iter (int, optional (default=5)) – Maximum number of iterations to perform. tol (float, optional (default=1e-3)) – Tolerance for the stopping criterion. kernel (string, optional (default=’rbf’)) – Specifies the kernel type to be used in the algorithm. Supported kernels are ‘linear’ and ‘rbf’. c (float, optional (default=1.0)) – Penalty parameter C of the error term. gamma (float, optional (default=’auto’)) – Kernel coefficient for ‘rbf’. Default is ‘auto’, which uses 1 / (n_features). check_convergence (boolean, optional (default=True)) – Whether to test for convergence. If False, the algorithm will run for cascade_iterations. Checking for convergence adds a synchronization point after each iteration. If ``check_convergence=False’’ synchronization does not happen until a call to ``predict’‘, ``decision_function’’ or ``score’‘. This can be useful to fit multiple models in parallel. random_state (int, RandomState instance or None, optional (default=None)) – The seed of the pseudo random number generator used when shuffling the data for probability estimates. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. verbose (boolean, optional (default=False)) – Whether to print progress information.
Variables:	iterations (int) – Number of iterations performed. converged (boolean) – Whether the model has converged.

cascade_arity (int, optional (default=2)) – Arity of the reduction process.
max_iter (int, optional (default=5)) – Maximum number of iterations to perform.
tol (float, optional (default=1e-3)) – Tolerance for the stopping criterion.
kernel (string, optional (default=’rbf’)) – Specifies the kernel type to be used in the algorithm. Supported kernels are ‘linear’ and ‘rbf’.
c (float, optional (default=1.0)) – Penalty parameter C of the error term.
gamma (float, optional (default=’auto’)) – Kernel coefficient for ‘rbf’.

Default is ‘auto’, which uses 1 / (n_features).
check_convergence (boolean, optional (default=True)) – Whether to test for convergence. If False, the algorithm will run for cascade_iterations. Checking for convergence adds a synchronization point after each iteration.

If ``check_convergence=False’’ synchronization does not happen until a call to ``predict’‘, ``decision_function’’ or ``score’‘. This can be useful to fit multiple models in parallel.
random_state (int, RandomState instance or None, optional (default=None)) – The seed of the pseudo random number generator used when shuffling the data for probability estimates. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (boolean, optional (default=False)) – Whether to print progress information.

Variables:

iterations (int) – Number of iterations performed.
converged (boolean) – Whether the model has converged.

References

[1]	Graf, H. P., Cosatto, E., Bottou, L., Dourdanovic, I., & Vapnik, V. (2005). Parallel support vector machines: The cascade svm. In Advances in neural information processing systems (pp. 521-528).

Examples

>>> import numpy as np
>>> x = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from dislib.data import load_data
>>> train_data = load_data(x=x, y=y, subset_size=4)
>>> from dislib.classification import CascadeSVM
>>> svm = CascadeSVM()
>>> svm.fit(train_data)
>>> test_data = load_data(x=np.array([[-0.8, -1]]), subset_size=1)
>>> svm.predict(test_data)
>>> print(test_data.labels)

decision_function(dataset)[source]¶

Computes distances of the samples in dataset to the separating hyperplane. Distances are stored in dataset.labels.

Parameters:	dataset (Dataset)

fit(dataset)[source]¶

Fits a model using training data.

Parameters:	dataset (Dataset) – Training data.

predict(dataset)[source]¶

Perform classification on samples in dataset. This method stores labels in dataset.

Parameters:	dataset (Dataset)

score(dataset)[source]¶

Returns the mean accuracy on the given test dataset. This method assumes dataset.labels are true labels.

Parameters:	dataset (Dataset) – Dataset where dataset.labels are true labels for dataset.samples.
Returns:	score
Return type:	Mean accuracy of self.predict(dataset) wrt. dataset.labels.