API Reference

dislib.array: Distributed array

Classes

data.Array - 2-dimensional array divided in blocks that can be operated in a distributed way.

Array creation routines

dislib.array - Build a distributed array (ds-array) from an array-like structure, such as a NumPy array, a list, or a SciPy sparse matrix.

dislib.random_array - Build a ds-array with random contents.

dislib.zeros - Build a ds-array filled with zeros.

dislib.full - Build a ds-array filled with a value.

dislib.identity - Build an identity ds-array.

dislib.load_svmlight_file - Build a ds-array from a file in SVMlight format.

dislib.load_txt_file - Build a ds-array from a text file.

dislib.load_npy_file - Build a ds-array from a binary NumPy file.

Other functions

dislib.apply_along_axis - Applies a function to a ds-array along a given axis.

dislib.utils: Utility functions

utils.shuffle - Randomly shuffles the rows of a ds-array.

dislib.math: Mathematical functions

dislib.kron - Computes the Kronecker product of two ds-arrays.

dislib.svd - Singular value decomposition of a ds-array.

dislib.preprocessing: Data pre-processing

preprocessing.StandardScaler - Scale a ds-array to zero mean and unit variance.

dislib.decomposition: Matrix Decomposition

decomposition.PCA - Principal component analysis (PCA).

dislib.cluster: Clustering

cluster.DBSCAN - Perform DBSCAN clustering.

cluster.KMeans - Perform K-Means clustering.

cluster.GaussianMixture - Fit a gaussian mixture model.

cluster.Daura - Perform Daura clustering.

dislib.classification: Classification

classification.CascadeSVM - Distributed support vector classification using a cascade of classifiers.

classification.RandomForestClassifier - Build a random forest for classification.

dislib.recommendation: Recommendation

recommendation.ALS - Distributed alternating least squares for collaborative filtering.

dislib.regression: Regression

regression.LinearRegression - Multivariate linear regression using ordinary least squares.

regression.Lasso - Linear Model trained with L1 prior as regularizer.

dislib.optimization: Optimization

optimization.ADMM - Alternating Direction Method of Multipliers (ADMM) solver.

dislib.neighbors: Neighbor queries

cluster.NearestNeighbors - Perform k-nearest neighbors queries.

dislib.model_selection: Model selection

model_selection.GridSearchCV - Exhaustive search over specified parameter values for an estimator.

model_selection.RandomizedSearchCV - Randomized search over estimator parameters sampled from given distributions.

model_selection.KFold - K-fold splitter for cross-validation.