API Reference

dislib.array: Distributed array

Classes

data.Array - 2-dimensional array divided in blocks that can be operated in a distributed way.

Array creation routines

dislib.array - Build a distributed array (ds-array) from an array-like structure, such as a NumPy array, a list, or a SciPy sparse matrix.

dislib.random_array - Build a ds-array with random contents.

dislib.zeros - Build a ds-array filled with zeros.

dislib.full - Build a ds-array filled with a value.

dislib.eye - Build an eye ds-array.

dislib.identity - Build an identity ds-array.

dislib.load_svmlight_file - Build a ds-array from a file in SVMlight format.

dislib.load_txt_file - Build a ds-array from a text file.

dislib.load_npy_file - Build a ds-array from a binary NumPy file.

dislib.load_mdcrd_file - Build a ds-array from a mdcrd trajectory file.

dislib.data.load_hstack_npy_files - Build a ds-array from .npy files, concatenating them side-by-side.

dislib.save_txt - Save a ds-array by blocks to a directory in txt format.

Utility functions

data.util.compute_bottom_right_shape - Computes a shape of the bottom right block.

data.util.pad - Pad array blocks with the desired value.

data.util.pad_last_blocks_with_zeros - Pad array blocks with zeros.

data.util.remove_last_columns - Removes last columns from the right-most blocks of the ds-array.

data.util.remove_last_rows - Removes last rows from the bottom blocks of the ds-array.

Other functions

dislib.apply_along_axis - Applies a function to a ds-array along a given axis.

dislib.classification: Classification

classification.CascadeSVM - Distributed support vector classification using a cascade of classifiers.

classification.KNeighborsClassifier - Distributed K neighbors classification using partial classifiers.

dislib.cluster: Clustering

cluster.DBSCAN - Perform DBSCAN clustering.

cluster.KMeans - Perform K-Means clustering.

cluster.GaussianMixture - Fit a gaussian mixture model.

cluster.Daura - Perform Daura clustering.

dislib.decomposition: Matrix Decomposition

decomposition.qr - QR decomposition.

decomposition.tsqr - Tall-Skinny QR decomposition.

decomposition.PCA - Principal Component Analysis (PCA).

dislib.math: Mathematical functions

dislib.kron - Computes the Kronecker product of two ds-arrays.

dislib.svd - Singular value decomposition of a ds-array.

dislib.model_selection: Model selection

model_selection.GridSearchCV - Exhaustive search over specified parameter values for an estimator.

model_selection.RandomizedSearchCV - Randomized search over estimator parameters sampled from given distributions.

model_selection.KFold - K-fold splitter for cross-validation.

dislib.neighbors: Neighbor queries

cluster.NearestNeighbors - Perform k-nearest neighbors queries.

dislib.preprocessing: Data pre-processing

preprocessing.MinMaxScaler - Scale a ds-array to zero mean and unit variance.

preprocessing.StandardScaler - Scale a ds-array to the given range.

dislib.recommendation: Recommendation

recommendation.ALS - Distributed alternating least squares for collaborative filtering.

dislib.regression: Regression

regression.LinearRegression - Multivariate linear regression using ordinary least squares.

regression.Lasso - Linear Model trained with L1 prior as regularizer.

dislib.trees: Trees

trees.DecisionTreeClassifier - Build a decision tree.

trees.DecisionTreeRegressor - Build a regression tree.

trees.RandomForestClassifier - Build a random forest for classification.

trees.RandomForestRegressor - Build a random forest for regression.

dislib.utils: Utility functions

utils.shuffle - Randomly shuffles the rows of a ds-array.