API Reference#

dislib.array: Distributed array#

Classes#

data.Array - 2-dimensional array divided in blocks that can be operated in a distributed way.

data.Tensor - n-dimensional tensor divided in blocks that can be operated in a distributed way.

Array creation routines#

dislib.array - Build a distributed array (ds-array) from an array-like structure, such as a NumPy array, a list, or a SciPy sparse matrix.

dislib.random_array - Build a ds-array with random contents.

dislib.zeros - Build a ds-array filled with zeros.

dislib.full - Build a ds-array filled with a value.

dislib.eye - Build an eye ds-array.

dislib.identity - Build an identity ds-array.

dislib.load_svmlight_file - Build a ds-array from a file in SVMlight format.

dislib.load_txt_file - Build a ds-array from a text file.

dislib.load_npy_file - Build a ds-array from a binary NumPy file.

dislib.load_mdcrd_file - Build a ds-array from a mdcrd trajectory file.

dislib.data.load_hstack_npy_files - Build a ds-array from .npy files, concatenating them side-by-side.

dislib.save_txt - Save a ds-array by blocks to a directory in txt format.

Utility functions#

data.util.compute_bottom_right_shape - Computes a shape of the bottom right block.

data.util.pad - Pad array blocks with the desired value.

data.util.pad_last_blocks_with_zeros - Pad array blocks with zeros.

data.util.remove_last_columns - Removes last columns from the right-most blocks of the ds-array.

data.util.remove_last_rows - Removes last rows from the bottom blocks of the ds-array.

Tensor creation routines#

dislib.from_array - Build a ds-tensor from a NumPy array.

dislib.from_pt_tensor - Build a ds-tensor from a PyTorch tensor.

dislib.from_ds_array - Build a ds-tensor from a ds-array.

dislib.create_ds_tensor - Build a ds-tensor from a list of tensors.

dislib.random_tensors - Build a ds-tensor with random contents.

dislib.data.tensor.load_dataset - Build a ds-tensor from a dataset of files.

Tensor utility functions#

dislib.data.tensor.cat - Concatenate ds-tensors along a dimension.

dislib.data.tensor.change_shape - Change the shape of a ds-tensor.

dislib.data.tensor.rechunk_tensor - Rechunk a ds-tensor along a dimension.

dislib.data.tensor.shuffle - Randomly shuffle a ds-tensor.

Other functions#

dislib.apply_along_axis - Applies a function to a ds-array along a given axis.

dislib.classification: Classification#

classification.CascadeSVM - Distributed support vector classification using a cascade of classifiers.

classification.KNeighborsClassifier - Distributed K neighbors classification using partial classifiers.

dislib.cluster: Clustering#

cluster.DBSCAN - Perform DBSCAN clustering.

cluster.KMeans - Perform K-Means clustering.

cluster.GaussianMixture - Fit a gaussian mixture model.

cluster.Daura - Perform Daura clustering.

dislib.decomposition: Matrix Decomposition#

decomposition.qr - QR decomposition.

decomposition.tsqr - Tall-Skinny QR decomposition.

decomposition.PCA - Principal Component Analysis (PCA).

decomposition.lanczos_svd - Lanczos SVD decomposition.

decomposition.random_svd - Random SVD decomposition.

dislib.math: Mathematical functions#

dislib.kron - Computes the Kronecker product of two ds-arrays.

dislib.svd - Singular value decomposition of a ds-array.

dislib.pytorch and dislib.eddl: Distributed neural network training#

pytorch.EncapsulatedFunctionsDistributedPytorch - Distributed training of neural networks using PyTorch backend.

eddl.EncapsulatedFunctionsDistributedEddl - Distributed training of neural networks using EDDL backend.

dislib.model_selection: Model selection#

model_selection.GridSearchCV - Exhaustive search over specified parameter values for an estimator.

model_selection.RandomizedSearchCV - Randomized search over estimator parameters sampled from given distributions.

model_selection.SimulationGridSearch - Exhaustive search over estimator parameters sampled from given distributions.

model_selection.KFold - K-fold splitter for cross-validation.

model_selection.train_test_split - Split arrays or matrices into random train and test subsets.

dislib.neighbors: Neighbor queries#

cluster.NearestNeighbors - Perform k-nearest neighbors queries.

dislib.preprocessing: Data pre-processing#

preprocessing.MinMaxScaler - Scale a ds-array to zero mean and unit variance.

preprocessing.StandardScaler - Scale a ds-array to the given range.

dislib.recommendation: Recommendation#

recommendation.ALS - Distributed alternating least squares for collaborative filtering.

dislib.regression: Regression#

regression.LinearRegression - Multivariate linear regression using ordinary least squares.

regression.Lasso - Linear Model trained with L1 prior as regularizer.

dislib.sorting: Sorting#

sorting.TeraSort - Sorts the ds-array using the TeraSort algorithm.

dislib.trees: Trees#

trees.DecisionTreeClassifier - Build a decision tree.

trees.DecisionTreeRegressor - Build a regression tree.

trees.RandomForestClassifier - Build a random forest for classification.

trees.RandomForestRegressor - Build a random forest for regression.

trees.mmap.DecisionTreeClassifier - Build a decision tree using memorymap.

trees.mmap.DecisionTreeRegressor - Build a regression tree using memorymap.

trees.mmap.RandomForestClassifier - Build a random forest for classification using memorymap.

trees.mmap.RandomForestRegressor - Build a random forest for regression using memorymap.

trees.distributed.DecisionTreeClassifier - Build a decision tree using the distributed approach.

trees.distributed.DecisionTreeRegressor - Build a regression tree using the distributed approach.

trees.distributed.RandomForestClassifier - Build a random forest for classification using the distributed approach.

trees.distributed.RandomForestRegressor - Build a random forest for regression using the distributed approach.

trees.nested.DecisionTreeClassifier - Build a decision tree using the nested approach.

trees.nested.DecisionTreeRegressor - Build a regression tree using the nested approach.

trees.nested.RandomForestClassifier - Build a random forest for classification using the nested approach.

trees.nested.RandomForestRegressor - Build a random forest for regression using the nested approach.

dislib.utils: Utility functions#

utils.shuffle - Randomly shuffles the rows of a ds-array.