API Reference

dislib.data: Data handling utilities

Classes

data.Dataset - Main data structure for handling distributed datasets. Dataset works as a list of Subset.

data.Subset - Collection of samples and (optionally) labels.

Functions

data.load_data - Build a Dataset from an ndarray.

data.load_libsvm_file - Build a Dataset from a file in LibSVM format (sparse).

data.load_libsvm_files - Build a Dataset from multiple files in LibSVM format (sparse).

data.load_txt_file - Build a Dataset from a text file.

data.load_txt_files - Build a Dataset from multiple text files.

dislib.utils: Other utility functions

utils.as_grid - Re-organizes samples in a Dataset in a hyper-dimensional grid, where each Subset represents a region in this space.

utils.shuffle - Randomly shuffles the samples in a Dataset.

utils.resample - Resamples a Dataset.

dislib.preprocessing: Data pre-processing

Classes

preprocessing.StandardScaler - Scale data to zero mean and unit variance.

dislib.decomposition: Matrix Decomposition

Classes

decomposition.PCA - Principal component analysis (PCA).

dislib.cluster: Clustering

Classes

cluster.DBSCAN - Perform DBSCAN clustering.

cluster.KMeans - Perform K-Means clustering.

cluster.GaussianMixture - Fit a gaussian mixture model.

dislib.classification: Classification

Classes

classification.CascadeSVM - Distributed support vector classification using a cascade of classifiers.

classification.RandomForestClassifier - Build a random forest for classification.

dislib.recommendation: Recommendation

Classes

recommendation.ALS - Distributed alternating least squares for collaborative filtering.

dislib.regression: Regression

Classes

regression.LinearRegression - Simple linear regression using ordinary least squares.

dislib.neighbors: Neighbor queries

Classes

cluster.NearestNeighbors - Perform k-nearest neighbors queries.

Other functions

fft - Distributed fast fourier transform computation.