API Reference¶
dislib.data: Data handling utilities¶
Classes¶
data.Dataset
- Main data structure for
handling distributed datasets. Dataset works as a list of Subset.
data.Subset
- Collection of samples and
(optionally) labels.
Functions¶
data.load_data
- Build a
Dataset
from an ndarray.
data.load_libsvm_file
- Build a
Dataset
from a file in LibSVM format
(sparse).
data.load_libsvm_files
- Build a
Dataset
from multiple files in LibSVM
format (sparse).
data.load_txt_file
- Build a
Dataset
from a text file.
data.load_txt_files
- Build a
Dataset
from multiple text files.
dislib.utils: Other utility functions¶
utils.as_grid
- Re-organizes samples in a
Dataset
in a hyper-dimensional grid, where each
Subset
represents a region in this space.
utils.shuffle
- Randomly shuffles the
samples in a Dataset
.
utils.resample
- Resamples a
Dataset
.
dislib.preprocessing: Data pre-processing¶
Classes¶
preprocessing.StandardScaler
-
Scale data to zero mean and unit variance.
dislib.decomposition: Matrix Decomposition¶
Classes¶
decomposition.PCA
- Principal
component analysis (PCA).
dislib.cluster: Clustering¶
Classes¶
cluster.DBSCAN
- Perform DBSCAN
clustering.
cluster.KMeans
- Perform K-Means
clustering.
cluster.GaussianMixture
-
Fit a gaussian mixture model.
dislib.classification: Classification¶
Classes¶
classification.CascadeSVM
- Distributed support vector classification using a cascade of classifiers.
classification.RandomForestClassifier
-
Build a random forest for classification.
dislib.recommendation: Recommendation¶
Classes¶
recommendation.ALS
- Distributed alternating least squares for collaborative filtering.
dislib.regression: Regression¶
Classes¶
regression.LinearRegression
- Simple linear regression using ordinary least squares.
dislib.neighbors: Neighbor queries¶
Classes¶
cluster.NearestNeighbors
-
Perform k-nearest neighbors queries.