API Reference¶
dislib.array: Distributed array¶
Classes¶
data.Array
- 2-dimensional array divided in
blocks that can be operated in a distributed way.
Array creation routines¶
dislib.array
- Build a distributed array
(ds-array) from an array-like structure, such as a NumPy array, a list, or a SciPy sparse matrix.
dislib.load_svmlight_file
- Build a
ds-array from a file in SVMlight format.
dislib.load_txt_file
- Build a
ds-array from a text file.
dislib.random_array
- Build a random ds-array.
Other functions¶
dislib.apply_along_axis
- Applies a
function to a ds-array along a given axis.
dislib.utils: Utility functions¶
utils.shuffle
- Randomly shuffles the
rows of a ds-array.
dislib.preprocessing: Data pre-processing¶
Classes¶
preprocessing.StandardScaler
-
Scale a ds-array to zero mean and unit variance.
dislib.decomposition: Matrix Decomposition¶
Classes¶
decomposition.PCA
- Principal
component analysis (PCA).
dislib.cluster: Clustering¶
Classes¶
cluster.DBSCAN
- Perform DBSCAN
clustering.
cluster.KMeans
- Perform K-Means
clustering.
cluster.GaussianMixture
-
Fit a gaussian mixture model.
dislib.classification: Classification¶
Classes¶
classification.CascadeSVM
- Distributed support vector classification using a cascade of classifiers.
classification.RandomForestClassifier
-
Build a random forest for classification.
dislib.recommendation: Recommendation¶
Classes¶
recommendation.ALS
- Distributed alternating least squares for collaborative filtering.
dislib.regression: Regression¶
Classes¶
regression.LinearRegression
- Simple linear regression using ordinary least squares.
dislib.neighbors: Neighbor queries¶
Classes¶
cluster.NearestNeighbors
-
Perform k-nearest neighbors queries.