dislib.decomposition¶
-
dislib.decomposition.qr.base.
qr
(a: dislib.data.array.Array, mode='full', overwrite_a=False)[source]¶ QR Decomposition (blocked).
Parameters: - a (ds-arrays) – Input ds-array.
- mode (string) – Mode of the algorithm ‘full’ - computes full Q matrix of size m x m and R of size m x n ‘economic’ - computes Q of size m x n and R of size n x n ‘r’ - computes only R of size m x n
- overwrite_a (bool) – Overwriting the input matrix as R.
Returns: - q (ds-array) – only for modes ‘full’ and ‘economic’
- r (ds-array) – for all modes
Raises: ValueError
– If m < n for the provided matrix m x n or If blocks are not square or If top left shape is different than regular or If bottom right block is different than regular
-
class
dislib.decomposition.pca.base.
PCA
(n_components=None, arity=50, method='eig', eps=1e-09)[source]¶ Bases:
sklearn.base.BaseEstimator
Principal component analysis (PCA).
Parameters: - n_components (int or None, optional (default=None)) – Number of components to keep. If None, all components are kept.
- arity (int, optional (default=50)) – Arity of the reductions. Only if method=’eig’.
- method (str, optional (default=’eig’)) – Method to use in the decomposition. Can be ‘svd’ for singular value decomposition and ‘eig’ for eigendecomposition of the covariance matrix. ‘svd’ is recommended when having a large number of features. Falls back to ‘eig’ if the method is not recognized.
- eps (float, optional (default=1e-9)) – Tolerance for the convergence criterion when method=’svd’.
Variables: - components (ds-array, shape (n_components, n_features)) –
Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_.
Equal to the n_components eigenvectors of the covariance matrix with greater eigenvalues.
- explained_variance (ds-array, shape (1, n_components)) –
The amount of variance explained by each of the selected components.
Equal to the first n_components largest eigenvalues of the covariance matrix.
- mean (ds-array, shape (1, n_features)) – Per-feature empirical mean, estimated from the training set.
Examples
>>> import dislib as ds >>> from dislib.decomposition import PCA >>> import numpy as np >>> >>> >>> if __name__ == '__main__': >>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) >>> bn, bm = 2, 2 >>> data = ds.array(x=x, block_size=(bn, bm)) >>> pca = PCA() >>> transformed_data = pca.fit_transform(data) >>> print(transformed_data) >>> print(pca.components_.collect()) >>> print(pca.explained_variance_.collect())
-
fit
(x, y=None)[source]¶ Fit the model with the dataset.
Parameters: - x (ds-array, shape (n_samples, n_features)) – Training data.
- y (ignored) – Not used, present here for API consistency by convention.
Returns: self
Return type:
-
fit_transform
(x)[source]¶ Fit the model with the dataset and apply the dimensionality reduction to it.
Parameters: x (ds-array, shape (n_samples, n_features)) – Training data. Returns: transformed_darray Return type: ds-array, shape (n_samples, n_components)
-
transform
(x)[source]¶ Apply dimensionality reduction to ds-array.
The given dataset is projected on the first principal components previously extracted from a training ds-array.
Parameters: x (ds-array, shape (n_samples, n_features)) – New ds-array, with the same n_features as the training dataset. Returns: transformed_darray Return type: ds-array, shape (n_samples, n_components)