dislib.decomposition.PCA

class dislib.decomposition.pca.base.PCA(n_components=None, arity=50, method='eig', eps=1e-09)[source]

Bases: sklearn.base.BaseEstimator

Principal component analysis (PCA).

Parameters:
  • n_components (int or None, optional (default=None)) – Number of components to keep. If None, all components are kept.
  • arity (int, optional (default=50)) – Arity of the reductions. Only if method=’eig’.
  • method (str, optional (default=’eig’)) – Method to use in the decomposition. Can be ‘svd’ for singular value decomposition and ‘eig’ for eigendecomposition of the covariance matrix. ‘svd’ is recommended when having a large number of features. Falls back to ‘eig’ if the method is not recognized.
  • eps (float, optional (default=1e-9)) – Tolerance for the convergence criterion when method=’svd’.
Variables:
  • components (ds-array, shape (n_components, n_features)) –

    Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_.

    Equal to the n_components eigenvectors of the covariance matrix with greater eigenvalues.

  • explained_variance (ds-array, shape (1, n_components)) –

    The amount of variance explained by each of the selected components.

    Equal to the first n_components largest eigenvalues of the covariance matrix.

  • mean (ds-array, shape (1, n_features)) – Per-feature empirical mean, estimated from the training set.

Examples

>>> from dislib.decomposition import PCA
>>> import numpy as np
>>> import dislib as ds
>>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
>>> bn, bm = 2, 2
>>> data = ds.array(x=x, block_size=(bn, bm))
>>> pca = PCA()
>>> transformed_data = pca.fit_transform(data)
>>> print(transformed_data)
>>> print(pca.components_.collect())
>>> print(pca.explained_variance_.collect())
fit(x, y=None)[source]

Fit the model with the dataset.

Parameters:
  • x (ds-array, shape (n_samples, n_features)) – Training data.
  • y (ignored) – Not used, present here for API consistency by convention.
Returns:

self

Return type:

PCA

fit_transform(x)[source]

Fit the model with the dataset and apply the dimensionality reduction to it.

Parameters:x (ds-array, shape (n_samples, n_features)) – Training data.
Returns:transformed_darray
Return type:ds-array, shape (n_samples, n_components)
transform(x)[source]

Apply dimensionality reduction to ds-array.

The given dataset is projected on the first principal components previously extracted from a training ds-array.

Parameters:x (ds-array, shape (n_samples, n_features)) – New ds-array, with the same n_features as the training dataset.
Returns:transformed_darray
Return type:ds-array, shape (n_samples, n_components)