dislib.decomposition.PCA¶

class dislib.decomposition.pca.base.PCA(n_components=None, arity=50)[source]¶

Bases: object

Principal component analysis (PCA) using the covariance method.

Performs a full eigendecomposition of the covariance matrix.

Parameters:

Parameters:	n_components (int or None, optional (default=None)) – Number of components to keep. If None, all components are kept. arity (int, optional (default=50)) – Arity of the reductions.
Variables:	components (array, shape (n_components, n_features)) – Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_. Equal to the n_components eigenvectors of the covariance matrix with greater eigenvalues. explained_variance (array, shape (n_components,)) – The amount of variance explained by each of the selected components. Equal to the first n_components largest eigenvalues of the covariance matrix. mean (array, shape (n_features,)) – Per-feature empirical mean, estimated from the training set.

n_components (int or None, optional (default=None)) – Number of components to keep. If None, all components are kept.
arity (int, optional (default=50)) – Arity of the reductions.

Variables:

components (array, shape (n_components, n_features)) –
Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_.

Equal to the n_components eigenvectors of the covariance matrix with greater eigenvalues.
explained_variance (array, shape (n_components,)) –
The amount of variance explained by each of the selected components.

Equal to the first n_components largest eigenvalues of the covariance matrix.
mean (array, shape (n_features,)) – Per-feature empirical mean, estimated from the training set.

Examples

>>> from dislib.decomposition import PCA
>>> import numpy as np
>>> x = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
>>> from dislib.data import load_data
>>> data = load_data(x=x, subset_size=2)
>>> pca = PCA()
>>> transformed_data = pca.fit_transform(data)
>>> print(transformed_data)
>>> print(pca.components_)
>>> print(pca.explained_variance_)

fit(dataset)[source]¶

Fit the model with the dataset.

Parameters:	dataset (Dataset, shape (n_samples, n_features)) – Training dataset.
Returns:	self – Returns the instance itself.
Return type:	PCA

fit_transform(dataset)[source]¶

Fit the model with the dataset and apply the dimensionality reduction to it.

Parameters:	dataset (Dataset, shape (n_samples, n_features)) – Training dataset.
Returns:	transformed_dataset
Return type:	Dataset, shape (n_samples, n_components)

transform(dataset)[source]¶

Apply dimensionality reduction to dataset.

The given dataset is projected on the first principal components previously extracted from a training dataset.

Parameters:	dataset (Dataset, shape (n_samples, n_features)) – New dataset, with the same n_features as the training dataset.
Returns:	transformed_dataset
Return type:	Dataset, shape (n_samples, n_components)