dislib.utils¶

Functions¶

dislib.utils.base.as_grid(dataset, n_regions, dimensions=None, return_indices=False)[source]¶

Arranges samples in an n-dimensional grid where each Subset contains samples lying in one region of the feature space. The feature space is divided in n_regions equally sized regions on each dimension based on the maximum and minimum values of each feature in the dataset.

Parameters:

Parameters:	dataset (Dataset) – Input data. n_regions (int) – Number of regions per dimension in which to split the feature space. dimensions (iterable, optional (default=None)) – Integer indices of the dimensions to split. If None, all dimensions are split. return_indices (boolean, optional (default=False)) – Whether to return sorting indices.
Returns:	grid_data (Dataset) – A new Dataset with one Subset per region in the feature space. index_array (array, shape = [n_samples, ]) – Array of indices that sort the samples in grid_data back to the order they have in the input Dataset.

dataset (Dataset) – Input data.
n_regions (int) – Number of regions per dimension in which to split the feature space.
dimensions (iterable, optional (default=None)) – Integer indices of the dimensions to split. If None, all dimensions are split.
return_indices (boolean, optional (default=False)) – Whether to return sorting indices.

Returns:

grid_data (Dataset) – A new Dataset with one Subset per region in the feature space.
index_array (array, shape = [n_samples, ]) – Array of indices that sort the samples in grid_data back to the order they have in the input Dataset.

dislib.utils.base.resample(dataset, n_samples, random_state=None)[source]¶

Resamples a dataset without replacement.

Parameters:	dataset (Dataset) – Input data. n_samples (int) – Number of samples to generate. random_state (int or RandomState, optional (default=None)) – Seed or numpy.random.RandomState instance to use in the generation of random numbers.
Returns:	resampled_data – Resampled dataset. The number of subsets in the returned dataset is less or equal to the number of subsets in the input dataset.
Return type:	Dataset

dislib.utils.base.shuffle(dataset_in, n_subsets_out=None, random_state=None)[source]¶

Randomly shuffles a Dataset.

Parameters:	dataset_in (Dataset) – Input Dataset. n_subsets_out (int, optional (default=None)) – Number of Subsets in the shuffled dataset. If None, it is the same as in the input Dataset. random_state (int or RandomState, optional (default=None)) – Seed or numpy.random.RandomState instance to use in the generation of random numbers.
Returns:	shuffled_dataset – A new randomly shuffled Dataset with n_subsets_out balanced Subsets. If even splits are impossible, some Subsets contain 1 extra instance. These extra instances are evenly distributed to make k-fold splits (with k divisor of the number of subsets) as balanced as possible.
Return type:	Dataset

dislib.utils
- Functions

dislib.utils¶

Functions¶

Table of Contents