dislib.preprocessing¶

class dislib.preprocessing.MinMaxScaler(feature_range=(0, 1))[source]¶

Bases: object

Standardize features by rescaling them to the provided range

Scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Minimum and Maximum values are then stored to be used on later data using the transform method.

Variables: feature_range (tuple) – The desired range of values in the ds-array.

fit(x)[source]¶

Compute the min and max values for later scaling.

Parameters: x (ds-array, shape=(n_samples, n_features))
Returns: self
Return type: MinMaxScaler

fit_transform(x)[source]¶

Fit to data, then transform it.

Parameters: x (ds-array, shape=(n_samples, n_features))
Returns: x_new – Scaled data.
Return type: ds-array, shape=(n_samples, n_features)

inverse_transform(x)[source]¶

Returns data to its original values. The Scaler should be fitted before using this function.

Parameters: x (ds-array, shape=(n_samples, n_features))
Returns: x_new – Original valued data.
Return type: ds-array, shape=(n_samples, n_features)

load_model(filepath, load_format='json')[source]¶

Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model

load_format (str, optional (default=’json’)) – Format used to load the model.

Examples

>>> from dislib.classification import CascadeSVM
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = MinMaxScaler()
>>> model.fit(x)
>>> model.save_model('/tmp/model')
>>> loaded_model = MinMaxScaler()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> x_transformed = model.transform(x_test)
>>> x_loaded_pred = loaded_model.transform(x_test)
>>> assert np.allclose(x_transformed.collect(),
>>>                    x_loaded_pred.collect())

save_model(filepath, overwrite=True, save_format='json')[source]¶

Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model

overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.

save_format (str, optional (default=’json)) – Format used to save the models.

Examples

>>> from dislib.preprocessing import MinMaxScaler
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = MinMaxScaler()
>>> model.fit(x)
>>> model.save_model('/tmp/model')
>>> loaded_model = MinMaxScaler()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> x_transformed = model.transform(x_test)
>>> x_loaded_pred = loaded_model.transform(x_test)
>>> assert np.allclose(x_transformed.collect(),
>>> x_loaded_pred.collect())

transform(x)[source]¶

Scale data.

Parameters: x (ds-array, shape=(n_samples, n_features))
Returns: x_new – Scaled data.
Return type: ds-array, shape=(n_samples, n_features)

class dislib.preprocessing.StandardScaler[source]¶

Bases: object

Standardize features by removing the mean and scaling to unit variance

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.

Variables

mean (ds-array, shape (1, n_features)) – The mean value for each feature in the training set.
var (ds-array, shape (1, n_features)) – The variance for each feature in the training set.

fit(x)[source]¶

Compute the mean and std to be used for later scaling.

Parameters: x (ds-array, shape=(n_samples, n_features))
Returns: self
Return type: StandardScaler

fit_transform(x)[source]¶

Fit to data, then transform it.

Parameters: x (ds-array, shape=(n_samples, n_features))
Returns: x_new – Scaled data.
Return type: ds-array, shape=(n_samples, n_features)

inverse_transform(x)[source]¶

Returns data to its original values. The Scaler should be fitted before using this function.

Parameters: x (ds-array, shape=(n_samples, n_features))
Returns: x_new – Original valued data.
Return type: ds-array, shape=(n_samples, n_features)

load_model(filepath, load_format='json')[source]¶

load_format (str, optional (default=’json’)) – Format used to load the model.

Examples

>>> from dislib.preprocessing import StandardScaler
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = StandardScaler()
>>> model.fit(x)
>>> model.save_model('/tmp/model')
>>> loaded_model = StandardScaler()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> x_transformed = model.transform(x_test)
>>> x_loaded_pred = loaded_model.transform(x_test)
>>> assert np.allclose(x_transformed.collect(),
>>>                    x_loaded_pred.collect())

save_model(filepath, overwrite=True, save_format='json')[source]¶

overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.

save_format (str, optional (default=’json)) – Format used to save the models.

Examples

>>> from dislib.classification import CascadeSVM
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = StandardScaler()
>>> model.fit(x)
>>> model.save_model('/tmp/model')
>>> loaded_model = StandardScaler()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> x_transformed = model.transform(x_test)
>>> x_loaded_pred = loaded_model.transform(x_test)
>>> assert np.allclose(x_transformed.collect(),
>>> x_loaded_pred.collect())

transform(x)[source]¶

Standarize data.

Parameters: x (ds-array, shape=(n_samples, n_features))
Returns: x_new – Scaled data.
Return type: ds-array, shape=(n_samples, n_features)

dislib.preprocessing
- MinMaxScaler
- StandardScaler

dislib.preprocessing¶

Table of Contents