dislib.preprocessing

class dislib.preprocessing.MinMaxScaler(feature_range=(0, 1))[source]

Bases: object

Standardize features by rescaling them to the provided range

Scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Minimum and Maximum values are then stored to be used on later data using the transform method.

Variables

feature_range (tuple) – The desired range of values in the ds-array.

fit(x)[source]

Compute the min and max values for later scaling.

Parameters

x (ds-array, shape=(n_samples, n_features))

Returns

self

Return type

MinMaxScaler

fit_transform(x)[source]

Fit to data, then transform it.

Parameters

x (ds-array, shape=(n_samples, n_features))

Returns

x_new – Scaled data.

Return type

ds-array, shape=(n_samples, n_features)

inverse_transform(x)[source]

Returns data to its original values. The Scaler should be fitted before using this function.

Parameters

x (ds-array, shape=(n_samples, n_features))

Returns

x_new – Original valued data.

Return type

ds-array, shape=(n_samples, n_features)

load_model(filepath, load_format='json')[source]

Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model

  • load_format (str, optional (default=’json’)) – Format used to load the model.

Examples

>>> from dislib.classification import CascadeSVM
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = MinMaxScaler()
>>> model.fit(x)
>>> model.save_model('/tmp/model')
>>> loaded_model = MinMaxScaler()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> x_transformed = model.transform(x_test)
>>> x_loaded_pred = loaded_model.transform(x_test)
>>> assert np.allclose(x_transformed.collect(),
>>>                    x_loaded_pred.collect())
save_model(filepath, overwrite=True, save_format='json')[source]

Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model

  • overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.

  • save_format (str, optional (default=’json)) – Format used to save the models.

Examples

>>> from dislib.preprocessing import MinMaxScaler
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = MinMaxScaler()
>>> model.fit(x)
>>> model.save_model('/tmp/model')
>>> loaded_model = MinMaxScaler()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> x_transformed = model.transform(x_test)
>>> x_loaded_pred = loaded_model.transform(x_test)
>>> assert np.allclose(x_transformed.collect(),
>>> x_loaded_pred.collect())
transform(x)[source]

Scale data.

Parameters

x (ds-array, shape=(n_samples, n_features))

Returns

x_new – Scaled data.

Return type

ds-array, shape=(n_samples, n_features)

class dislib.preprocessing.StandardScaler[source]

Bases: object

Standardize features by removing the mean and scaling to unit variance

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.

Variables
  • mean (ds-array, shape (1, n_features)) – The mean value for each feature in the training set.

  • var (ds-array, shape (1, n_features)) – The variance for each feature in the training set.

fit(x)[source]

Compute the mean and std to be used for later scaling.

Parameters

x (ds-array, shape=(n_samples, n_features))

Returns

self

Return type

StandardScaler

fit_transform(x)[source]

Fit to data, then transform it.

Parameters

x (ds-array, shape=(n_samples, n_features))

Returns

x_new – Scaled data.

Return type

ds-array, shape=(n_samples, n_features)

inverse_transform(x)[source]

Returns data to its original values. The Scaler should be fitted before using this function.

Parameters

x (ds-array, shape=(n_samples, n_features))

Returns

x_new – Original valued data.

Return type

ds-array, shape=(n_samples, n_features)

load_model(filepath, load_format='json')[source]

Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model

  • load_format (str, optional (default=’json’)) – Format used to load the model.

Examples

>>> from dislib.preprocessing import StandardScaler
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = StandardScaler()
>>> model.fit(x)
>>> model.save_model('/tmp/model')
>>> loaded_model = StandardScaler()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> x_transformed = model.transform(x_test)
>>> x_loaded_pred = loaded_model.transform(x_test)
>>> assert np.allclose(x_transformed.collect(),
>>>                    x_loaded_pred.collect())
save_model(filepath, overwrite=True, save_format='json')[source]

Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model

  • overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.

  • save_format (str, optional (default=’json)) – Format used to save the models.

Examples

>>> from dislib.classification import CascadeSVM
>>> import numpy as np
>>> import dislib as ds
>>> x = ds.array(np.array([[1, 2], [2, 1], [-1, -2],
>>> [-2, -1]]), (2, 2))
>>> y = ds.array(np.array([0, 1, 1, 0]).reshape(-1, 1), (2, 1))
>>> model = StandardScaler()
>>> model.fit(x)
>>> model.save_model('/tmp/model')
>>> loaded_model = StandardScaler()
>>> loaded_model.load_model('/tmp/model')
>>> x_test = ds.array(np.array([[1, 2], [2, 1], [-1, -2], [-2, -1],
>>> [1, 1], [-1, -1]]), (2, 2))
>>> x_transformed = model.transform(x_test)
>>> x_loaded_pred = loaded_model.transform(x_test)
>>> assert np.allclose(x_transformed.collect(),
>>> x_loaded_pred.collect())
transform(x)[source]

Standarize data.

Parameters

x (ds-array, shape=(n_samples, n_features))

Returns

x_new – Scaled data.

Return type

ds-array, shape=(n_samples, n_features)