dislib.regression.LinearRegression¶

class dislib.regression.linear.base.LinearRegression(fit_intercept=True, arity=50)[source]¶

Bases: sklearn.base.BaseEstimator

Multivariate linear regression using ordinary least squares.

The model is: y = alpha + beta*X + err, where alpha is the intercept and beta is a vector of coefficients of shape (n_features,).

The goal is to choose alpha and beta that minimize the sum of the squared errors. These optimal parameters are computed using linear algebra.

Parameters:	fit_intercept (bool, optional (default=True)) – Whether to calculate the intercept parameter for this model. If set to False, no intercept will be used in calculations (self.intercept_ will be 0). arity (int, optional (default=50)) – Arity of the reductions.
Variables:	coef (ds-array, shape (n_features, n_targets)) – Estimated coefficients (beta) in the linear model. intercept (ds-array, shape (1, n_targets)) – Estimated independent term (alpha) in the linear model.

Examples

>>> import dislib as ds
>>> from dislib.regression import LinearRegression
>>> import numpy as np
>>> from pycompss.api.api import compss_wait_on
>>>
>>>
>>> if __name__ == '__main__':
>>>     x_data = np.array([[1, 2], [2, 0], [3, 1], [4, 4], [5, 3]])
>>>     y_data = np.array([2, 1, 1, 2, 4.5])
>>>     bn, bm = 2, 2
>>>     x = ds.array(x=x_data, block_size=(bn, bm))
>>>     y = ds.array(x=y_data, block_size=(bn, 1))
>>>     reg = LinearRegression()
>>>     reg.fit(x, y)
>>>     reg.coef_.collect()
    array([0.421875, 0.296875])
>>>     reg.intercept_.collect()
    0.240625
>>>     x_test = np.array([[3, 2], [4, 4]])
>>>     test_data = ds.array(x=x_test, block_size=(bn, bm))
>>>     pred = reg.predict(test_data).collect()
>>>     pred
    array([2.1, 3.115625])

coef_¶

fit(x, y)[source]¶

Fit the linear model.

Parameters:	x (ds-array, shape (n_samples, n_features)) – Explanatory variables. y (ds-array, shape (n_samples, n_targets)) – Response variables.
Raises:	`NotImplementedError` – If x or y are sparse arrays.

intercept_¶

load_model(filepath, load_format='json')[source]¶

Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model

load_format (str, optional (default=’json’)) – Format used to load the model.

Examples

>>> from dislib.regression import LinearRegression
>>> import numpy as np
>>> import dislib as ds
>>> x_data = np.array([[1, 2], [2, 0], [3, 1], [4, 4], [5, 3]])
>>> y_data = np.array([2, 1, 1, 2, 4.5])
>>> x_test_m = np.array([[3, 2], [4, 4], [1, 3]])
>>> bn, bm = 2, 2
>>> x = ds.array(x=x_data, block_size=(bn, bm))
>>> y = ds.array(x=y_data, block_size=(bn, 1))
>>> test_data_m = ds.array(x=x_test_m, block_size=(bn, bm))
>>> reg = LinearRegression()
>>> reg.fit(x, y)
>>> reg.save_model("./model_LR")
>>> reg_loaded = LinearRegression()
>>> reg_loaded.load_model("./model_LR")
>>> pred = reg_loaded.predict(test_data).collect()

predict(x)[source]¶

Predict using the linear model.

Parameters:	x (ds-array, shape (n_samples_predict, n_features)) – Samples to be predicted.
Returns:	y – Predicted values.
Return type:	ds-array, shape (n_samples_predict, n_targets)
Raises:	`NotImplementedError` – If x is a sparse array.

save_model(filepath, overwrite=True, save_format='json')[source]¶

Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model

overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.

save_format (str, optional (default=’json)) – Format used to save the models.

Examples

>>> from dislib.regression import LinearRegression
>>> import numpy as np
>>> import dislib as ds
>>> x_data = np.array([[1, 2], [2, 0], [3, 1], [4, 4], [5, 3]])
>>> y_data = np.array([2, 1, 1, 2, 4.5])
>>> bn, bm = 2, 2
>>> x = ds.array(x=x_data, block_size=(bn, bm))
>>> y = ds.array(x=y_data, block_size=(bn, 1))
>>> reg = LinearRegression()
>>> reg.fit(x, y)
>>> reg.save_model("./model_LR")