dislib.recommendation.ALS

class dislib.recommendation.als.base.ALS(random_state=None, n_f=100, lambda_=0.065, tol=0.0001, max_iter=100, arity=5, check_convergence=True, verbose=False)[source]

Bases: object

Alternating Least Squares recommendation.

Implements distributed alternating least squares recommendation based on Zhou et al. [1].

Parameters:
  • max_iter (int, optional (default=100)) – Maximum number of iterations to perform.
  • tol (float, optional (default=1e-4)) – Tolerance for the stopping criterion.
  • n_f (int, optional (default=100)) – Number of latent factors (i.e. dimensions) for the matrices U and I.
  • lambda_ (float, optional (default=0.065)) – Regularization parameters value.
  • check_convergence (boolean, optional (default=True)) – Whether to test for convergence at the end of each iteration.
  • random_state (int, orNone, optional (default=None)) – The seed of the pseudo random number generator used to initialize the items matrix I.
  • arity (int, optional (default=5)) – The arity of the tasks during the merge of each matrix chunk.
  • verbose (boolean, optional (default=False)) – Whether to print progress information.
Variables:
  • users (np.array) – User matrix.
  • items (np.array) – Items matrix.
  • converged (boolean) – Whether the model has converged.

References

[1]Zhou Y., Wilkinson D., Schreiber R., Pan R. (2008) Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In: Fleischer R., Xu J. (eds) Algorithmic Aspects in Information and Management. AAIM 2008. Lecture Notes in Computer Science, vol 5034. Springer, Berlin, Heidelberg

Examples

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> from dislib.data import load_data
>>> data = np.array([[0, 0, 5], [3, 0, 5], [3, 1, 2]])
>>> ratings = csr_matrix(data).transpose().tocsr()
>>> train = load_data(x=ratings, subset_size=1)
>>> from dislib.recommendation import ALS
>>> als = ALS()
>>> als.fit(train)
>>> print('Ratings for user 0: %s' % als.predict_user(user_id=0))
fit(dataset, test=None)[source]

Fits a model using training data. Training data is also used to check for convergence unless test data is provided.

Parameters:
  • dataset (Dataset) – Dataset where each sample represents the ratings of a given item.
  • test (csr_matrix) – Sparse matrix used to check convergence with users as rows and items as columns. If not passed, uses training data to check convergence.
predict_user(user_id)[source]

Returns the expected ratings for user_id. Each index represents the rating for i-th item. If the user was not present in the training set, a np.NaN vector is returned.

Parameters:user_id (int)
Returns:ratings
Return type:np.array containing all estimated items ratings for user_id.