dislib.recommendation.ALS¶
-
class
dislib.recommendation.als.base.
ALS
(random_state=None, n_f=100, lambda_=0.065, tol=0.0001, max_iter=100, arity=5, check_convergence=True, verbose=False)[source]¶ Bases:
sklearn.base.BaseEstimator
Alternating Least Squares recommendation.
Implements distributed alternating least squares recommendation based on Zhou et al. [1].
Parameters: - max_iter (int, optional (default=100)) – Maximum number of iterations to perform.
- tol (float, optional (default=1e-4)) – Tolerance for the stopping criterion.
- n_f (int, optional (default=100)) – Number of latent factors (i.e. dimensions) for the matrices U and I.
- lambda_ (float, optional (default=0.065)) – Regularization parameters value.
- check_convergence (boolean, optional (default=True)) – Whether to test for convergence at the end of each iteration.
- random_state (int, orNone, optional (default=None)) – The seed of the pseudo random number generator used to initialize the items matrix I.
- arity (int, optional (default=5)) – The arity of the tasks during the merge of each matrix chunk.
- verbose (boolean, optional (default=False)) – Whether to print progress information.
Variables: - users (np.array) – User matrix.
- items (np.array) – Items matrix.
- converged (boolean) – Whether the model has converged.
References
[1] Zhou Y., Wilkinson D., Schreiber R., Pan R. (2008) Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In: Fleischer R., Xu J. (eds) Algorithmic Aspects in Information and Management. AAIM 2008. Lecture Notes in Computer Science, vol 5034. Springer, Berlin, Heidelberg Examples
>>> import dislib as ds >>> from dislib.recommendation import ALS >>> import numpy as np >>> from scipy.sparse import csr_matrix >>> >>> >>> if __name__ == '__main__': >>> data = np.array([[0, 0, 5], [3, 0, 5], [3, 1, 2]]) >>> ratings = csr_matrix(data).transpose().tocsr() >>> train = ds.array(ratings, block_size=(1, 3)) >>> als = ALS() >>> als.fit(train) >>> print('Ratings for user 0: %s' % als.predict_user(user_id=0))
-
fit
(x, test=None)[source]¶ Fits a model using training data. Training data is also used to check for convergence unless test data is provided.
Parameters: - x (ds-array, shape=(n_ratings, n_users)) – ds-array where each row is the collection of ratings given by a user
- test (csr_matrix) – Sparse matrix used to check convergence with users as rows and items as columns. If not passed, uses training data to check convergence.
-
load_model
(filepath, load_format='json')[source]¶ Loads a model from a file. The model is reinstantiated in the exact same state in which it was saved, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path of the saved the model
- load_format (str, optional (default=’json’)) – Format used to load the model.
Examples
>>> from dislib.recommendation import ALS >>> import numpy as np >>> import dislib as ds >>> als2 = ALS() >>> als2.load_model("model_als") >>> predictions2 = als2.predict_user(user_id=0)
-
predict_user
(user_id)[source]¶ Returns the expected ratings for user_id. Each index represents the rating for i-th item. If the user was not present in the training set, a np.NaN vector is returned.
Parameters: user_id (int) Returns: ratings Return type: np.array containing all estimated items ratings for user_id.
-
save_model
(filepath, overwrite=True, save_format='json')[source]¶ Saves a model to a file. The model is synchronized before saving and can be reinstantiated in the exact same state, without any of the code used for model definition or fitting. :Parameters: * filepath (str) – Path where to save the model
- overwrite (bool, optional (default=True)) – Whether any existing model at the target location should be overwritten.
- save_format (str, optional (default=’json)) – Format used to save the models.
Examples
>>> from dislib.recommendation import ALS >>> import numpy as np >>> import dislib as ds >>> data = np.array([[0, 0, 5], [3, 0, 5], [3, 1, 2]]) >>> ratings = csr_matrix(data) >>> train = ds.array(x=ratings, block_size=(1, 1)) >>> als = ALS(tol=0.01, random_state=666, n_f=5, verbose=False) >>> als.fit(train) >>> als.save_model("model_als")