dislib package¶

Subpackages¶

dislib.array(x, block_size)[source]¶

Loads data into a Distributed Array.

Parameters:	x (spmatrix or array-like, shape=(n_samples, n_features)) – Array of samples. block_size ((int, int)) – Block sizes in number of samples.
Returns:	dsarray – A distributed representation of the data divided in blocks.
Return type:	ds-array

dislib.random_array(shape, block_size, random_state=None)[source]¶

Returns a distributed array of random floats in the open interval [0.0, 1.0). Values are from the “continuous uniform” distribution over the stated interval.

Parameters:	shape (tuple of two ints) – Shape of the output ds-array. block_size (tuple of two ints) – Size of the ds-array blocks. random_state (int or RandomState, optional (default=None)) – Seed or numpy.random.RandomState instance to generate the random numbers.
Returns:	x – Distributed array of random floats.
Return type:	ds-array

dislib.zeros(shape, block_size, dtype=None)[source]¶

Returns a ds-array of given shape and block size, filled with zeros.

Parameters:	shape (tuple of two ints) – Shape of the output ds-array. block_size (tuple of two ints) – Size of the ds-array blocks. dtype (data type, optional (default=None)) – The desired type of the array. Defaults to float.
Returns:	x – Distributed array filled with zeros.
Return type:	ds-array

dislib.full(shape, block_size, fill_value, dtype=None)[source]¶

Returns a ds-array of ‘shape’ filled with ‘fill_value’.

Parameters:	shape (tuple of two ints) – Shape of the output ds-array. block_size (tuple of two ints) – Size of the ds-array blocks. fill_value (scalar) – Fill value. dtype (data type, optional (default=None)) – The desired type of the array. Defaults to float.
Returns:	x – Distributed array filled with the fill value.
Return type:	ds-array

dislib.identity(n, block_size, dtype=None)[source]¶

Returns the identity matrix.

Parameters:	n (int) – Size of the matrix. block_size (tuple of two ints) – Block size. dtype (data type, optional (default=None)) – The desired type of the ds-array. Defaults to float.
Returns:	x – Identity matrix of shape n x n.
Return type:	ds-array
Raises:	`ValueError` – If block_size is greater than n.

dislib.eye(n, m, block_size, dtype=None)[source]¶

Returns a matrix filled with ones on the diagonal and zeros elsewhere.

Parameters:	n (int) – number of rows. m (int) – number of columns. block_size (tuple of two ints) – Block size. dtype (data type, optional (default=None)) – The desired type of the ds-array. Defaults to float.
Returns:	x – Identity matrix of shape n x m.
Return type:	ds-array
Raises:	`ValueError` – If block_size is greater than n.

dislib.load_txt_file(path, block_size, delimiter=', ')[source]¶

Loads a text file into a distributed array.

Parameters:	path (string) – File path. block_size (tuple (int, int)) – Size of the blocks of the array. delimiter (string, optional (default=”,”)) – String that separates columns in the file.
Returns:	x – A distributed representation of the data divided in blocks.
Return type:	ds-array

dislib.load_svmlight_file(path, block_size, n_features, store_sparse)[source]¶

Loads a SVMLight file into a distributed array.

Parameters:	path (string) – File path. block_size (tuple (int, int)) – Size of the blocks for the output ds-array. n_features (int) – Number of features. store_sparse (boolean) – Whether to use scipy.sparse data structures to store data. If False, numpy.array is used instead.
Returns:	x, y – A distributed representation (ds-array) of the X and y.
Return type:	(ds-array, ds-array)

dislib.load_npy_file(path, block_size)[source]¶

Loads a file in npy format (must be 2-dimensional).

Parameters:	path (str) – Path to the npy file. block_size (tuple (int, int)) – Block size of the resulting ds-array.
Returns:	x
Return type:	ds-array

dislib.load_mdcrd_file(path, block_size, n_atoms, copy=False)[source]¶

Loads a mdcrd trajectory file into a distributed array.

Parameters:	path (string) – File path. block_size (tuple (int, int)) – Size of the blocks of the array. n_atoms (int) – Number of atoms in the trajectory. Each frame in the mdcrd file has 3n_atoms float values (corresponding to 3-dimensional position). copy* (boolean, default=False) – Send the file to every task, as opposed to reading it once in the master program.
Returns:	x – A distributed representation of the data divided in blocks.
Return type:	ds-array

dislib.matmul(a: dislib.data.array.Array, b: dislib.data.array.Array, transpose_a=False, transpose_b=False)[source]¶

Matrix multiplication with a possible transpose of the input.

Parameters:	a (ds-array) – First matrix. b (ds-array) – Second matrix. transpose_a (bool) – Transpose of the first matrix before multiplication. transpose_b (any) – Transpose of the second matrix before multiplication.
Returns:	out – The output array.
Return type:	ds-array
Raises:	`NotImplementedError` – If _top_left shape does not match _reg_shape. This case will be implemented in the future. `ValueError` – If any of the block sizes does not match.

Examples

>>> import dislib as ds
>>>
>>>
>>> if __name__ == "__main__":
>>>     x = ds.random_array((8, 4), block_size=(2, 2))
>>>     y = ds.random_array((5, 8), block_size=(2, 2))
>>>     result = ds.matmul(x, y, transpose_a=True, transpose_b=True)
>>>     print(result.collect())

dislib.save_txt(arr, dir, merge_rows=False)[source]¶

Save a ds-array by blocks to a directory in txt format.

Parameters:	arr (ds-array) – Array data to be saved. dir (str) – Directory into which the data is saved. merge_rows (boolean, default=False) – Merge blocks along rows before saving.

dislib.apply_along_axis(func, axis, x, *args, **kwargs)[source]¶

Apply a function to slices along the given axis.

Execute func(a, *args, **kwargs) where func operates on nd-arrays and a is a slice of arr along axis. The size of the slices is determined by the blocks shape of x.

func must meet the following conditions:

Take an nd-array as argument

Accept axis as a keyword argument

Return an array-like structure

Parameters:	func (function) – This function should accept nd-arrays and an axis argument. It is applied to slices of arr along the specified axis. axis (integer) – Axis along which arr is sliced. Can be 0 or 1. x (ds-array) – Input distributed array. args (any) – Additional arguments to func. kwargs (any) – Additional named arguments to func.
Returns:	out – The output array. The shape of out is identical to the shape of arr, except along the axis dimension. The output ds-array is dense regardless of the type of the input array.
Return type:	ds-array

Examples

>>> import dislib as ds
>>> import numpy as np
>>>
>>>
>>> if __name__ == '__main__':
>>>     x = ds.random_array((100, 100), block_size=(25, 25))
>>>     mean = ds.apply_along_axis(np.mean, 0, x)
>>>     print(mean.collect())

dislib.kron(a, b, block_size=None)[source]¶

Kronecker product of two ds-arrays.

Parameters:	a, b (ds-arrays) – Input ds-arrays. block_size (tuple of two ints, optional) – Block size of the resulting array. Defaults to the block size of b.
Returns:	out
Return type:	ds-array
Raises:	`NotImplementedError` – If a or b are sparse.

dislib.svd(a, compute_uv=True, sort=True, copy=True, eps=1e-09)[source]¶

Performs singular value decomposition of a ds-array via the one-sided block Jacobi algorithm described in Arbenz and Slapnicar [1] and Dongarra et al. [2].

Singular value decomposition is a factorization of the form A = USV’, where U and V are unitary matrices and S is a rectangular diagonal matrix.

Parameters:

Parameters:	a (ds-array, shape=(m, n)) – Input matrix (m >= n). Needs to be partitioned in two column blocks at least due to the design of the block Jacobi algorithm. compute_uv (boolean, optional (default=True)) – Whether or not to compute u and v in addition to s. sort (boolean, optional (default=True)) – Whether to return sorted u, s and v. Sorting requires a significant amount of additional computation. copy (boolean, optional (default=True)) – Whether to create a copy of a or to apply transformations on a directly. Only valid if a is regular (i.e., top left block is of regular shape). eps (float, optional (default=1e-9)) – Tolerance for the convergence criterion.
Returns:	u (ds-array, shape=(m, n)) – U matrix. Only returned if compute_uv is True. s (ds-array, shape=(1, n)) – Diagonal entries of S. v (ds-array, shape=(n, n)) – V matrix. Only returned if compute_uv is True.
Raises:	`ValueError` – If a has less than 2 column blocks or m < n.

a (ds-array, shape=(m, n)) – Input matrix (m >= n). Needs to be partitioned in two column blocks at least due to the design of the block Jacobi algorithm.
compute_uv (boolean, optional (default=True)) – Whether or not to compute u and v in addition to s.
sort (boolean, optional (default=True)) – Whether to return sorted u, s and v. Sorting requires a significant amount of additional computation.
copy (boolean, optional (default=True)) – Whether to create a copy of a or to apply transformations on a directly. Only valid if a is regular (i.e., top left block is of regular shape).
eps (float, optional (default=1e-9)) – Tolerance for the convergence criterion.

Returns:

u (ds-array, shape=(m, n)) – U matrix. Only returned if compute_uv is True.
s (ds-array, shape=(1, n)) – Diagonal entries of S.
v (ds-array, shape=(n, n)) – V matrix. Only returned if compute_uv is True.

Raises:

ValueError – If a has less than 2 column blocks or m < n.

References

[1]	Arbenz, P. and Slapnicar, A. (1995). An Analysis of Parallel Implementations of the Block-Jacobi Algorithm for Computing the SVD. In Proceedings of the 17th International Conference on Information Technology Interfaces ITI (pp. 13-16).

[2]	Dongarra, J., Gates, M., Haidar, A. et al. (2018). The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale. In SIAM review, 60(4) (pp. 808-865).

Examples

>>> import dislib as ds
>>> import numpy as np
>>>
>>>
>>> if __name__ == '__main__':
>>>     x = ds.random_array((10, 6), (2, 2), random_state=7)
>>>     u, s, v = ds.svd(x)
>>>     u = u.collect()
>>>     s = np.diag(s.collect())
>>>     v = v.collect()
>>>     print(np.allclose(x.collect(), u @ s @ v.T))

dislib package
- Subpackages

dislib package¶

Subpackages¶

Table of Contents