dislib.sorting.TeraSort#

class dislib.sorting.terasort.base.TeraSort(range_min=0, range_max=100, column_indexes=None, num_buckets=None)[source]#

Bases: object

TeraSort algorithm for ordering ds_arrays by columns, or the whole values.

Parameters:
  • range_min (int or list, optional (default=0)) – Minimum value for attribute (if int) or for columns (if list)

  • range_max (int or list, optional (default=100)) – Maximum value for attribute (if int) or for columns (if list)

  • column_indexes (np.ndarray, list or None, optional (default=None)) – Column indexes to sort independently of the rest of columns.

  • num_buckets (int, optional) – Number of buckets to divide the data in the ds.array and do the sorting. A big number of num_buckets may lead to slower execution.

fit(x, y=None)[source]#

Fits the Range Max and Range Min of the TeraSort.

Parameters:
  • x (ds-array, shape=(n_rows, n_columns)) – The data to sort.

  • y (None) – To maintain standard API.

sort(x)[source]#

Sorts the data in x.

Parameters:

x (ds-array, shape=(n_rows, n_columns)) – The data to sort.

Returns:

x

Return type:

A copy of x sorted.