dislib.sorting.TeraSort

class dislib.sorting.terasort.base.TeraSort(range_min=0, range_max=100, column_indexes=None, num_buckets=None)[source]

Bases: object

TeraSort algorithm for ordering ds_arrays by columns, or the whole values.

range_minint or list, optional (default=0)

Minimum value for attribute (if int) or for columns (if list)

range_maxint or list, optional (default=100)

Maximum value for attribute (if int) or for columns (if list)

column_indexesnp.ndarray, list or None, optional (default=None)

Column indexes to sort independtly of the rest of columns.

num_bucketsint, optional

Number of buckets to divide the data in the ds.array and do the sorting. A big number of num_buckets may lead to slower execution.

fit(x, y=None)[source]

Fits the Range Max and Range Min of the TeraSort.

Parameters
  • x (ds-array, shape=(n_rows, n_columns)) – The data to sort.

  • y (None) – To maintain standard API.

sort(x)[source]

Sorts the data in x.

Parameters

x (ds-array, shape=(n_rows, n_columns)) – The data to sort.

Returns

x

Return type

A copy of x sorted.