Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Sorting is a fundamental problem so this is important stuff but I can't right now come up with a practical problem involving large sorting operations. Can anyone come up with a practical application for this? I have no doubt they exist.

I would think the breakeven point for using the GPU (assuming inputs and results are on the CPU) is several megabytes of millions or elements at least.

Writing this paper must have been a lot of effort. There are something like 50 different methods reviewed here. Good thing that papers like this exist.



It has potential in the field of data linking: Graph and database partitioning, merging, indexing, sorted-neighbourhood algorithms. And cases where you can't use hash-joins or comparison techniques because you're using the available ram and resources to hold something else.

Optimisation for cache-locality and branch prediction.

Any kind of repeatable scientific/datascience analysis operating across groups or cross-tabulation. Anything involving lag-like functions and relationships, time-series, relative positions (in households, neighbourhoods, countries, spatial relationships).

Perhaps i'm biased, I think i've grown up working with operations where having the data in an implicit order makes things a fair bit easier/more efficient.

Albeit, you're also right, it probably has to get up into the millions before you fundamentally care. But I do like to think I work on practical applications :P


I think the most common practical problem for sorting large amounts of data is to enable efficient search operations later, for example binary search over sorted data.

I also agree that needing to sort large amounts of data fast is actually not that common a requirement.

EDIT: I would add that the speed of sorting on CPU is often underestimated. See my blog post about fast multithreaded radix sort on CPU here: http://www.forwardscattering.org/post/34


Two practical applications from top of my head:

* Do you want to build an index (for big data?), so that you can access your data fast (thing about a database). Well, the index construction is likely to do some sorting internally. * MapReduce algorithm has indeed more phases... one of it is sorting...so yes, when you are porocessing big data with map reduce, then the data is being sorted at some point.


Indexes don't really require data to be sorted, sorting is a consequence of adding data to many different types of index.

https://en.wikipedia.org/wiki/B-tree


Rendering transparent models in real-time (ie. for games) requires a sorting step when they overlap. I can imagine a scene containing a couple hundred thousand transparent models that need sorting.


That's obviously a good use case and doesn't even need to have that large of number of elements because the data could get consumed by the GPU, so no back-and-forth transfer necessary.

This paper was focused on comparison based sorting. Depth sorting can be done with GPU radix sort (which is super fast), because with minor modifications, floating point and integer comparison are equal for finite, not-NaN values (and games don't care about that).


Rendering transparent models in real-time (ie. for games) requires a sorting step when they overlap.

My game heavily uses a "find nearest n" operation on an R-Tree, then sorts by distance for AI operations.


Are there any game engines that actually do the sorting? I thought they just render them in random order and hope for the best.


I was very surprised that depth sorting for real time rendering was not listed among the potential applications in the abstract.


Isn't that usually done with the depth buffer?


Depth buffers only work well for opaque objects, which cover each other completely. With partial coverage or blending, multiple objects could end up contributing to a pixel in an order-dependent way (e.g., if you're viewing an anti-aliased leaf edge under the surface of water through a window). The depth buffer, which only stores a single depth, doesn't solve this problem directly -- you can use it to render each transparent layer one-by-one ("depth peeling"), but this can be slow.


Depth buffer has an issue of scales: one thing is 1 meter away and another one is 1 million meters away with another several thousand billion meters away. That makes it practically unusable for global positioning (especially for things that are both far away, but another one is just a little bit further).

On the other hand sorting doesn't need to have a global axis - in the perfect case it just needs to compare two elements against each other.


For non-transparent models, to a degree it gets done by the depth buffer. For transparent models, depth buffer is not helpful.

But even when using the depth buffer, drawing in front-to-back order will drastically improve performance as fragment shaders don't have to run for occluded fragments that get depth culled.


Seismic exploration data needs to be sorted into various domains; for example, by offset, by receiver, or by shot. It is one of the faster operations but you can still end up CPU bound rather than I/O bound, especially if you're doing work on the fly in a GUI.

http://wiki.seg.org/wiki/CMP_sorting


I work for a startup that specializes in data matching (usually called reconciliation in banking / finance industry). Some core matching algorithms rely on comparing sorted input using sliding windows; others rely on bucketing of some kind; anything to avoid O(m*n) when comparing two lists.

The lists involved can easily run into the millions, depending on the domain.


I have yet to see a non-trivial simulation model that doesn't need sorting at some point in its operations. But I do think you're right that sorting isn't something that would be bottleneck in today's mainstream applications of GPU's - games, compression/decompression (video, mostly) and graphics operations.


Sorting has long been known as something that can be bound with more comparison units. Batcher's sort, as an example, is far from a new algorithm. So, to that end, being able to sort a collection in lgN time is quite appealing.

What I take this paper as doing, is seeing how close we are to being able to take it for granted that you can sort large collections in lgN time, instead of NlgN time. My guess is we are a ways from there, but probably not as far as I think.


XGBoost recently added GPU support for training. As far as I'm aware one of the reasons the speedup is relatively small (vs say deep learning on CPU vs GPU) is that it spends a significant amount of time sorting which is not particularly fast on the GPU.


Sorting also forms the bottleneck in several memory hard Proof-of-Work schemes. These differ from the general problem in two important ways:

1) the input data is generated from cryptographic hash functions, and can thus be considered random, leading to e.g. very even distribution of elements over buckets.

2) each round of sorting serves to identify groups of items that match on a subrange of their bits, and somehow transform these items.

There is a lot of cryptocurrency money to be made by developing the most optimized mining software and charging 1% or so "developer-fee" on the mining proceeds.


> I would think the breakeven point for using the GPU (assuming inputs and results are on the CPU) is several megabytes of millions or elements at least.

Well, the idea of GPU computing is to keep the data on the GPU as much as possible.

The CPU <---> GPU link is at best, 16x PCIe. Which is fast, but not anywhere close to the speed of HBM, GDDR5 or DDR4 RAM. (Exception: AMD's A10 operators, which have a GPU / CPU which shares on-chip cache memory. As well as the "Crystalwell" Intel chips IIRC. But these are rare chips that I doubt most people use)

So if you happen to be running a major algorithm on the GPU, you'll probably want to sort it as well, before bringing it back to the CPU. Under no circumstances should you be introducing a PCIe delay for a simple operation like sorting.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: