<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/lib/sort.c, branch rust-fixes-6.10</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=rust-fixes-6.10</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=rust-fixes-6.10'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2024-02-22T23:38:52Z</updated>
<entry>
<title>lib/sort: optimize heapsort with double-pop variation</title>
<updated>2024-02-22T23:38:52Z</updated>
<author>
<name>Kuan-Wei Chiu</name>
<email>visitorckw@gmail.com</email>
</author>
<published>2024-01-13T03:13:52Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=0e02ca29a5634dfb90fe1bad9467ee0965a1e170'/>
<id>urn:sha1:0e02ca29a5634dfb90fe1bad9467ee0965a1e170</id>
<content type='text'>
Instead of popping only the maximum element from the heap during each
iteration, we now pop the two largest elements at once.  Although this
introduces an additional comparison to determine the second largest
element, it enables a reduction in the height of the tree by one during
the heapify operations starting from root's left/right child.  This
reduction in tree height by one leads to a decrease of one comparison and
one swap.

This optimization results in saving approximately 0.5 * n swaps without
increasing the number of comparisons.  Additionally, the heap size during
heapify is now one less than the original size, offering a chance for
further reduction in comparisons and swaps.

The following experimental data is based on the array generated using
get_random_u32().

| N     | swaps (old) | swaps (new) | comparisons (old) | comparisons (new) |
|-------|-------------|-------------|-------------------|-------------------|
| 1000  | 9054        | 8569        | 10328             | 10320             |
| 2000  | 20137       | 19182       | 22634             | 22587             |
| 3000  | 32062       | 30623       | 35833             | 35752             |
| 4000  | 44274       | 42282       | 49332             | 49306             |
| 5000  | 57195       | 54676       | 63300             | 63294             |
| 6000  | 70205       | 67202       | 77599             | 77557             |
| 7000  | 83276       | 79831       | 92113             | 92032             |
| 8000  | 96630       | 92678       | 106635            | 106617            |
| 9000  | 110349      | 105883      | 121505            | 121404            |
| 10000 | 124165      | 119202      | 136628            | 136617            |


Link: https://lkml.kernel.org/r/20240113031352.2395118-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: George Spelvin &lt;lkml@sdf.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/sort: optimize heapsort for equal elements in sift-down path</title>
<updated>2024-02-22T23:38:52Z</updated>
<author>
<name>Kuan-Wei Chiu</name>
<email>visitorckw@gmail.com</email>
</author>
<published>2024-01-13T03:13:51Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=db946a42226052a13fbce8757ff77b3107dd2030'/>
<id>urn:sha1:db946a42226052a13fbce8757ff77b3107dd2030</id>
<content type='text'>
Patch series "lib/sort: Optimize the number of swaps and comparisons".

This patch series aims to optimize the heapsort algorithm, specifically
targeting a reduction in the number of swaps and comparisons required.


This patch (of 2):

Currently, when searching for the sift-down path and encountering equal
elements, the algorithm chooses the left child.  However, considering that
the height of the right subtree may be one less than that of the left
subtree, selecting the right child in such cases can potentially reduce
the number of comparisons and swaps.

For instance, when sorting an array of 10,000 identical elements, the
current implementation requires 247,209 comparisons.  With this patch, the
number of comparisons can be reduced to 227,241.

Link: https://lkml.kernel.org/r/20240113031352.2395118-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20240113031352.2395118-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/sort: Add priv pointer to swap function</title>
<updated>2022-03-18T03:17:18Z</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2022-03-16T12:24:07Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=a0019cd7d41a191253859349535d76ee28ab7d96'/>
<id>urn:sha1:a0019cd7d41a191253859349535d76ee28ab7d96</id>
<content type='text'>
Adding support to have priv pointer in swap callback function.

Following the initial change on cmp callback functions [1]
and adding SWAP_WRAPPER macro to identify sort call of sort_r.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20220316122419.933957-2-jolsa@kernel.org

[1] 4333fb96ca10 ("media: lib/sort.c: implement sort() variant taking context argument")
</content>
</entry>
<entry>
<title>lib: fix spelling mistakes</title>
<updated>2021-07-08T18:48:20Z</updated>
<author>
<name>Zhen Lei</name>
<email>thunder.leizhen@huawei.com</email>
</author>
<published>2021-07-08T01:07:31Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=9dbbc3b9d09d6deba9f3b9e1d5b355032ed46a75'/>
<id>urn:sha1:9dbbc3b9d09d6deba9f3b9e1d5b355032ed46a75</id>
<content type='text'>
Fix some spelling mistakes in comments:
permanentely ==&gt; permanently
wont ==&gt; won't
remaning ==&gt; remaining
succed ==&gt; succeed
shouldnt ==&gt; shouldn't
alpha-numeric ==&gt; alphanumeric
storeing ==&gt; storing
funtion ==&gt; function
documenation ==&gt; documentation
Determin ==&gt; Determine
intepreted ==&gt; interpreted
ammount ==&gt; amount
obious ==&gt; obvious
interupts ==&gt; interrupts
occured ==&gt; occurred
asssociated ==&gt; associated
taking into acount ==&gt; taking into account
squence ==&gt; sequence
stil ==&gt; still
contiguos ==&gt; contiguous
matchs ==&gt; matches

Link: https://lkml.kernel.org/r/20210607072555.12416-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Reviewed-by: Jacob Keller &lt;jacob.e.keller@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/sort: Move swap, cmp and cmp_r function types for wider use</title>
<updated>2019-11-14T18:15:11Z</updated>
<author>
<name>Andy Shevchenko</name>
<email>andriy.shevchenko@linux.intel.com</email>
</author>
<published>2019-10-07T13:56:54Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=52ae533b8a18e7ca868e7ac5953ad7258210f320'/>
<id>urn:sha1:52ae533b8a18e7ca868e7ac5953ad7258210f320</id>
<content type='text'>
The function types for swap, cmp and cmp_r functions are already
being in use by modules.

Move them to types.h that everybody in kernel will be able to use
generic types instead of custom ones.

This adds more sense to the comment in bsearch() later on.

Link: http://lkml.kernel.org/r/20191007135656.37734-1-andriy.shevchenko@linux.intel.com

Signed-off-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>media: lib/sort.c: implement sort() variant taking context argument</title>
<updated>2019-08-19T16:14:53Z</updated>
<author>
<name>Rasmus Villemoes</name>
<email>linux@rasmusvillemoes.dk</email>
</author>
<published>2019-08-16T16:01:22Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4333fb96ca1086d1cec0f93f78c453aa2dee8a5c'/>
<id>urn:sha1:4333fb96ca1086d1cec0f93f78c453aa2dee8a5c</id>
<content type='text'>
Our list_sort() utility has always supported a context argument that
is passed through to the comparison routine. Now there's a use case
for the similar thing for sort().

This implements sort_r by simply extending the existing sort function
in the obvious way. To avoid code duplication, we want to implement
sort() in terms of sort_r(). The naive way to do that is

static int cmp_wrapper(const void *a, const void *b, const void *ctx)
{
  int (*real_cmp)(const void*, const void*) = ctx;
  return real_cmp(a, b);
}

sort(..., cmp) { sort_r(..., cmp_wrapper, cmp) }

but this would do two indirect calls for each comparison. Instead, do
as is done for the default swap functions - that only adds a cost of a
single easily predicted branch to each comparison call.

Aside from introducing support for the context argument, this also
serves as preparation for patches that will eliminate the indirect
comparison calls in common cases.

Requested-by: Boris Brezillon &lt;boris.brezillon@collabora.com&gt;

Signed-off-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Signed-off-by: Boris Brezillon &lt;boris.brezillon@collabora.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Tested-by: Philipp Zabel &lt;p.zabel@pengutronix.de&gt;
Signed-off-by: Hans Verkuil &lt;hverkuil-cisco@xs4all.nl&gt;
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab+samsung@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/sort.c: fix kernel-doc notation warnings</title>
<updated>2019-06-01T22:51:31Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2019-06-01T05:30:00Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=aa52619ccbe056999d7c7231c8a1a11cedfccc6a'/>
<id>urn:sha1:aa52619ccbe056999d7c7231c8a1a11cedfccc6a</id>
<content type='text'>
Fix kernel-doc notation in lib/sort.c by using correct function parameter
names.

  lib/sort.c:59: warning: Excess function parameter 'size' description in 'swap_words_32'
  lib/sort.c:83: warning: Excess function parameter 'size' description in 'swap_words_64'
  lib/sort.c:110: warning: Excess function parameter 'size' description in 'swap_bytes'

Link: http://lkml.kernel.org/r/60e25d3d-68d1-bde2-3b39-e4baa0b14907@infradead.org
Fixes: 37d0ec34d111a ("lib/sort: make swap functions more generic")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: George Spelvin &lt;lkml@sdf.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/sort: avoid indirect calls to built-in swap</title>
<updated>2019-05-15T02:52:49Z</updated>
<author>
<name>George Spelvin</name>
<email>lkml@sdf.org</email>
</author>
<published>2019-05-14T22:42:55Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=8fb583c4258d08f0aff105aa2ae5157b7d414ea2'/>
<id>urn:sha1:8fb583c4258d08f0aff105aa2ae5157b7d414ea2</id>
<content type='text'>
Similar to what's being done in the net code, this takes advantage of
the fact that most invocations use only a few common swap functions, and
replaces indirect calls to them with (highly predictable) conditional
branches.  (The downside, of course, is that if you *do* use a custom
swap function, there are a few extra predicted branches on the code
path.)

This actually *shrinks* the x86-64 code, because it inlines the various
swap functions inside do_swap, eliding function prologues &amp; epilogues.

x86-64 code size 767 -&gt; 703 bytes (-64)

Link: http://lkml.kernel.org/r/d10c5d4b393a1847f32f5b26f4bbaa2857140e1e.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin &lt;lkml@sdf.org&gt;
Acked-by: Andrey Abramov &lt;st5pub@yandex.ru&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Daniel Wagner &lt;daniel.wagner@siemens.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Don Mullis &lt;don.mullis@gmail.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/sort: use more efficient bottom-up heapsort variant</title>
<updated>2019-05-15T02:52:49Z</updated>
<author>
<name>George Spelvin</name>
<email>lkml@sdf.org</email>
</author>
<published>2019-05-14T22:42:52Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=22a241ccb2c19962a0fb02c98154aa93d3fc1862'/>
<id>urn:sha1:22a241ccb2c19962a0fb02c98154aa93d3fc1862</id>
<content type='text'>
This uses fewer comparisons than the previous code (approaching half as
many for large random inputs), but produces identical results; it
actually performs the exact same series of swap operations.

Specifically, it reduces the average number of compares from
  2*n*log2(n) - 3*n + o(n)
to
    n*log2(n) + 0.37*n + o(n).

This is still 1.63*n worse than glibc qsort() which manages n*log2(n) -
1.26*n, but at least the leading coefficient is correct.

Standard heapsort, when sifting down, performs two comparisons per
level: one to find the greater child, and a second to see if the current
node should be exchanged with that child.

Bottom-up heapsort observes that it's better to postpone the second
comparison and search for the leaf where -infinity would be sent to,
then search back *up* for the current node's destination.

Since sifting down usually proceeds to the leaf level (that's where half
the nodes are), this does O(1) second comparisons rather than log2(n).
That saves a lot of (expensive since Spectre) indirect function calls.

The one time it's worse than the previous code is if there are large
numbers of duplicate keys, when the top-down algorithm is O(n) and
bottom-up is O(n log n).  For distinct keys, it's provably always
better, doing 1.5*n*log2(n) + O(n) in the worst case.

(The code is not significantly more complex.  This patch also merges the
heap-building and -extracting sift-down loops, resulting in a net code
size savings.)

x86-64 code size 885 -&gt; 767 bytes (-118)

(I see the checkpatch complaint about "else if (n -= size)".  The
alternative is significantly uglier.)

Link: http://lkml.kernel.org/r/2de8348635a1a421a72620677898c7fd5bd4b19d.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin &lt;lkml@sdf.org&gt;
Acked-by: Andrey Abramov &lt;st5pub@yandex.ru&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Daniel Wagner &lt;daniel.wagner@siemens.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Don Mullis &lt;don.mullis@gmail.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/sort: make swap functions more generic</title>
<updated>2019-05-15T02:52:49Z</updated>
<author>
<name>George Spelvin</name>
<email>lkml@sdf.org</email>
</author>
<published>2019-05-14T22:42:49Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=37d0ec34d111acfdb82b24e3de00d926c0aece4d'/>
<id>urn:sha1:37d0ec34d111acfdb82b24e3de00d926c0aece4d</id>
<content type='text'>
Patch series "lib/sort &amp; lib/list_sort: faster and smaller", v2.

Because CONFIG_RETPOLINE has made indirect calls much more expensive, I
thought I'd try to reduce the number made by the library sort functions.

The first three patches apply to lib/sort.c.

Patch #1 is a simple optimization.  The built-in swap has special cases
for aligned 4- and 8-byte objects.  But those are almost never used;
most calls to sort() work on larger structures, which fall back to the
byte-at-a-time loop.  This generalizes them to aligned *multiples* of 4
and 8 bytes.  (If nothing else, it saves an awful lot of energy by not
thrashing the store buffers as much.)

Patch #2 grabs a juicy piece of low-hanging fruit.  I agree that nice
simple solid heapsort is preferable to more complex algorithms (sorry,
Andrey), but it's possible to implement heapsort with far fewer
comparisons (50% asymptotically, 25-40% reduction for realistic sizes)
than the way it's been done up to now.  And with some care, the code
ends up smaller, as well.  This is the "big win" patch.

Patch #3 adds the same sort of indirect call bypass that has been added
to the net code of late.  The great majority of the callers use the
builtin swap functions, so replace the indirect call to sort_func with a
(highly preditable) series of if() statements.  Rather surprisingly,
this decreased code size, as the swap functions were inlined and their
prologue &amp; epilogue code eliminated.

lib/list_sort.c is a bit trickier, as merge sort is already close to
optimal, and we don't want to introduce triumphs of theory over
practicality like the Ford-Johnson merge-insertion sort.

Patch #4, without changing the algorithm, chops 32% off the code size
and removes the part[MAX_LIST_LENGTH+1] pointer array (and the
corresponding upper limit on efficiently sortable input size).

Patch #5 improves the algorithm.  The previous code is already optimal
for power-of-two (or slightly smaller) size inputs, but when the input
size is just over a power of 2, there's a very unbalanced final merge.

There are, in the literature, several algorithms which solve this, but
they all depend on the "breadth-first" merge order which was replaced by
commit 835cc0c8477f with a more cache-friendly "depth-first" order.
Some hard thinking came up with a depth-first algorithm which defers
merges as little as possible while avoiding bad merges.  This saves
0.2*n compares, averaged over all sizes.

The code size increase is minimal (64 bytes on x86-64, reducing the net
savings to 26%), but the comments expanded significantly to document the
clever algorithm.

TESTING NOTES: I have some ugly user-space benchmarking code which I
used for testing before moving this code into the kernel.  Shout if you
want a copy.

I'm running this code right now, with CONFIG_TEST_SORT and
CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last
round of minor edits to quell checkpatch.  I figure there will be at
least one round of comments and final testing.

This patch (of 5):

Rather than having special-case swap functions for 4- and 8-byte
objects, special-case aligned multiples of 4 or 8 bytes.  This speeds up
most users of sort() by avoiding fallback to the byte copy loop.

Despite what ca96ab859ab4 ("lib/sort: Add 64 bit swap function") claims,
very few users of sort() sort pointers (or pointer-sized objects); most
sort structures containing at least two words.  (E.g.
drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct
acpi_fan_fps.)

The functions also got renamed to reflect the fact that they support
multiple words.  In the great tradition of bikeshedding, the names were
by far the most contentious issue during review of this patch series.

x86-64 code size 872 -&gt; 886 bytes (+14)

With feedback from Andy Shevchenko, Rasmus Villemoes and Geert
Uytterhoeven.

Link: http://lkml.kernel.org/r/f24f932df3a7fa1973c1084154f1cea596bcf341.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin &lt;lkml@sdf.org&gt;
Acked-by: Andrey Abramov &lt;st5pub@yandex.ru&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Daniel Wagner &lt;daniel.wagner@siemens.com&gt;
Cc: Don Mullis &lt;don.mullis@gmail.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
