Commit Graph

10 Commits

Author SHA1 Message Date
Denys Vlasenko
c193cbd6df libbb/sha1: shrink and speed up unrolled x86-64 code
function                                             old     new   delta
sha1_process_block64                                3514    3482     -32

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-02-07 02:34:04 +01:00
Denys Vlasenko
6472ac9428 libbb/sha256: optional x86 hardware accelerated hashing
64 bit:
function                                             old     new   delta
sha256_process_block64_shaNI                           -     730    +730
.rodata                                           108314  108586    +272
sha256_begin                                          31      83     +52
------------------------------------------------------------------------------
(add/remove: 5/1 grow/shrink: 2/0 up/down: 1055/-1)          Total: 1054 bytes

32 bit:
function                                             old     new   delta
sha256_process_block64_shaNI                           -     747    +747
.rodata                                           104318  104590    +272
sha256_begin                                          29      84     +55
------------------------------------------------------------------------------
(add/remove: 5/1 grow/shrink: 2/0 up/down: 1075/-1)          Total: 1074 bytes

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-02-03 14:58:02 +01:00
Denys Vlasenko
205042c07a libbb/sha1: in unrolled x86-64 code, pass initial W[] in registers, not on stack
This can be faster on some CPUs.
On Skylake, evidently load latency from L1 (or store-to-load
forwarding in LSU) is fast enough to completely hide
memory reference latencies here.

function                                             old     new   delta
sha1_process_block64                                3495    3514     +19

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-25 17:21:45 +01:00
Denys Vlasenko
39369ff460 libbb/sha1: use SSE2 in unrolled x86-64 code. ~10% faster
function                                             old     new   delta
.rodata                                           108241  108305     +64
sha1_process_block64                                3502    3495      -7
------------------------------------------------------------------------------
(add/remove: 5/0 grow/shrink: 1/1 up/down: 64/-7)              Total: 57 bytes

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-23 12:57:27 +01:00
Denys Vlasenko
805ececa61 whitespace fix
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-08 00:41:09 +01:00
Denys Vlasenko
c3cfcc9242 libbb/sha1: x86_64 version: reorder prologue/epilogue insns
Not clear exactly why, but this increases hashing speed
on Skylake from 454 MB/s to 464 MB/s.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-04 01:45:52 +01:00
Denys Vlasenko
7abb2bb96e libbb/sha1: x86_64 version: tidying up, no code changes
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-03 17:02:48 +01:00
Denys Vlasenko
4387077f8e typo fix
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-03 13:14:09 +01:00
Denys Vlasenko
947bef0dea libbb/sha1: x86_64 version: generate from a script, optimize a bit
function                                             old     new   delta
sha1_process_block64                                3569    3502     -67

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-03 13:10:30 +01:00
Denys Vlasenko
05fd13ebec libbb/sha1: x86_64 version: move to a separate .S file, no code changes
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2022-01-03 12:57:36 +01:00