Kovid Goyal
eb07307370
Ignore pedantic warnings from simde headers
2024-04-30 09:54:14 +05:30
Kovid Goyal
393169f79d
Fix #7225
2024-03-14 20:55:05 +05:30
Kovid Goyal
daeaf65d7e
fix compiler warning
2024-02-25 11:17:26 +05:30
Kovid Goyal
f4f06222d4
...
2024-02-25 09:57:44 +05:30
Kovid Goyal
ad3ab877f8
Use a fast SIMD implementation to XOR data going into the disk cache
2024-02-25 09:57:43 +05:30
Kovid Goyal
1db7ac5f6b
Use our new shift by n functions to improve function to zero last N bytes
...
Benchmark neutral but cleaner code using one less vector register and equal
number of operations.
2024-02-25 09:57:43 +05:30
Kovid Goyal
e77a970ca1
Also implement arbitrary byte shift for 128 bit registers
2024-02-25 09:57:43 +05:30
Kovid Goyal
a7c06b38e6
We dont actually need vzeroupper at start of function
...
GCC emits vzeroupper automatically when compiling with native
optimizations but we still need it otherwise
2024-02-25 09:57:43 +05:30
Kovid Goyal
0a1eb038a5
Implement functions for arbitrary byte shifts in vector registers
2024-02-25 09:57:42 +05:30
Kovid Goyal
eb1e3b33b4
Fix test failure on some systems
...
Broken ass compilers strike again
2024-02-25 09:57:42 +05:30
Kovid Goyal
b021e9b648
Do the default func test last so we can see what the failure is on more explicitly
2024-02-25 09:57:42 +05:30
Kovid Goyal
1acd223f45
...
2024-02-25 09:57:42 +05:30
Kovid Goyal
f48e4ffd5e
Port aligned load based find algorithm to C
2024-02-25 09:57:42 +05:30
Kovid Goyal
36773c09d3
Functions to get bytes to first match ignoring leading bytes
2024-02-25 09:57:42 +05:30
Kovid Goyal
687340003d
...
2024-02-25 09:57:42 +05:30
Kovid Goyal
493fc900e9
Fix build on ARM
2024-02-25 09:57:41 +05:30
Kovid Goyal
f1fe0bf40a
Code to easily compare SIMD and scalar decode in a live instance
...
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal
d5f34c401d
Better vector registers to pre-calculate before the loop
2024-02-25 09:57:41 +05:30
Kovid Goyal
920b8a2496
Use VZEROUPPER in avx functions
...
See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf
2024-02-25 09:57:40 +05:30
Kovid Goyal
d4c4805f96
const away to glory
2024-02-25 09:57:40 +05:30
Kovid Goyal
6cdc7ac91d
A further 5% speedup for UTF-8 decoding
...
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal
0bccada9d1
No longer need to abort after dealing with trailing bytes
2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274
Allow unbounded output in UTF8Decoder
...
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
d987ffe49a
Use unaligned stores
...
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal
131716da00
Ignore another warning on some compiler versions in simde
2024-02-25 09:57:39 +05:30
Kovid Goyal
4d35fc2928
Use a custom movmask for ARM rather than the one from simde
...
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30
Kovid Goyal
9bca415af2
Use aligned loads when finding either of two bytes
...
No measurable performance improvement, but neater algorithm anyway.
2024-02-25 09:57:39 +05:30
Kovid Goyal
60bc8e6c25
...
2024-02-25 09:57:39 +05:30
Kovid Goyal
8aa1b112b8
Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit
2024-02-25 09:57:39 +05:30
Kovid Goyal
0bd47d8457
Cleanup KITTY_NO_SIMD compilation
2024-02-25 09:57:39 +05:30
Kovid Goyal
fcbda63023
Move finding byte code into separate functions
...
movemask() is inefficient on ARM64 this will allow us to use a dedicated
implementation for finding bytes on that platform
2024-02-25 09:57:38 +05:30
Kovid Goyal
73342411bc
Dont build any SIMD code when the target is neither ARM64 nor x86/amd64
2024-02-25 09:57:38 +05:30
Kovid Goyal
8dd6f9b07c
Get universal builds working again
...
Now we use lipo and build individually so we can pass the correct
compiler flags per arch
2024-02-25 09:57:38 +05:30
Kovid Goyal
7e77a196e6
Build only the SIMD code with SIMD compiler flags
2024-02-25 09:57:38 +05:30
Kovid Goyal
0e4c49a0d6
Fix building on macOS ARM
2024-02-25 09:57:35 +05:30
Kovid Goyal
e783eccc97
fix handling of bits from high byte of 4 byte sequences
2024-02-25 09:57:35 +05:30
Kovid Goyal
7e6459a5e4
DRYer
2024-02-25 09:57:35 +05:30
Kovid Goyal
67d22b0ec6
Avoid multiple branches for checking for trailing sequence
2024-02-25 09:57:34 +05:30
Kovid Goyal
79f99bb3ad
Make print_register useable without full debug
2024-02-25 09:57:34 +05:30
Kovid Goyal
fa3579656b
More invalid utf-8 tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
8a10fcaf5a
More tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
4c8b8caead
Handle trailing incomplete sequences
2024-02-25 09:57:34 +05:30
Kovid Goyal
4238fedee7
More tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
b0dcdf74bd
More tests and micro-optimize switch to ASCII fast path
2024-02-25 09:57:34 +05:30
Kovid Goyal
a63d62fb4e
...
2024-02-25 09:57:34 +05:30
Kovid Goyal
8dbb0cff6f
Dont call __builtin_ctz with zero
2024-02-25 09:57:34 +05:30
Kovid Goyal
07bba337f5
fix various bugs in AVX2 utility functions
2024-02-25 09:57:34 +05:30
Kovid Goyal
b28fbf6817
fix zero-ing of last n bytes
2024-02-25 09:57:34 +05:30
Kovid Goyal
daa169b8ed
More work on utf8 SIMD decode
2024-02-25 09:57:34 +05:30
Kovid Goyal
a5251bedc9
More work on SIMD utf8 decode
2024-02-25 09:57:34 +05:30