Kovid Goyal
0bccada9d1
No longer need to abort after dealing with trailing bytes
2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274
Allow unbounded output in UTF8Decoder
...
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
d987ffe49a
Use unaligned stores
...
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal
77cfd44f24
More efficient clearing of register to all zeros or all ones
2024-02-25 09:57:39 +05:30
Kovid Goyal
59be7213cf
Make set1_epi8 more general
2024-02-25 09:57:39 +05:30
Kovid Goyal
d60dacbd09
Implement > and < intrinsics for vector registers
2024-02-25 09:57:39 +05:30
Kovid Goyal
82b7b4fcce
Make a re-useable template for generating ASM index functions with different tests
2024-02-25 09:57:39 +05:30
Kovid Goyal
fa9a2b1e2e
Switch file input to use new SIMD parser to search for \n and \r in parallel
2024-02-25 09:57:39 +05:30
Kovid Goyal
4e6138d785
Generate SIMD code during build
2024-02-25 09:57:39 +05:30
Kovid Goyal
86a55e2c0a
Use an aligned slice for file reads
2024-02-25 09:57:39 +05:30
Kovid Goyal
de8c1e0206
Work on porting SIMD vt arser to Go for the kittens
2024-02-25 09:57:39 +05:30
Kovid Goyal
131716da00
Ignore another warning on some compiler versions in simde
2024-02-25 09:57:39 +05:30
Kovid Goyal
4d35fc2928
Use a custom movmask for ARM rather than the one from simde
...
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30
Kovid Goyal
3b65c1a58a
remove declaration without implementation
2024-02-25 09:57:39 +05:30
Kovid Goyal
9bca415af2
Use aligned loads when finding either of two bytes
...
No measurable performance improvement, but neater algorithm anyway.
2024-02-25 09:57:39 +05:30
Kovid Goyal
60bc8e6c25
...
2024-02-25 09:57:39 +05:30
Kovid Goyal
8aa1b112b8
Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit
2024-02-25 09:57:39 +05:30
Kovid Goyal
0bd47d8457
Cleanup KITTY_NO_SIMD compilation
2024-02-25 09:57:39 +05:30
Kovid Goyal
fcbda63023
Move finding byte code into separate functions
...
movemask() is inefficient on ARM64 this will allow us to use a dedicated
implementation for finding bytes on that platform
2024-02-25 09:57:38 +05:30
Kovid Goyal
1d59bfade3
...
2024-02-25 09:57:38 +05:30
Kovid Goyal
fd7d0f8787
Fix event loop continuously ticking every input_delay seconds even when no input is available
2024-02-25 09:57:38 +05:30
Kovid Goyal
fa11858a72
Make bash integration tests more robust on macOS
2024-02-25 09:57:38 +05:30
Kovid Goyal
1293ee60e0
...
2024-02-25 09:57:38 +05:30
Kovid Goyal
66341aa28e
Make the env var controlling which SIMD level to use more capable
2024-02-25 09:57:38 +05:30
Kovid Goyal
73342411bc
Dont build any SIMD code when the target is neither ARM64 nor x86/amd64
2024-02-25 09:57:38 +05:30
Kovid Goyal
8dd6f9b07c
Get universal builds working again
...
Now we use lipo and build individually so we can pass the correct
compiler flags per arch
2024-02-25 09:57:38 +05:30
Kovid Goyal
7e77a196e6
Build only the SIMD code with SIMD compiler flags
2024-02-25 09:57:38 +05:30
Kovid Goyal
465616223c
Drop using the v2 microarch
...
No significant performance impact and small risk of breakage
2024-02-25 09:57:38 +05:30
Kovid Goyal
9d4193f4ea
Fix texture ref not useable on repurposed image object
2024-02-25 09:57:38 +05:30
Kovid Goyal
dafb876d75
Skip simd parser tests on machines without SIMD instructions
2024-02-25 09:57:38 +05:30
Kovid Goyal
4b846e0106
Turns out that using 256 bit code on ARM is slightly faster even though it is emulated with 128 bit registers
2024-02-25 09:57:38 +05:30
Kovid Goyal
76c6630084
Dont use 256 bit code paths on ARM
...
ARM only has 128 bit registers. simde simulates 256 bit operations using
them, which is fairly pointless for us.
2024-02-25 09:57:38 +05:30
Kovid Goyal
23a4012aeb
Add an env var to turn off use of SIMD instructions
2024-02-25 09:57:38 +05:30
Kovid Goyal
eee14ae148
Workaround for machines on GitHub Actions that incorrectly report CPU vector instruction availability
2024-02-25 09:57:37 +05:30
Kovid Goyal
b0ccaa09be
Clean up test env reporting
2024-02-25 09:57:37 +05:30
Kovid Goyal
bbaccfdaae
DRYer
2024-02-25 09:57:37 +05:30
Kovid Goyal
cb5a2cce53
...
2024-02-25 09:57:37 +05:30
Kovid Goyal
4fec11af05
Run dsymutil in post link phase
2024-02-25 09:57:37 +05:30
Kovid Goyal
5a9304e1b8
DRYer
2024-02-25 09:57:37 +05:30
Kovid Goyal
2b9c646c5b
Build dSYM bundles on CI
2024-02-25 09:57:37 +05:30
Kovid Goyal
6b6f3e0ece
...
2024-02-25 09:57:37 +05:30
Kovid Goyal
b560fe34c9
Give the functions for creating various objects unique names so they are easily recognized in macOS's non-fully-symolicated crash reports
2024-02-25 09:57:37 +05:30
Kovid Goyal
e5b27d066c
Output macOS crash reports on CI with nicer formatting
2024-02-25 09:57:37 +05:30
Kovid Goyal
8762a939c0
Dont specify arch/tune when building universal binary
2024-02-25 09:57:37 +05:30
Kovid Goyal
06da31019c
Micro-optimize clearing of lines
...
Use a doubling strategy to memset arrays to a fixed value. Makes the
memset O(log(N)) from O(N) in number of calls to memcpy.
2024-02-25 09:57:37 +05:30
Kovid Goyal
d0621cb82a
Better ipd crash report printing
2024-02-25 09:57:37 +05:30
Kovid Goyal
9935b5ddb2
...
2024-02-25 09:57:37 +05:30
Kovid Goyal
49d664bb0d
Fix incorrect line mapping when clearing screen using optimized code
2024-02-25 09:57:37 +05:30
Kovid Goyal
c6c0d0ed60
Sleep for a minute in the hope that macOS crash log will become available
2024-02-25 09:57:37 +05:30
Kovid Goyal
6f74d1b0c1
...
2024-02-25 09:57:36 +05:30