Kovid Goyal
3abdc54e4b
...
2024-02-25 09:57:41 +05:30
Kovid Goyal
618aeec709
Finally got gnome-terminal to run on my system
...
Apparently it needed some kind of GTK desktop portal or the other
🙄
Interesting that its numbers are basically the same as alacritty's. Lot
better than I remember, I guess the recent libvte performance work was
good.
2024-02-25 09:57:41 +05:30
Kovid Goyal
4585361161
Micro optimization
2024-02-25 09:57:41 +05:30
Kovid Goyal
f64739c29b
Fix regression that broke handling of single byte control chars when cursor is on second cell of wide character
2024-02-25 09:57:41 +05:30
Kovid Goyal
f3830aa854
Avoid unnecessary if
2024-02-25 09:57:41 +05:30
Kovid Goyal
f1fe0bf40a
Code to easily compare SIMD and scalar decode in a live instance
...
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal
561712090d
Fix cmplt implementation
2024-02-25 09:57:41 +05:30
Kovid Goyal
d5f34c401d
Better vector registers to pre-calculate before the loop
2024-02-25 09:57:41 +05:30
Kovid Goyal
d9190ea675
DRYer
2024-02-25 09:57:41 +05:30
Kovid Goyal
57f4ea4d4a
Add some tests for broadcast from constant intrinsic
2024-02-25 09:57:41 +05:30
Kovid Goyal
9b0ae8d403
Dont use VEX encoded instructions for 128 bit ISA
2024-02-25 09:57:41 +05:30
Kovid Goyal
aed0611fb8
Avoid double trailing RET
2024-02-25 09:57:40 +05:30
Kovid Goyal
920b8a2496
Use VZEROUPPER in avx functions
...
See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf
2024-02-25 09:57:40 +05:30
Kovid Goyal
5a5e31c38b
Also zero upper at start of function
2024-02-25 09:57:40 +05:30
Kovid Goyal
db2e0e816d
Fix mixing of register types in the same function
2024-02-25 09:57:40 +05:30
Kovid Goyal
a298781b85
DRYer
2024-02-25 09:57:40 +05:30
Kovid Goyal
d5cd9ef2ca
...
2024-02-25 09:57:40 +05:30
Kovid Goyal
55c909c656
Use -mtune=intel for SIMD files when building without native optimizations
2024-02-25 09:57:40 +05:30
Kovid Goyal
da31db3212
...
2024-02-25 09:57:40 +05:30
Kovid Goyal
601c4ad4df
Fix some typos
2024-02-25 09:57:40 +05:30
Kovid Goyal
2549b4328f
Update throughput comparison table in light of latest improvements
2024-02-25 09:57:40 +05:30
Kovid Goyal
68d800d4fa
make clean should clean generated asm as well
2024-02-25 09:57:40 +05:30
Kovid Goyal
9fc3db1dd1
Work on C0 index func
2024-02-25 09:57:40 +05:30
Kovid Goyal
d4c4805f96
const away to glory
2024-02-25 09:57:40 +05:30
Kovid Goyal
161eae78b6
Make generated asm_* files world readable
2024-02-25 09:57:40 +05:30
Kovid Goyal
6cdc7ac91d
A further 5% speedup for UTF-8 decoding
...
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal
0bccada9d1
No longer need to abort after dealing with trailing bytes
2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274
Allow unbounded output in UTF8Decoder
...
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
d987ffe49a
Use unaligned stores
...
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal
77cfd44f24
More efficient clearing of register to all zeros or all ones
2024-02-25 09:57:39 +05:30
Kovid Goyal
59be7213cf
Make set1_epi8 more general
2024-02-25 09:57:39 +05:30
Kovid Goyal
d60dacbd09
Implement > and < intrinsics for vector registers
2024-02-25 09:57:39 +05:30
Kovid Goyal
82b7b4fcce
Make a re-useable template for generating ASM index functions with different tests
2024-02-25 09:57:39 +05:30
Kovid Goyal
fa9a2b1e2e
Switch file input to use new SIMD parser to search for \n and \r in parallel
2024-02-25 09:57:39 +05:30
Kovid Goyal
4e6138d785
Generate SIMD code during build
2024-02-25 09:57:39 +05:30
Kovid Goyal
86a55e2c0a
Use an aligned slice for file reads
2024-02-25 09:57:39 +05:30
Kovid Goyal
de8c1e0206
Work on porting SIMD vt arser to Go for the kittens
2024-02-25 09:57:39 +05:30
Kovid Goyal
131716da00
Ignore another warning on some compiler versions in simde
2024-02-25 09:57:39 +05:30
Kovid Goyal
4d35fc2928
Use a custom movmask for ARM rather than the one from simde
...
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30
Kovid Goyal
3b65c1a58a
remove declaration without implementation
2024-02-25 09:57:39 +05:30
Kovid Goyal
9bca415af2
Use aligned loads when finding either of two bytes
...
No measurable performance improvement, but neater algorithm anyway.
2024-02-25 09:57:39 +05:30
Kovid Goyal
60bc8e6c25
...
2024-02-25 09:57:39 +05:30
Kovid Goyal
8aa1b112b8
Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit
2024-02-25 09:57:39 +05:30
Kovid Goyal
0bd47d8457
Cleanup KITTY_NO_SIMD compilation
2024-02-25 09:57:39 +05:30
Kovid Goyal
fcbda63023
Move finding byte code into separate functions
...
movemask() is inefficient on ARM64 this will allow us to use a dedicated
implementation for finding bytes on that platform
2024-02-25 09:57:38 +05:30
Kovid Goyal
1d59bfade3
...
2024-02-25 09:57:38 +05:30
Kovid Goyal
fd7d0f8787
Fix event loop continuously ticking every input_delay seconds even when no input is available
2024-02-25 09:57:38 +05:30
Kovid Goyal
fa11858a72
Make bash integration tests more robust on macOS
2024-02-25 09:57:38 +05:30
Kovid Goyal
1293ee60e0
...
2024-02-25 09:57:38 +05:30
Kovid Goyal
66341aa28e
Make the env var controlling which SIMD level to use more capable
2024-02-25 09:57:38 +05:30