Commit Graph

13456 Commits

Author SHA1 Message Date
Kovid Goyal
f48e4ffd5e Port aligned load based find algorithm to C 2024-02-25 09:57:42 +05:30
Kovid Goyal
c01b959723 Fix Go unaligned index implementation 2024-02-25 09:57:42 +05:30
Kovid Goyal
36773c09d3 Functions to get bytes to first match ignoring leading bytes 2024-02-25 09:57:42 +05:30
Kovid Goyal
687340003d ... 2024-02-25 09:57:42 +05:30
Kovid Goyal
7467307200 Add some alignment tests 2024-02-25 09:57:42 +05:30
Kovid Goyal
bbdb0b15f3 DRYer 2024-02-25 09:57:42 +05:30
Kovid Goyal
b5edd9ad57 Dont precalculate mask in loop body
No need since we dont shift. Avoids the extra mask instructions for the
not found case.
2024-02-25 09:57:42 +05:30
Kovid Goyal
a32e1aafa6 ... 2024-02-25 09:57:41 +05:30
Kovid Goyal
f9fd6ffd46 Use only aligned loads for index funcs
Also obviates the necessity for safe slice wrappers
2024-02-25 09:57:41 +05:30
Kovid Goyal
31a5fcf297 DRYer 2024-02-25 09:57:41 +05:30
Kovid Goyal
493fc900e9 Fix build on ARM 2024-02-25 09:57:41 +05:30
Kovid Goyal
3abdc54e4b ... 2024-02-25 09:57:41 +05:30
Kovid Goyal
618aeec709 Finally got gnome-terminal to run on my system
Apparently it needed some kind of GTK desktop portal or the other
🙄

Interesting that its numbers are basically the same as alacritty's. Lot
better than I remember, I guess the recent libvte performance work was
good.
2024-02-25 09:57:41 +05:30
Kovid Goyal
4585361161 Micro optimization 2024-02-25 09:57:41 +05:30
Kovid Goyal
f64739c29b Fix regression that broke handling of single byte control chars when cursor is on second cell of wide character 2024-02-25 09:57:41 +05:30
Kovid Goyal
f3830aa854 Avoid unnecessary if 2024-02-25 09:57:41 +05:30
Kovid Goyal
f1fe0bf40a Code to easily compare SIMD and scalar decode in a live instance
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal
561712090d Fix cmplt implementation 2024-02-25 09:57:41 +05:30
Kovid Goyal
d5f34c401d Better vector registers to pre-calculate before the loop 2024-02-25 09:57:41 +05:30
Kovid Goyal
d9190ea675 DRYer 2024-02-25 09:57:41 +05:30
Kovid Goyal
57f4ea4d4a Add some tests for broadcast from constant intrinsic 2024-02-25 09:57:41 +05:30
Kovid Goyal
9b0ae8d403 Dont use VEX encoded instructions for 128 bit ISA 2024-02-25 09:57:41 +05:30
Kovid Goyal
aed0611fb8 Avoid double trailing RET 2024-02-25 09:57:40 +05:30
Kovid Goyal
920b8a2496 Use VZEROUPPER in avx functions
See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf
2024-02-25 09:57:40 +05:30
Kovid Goyal
5a5e31c38b Also zero upper at start of function 2024-02-25 09:57:40 +05:30
Kovid Goyal
db2e0e816d Fix mixing of register types in the same function 2024-02-25 09:57:40 +05:30
Kovid Goyal
a298781b85 DRYer 2024-02-25 09:57:40 +05:30
Kovid Goyal
d5cd9ef2ca ... 2024-02-25 09:57:40 +05:30
Kovid Goyal
55c909c656 Use -mtune=intel for SIMD files when building without native optimizations 2024-02-25 09:57:40 +05:30
Kovid Goyal
da31db3212 ... 2024-02-25 09:57:40 +05:30
Kovid Goyal
601c4ad4df Fix some typos 2024-02-25 09:57:40 +05:30
Kovid Goyal
2549b4328f Update throughput comparison table in light of latest improvements 2024-02-25 09:57:40 +05:30
Kovid Goyal
68d800d4fa make clean should clean generated asm as well 2024-02-25 09:57:40 +05:30
Kovid Goyal
9fc3db1dd1 Work on C0 index func 2024-02-25 09:57:40 +05:30
Kovid Goyal
d4c4805f96 const away to glory 2024-02-25 09:57:40 +05:30
Kovid Goyal
161eae78b6 Make generated asm_* files world readable 2024-02-25 09:57:40 +05:30
Kovid Goyal
6cdc7ac91d A further 5% speedup for UTF-8 decoding
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal
0bccada9d1 No longer need to abort after dealing with trailing bytes 2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274 Allow unbounded output in UTF8Decoder
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
d987ffe49a Use unaligned stores
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal
77cfd44f24 More efficient clearing of register to all zeros or all ones 2024-02-25 09:57:39 +05:30
Kovid Goyal
59be7213cf Make set1_epi8 more general 2024-02-25 09:57:39 +05:30
Kovid Goyal
d60dacbd09 Implement > and < intrinsics for vector registers 2024-02-25 09:57:39 +05:30
Kovid Goyal
82b7b4fcce Make a re-useable template for generating ASM index functions with different tests 2024-02-25 09:57:39 +05:30
Kovid Goyal
fa9a2b1e2e Switch file input to use new SIMD parser to search for \n and \r in parallel 2024-02-25 09:57:39 +05:30
Kovid Goyal
4e6138d785 Generate SIMD code during build 2024-02-25 09:57:39 +05:30
Kovid Goyal
86a55e2c0a Use an aligned slice for file reads 2024-02-25 09:57:39 +05:30
Kovid Goyal
de8c1e0206 Work on porting SIMD vt arser to Go for the kittens 2024-02-25 09:57:39 +05:30
Kovid Goyal
131716da00 Ignore another warning on some compiler versions in simde 2024-02-25 09:57:39 +05:30
Kovid Goyal
4d35fc2928 Use a custom movmask for ARM rather than the one from simde
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30