kitty

mirror of https://github.com/kovidgoyal/kitty synced 2026-06-12 19:49:32 +02:00

Author	SHA1	Message	Date
Kovid Goyal	eb07307370	Ignore pedantic warnings from simde headers	2024-04-30 09:54:14 +05:30
Kovid Goyal	393169f79d	Fix #7225	2024-03-14 20:55:05 +05:30
Kovid Goyal	daeaf65d7e	fix compiler warning	2024-02-25 11:17:26 +05:30
Kovid Goyal	f4f06222d4	...	2024-02-25 09:57:44 +05:30
Kovid Goyal	ad3ab877f8	Use a fast SIMD implementation to XOR data going into the disk cache	2024-02-25 09:57:43 +05:30
Kovid Goyal	1db7ac5f6b	Use our new shift by n functions to improve function to zero last N bytes Benchmark neutral but cleaner code using one less vector register and equal number of operations.	2024-02-25 09:57:43 +05:30
Kovid Goyal	e77a970ca1	Also implement arbitrary byte shift for 128 bit registers	2024-02-25 09:57:43 +05:30
Kovid Goyal	a7c06b38e6	We dont actually need vzeroupper at start of function GCC emits vzeroupper automatically when compiling with native optimizations but we still need it otherwise	2024-02-25 09:57:43 +05:30
Kovid Goyal	0a1eb038a5	Implement functions for arbitrary byte shifts in vector registers	2024-02-25 09:57:42 +05:30
Kovid Goyal	eb1e3b33b4	Fix test failure on some systems Broken ass compilers strike again	2024-02-25 09:57:42 +05:30
Kovid Goyal	b021e9b648	Do the default func test last so we can see what the failure is on more explicitly	2024-02-25 09:57:42 +05:30
Kovid Goyal	1acd223f45	...	2024-02-25 09:57:42 +05:30
Kovid Goyal	f48e4ffd5e	Port aligned load based find algorithm to C	2024-02-25 09:57:42 +05:30
Kovid Goyal	36773c09d3	Functions to get bytes to first match ignoring leading bytes	2024-02-25 09:57:42 +05:30
Kovid Goyal	687340003d	...	2024-02-25 09:57:42 +05:30
Kovid Goyal	493fc900e9	Fix build on ARM	2024-02-25 09:57:41 +05:30
Kovid Goyal	f1fe0bf40a	Code to easily compare SIMD and scalar decode in a live instance Also remove -mtune=intel as it fails with clang	2024-02-25 09:57:41 +05:30
Kovid Goyal	d5f34c401d	Better vector registers to pre-calculate before the loop	2024-02-25 09:57:41 +05:30
Kovid Goyal	920b8a2496	Use VZEROUPPER in avx functions See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf	2024-02-25 09:57:40 +05:30
Kovid Goyal	d4c4805f96	const away to glory	2024-02-25 09:57:40 +05:30
Kovid Goyal	6cdc7ac91d	A further 5% speedup for UTF-8 decoding Achieved by decoding in larger chunks thereby amortizing the cost of creating various constant vectors over larger chunks.	2024-02-25 09:57:40 +05:30
Kovid Goyal	0bccada9d1	No longer need to abort after dealing with trailing bytes	2024-02-25 09:57:40 +05:30
Kovid Goyal	9cb9373274	Allow unbounded output in UTF8Decoder This will allow us to eventually decode more than a single vector's worth in a fast inner loop	2024-02-25 09:57:39 +05:30
Kovid Goyal	d987ffe49a	Use unaligned stores Makes no measurable difference in the benchmark. And will eventually allow us to process larger chunks of data without need to reset a bunch of vector registers to constant values each time.	2024-02-25 09:57:39 +05:30
Kovid Goyal	131716da00	Ignore another warning on some compiler versions in simde	2024-02-25 09:57:39 +05:30
Kovid Goyal	4d35fc2928	Use a custom movmask for ARM rather than the one from simde Supposedly faster, not that I can measure it, but... Also gives neater code, so keep it.	2024-02-25 09:57:39 +05:30
Kovid Goyal	9bca415af2	Use aligned loads when finding either of two bytes No measurable performance improvement, but neater algorithm anyway.	2024-02-25 09:57:39 +05:30
Kovid Goyal	60bc8e6c25	...	2024-02-25 09:57:39 +05:30
Kovid Goyal	8aa1b112b8	Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit	2024-02-25 09:57:39 +05:30
Kovid Goyal	0bd47d8457	Cleanup KITTY_NO_SIMD compilation	2024-02-25 09:57:39 +05:30
Kovid Goyal	fcbda63023	Move finding byte code into separate functions movemask() is inefficient on ARM64 this will allow us to use a dedicated implementation for finding bytes on that platform	2024-02-25 09:57:38 +05:30
Kovid Goyal	73342411bc	Dont build any SIMD code when the target is neither ARM64 nor x86/amd64	2024-02-25 09:57:38 +05:30
Kovid Goyal	8dd6f9b07c	Get universal builds working again Now we use lipo and build individually so we can pass the correct compiler flags per arch	2024-02-25 09:57:38 +05:30
Kovid Goyal	7e77a196e6	Build only the SIMD code with SIMD compiler flags	2024-02-25 09:57:38 +05:30
Kovid Goyal	0e4c49a0d6	Fix building on macOS ARM	2024-02-25 09:57:35 +05:30
Kovid Goyal	e783eccc97	fix handling of bits from high byte of 4 byte sequences	2024-02-25 09:57:35 +05:30
Kovid Goyal	7e6459a5e4	DRYer	2024-02-25 09:57:35 +05:30
Kovid Goyal	67d22b0ec6	Avoid multiple branches for checking for trailing sequence	2024-02-25 09:57:34 +05:30
Kovid Goyal	79f99bb3ad	Make print_register useable without full debug	2024-02-25 09:57:34 +05:30
Kovid Goyal	fa3579656b	More invalid utf-8 tests	2024-02-25 09:57:34 +05:30
Kovid Goyal	8a10fcaf5a	More tests	2024-02-25 09:57:34 +05:30
Kovid Goyal	4c8b8caead	Handle trailing incomplete sequences	2024-02-25 09:57:34 +05:30
Kovid Goyal	4238fedee7	More tests	2024-02-25 09:57:34 +05:30
Kovid Goyal	b0dcdf74bd	More tests and micro-optimize switch to ASCII fast path	2024-02-25 09:57:34 +05:30
Kovid Goyal	a63d62fb4e	...	2024-02-25 09:57:34 +05:30
Kovid Goyal	8dbb0cff6f	Dont call __builtin_ctz with zero	2024-02-25 09:57:34 +05:30
Kovid Goyal	07bba337f5	fix various bugs in AVX2 utility functions	2024-02-25 09:57:34 +05:30
Kovid Goyal	b28fbf6817	fix zero-ing of last n bytes	2024-02-25 09:57:34 +05:30
Kovid Goyal	daa169b8ed	More work on utf8 SIMD decode	2024-02-25 09:57:34 +05:30
Kovid Goyal	a5251bedc9	More work on SIMD utf8 decode	2024-02-25 09:57:34 +05:30

1 2

67 Commits