kitty

mirror of https://github.com/kovidgoyal/kitty synced 2026-07-28 03:01:57 +02:00

Author	SHA1	Message	Date
Kovid Goyal	920b8a2496	Use VZEROUPPER in avx functions See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf	2024-02-25 09:57:40 +05:30
Kovid Goyal	5a5e31c38b	Also zero upper at start of function	2024-02-25 09:57:40 +05:30
Kovid Goyal	db2e0e816d	Fix mixing of register types in the same function	2024-02-25 09:57:40 +05:30
Kovid Goyal	a298781b85	DRYer	2024-02-25 09:57:40 +05:30
Kovid Goyal	d5cd9ef2ca	...	2024-02-25 09:57:40 +05:30
Kovid Goyal	55c909c656	Use -mtune=intel for SIMD files when building without native optimizations	2024-02-25 09:57:40 +05:30
Kovid Goyal	da31db3212	...	2024-02-25 09:57:40 +05:30
Kovid Goyal	601c4ad4df	Fix some typos	2024-02-25 09:57:40 +05:30
Kovid Goyal	2549b4328f	Update throughput comparison table in light of latest improvements	2024-02-25 09:57:40 +05:30
Kovid Goyal	68d800d4fa	make clean should clean generated asm as well	2024-02-25 09:57:40 +05:30
Kovid Goyal	9fc3db1dd1	Work on C0 index func	2024-02-25 09:57:40 +05:30
Kovid Goyal	d4c4805f96	const away to glory	2024-02-25 09:57:40 +05:30
Kovid Goyal	161eae78b6	Make generated asm_* files world readable	2024-02-25 09:57:40 +05:30
Kovid Goyal	6cdc7ac91d	A further 5% speedup for UTF-8 decoding Achieved by decoding in larger chunks thereby amortizing the cost of creating various constant vectors over larger chunks.	2024-02-25 09:57:40 +05:30
Kovid Goyal	0bccada9d1	No longer need to abort after dealing with trailing bytes	2024-02-25 09:57:40 +05:30
Kovid Goyal	9cb9373274	Allow unbounded output in UTF8Decoder This will allow us to eventually decode more than a single vector's worth in a fast inner loop	2024-02-25 09:57:39 +05:30
Kovid Goyal	d987ffe49a	Use unaligned stores Makes no measurable difference in the benchmark. And will eventually allow us to process larger chunks of data without need to reset a bunch of vector registers to constant values each time.	2024-02-25 09:57:39 +05:30
Kovid Goyal	77cfd44f24	More efficient clearing of register to all zeros or all ones	2024-02-25 09:57:39 +05:30
Kovid Goyal	59be7213cf	Make set1_epi8 more general	2024-02-25 09:57:39 +05:30
Kovid Goyal	d60dacbd09	Implement > and < intrinsics for vector registers	2024-02-25 09:57:39 +05:30
Kovid Goyal	82b7b4fcce	Make a re-useable template for generating ASM index functions with different tests	2024-02-25 09:57:39 +05:30
Kovid Goyal	fa9a2b1e2e	Switch file input to use new SIMD parser to search for \n and \r in parallel	2024-02-25 09:57:39 +05:30
Kovid Goyal	4e6138d785	Generate SIMD code during build	2024-02-25 09:57:39 +05:30
Kovid Goyal	86a55e2c0a	Use an aligned slice for file reads	2024-02-25 09:57:39 +05:30
Kovid Goyal	de8c1e0206	Work on porting SIMD vt arser to Go for the kittens	2024-02-25 09:57:39 +05:30
Kovid Goyal	131716da00	Ignore another warning on some compiler versions in simde	2024-02-25 09:57:39 +05:30
Kovid Goyal	4d35fc2928	Use a custom movmask for ARM rather than the one from simde Supposedly faster, not that I can measure it, but... Also gives neater code, so keep it.	2024-02-25 09:57:39 +05:30
Kovid Goyal	3b65c1a58a	remove declaration without implementation	2024-02-25 09:57:39 +05:30
Kovid Goyal	9bca415af2	Use aligned loads when finding either of two bytes No measurable performance improvement, but neater algorithm anyway.	2024-02-25 09:57:39 +05:30
Kovid Goyal	60bc8e6c25	...	2024-02-25 09:57:39 +05:30
Kovid Goyal	8aa1b112b8	Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit	2024-02-25 09:57:39 +05:30
Kovid Goyal	0bd47d8457	Cleanup KITTY_NO_SIMD compilation	2024-02-25 09:57:39 +05:30
Kovid Goyal	fcbda63023	Move finding byte code into separate functions movemask() is inefficient on ARM64 this will allow us to use a dedicated implementation for finding bytes on that platform	2024-02-25 09:57:38 +05:30
Kovid Goyal	1d59bfade3	...	2024-02-25 09:57:38 +05:30
Kovid Goyal	fd7d0f8787	Fix event loop continuously ticking every input_delay seconds even when no input is available	2024-02-25 09:57:38 +05:30
Kovid Goyal	fa11858a72	Make bash integration tests more robust on macOS	2024-02-25 09:57:38 +05:30
Kovid Goyal	1293ee60e0	...	2024-02-25 09:57:38 +05:30
Kovid Goyal	66341aa28e	Make the env var controlling which SIMD level to use more capable	2024-02-25 09:57:38 +05:30
Kovid Goyal	73342411bc	Dont build any SIMD code when the target is neither ARM64 nor x86/amd64	2024-02-25 09:57:38 +05:30
Kovid Goyal	8dd6f9b07c	Get universal builds working again Now we use lipo and build individually so we can pass the correct compiler flags per arch	2024-02-25 09:57:38 +05:30
Kovid Goyal	7e77a196e6	Build only the SIMD code with SIMD compiler flags	2024-02-25 09:57:38 +05:30
Kovid Goyal	465616223c	Drop using the v2 microarch No significant performance impact and small risk of breakage	2024-02-25 09:57:38 +05:30
Kovid Goyal	9d4193f4ea	Fix texture ref not useable on repurposed image object	2024-02-25 09:57:38 +05:30
Kovid Goyal	dafb876d75	Skip simd parser tests on machines without SIMD instructions	2024-02-25 09:57:38 +05:30
Kovid Goyal	4b846e0106	Turns out that using 256 bit code on ARM is slightly faster even though it is emulated with 128 bit registers	2024-02-25 09:57:38 +05:30
Kovid Goyal	76c6630084	Dont use 256 bit code paths on ARM ARM only has 128 bit registers. simde simulates 256 bit operations using them, which is fairly pointless for us.	2024-02-25 09:57:38 +05:30
Kovid Goyal	23a4012aeb	Add an env var to turn off use of SIMD instructions	2024-02-25 09:57:38 +05:30
Kovid Goyal	eee14ae148	Workaround for machines on GitHub Actions that incorrectly report CPU vector instruction availability	2024-02-25 09:57:37 +05:30
Kovid Goyal	b0ccaa09be	Clean up test env reporting	2024-02-25 09:57:37 +05:30
Kovid Goyal	bbaccfdaae	DRYer	2024-02-25 09:57:37 +05:30

1 2 3 4 5 ...

13433 Commits