kitty

mirror of https://github.com/kovidgoyal/kitty synced 2026-07-25 09:48:09 +02:00

Author	SHA1	Message	Date
Kovid Goyal	f48e4ffd5e	Port aligned load based find algorithm to C	2024-02-25 09:57:42 +05:30
Kovid Goyal	c01b959723	Fix Go unaligned index implementation	2024-02-25 09:57:42 +05:30
Kovid Goyal	36773c09d3	Functions to get bytes to first match ignoring leading bytes	2024-02-25 09:57:42 +05:30
Kovid Goyal	687340003d	...	2024-02-25 09:57:42 +05:30
Kovid Goyal	7467307200	Add some alignment tests	2024-02-25 09:57:42 +05:30
Kovid Goyal	bbdb0b15f3	DRYer	2024-02-25 09:57:42 +05:30
Kovid Goyal	b5edd9ad57	Dont precalculate mask in loop body No need since we dont shift. Avoids the extra mask instructions for the not found case.	2024-02-25 09:57:42 +05:30
Kovid Goyal	a32e1aafa6	...	2024-02-25 09:57:41 +05:30
Kovid Goyal	f9fd6ffd46	Use only aligned loads for index funcs Also obviates the necessity for safe slice wrappers	2024-02-25 09:57:41 +05:30
Kovid Goyal	31a5fcf297	DRYer	2024-02-25 09:57:41 +05:30
Kovid Goyal	493fc900e9	Fix build on ARM	2024-02-25 09:57:41 +05:30
Kovid Goyal	3abdc54e4b	...	2024-02-25 09:57:41 +05:30
Kovid Goyal	618aeec709	Finally got gnome-terminal to run on my system Apparently it needed some kind of GTK desktop portal or the other 🙄 Interesting that its numbers are basically the same as alacritty's. Lot better than I remember, I guess the recent libvte performance work was good.	2024-02-25 09:57:41 +05:30
Kovid Goyal	4585361161	Micro optimization	2024-02-25 09:57:41 +05:30
Kovid Goyal	f64739c29b	Fix regression that broke handling of single byte control chars when cursor is on second cell of wide character	2024-02-25 09:57:41 +05:30
Kovid Goyal	f3830aa854	Avoid unnecessary if	2024-02-25 09:57:41 +05:30
Kovid Goyal	f1fe0bf40a	Code to easily compare SIMD and scalar decode in a live instance Also remove -mtune=intel as it fails with clang	2024-02-25 09:57:41 +05:30
Kovid Goyal	561712090d	Fix cmplt implementation	2024-02-25 09:57:41 +05:30
Kovid Goyal	d5f34c401d	Better vector registers to pre-calculate before the loop	2024-02-25 09:57:41 +05:30
Kovid Goyal	d9190ea675	DRYer	2024-02-25 09:57:41 +05:30
Kovid Goyal	57f4ea4d4a	Add some tests for broadcast from constant intrinsic	2024-02-25 09:57:41 +05:30
Kovid Goyal	9b0ae8d403	Dont use VEX encoded instructions for 128 bit ISA	2024-02-25 09:57:41 +05:30
Kovid Goyal	aed0611fb8	Avoid double trailing RET	2024-02-25 09:57:40 +05:30
Kovid Goyal	920b8a2496	Use VZEROUPPER in avx functions See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf	2024-02-25 09:57:40 +05:30
Kovid Goyal	5a5e31c38b	Also zero upper at start of function	2024-02-25 09:57:40 +05:30
Kovid Goyal	db2e0e816d	Fix mixing of register types in the same function	2024-02-25 09:57:40 +05:30
Kovid Goyal	a298781b85	DRYer	2024-02-25 09:57:40 +05:30
Kovid Goyal	d5cd9ef2ca	...	2024-02-25 09:57:40 +05:30
Kovid Goyal	55c909c656	Use -mtune=intel for SIMD files when building without native optimizations	2024-02-25 09:57:40 +05:30
Kovid Goyal	da31db3212	...	2024-02-25 09:57:40 +05:30
Kovid Goyal	601c4ad4df	Fix some typos	2024-02-25 09:57:40 +05:30
Kovid Goyal	2549b4328f	Update throughput comparison table in light of latest improvements	2024-02-25 09:57:40 +05:30
Kovid Goyal	68d800d4fa	make clean should clean generated asm as well	2024-02-25 09:57:40 +05:30
Kovid Goyal	9fc3db1dd1	Work on C0 index func	2024-02-25 09:57:40 +05:30
Kovid Goyal	d4c4805f96	const away to glory	2024-02-25 09:57:40 +05:30
Kovid Goyal	161eae78b6	Make generated asm_* files world readable	2024-02-25 09:57:40 +05:30
Kovid Goyal	6cdc7ac91d	A further 5% speedup for UTF-8 decoding Achieved by decoding in larger chunks thereby amortizing the cost of creating various constant vectors over larger chunks.	2024-02-25 09:57:40 +05:30
Kovid Goyal	0bccada9d1	No longer need to abort after dealing with trailing bytes	2024-02-25 09:57:40 +05:30
Kovid Goyal	9cb9373274	Allow unbounded output in UTF8Decoder This will allow us to eventually decode more than a single vector's worth in a fast inner loop	2024-02-25 09:57:39 +05:30
Kovid Goyal	d987ffe49a	Use unaligned stores Makes no measurable difference in the benchmark. And will eventually allow us to process larger chunks of data without need to reset a bunch of vector registers to constant values each time.	2024-02-25 09:57:39 +05:30
Kovid Goyal	77cfd44f24	More efficient clearing of register to all zeros or all ones	2024-02-25 09:57:39 +05:30
Kovid Goyal	59be7213cf	Make set1_epi8 more general	2024-02-25 09:57:39 +05:30
Kovid Goyal	d60dacbd09	Implement > and < intrinsics for vector registers	2024-02-25 09:57:39 +05:30
Kovid Goyal	82b7b4fcce	Make a re-useable template for generating ASM index functions with different tests	2024-02-25 09:57:39 +05:30
Kovid Goyal	fa9a2b1e2e	Switch file input to use new SIMD parser to search for \n and \r in parallel	2024-02-25 09:57:39 +05:30
Kovid Goyal	4e6138d785	Generate SIMD code during build	2024-02-25 09:57:39 +05:30
Kovid Goyal	86a55e2c0a	Use an aligned slice for file reads	2024-02-25 09:57:39 +05:30
Kovid Goyal	de8c1e0206	Work on porting SIMD vt arser to Go for the kittens	2024-02-25 09:57:39 +05:30
Kovid Goyal	131716da00	Ignore another warning on some compiler versions in simde	2024-02-25 09:57:39 +05:30
Kovid Goyal	4d35fc2928	Use a custom movmask for ARM rather than the one from simde Supposedly faster, not that I can measure it, but... Also gives neater code, so keep it.	2024-02-25 09:57:39 +05:30

1 2 3 4 5 ...

13456 Commits