Compare commits

..

755 Commits

Author SHA1 Message Date
Kovid Goyal
2b671100d9 version 0.34.0 2024-04-15 06:54:21 +05:30
Kovid Goyal
ed6ab27a67 Fix #7342 2024-04-14 07:35:05 +05:30
Kovid Goyal
e7fb4376c0 ... 2024-04-12 15:26:29 +05:30
Kovid Goyal
684d28d328 Fix flickering of prompt during window resize
Works by keeping the old prompt unreflowed rather than clearing it.
There may still be some flicker for people using long or right side
prompts, but that cant be avoided, since we cannot know how the shell
will redraw after the resize. But in the common case of a left side
smallish prompt that fits in the resized window, the flicker is
eliminated.

It means we have to do some more copying work on resize, but the nicer
visuals is worth it, IMO.
2024-04-12 15:16:34 +05:30
Kovid Goyal
a5b0db3219 Add a note about the remote_control scroll-window action to the docs for the default scroll actions 2024-04-12 15:09:25 +05:30
Kovid Goyal
cf0a5fb607 Expose pause_rendering to Python 2024-04-12 11:39:24 +05:30
Kovid Goyal
e0374ee623 Avoid pointlessly querying window pos on Wayland 2024-04-12 11:31:04 +05:30
Kovid Goyal
655494f37b Exclude tests from pylsp type checking 2024-04-12 08:52:23 +05:30
Kovid Goyal
353a56dbbf Wayland: Fix initial font size wrong when using fractional scales
We have to check for window scale after window is shown as Wayland has
this crazy design where the compositor only sets fractional scale after
the window is shown. Which means kitty has to do useless work
calculating font metrics twice. Sigh.
2024-04-12 08:29:53 +05:30
Kovid Goyal
1c8fd0ccc4 When asking for quit confirmation because of a running program, mention the program name
Fixes #7331
2024-04-11 14:55:16 +05:30
Kovid Goyal
1c3d3ad9be Fix report_device_attributes 2024-04-10 08:24:39 +05:30
Kovid Goyal
e66b8a47d4 Add support for screen_erase_characters ECH 2024-04-10 08:04:56 +05:30
Kovid Goyal
e57692e4f5 Possibly fix #7327 2024-04-09 21:19:25 +05:30
Kovid Goyal
437fc0d8c2 Revert renaming of kitty.dekstop to kitty-terminal.desktop
Changing the default value for application id/WM_CLASS is a no go, since
existing scripts can depend on it and I try to avoid breaking people's
workflows wherever possible. Guess xdg-mime will just have to live with
the horror of an unhyphenated file name.

Fixes #7326
2024-04-09 12:47:14 +05:30
Kovid Goyal
3d98b33076 ... 2024-04-09 08:39:42 +05:30
Kovid Goyal
a444b5eccb Only use raw monotonic time on Linux and macOS 2024-04-09 08:21:20 +05:30
Kovid Goyal
6c64428be9 CLOCK_MONOTONIC_RAW support for Go 2024-04-09 08:04:15 +05:30
Kovid Goyal
d034bcb1ac ... 2024-04-09 07:11:48 +05:30
Kovid Goyal
325f8df709 text formatting 2024-04-09 07:09:15 +05:30
Kovid Goyal
996a821bf8 Update changelog 2024-04-09 07:05:09 +05:30
Kovid Goyal
ac4eef7eb3 Another try at pointer frame support on Wayland
This time I think I got the wheel handling correct. At least works with
my wheel mouse.
2024-04-08 19:07:52 +05:30
Kovid Goyal
b48b53fce9 Next version will be 0.34.0 2024-04-08 13:35:16 +05:30
Kovid Goyal
acf3fef03d Note when the panel kitten got support for Wayland 2024-04-08 13:34:36 +05:30
Kovid Goyal
48ed574b4f ... 2024-04-08 13:04:11 +05:30
Kovid Goyal
8fc96c5bd7 Make the debug logging functions consistent
They now all output the same format of:
[time since program start] msg
2024-04-08 12:53:55 +05:30
Kovid Goyal
208490f4e1 ... 2024-04-08 11:18:32 +05:30
Kovid Goyal
d392aba64d Wayland CSD: Dont render window shadows for docked windows 2024-04-08 10:59:03 +05:30
Kovid Goyal
597710dd53 Add StartupNotify to kitty.desktop
See https://gitlab.gnome.org/GNOME/mutter/-/issues/2739

Also rename kitty.desktop to kitty-terminal.desktop as otherwise
xdg-menu-install complains about no vendor prefix.
2024-04-08 10:00:06 +05:30
Kovid Goyal
d630b3d8a7 Change the maximize icon to restore when window is maximized 2024-04-08 09:36:38 +05:30
Kovid Goyal
aebfab3777 Merge branch 'dependabot/go_modules/all-go-deps-f04424bbb8' of https://github.com/kovidgoyal/kitty 2024-04-08 09:21:26 +05:30
dependabot[bot]
91f699b571 Bump golang.org/x/sys from 0.18.0 to 0.19.0 in the all-go-deps group
Bumps the all-go-deps group with 1 update: [golang.org/x/sys](https://github.com/golang/sys).


Updates `golang.org/x/sys` from 0.18.0 to 0.19.0
- [Commits](https://github.com/golang/sys/compare/v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-08 03:47:53 +00:00
Kovid Goyal
b46d1d8d21 Fix #7321 2024-04-08 07:39:50 +05:30
Kovid Goyal
bf60321466 Use individual surfaces for corner and bar shadows
Simplifies a bunch of code and also gives us the option at a later date
to turn off some shadows selectively when the window is in a tiled
state.
2024-04-07 22:28:41 +05:30
Kovid Goyal
38daac868a Only run bind --function-names once 2024-04-07 15:15:38 +05:30
Kovid Goyal
0b27f2cbe0 Merge branch 'fish-osc-133' of https://github.com/krobelus/kitty 2024-04-07 15:15:04 +05:30
Kovid Goyal
eb96830aa0 Make CSD API functions naming consistent 2024-04-07 10:18:13 +05:30
Kovid Goyal
334bb36745 Dont enable CSD for non XDG top-level windows such as layer shell surfaces 2024-04-07 10:08:02 +05:30
Kovid Goyal
60f9bcf51c Document the extra fields in the prompt marking escape code that kitty supports 2024-04-07 09:50:14 +05:30
Kovid Goyal
65fadf4ed3 Update changelog 2024-04-07 09:04:23 +05:30
Johannes Altmanninger
8951581815 fish integration: drop redundant OSC 133 markers in upcoming fish 3.8
The upcoming fish 3.8 release will output OSC 133 sequences
unconditionally [1].

I tested ctrl-shift-{g,x,z} bindings both without and with kitty's
shell integration on top; everything seems to work.

Let's simplify kitty integration by removing the markers for the
upcoming fish >= 3.8.

I have hopes that the native OSC 133 implementation address #7200
though I'm not sure if I could reproduce this bug (I only saw a
similar bug when `fish_handle_reflow` was not enabled, which fish
also does now (same commit)).
cc @iacore let me know if you can reproduce #7200 with latest fish master.

[1]: 3b9e3e251b
2024-04-06 22:47:13 +02:00
Johannes Altmanninger
4dc1e733a7 doc keyboard protocol: mention upcoming support in fish 2024-04-06 22:46:58 +02:00
Kovid Goyal
283eba9667 Wayland: Respect top level bounds sent by compositor 2024-04-06 17:29:17 +05:30
Kovid Goyal
c651312a88 Adjust button colors 2024-04-06 17:24:55 +05:30
Kovid Goyal
5b1fdc34eb Wire up the buttons 2024-04-06 14:58:31 +05:30
Kovid Goyal
a158fa108b CSD pointer enter is the same as move 2024-04-06 12:51:09 +05:30
Kovid Goyal
60cb0fa650 Move CSD pointer handling code into CSD file 2024-04-06 12:49:07 +05:30
Kovid Goyal
67314bf2fb Add settings that are optimal for latency 2024-04-06 11:48:55 +05:30
Kovid Goyal
d4cc5aa698 Report errors when attempts are made to perform actions the compositor doesnt support 2024-04-06 11:09:22 +05:30
Kovid Goyal
235b8dc2e4 Assume all capabilities on compositors that dont support reporting of capabilities 2024-04-06 10:59:28 +05:30
Kovid Goyal
4f6faddbab Implement rendering of window control buttons in CSD
They still need to be wired up
2024-04-06 08:32:07 +05:30
Kovid Goyal
416d52bdac ... 2024-04-05 21:33:41 +05:30
Kovid Goyal
2d18e0be81 Fix #7311 2024-04-05 20:00:08 +05:30
Kovid Goyal
9225bd772d Reduce CSD API surface 2024-04-05 19:57:43 +05:30
Kovid Goyal
9e55951d5a ... 2024-04-05 19:37:41 +05:30
Kovid Goyal
d9663aa135 Make code a little clearer 2024-04-05 19:29:51 +05:30
Kovid Goyal
9d86448585 Wayland: Allow hiding window decorations on compositors with SSD as well 2024-04-05 19:23:42 +05:30
Kovid Goyal
2c4ffba0f3 Wayland: A new option to turn off IME 2024-04-05 14:56:11 +05:30
Kovid Goyal
f9e38d3311 Nicer debug output for IME text commit event 2024-04-05 13:53:17 +05:30
Kovid Goyal
1317a7c4ac show-key kitten: Show plain text received not associated with a key event 2024-04-05 13:40:41 +05:30
Kovid Goyal
8f19d7aa8b ... 2024-04-05 13:29:56 +05:30
Kovid Goyal
18b595a7e7 Map keymap fd using MAP_PRIVATE as required by the spec
Also report failures
2024-04-05 13:10:04 +05:30
Kovid Goyal
676c426e87 Use 1 rather than 0 as the keycode for the special mouse click key in fish
zero is used for key events from the wayland text input system that have
only text and no associated key
2024-04-05 13:01:58 +05:30
Kovid Goyal
0198b7fa5a ... 2024-04-05 12:57:58 +05:30
Kovid Goyal
8413d298df remove unused code 2024-04-05 12:09:08 +05:30
Kovid Goyal
77d637cc47 Wayland GNOME: Less jarring title introducer in CSD 2024-04-05 11:44:43 +05:30
Kovid Goyal
9faeb3e2ce Add the XM and xm terminfo capabilities
These are apparently used by vim for mouse protocol detection. Sigh.
2024-04-05 08:08:00 +05:30
Kovid Goyal
1bffe89b5d Wayland GNOME: titlebar color now follows system theme
When GNOME system theme is default, the color matches the background
color. When it is dark it is dark.
2024-04-04 21:52:56 +05:30
Kovid Goyal
bdfa57039c Get rid of frame dependent size storage in kitty layer
This unifies behaviour with CSD and SSD. Now, in both cases the
remembered size is the size of the content area.
2024-04-04 19:39:02 +05:30
Kovid Goyal
f51c2f08a5 DRYer 2024-04-04 19:11:21 +05:30
Kovid Goyal
ecee7086a8 Report compositor missing capabilities in debug output 2024-04-04 16:56:13 +05:30
Kovid Goyal
76999d1a67 Fix creation of single pixel buffer to use 32 bits per color channel 2024-04-04 16:13:28 +05:30
Kovid Goyal
20375ee77a Fix #7310 2024-04-04 15:08:58 +05:30
Kovid Goyal
cd67184432 Output some info about compositor capabilities for --debug-rendering 2024-04-04 11:46:41 +05:30
Kovid Goyal
b3197e4498 Wayland: Add fractional scale support to CSD 2024-04-04 11:25:17 +05:30
Kovid Goyal
90d2b8330a ... 2024-04-04 10:59:51 +05:30
Kovid Goyal
02c6f024d1 Merge branch 'master' of https://github.com/kindhuge/kitty 2024-04-04 10:56:22 +05:30
kindhuge
32905bbf5d chore: remove repetitive words
Signed-off-by: kindhuge <huangpengfei@outlook.com>
2024-04-04 13:19:49 +08:00
Kovid Goyal
bddc552433 ... 2024-04-04 10:40:09 +05:30
Kovid Goyal
1f1f1f60ac Only initialize edge_spacing_func if glfw init succeeds 2024-04-04 10:31:43 +05:30
Kovid Goyal
6adf4f5171 O_CLOEXEC for linux joystick open 2024-04-04 10:31:33 +05:30
Kovid Goyal
b76e94059d Propagate failures to get video mode 2024-04-04 10:31:30 +05:30
Kovid Goyal
ad039c703c Note that file transfer wont work through tmux in the FAQ 2024-04-04 10:31:24 +05:30
Kovid Goyal
efc1509d87 Report errors parsing symbol_map more prominently to user 2024-04-04 09:13:03 +05:30
Kovid Goyal
7071452e6e Fix #7308 2024-04-04 09:08:23 +05:30
Kovid Goyal
b4bba99678 Use single pixel buffer protocol for more efficient temp buffer creation 2024-04-04 09:00:33 +05:30
Kovid Goyal
d3b5e86f30 Account for layouts like the stack layout that hide windows when deciding the value of inactive_text_alpha 2024-04-04 06:01:26 +05:30
Kovid Goyal
16e3b8e0fd Merge branch 'fix-text-alpha-inactive-os-window' of https://github.com/happenslol/kitty 2024-04-04 05:59:05 +05:30
Hilmar Wiegand
20e9549afe Update rules for inactive_text_alpha
Previously, all windows would be drawn with inactive_text_alpha if the
os window was unfocused. This changes the rules to the following:

- If there is a single window and the os window is unfocused, use
inactive_text_alpha.
- If there are multiple windows, use inactive_text_alpha for all
non-active windows, regardless of os window focus.
2024-04-03 20:40:22 +02:00
Kovid Goyal
aae1c81840 Update changelog 2024-04-03 18:54:51 +05:30
Kovid Goyal
23779da2dc Provide access to the current keyboard mode in the tab_title_template 2024-04-01 22:12:49 +05:30
Kovid Goyal
d51c342cbd Merge branch 'dependabot/go_modules/all-go-deps-ae103a535c' of https://github.com/kovidgoyal/kitty 2024-04-01 09:12:59 +05:30
dependabot[bot]
e0c7eefc84 Bump the all-go-deps group with 1 update
Bumps the all-go-deps group with 1 update: [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil).


Updates `github.com/shirou/gopsutil/v3` from 3.24.2 to 3.24.3
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.24.2...v3.24.3)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-01 03:37:17 +00:00
Kovid Goyal
5a62bbdd33 Revert "Wayland: Add support for pointer frame events. Code taken with thanks from SDL"
This reverts commit 506be129e1.
Seems to have broken wheel scrolling and I lake the time/interest to
debug why. Since this commit didnt actually solve any real issue revert
it for now. Revisit in the future when I have more bandwidth.

Fix #7287
2024-03-31 20:33:58 +05:30
Kovid Goyal
94a612c4df fish shell integration: Cleanup detection and binding of passive cursor movement functions
https://github.com/fish-shell/fish-shell/issues/10397#issuecomment-2028585536
Thanks @krobelus
2024-03-31 13:22:31 +05:30
Kovid Goyal
b0d29e7348 Add a note that sway bg covers kitten bg 2024-03-31 12:20:45 +05:30
Kovid Goyal
0965155935 Make the scrollback indicator visible by default 2024-03-31 12:15:05 +05:30
Kovid Goyal
5548b1aa21 ... 2024-03-31 12:02:55 +05:30
Kovid Goyal
775b7c4758 fish shell integration: Fix clicking at the prompt causing autosuggestions to be accepted, needs fish >= 3.8.0
Fixes #7168
2024-03-31 11:57:53 +05:30
Kovid Goyal
f4fe015261 Support a special key mode for moving cursor at marked prompts
Needed for fish integration because the arrow keys cause auto-complete
to trigger.
2024-03-31 11:19:35 +05:30
Kovid Goyal
0c6fa47789 Wayland IME: Fix a bug with handling synthetic keypresses generated by ZMK keyboard + fcitx5
Fixes #7283
2024-03-31 09:42:28 +05:30
Kovid Goyal
274a9d7759 Suppress an unused header warning 2024-03-30 11:41:23 +05:30
Kovid Goyal
7796c15248 Suppress spurious warning from clangd 2024-03-30 11:37:01 +05:30
Kovid Goyal
ce035361e8 ruff deprecations 2024-03-29 13:44:36 +05:30
Kovid Goyal
d260c0a679 ... 2024-03-29 13:31:31 +05:30
Kovid Goyal
c72963dfc5 Use requires-python in pyproject.toml to specify python requirement 2024-03-29 13:23:09 +05:30
Kovid Goyal
4fe65f75bc Move to using ruff for formatting 2024-03-29 12:09:24 +05:30
Kovid Goyal
a61a48d876 Add an example of using a different separator to combine docs 2024-03-29 10:43:38 +05:30
Kovid Goyal
9ac4e6b64e add a link to select_tab in goto_tab docs 2024-03-29 10:14:06 +05:30
Kovid Goyal
a695b4ebe1 Link to tgutui in the integrations doc 2024-03-28 21:31:05 +05:30
Kovid Goyal
700b57bc18 Implement a simple scroll progress indicator
Shows a simple bar on the right edge of the window that moves up as you
scroll further back. There are apparently a lot of people that dont use
a pager for browsing large scrollbacks. I will never understand this,
but, what the hell I was in that code area anyway for other reasons.

TODO: Maybe make it a rounded rectangle
2024-03-28 20:33:35 +05:30
Kovid Goyal
c9f8596357 Switch to LSP for mypy 2024-03-28 19:36:07 +05:30
Kovid Goyal
4b282211de Fix #7276 2024-03-28 18:15:10 +05:30
Kovid Goyal
506be129e1 Wayland: Add support for pointer frame events. Code taken with thanks from SDL 2024-03-28 15:32:59 +05:30
Kovid Goyal
d38c986c82 ... 2024-03-28 10:43:53 +05:30
Kovid Goyal
399a9d65d2 Improve docs on how to use icat without access to the TTY device 2024-03-28 09:23:23 +05:30
Kovid Goyal
8335a5212e macOS: Fix an abort due to an assertion when a program tries to set an invalid window title
Fixes #7271
2024-03-27 19:02:43 +05:30
Kovid Goyal
e5a7554c30 Forgot to handle suspend/resume in example code for setting uservar in nvim 2024-03-26 22:04:08 +05:30
Kovid Goyal
5d5f3ff1b5 Add --app-id as alias for --class
On Wayland we have application ids instead of WM_CLASS. Bloody dumb.
2024-03-26 19:59:59 +05:30
Kovid Goyal
aca13a619a Retry all the ssh kitten tests on failure once 2024-03-26 19:45:20 +05:30
Kovid Goyal
dd879c413a Initialize temp wayland buffer with background color 2024-03-26 19:31:27 +05:30
Kovid Goyal
f8dda12024 Lower the limit of number of fallback fonts again
No need to raise as the issue was caused bya  bug in fontconfig
2024-03-26 18:33:55 +05:30
Kovid Goyal
d2c21ee297 Workaround for fontconfig returning junk in all but the lowest eight bits for FC_INDEX
Fixes #7263
2024-03-26 18:32:44 +05:30
Kovid Goyal
351e96ca75 Ensure temp buffer is destroyed once normal swapping is in place 2024-03-26 17:13:31 +05:30
Kovid Goyal
610390ed69 abort ready loop if window receives a close event 2024-03-26 15:44:02 +05:30
Kovid Goyal
a40a36d191 Wayland: Remove the 120ms penalty from waiting for window creation
When showing the window we loop in the wayland backend using a
temporary buffer of blank pixels to force the compositor to finish
setting up the top level surface pronto.

TODO: Set the color of the temmporary buffer to the background color
2024-03-26 15:40:13 +05:30
Kovid Goyal
073f78badb ... 2024-03-26 14:09:19 +05:30
Kovid Goyal
0c5e8be49a ... 2024-03-26 14:06:39 +05:30
Kovid Goyal
3363de8549 ... 2024-03-26 14:03:23 +05:30
Kovid Goyal
2009a20561 ... 2024-03-26 13:42:19 +05:30
Kovid Goyal
bb45062ef6 Use monotonic() instead of time of day for logging
Time of day is verbose and I have never found it to be of any use
2024-03-26 13:32:07 +05:30
Kovid Goyal
ede332fecf Use our monotonic everywhere
Gives nicer times relative to process start time than the python stdlib
monotonic
2024-03-26 13:26:18 +05:30
Kovid Goyal
304c68ba6f Merge branch 'symbol-fix' of https://github.com/stribor14/kitty 2024-03-26 13:17:37 +05:30
Kovid Goyal
f7a7765ba2 Clean up debug rendering output 2024-03-26 13:06:08 +05:30
Kovid Goyal
adf5917325 Wayland: Only launch child after OS Window achieves its final size
Avoids a bunch of SIGIWNCH during child startup as not all programs
handle these correctly. Sadly adds about 0.1 seconds of latency to
startup. Will have to look into reducing that. The Wayland protocol is
*so badly* designed.
2024-03-26 12:48:45 +05:30
stribor14
08378de48c Fix Smooth mosaic terminal graphic characters from quaters to thirds 2024-03-26 08:00:34 +01:00
Kovid Goyal
f5314cb862 ... 2024-03-26 11:48:57 +05:30
Kovid Goyal
cbd7aa565b Increase max number of fallback fonts 2024-03-26 11:18:55 +05:30
Kovid Goyal
6398dd5b75 Move xdg confugure response into its own function 2024-03-26 11:17:27 +05:30
Kovid Goyal
2edd332759 Flag to indicate that we expect scale from compositor 2024-03-26 10:41:40 +05:30
Kovid Goyal
8bd9dbcee8 In --debug-rendering output when SIGWINCH is sent to child 2024-03-26 10:37:51 +05:30
Kovid Goyal
9149f6e34c Get version of hyprland as well 2024-03-26 10:04:05 +05:30
Kovid Goyal
c5fc65b56a ... 2024-03-26 09:55:52 +05:30
Kovid Goyal
83fcd472bb Debug output: Show name and version of Wayland compositor 2024-03-26 09:54:38 +05:30
Kovid Goyal
006a047276 Also output pointer shape changes when debugging 2024-03-26 09:18:06 +05:30
Kovid Goyal
db3a49fc4b Wayland KDE: Fix mouse cursor hiding not working in Plasma 6
kwin in Plasma 6 now requires usage of pointer_enter_serial instead of
last received serial for wl_set_cursor_image(). Hopefully, this wont
break any other compositors.

Fixes #7265
2024-03-26 09:02:59 +05:30
Kovid Goyal
cd5099d6f7 Splits layout: Fix move_window_forward not working
Fixes #7264
2024-03-26 08:21:03 +05:30
Kovid Goyal
1ae607f924 Merge branch 'optimize-images' of https://github.com/C0rn3j/kitty 2024-03-26 08:06:51 +05:30
Kovid Goyal
4d93801d5f Retry flaky test 2024-03-26 08:03:20 +05:30
Martin Rys
efcacd0885 Oxipng/svgo images to save some 150KB~ 2024-03-25 23:42:56 +01:00
Kovid Goyal
7ade6f97e9 Cleanup DPI change handling 2024-03-25 18:55:29 +05:30
Kovid Goyal
a58187943d ... 2024-03-25 18:31:02 +05:30
Kovid Goyal
06316eee26 DRYer: Maintain font and DPI per OSWindow information in one place 2024-03-25 18:26:47 +05:30
Kovid Goyal
7cebb37c93 Use up-to-date scale in layer shell callback 2024-03-25 17:55:40 +05:30
Kovid Goyal
396def91e5 kwin requires layer properties to be set at creation time 2024-03-25 17:40:14 +05:30
Kovid Goyal
ebee3f1c02 ... 2024-03-25 16:15:18 +05:30
Kovid Goyal
c9701a9b05 Update changelog 2024-03-25 16:04:23 +05:30
Kovid Goyal
cc76732058 ... 2024-03-25 14:01:45 +05:30
Kovid Goyal
3adf05244d Allow using --debug-rendering with panel kitten 2024-03-25 13:56:35 +05:30
Kovid Goyal
0dd2c3ea27 Edge panels now work
Tested under sway
2024-03-25 13:52:20 +05:30
Kovid Goyal
d56fbb88e5 More work on getting layer to actually render 2024-03-25 12:46:31 +05:30
Kovid Goyal
46db1f7b76 Get the layer sizing function working 2024-03-25 12:15:38 +05:30
Kovid Goyal
85a980ea3e Slightly nicer initial scale guess on wayland 2024-03-25 12:02:33 +05:30
Kovid Goyal
e9689ea50d Fix wayland backend windowfocused() implementation 2024-03-25 10:33:27 +05:30
Kovid Goyal
3fa5125f24 Ensure layer shell hint is set just before actual layer shell window creation 2024-03-25 09:42:56 +05:30
Kovid Goyal
4f049302c8 swapped scales 2024-03-25 09:16:50 +05:30
Kovid Goyal
5f7e53bfde ... 2024-03-25 08:54:10 +05:30
Kovid Goyal
de1dee6c3b Debug setting of exclusive zone 2024-03-24 21:57:52 +05:30
Kovid Goyal
411ae71ca9 ... 2024-03-24 20:48:21 +05:30
Kovid Goyal
0b6943fb5a ... 2024-03-24 20:48:20 +05:30
Kovid Goyal
333ea519ed Infrastructure to go from panel CLI opts all the way to wayland layer shell implementation 2024-03-24 20:48:20 +05:30
Kovid Goyal
56978189e0 Infrastructure for passing layer shell config from python to glfw 2024-03-24 20:48:20 +05:30
Kovid Goyal
fe5ccc144b Finish glfw side support for layer shell 2024-03-24 20:48:20 +05:30
Kovid Goyal
0641ec2d89 GLFW API for configuring a window as a layer shell 2024-03-24 20:48:20 +05:30
Kovid Goyal
707e69a794 Start work on wayland layer shell support 2024-03-24 20:48:20 +05:30
Kovid Goyal
5b4ea0052c ... 2024-03-24 20:44:58 +05:30
Kovid Goyal
a0aba4da4a Fix handling of tab character when cursor is at end of line and wrapping is enabled
Fixes #7250
2024-03-23 08:43:06 +05:30
Kovid Goyal
af82938427 Dont bother applying zero style to fallback
Micro-optimization
2024-03-22 15:37:58 +05:30
Kovid Goyal
42994bac37 DRYer 2024-03-22 15:19:39 +05:30
Kovid Goyal
4878b7cfd3 Proper fix for Zapf Dingbats vs Apple
See #7249
2024-03-22 15:09:44 +05:30
Kovid Goyal
98d32e50e0 macOS: Reject styled fallback from CoreText if its family name is not the same as the original
On some systems, for the good Lord alone knows what reason, CoreText is
giving us Zapf Dingbats as a font for some symbols, which doesnt
actually work.

Fixes #7249 (I hope)
2024-03-22 14:38:08 +05:30
Kovid Goyal
12e9db4ccc DRYer 2024-03-22 14:19:21 +05:30
Kovid Goyal
ebb1063e33 ... 2024-03-22 14:10:30 +05:30
Kovid Goyal
716bcb6d12 ... 2024-03-22 14:08:03 +05:30
Kovid Goyal
2f151e773c ... 2024-03-22 13:52:17 +05:30
Kovid Goyal
1c9f9a74e8 Wayland KDE: Add support for background_blur under kwin using a kwin private Wayland protocol 2024-03-22 13:41:44 +05:30
Kovid Goyal
9df7460fe1 Add a note about how to use edit-in-kitty with sudo to edit root files 2024-03-22 11:56:44 +05:30
Kovid Goyal
55feef8663 Linter fixes 2024-03-22 11:16:02 +05:30
Kovid Goyal
22c59fe19f bump version of imaging 2024-03-22 11:06:52 +05:30
Kovid Goyal
3b74fcb88c switch to a maintained fork of imaging 2024-03-22 10:38:22 +05:30
Kovid Goyal
e818f01ff2 Ensure palette is large enough to avoid panics with invalid images that have pixels refering to colors not in the palette 2024-03-22 10:01:20 +05:30
Kovid Goyal
e11081ac09 Use exiffix rather than imaging to handle EXIF rotation
exiffix works for more formats than just JPEG
2024-03-22 09:35:39 +05:30
Kovid Goyal
0eae7ba21d Remove unused parameter 2024-03-22 09:25:13 +05:30
Kovid Goyal
0be9b888fa string changes 2024-03-22 08:15:10 +05:30
Kovid Goyal
adda3249f5 Lets see if sanitize works 2024-03-21 21:00:58 +05:30
Kovid Goyal
cc11ed5c2c Update changelog 2024-03-21 20:53:36 +05:30
Kovid Goyal
83468535dd Implement support for preferred buffer scale 2024-03-21 20:53:36 +05:30
Kovid Goyal
55115058d2 Scale pointer axis events by effective scale 2024-03-21 20:53:36 +05:30
Kovid Goyal
776bfa3d7e Basic fractional scale protocol works 2024-03-21 20:53:36 +05:30
Kovid Goyal
6979f1c5eb Add listener for fractional scale events 2024-03-21 20:53:36 +05:30
Kovid Goyal
5d0c25f5ea Register the viewporter 2024-03-21 20:53:36 +05:30
Kovid Goyal
eb42ad3a2b Rename scale to integer_scale
We will presumably have a fractional_scale soon
2024-03-21 20:53:36 +05:30
Kovid Goyal
2b6edbccbc Start work on fractional scale support for Wayland
Register the interface on startup
2024-03-21 20:53:36 +05:30
Kovid Goyal
7e12cc57c6 Fix #7245 2024-03-21 20:50:05 +05:30
Kovid Goyal
1f149861f9 Mouse reporting: Fix drag release event outside the window not being reported in legacy mouse reporting modes
Fixes #7244
2024-03-21 20:32:58 +05:30
Kovid Goyal
11882aef2d Fix #7243 2024-03-21 17:16:09 +05:30
Kovid Goyal
3c4db20d2d DRYer 2024-03-21 11:27:41 +05:30
Kovid Goyal
924b87a16a Convenience function to get terminfo data in kittens 2024-03-21 10:58:46 +05:30
Kovid Goyal
198b69e275 An option to set TERMINFO to the database directly instead of a path 2024-03-21 10:48:53 +05:30
Kovid Goyal
ad64472950 Make the terminfo database available in the compiled module 2024-03-21 10:16:50 +05:30
Kovid Goyal
d9f33b19bc ... 2024-03-21 09:46:02 +05:30
Kovid Goyal
a5fea33757 version 0.33.1 2024-03-21 08:34:07 +05:30
Kovid Goyal
8cc360e344 Make preferential usage of NERD font for manual fallback more efficient 2024-03-20 20:33:50 +05:30
Kovid Goyal
a285d09459 ... 2024-03-20 20:07:03 +05:30
Kovid Goyal
e646596c5b macOS: When CoreText fails to find a fallback font for a character in the first Private Use Unicode Area, preferentially use the NERD font, if available, for it
Fixes #6043
2024-03-20 20:01:17 +05:30
Kovid Goyal
f3d9ad3244 ... 2024-03-20 18:59:52 +05:30
Kovid Goyal
752fcb6424 macOS: Fix text rendered with fallback fonts not respecting bold/italic styling 2024-03-20 18:23:09 +05:30
Kovid Goyal
0042288e92 remove unused headers 2024-03-20 17:28:14 +05:30
Kovid Goyal
7ea0af6f6d Fix debug-font-fallback to report re-used faces correctly 2024-03-20 17:26:18 +05:30
Kovid Goyal
69c0eaaf74 Dont request sRGB surfaces on Wayland
Apparently mesa just completely broke it. Besides it being already
broken on NVIDIA. Sigh, more of my life wasted on Wayland.

See https://github.com/kovidgoyal/kitty/issues/7174#issuecomment-2000033873
2024-03-19 08:57:48 +05:30
Kovid Goyal
a0424bf1bd Dont omit frame pointer in debug and profile builds 2024-03-18 10:56:50 +05:30
Kovid Goyal
481eccfe3c Merge branch 'dependabot/go_modules/all-go-deps-75b28919f3' of https://github.com/kovidgoyal/kitty 2024-03-18 09:28:52 +05:30
dependabot[bot]
c5c3f98595 Bump the all-go-deps group with 1 update
Bumps the all-go-deps group with 1 update: [github.com/alecthomas/chroma/v2](https://github.com/alecthomas/chroma).


Updates `github.com/alecthomas/chroma/v2` from 2.12.0 to 2.13.0
- [Release notes](https://github.com/alecthomas/chroma/releases)
- [Changelog](https://github.com/alecthomas/chroma/blob/master/.goreleaser.yml)
- [Commits](https://github.com/alecthomas/chroma/compare/v2.12.0...v2.13.0)

---
updated-dependencies:
- dependency-name: github.com/alecthomas/chroma/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-18 03:35:28 +00:00
Kovid Goyal
3c19b6f734 Merge branch 'docs/fine-tuning' of https://github.com/Uaitt/kitty 2024-03-17 09:05:18 +05:30
Lorenzo Zabot
2d8deb86bb docs: minor adjustements 2024-03-16 14:43:23 +01:00
Kovid Goyal
5e0185d3eb Ensure KITTY_NO_SIMD is defined for all files on arches without SIMD support 2024-03-15 12:19:47 +05:30
Kovid Goyal
aee84c32e5 Merge branch 'fix-typo' of https://github.com/KaranveerB/kitty 2024-03-15 11:53:19 +05:30
KaranveerB
19a9594143 Fix typo in mapping.rst 2024-03-14 23:01:27 -07:00
Kovid Goyal
32f0da2e77 Ensure no frame is created for assembly functions 2024-03-15 07:58:09 +05:30
Kovid Goyal
d329cb3fff Update FAQ 2024-03-14 21:40:16 +05:30
Kovid Goyal
3bb9e36fc8 ... 2024-03-14 21:00:57 +05:30
Kovid Goyal
76ae5f5b9b DRYer: Use the SIMD detection in setup.py to avoid calling __builtin_cpu_supports 2024-03-14 20:57:09 +05:30
Kovid Goyal
393169f79d Fix #7225 2024-03-14 20:55:05 +05:30
Kovid Goyal
f5570c38dd Turn off sanitizers in CI as they are segfaulting
Trying to debug this in CI is too much work. Hopefully whatever
update in the CI env that is causing these will eventually be fixed.
2024-03-14 18:37:19 +05:30
Kovid Goyal
0153c9bb85 Use -g3 for profiling rather than -g 2024-03-14 17:07:38 +05:30
Kovid Goyal
1712a317d5 ... 2024-03-14 16:38:19 +05:30
Kovid Goyal
10cff577d6 Also get a backtrace when generating go code segfaults on CI 2024-03-14 16:25:52 +05:30
Kovid Goyal
86c59016c6 Try outputting core dump when multiprocessing spawn segfaults 2024-03-14 16:19:33 +05:30
Kovid Goyal
7821ae39ab Also need gdb to get coredumps in CI 2024-03-14 16:09:51 +05:30
Kovid Goyal
039d144c84 Splits layout: Allow resizing until one of the halves in a split is minimally sized
Fixes #7220
2024-03-14 15:59:23 +05:30
Kovid Goyal
af0d570725 Install systemd-coredump on CI so we can see coredumps 2024-03-14 15:18:33 +05:30
Kovid Goyal
288fa0128b Fix test suite running under sanitizers 2024-03-14 15:01:55 +05:30
Kovid Goyal
77125798a4 Redirect to NULL instead of pipe since we dont use the output 2024-03-14 12:15:15 +05:30
Kovid Goyal
c26954c69e ... 2024-03-14 12:05:57 +05:30
Kovid Goyal
9667da307c Print detected compiler type in verbose mode 2024-03-14 12:04:33 +05:30
Kovid Goyal
baa3ec0a62 Explicitly detect compiler types gcc vs clang 2024-03-14 12:02:01 +05:30
Kovid Goyal
478fc766b6 ... 2024-03-14 11:53:44 +05:30
Kovid Goyal
6c49066cde Fix undefined function pointer usage found by clang sanitizer 2024-03-14 11:47:56 +05:30
Kovid Goyal
a839af04dc Fix #7219 2024-03-14 11:13:54 +05:30
Kovid Goyal
3950632517 Switch to detecting clang rather than gcc
gcc makes it impossible to detect that it is gcc via --version
so instead detect clang and assume gcc if not clang.

Fixes #7218
2024-03-14 10:48:27 +05:30
Kovid Goyal
297ac9c3fe ... 2024-03-14 08:54:39 +05:30
Kovid Goyal
60d4ed3a1c ... 2024-03-13 14:12:49 +05:30
Kovid Goyal
a1b40cc8f5 Handle exception raised by Cocoa runtime when trying to get user notification center from a non-bundled executable 2024-03-13 11:43:38 +05:30
Kovid Goyal
5a9cf82564 Fix requesting data from clipboard via OSC 52 getting it from primary selection instead
Fixes #7213
2024-03-13 09:43:28 +05:30
Kovid Goyal
04f8cb6d30 version 0.33.0 2024-03-12 20:49:31 +05:30
Kovid Goyal
5264f3b5aa Merge branch 'dependabot/go_modules/all-go-deps-8f3a99df9e' of https://github.com/kovidgoyal/kitty 2024-03-11 11:11:56 +05:30
dependabot[bot]
5105c2f3ea Bump the all-go-deps group with 1 update
Bumps the all-go-deps group with 1 update: [golang.org/x/sys](https://github.com/golang/sys).


Updates `golang.org/x/sys` from 0.17.0 to 0.18.0
- [Commits](https://github.com/golang/sys/compare/v0.17.0...v0.18.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-11 04:00:55 +00:00
Kovid Goyal
eb46107575 Merge branch 'master' of https://github.com/pbevin/kitty 2024-03-10 21:41:04 +05:30
Pete Bevin
d75b8c63ef Fix build instructions after ./dev.sh deps 2024-03-10 12:02:51 -04:00
Kovid Goyal
7d787e6c22 Implement box drawing for Fira Code spinner glyphs 2024-03-10 21:08:23 +05:30
Kovid Goyal
36c79f3d50 Implement box drawing for Fira Code progress bar glyphs in PUA 2024-03-10 19:51:07 +05:30
Kovid Goyal
88b7595929 Ignore startup_session when kitty is invoked with command line options specifying a command to run 2024-03-10 09:41:08 +05:30
Kovid Goyal
fccf8db099 GitHub returns errors on delete but actually deletes 2024-03-10 08:41:14 +05:30
Kovid Goyal
d33afd4e96 Move toggle_tab into Boss
actions arent supported on TabManager
2024-03-07 11:57:07 +05:30
Kovid Goyal
76a4840a0f toggle_tab to easily switch to and back from a tab
Fixes #7203
2024-03-07 11:38:28 +05:30
Kovid Goyal
65923b1aba Add some benchamrking 2024-03-07 11:09:24 +05:30
Kovid Goyal
47fea26b62 Add an IndexByte implementation useful for benchmarking against stdlib SIMD implementation 2024-03-07 09:36:40 +05:30
Kovid Goyal
9ea0dde469 Add a note that startup_session prevents processing of cli args 2024-03-07 08:55:44 +05:30
Kovid Goyal
6c31256aa1 Keyboard protocol: Do not deliver a fake key release events on OS window focus out for engaged modifiers
Fixes #7196
2024-03-07 08:29:10 +05:30
Kovid Goyal
210c417d96 ... 2024-03-06 10:41:39 +05:30
Kovid Goyal
c1af14c22a Fix @ send-key not working to send keys to self over TTY 2024-03-05 13:09:07 +05:30
Kovid Goyal
a3d8be5e2f icat: Nicer error when user specifies invalid screen geometry 2024-03-05 10:49:47 +05:30
Kovid Goyal
d08a44ea4c Add a note as to why errors are not reported for send-text 2024-03-05 08:37:56 +05:30
Kovid Goyal
63d974135b Clean up linter warnings 2024-03-05 08:27:13 +05:30
Kovid Goyal
7b34c0603f Fix --match not working for some remote control commands 2024-03-05 08:00:57 +05:30
Kovid Goyal
c3c99113c7 hints kitten: Use default editor rather than hardcoding vim to open file at specific line
Fixes #7186
2024-03-04 21:49:06 +05:30
Kovid Goyal
9352576d1e Merge branch 'nushell-shell-integration' of https://github.com/gutenye/kitty 2024-03-04 19:44:18 +05:30
Guten Ye
dc82a06e9e doc: add detailed step to enable shell integration for Nushell 2024-03-04 21:41:50 +08:00
Kovid Goyal
bb98b81f82 Note that kitty keyboard protocol is supported in yazi
Fixes #7189
2024-03-04 19:03:23 +05:30
Kovid Goyal
1687c74913 Note that nushell supports shell integration 2024-03-04 19:01:31 +05:30
Kovid Goyal
bfe4256bf2 Merge branch 'dependabot/go_modules/all-go-deps-9e5965924d' of https://github.com/kovidgoyal/kitty 2024-03-04 10:43:51 +05:30
dependabot[bot]
66c3e1c1ff Bump the all-go-deps group with 1 update
Bumps the all-go-deps group with 1 update: [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil).


Updates `github.com/shirou/gopsutil/v3` from 3.24.1 to 3.24.2
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.24.1...v3.24.2)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-04 03:31:42 +00:00
Kovid Goyal
c6acfa2cd4 Add a note to clarify lock key handling in disambiguate mode 2024-03-03 09:12:54 +05:30
Kovid Goyal
8fa592d849 Parse and ignore SOS codes
Fixes #7184
2024-03-03 08:58:47 +05:30
Kovid Goyal
89108e856f Clarify exactly when modifiers bits are set in the keyboard protocol
Fixes #7183
2024-03-02 13:14:41 +05:30
Kovid Goyal
0a0420bfd0 kitten @ ls: Return the timestamp at which the window was created
Fixes #7178
2024-03-02 09:30:12 +05:30
Kovid Goyal
2cc08cc5fd Cleanup previous PR 2024-03-02 09:24:19 +05:30
Kovid Goyal
f3344e1da0 Merge branch 'escapes' of https://github.com/derekschrock/kitty 2024-03-02 09:24:01 +05:30
Derek Schrock
9eb82c600d Fix invalid escape sequence warning with reset_terminal long_text 2024-03-01 14:33:05 -05:00
Kovid Goyal
17fe6b3373 Simplify sanitize args 2024-03-01 11:08:51 +05:30
Kovid Goyal
c3869dc479 ... 2024-03-01 11:07:51 +05:30
Kovid Goyal
473bff1aae Cheetah speed
😸
2024-02-29 10:16:08 +05:30
Kovid Goyal
99ceb2476e Fix #7169 2024-02-29 09:56:41 +05:30
Kovid Goyal
05881db492 Remove unused code 2024-02-28 12:15:20 +05:30
Kovid Goyal
1086757dc0 More docs 2024-02-28 11:39:16 +05:30
Kovid Goyal
d4c302bea3 Cleanup clear to prompt implementation and allow moving cleared lies into scrollback 2024-02-28 11:27:41 +05:30
Kovid Goyal
b8774327b6 icat kitten: Add a command line argument to override terminal window size detection
Fixes #7165

I had five minutes, so why not.
2024-02-27 23:06:10 +05:30
Kovid Goyal
e9c4e73103 Merge branch 'dependabot/go_modules/all-go-deps-55e9d2be01' of https://github.com/kovidgoyal/kitty 2024-02-26 10:12:12 +05:30
dependabot[bot]
d19a2fb606 Bump the all-go-deps group with 1 update
Bumps the all-go-deps group with 1 update: [github.com/dlclark/regexp2](https://github.com/dlclark/regexp2).


Updates `github.com/dlclark/regexp2` from 1.10.0 to 1.11.0
- [Commits](https://github.com/dlclark/regexp2/compare/v1.10.0...v1.11.0)

---
updated-dependencies:
- dependency-name: github.com/dlclark/regexp2
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-26 04:00:28 +00:00
Kovid Goyal
99af9739b2 ... 2024-02-25 19:46:18 +05:30
Kovid Goyal
9ce366fa7b Fix #7154 2024-02-25 15:22:43 +05:30
Kovid Goyal
daeaf65d7e fix compiler warning 2024-02-25 11:17:26 +05:30
Kovid Goyal
c5a10d19a4 Update the changelog 2024-02-25 10:49:22 +05:30
Kovid Goyal
a5f3142514 hints kitten: The option to set the text color for hints now allows arbitrary colors
Fixes #7150
2024-02-25 10:02:38 +05:30
Kovid Goyal
cf49fcc4e9 Make --dump-bytes robust against parser code modifying contents of buffer during parsing 2024-02-25 09:57:45 +05:30
Kovid Goyal
06dd84d6da Ensure event loop ticks ASAP when there is pending input 2024-02-25 09:57:45 +05:30
Kovid Goyal
dd3d4f8451 ... 2024-02-25 09:57:44 +05:30
Kovid Goyal
c08523a1ad Fix --dump-bytes duplicating bytes because of input_delay 2024-02-25 09:57:44 +05:30
Kovid Goyal
559be309ea Document previous PR 2024-02-25 09:57:44 +05:30
Carlos Esparza
299c39a214 add new-tab-left and new-tab-right 2024-02-25 09:57:44 +05:30
Carlos Esparza
4575608a97 add new-tab-neighbor option to detach_window 2024-02-25 09:57:44 +05:30
Kovid Goyal
1a9a7a59ac Make XOR64 test also test alignment issues 2024-02-25 09:57:44 +05:30
Kovid Goyal
e7d6101bd4 DRYer 2024-02-25 09:57:44 +05:30
Kovid Goyal
76381f5cdd Another tdir rmtree failure during tear down ignored 2024-02-25 09:57:44 +05:30
Kovid Goyal
40a4429e58 Ignore failure to remove tempdir during test tear down 2024-02-25 09:57:44 +05:30
Kovid Goyal
bd1aed43ec Check for leftovers when tokenizing 2024-02-25 09:57:44 +05:30
Kovid Goyal
7727cd45cf Delay load replacements as well 2024-02-25 09:57:44 +05:30
Kovid Goyal
4ee5b94584 Improve typing info for lex_scanner 2024-02-25 09:57:44 +05:30
Kovid Goyal
1d5c7d662e log error when failing to parse URL 2024-02-25 09:57:44 +05:30
Kovid Goyal
f4f06222d4 ... 2024-02-25 09:57:44 +05:30
Kovid Goyal
4caf8a6b14 Restore support for alternate character sets
Needed by the execrable ncurses. Adds an extra branch in the hot path,
sigh. Thanks to branch prediction it doesnt have any measurable impact
on the benchmark, thankfully.
2024-02-25 09:57:44 +05:30
Kovid Goyal
c19488f3be Graphics protocol: Add a new delete mode for deleting images whose ids fall within a range
Useful for bulk deletion. See #7080
2024-02-25 09:57:44 +05:30
Kovid Goyal
ad3ab877f8 Use a fast SIMD implementation to XOR data going into the disk cache 2024-02-25 09:57:43 +05:30
Kovid Goyal
88f3c8c5ee Reduce max key size in disk cache
We used only 12 byte keys no need to have a max key size more than 16
2024-02-25 09:57:43 +05:30
Kovid Goyal
91013a4e05 Faster image cache key generation 2024-02-25 09:57:43 +05:30
Kovid Goyal
de92470f0d Improve performance of disk cache when there are thousands of small images
Fixes #7080
2024-02-25 09:57:43 +05:30
Kovid Goyal
1c62b0f1ec ... 2024-02-25 09:57:43 +05:30
Kovid Goyal
b52af64ffe Hide cursor during benchmark run 2024-02-25 09:57:43 +05:30
Kovid Goyal
1db7ac5f6b Use our new shift by n functions to improve function to zero last N bytes
Benchmark neutral but cleaner code using one less vector register and equal
number of operations.
2024-02-25 09:57:43 +05:30
Kovid Goyal
e77a970ca1 Also implement arbitrary byte shift for 128 bit registers 2024-02-25 09:57:43 +05:30
Kovid Goyal
ac984d05f2 Fix gcc detection 2024-02-25 09:57:43 +05:30
Kovid Goyal
f16c2a0d67 Move checking for compiler brand into Env 2024-02-25 09:57:43 +05:30
Kovid Goyal
29a574a4bc Prevent duplicate VZEROUPPER instructions 2024-02-25 09:57:43 +05:30
Kovid Goyal
a7c06b38e6 We dont actually need vzeroupper at start of function
GCC emits vzeroupper automatically when compiling with native
optimizations but we still need it otherwise
2024-02-25 09:57:43 +05:30
Kovid Goyal
16d36c46fe Update to using math/rand/v2 2024-02-25 09:57:43 +05:30
Kovid Goyal
720618bc37 Use go 1.22 for building
It supports PCALIGN on non ARM arches as well
2024-02-25 09:57:43 +05:30
Kovid Goyal
2f727e6561 ... 2024-02-25 09:57:43 +05:30
Kovid Goyal
b65a5f78fd Fix regression causing shells in darwin to not run in login mode 2024-02-25 09:57:43 +05:30
Kovid Goyal
0a1eb038a5 Implement functions for arbitrary byte shifts in vector registers 2024-02-25 09:57:42 +05:30
Kovid Goyal
e541c0534d ... 2024-02-25 09:57:42 +05:30
Kovid Goyal
eb1e3b33b4 Fix test failure on some systems
Broken ass compilers strike again
2024-02-25 09:57:42 +05:30
Kovid Goyal
41ec46d5bb ... 2024-02-25 09:57:42 +05:30
Kovid Goyal
b021e9b648 Do the default func test last so we can see what the failure is on more explicitly 2024-02-25 09:57:42 +05:30
Kovid Goyal
ede4d7fbca ... 2024-02-25 09:57:42 +05:30
Kovid Goyal
d0797a025b Add dedicated tests for find_either_of_two 2024-02-25 09:57:42 +05:30
Kovid Goyal
1acd223f45 ... 2024-02-25 09:57:42 +05:30
Kovid Goyal
8639a2ac40 Update perf figures based on latest code 2024-02-25 09:57:42 +05:30
Kovid Goyal
f48e4ffd5e Port aligned load based find algorithm to C 2024-02-25 09:57:42 +05:30
Kovid Goyal
c01b959723 Fix Go unaligned index implementation 2024-02-25 09:57:42 +05:30
Kovid Goyal
36773c09d3 Functions to get bytes to first match ignoring leading bytes 2024-02-25 09:57:42 +05:30
Kovid Goyal
687340003d ... 2024-02-25 09:57:42 +05:30
Kovid Goyal
7467307200 Add some alignment tests 2024-02-25 09:57:42 +05:30
Kovid Goyal
bbdb0b15f3 DRYer 2024-02-25 09:57:42 +05:30
Kovid Goyal
b5edd9ad57 Dont precalculate mask in loop body
No need since we dont shift. Avoids the extra mask instructions for the
not found case.
2024-02-25 09:57:42 +05:30
Kovid Goyal
a32e1aafa6 ... 2024-02-25 09:57:41 +05:30
Kovid Goyal
f9fd6ffd46 Use only aligned loads for index funcs
Also obviates the necessity for safe slice wrappers
2024-02-25 09:57:41 +05:30
Kovid Goyal
31a5fcf297 DRYer 2024-02-25 09:57:41 +05:30
Kovid Goyal
493fc900e9 Fix build on ARM 2024-02-25 09:57:41 +05:30
Kovid Goyal
3abdc54e4b ... 2024-02-25 09:57:41 +05:30
Kovid Goyal
618aeec709 Finally got gnome-terminal to run on my system
Apparently it needed some kind of GTK desktop portal or the other
🙄

Interesting that its numbers are basically the same as alacritty's. Lot
better than I remember, I guess the recent libvte performance work was
good.
2024-02-25 09:57:41 +05:30
Kovid Goyal
4585361161 Micro optimization 2024-02-25 09:57:41 +05:30
Kovid Goyal
f64739c29b Fix regression that broke handling of single byte control chars when cursor is on second cell of wide character 2024-02-25 09:57:41 +05:30
Kovid Goyal
f3830aa854 Avoid unnecessary if 2024-02-25 09:57:41 +05:30
Kovid Goyal
f1fe0bf40a Code to easily compare SIMD and scalar decode in a live instance
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal
561712090d Fix cmplt implementation 2024-02-25 09:57:41 +05:30
Kovid Goyal
d5f34c401d Better vector registers to pre-calculate before the loop 2024-02-25 09:57:41 +05:30
Kovid Goyal
d9190ea675 DRYer 2024-02-25 09:57:41 +05:30
Kovid Goyal
57f4ea4d4a Add some tests for broadcast from constant intrinsic 2024-02-25 09:57:41 +05:30
Kovid Goyal
9b0ae8d403 Dont use VEX encoded instructions for 128 bit ISA 2024-02-25 09:57:41 +05:30
Kovid Goyal
aed0611fb8 Avoid double trailing RET 2024-02-25 09:57:40 +05:30
Kovid Goyal
920b8a2496 Use VZEROUPPER in avx functions
See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf
2024-02-25 09:57:40 +05:30
Kovid Goyal
5a5e31c38b Also zero upper at start of function 2024-02-25 09:57:40 +05:30
Kovid Goyal
db2e0e816d Fix mixing of register types in the same function 2024-02-25 09:57:40 +05:30
Kovid Goyal
a298781b85 DRYer 2024-02-25 09:57:40 +05:30
Kovid Goyal
d5cd9ef2ca ... 2024-02-25 09:57:40 +05:30
Kovid Goyal
55c909c656 Use -mtune=intel for SIMD files when building without native optimizations 2024-02-25 09:57:40 +05:30
Kovid Goyal
da31db3212 ... 2024-02-25 09:57:40 +05:30
Kovid Goyal
601c4ad4df Fix some typos 2024-02-25 09:57:40 +05:30
Kovid Goyal
2549b4328f Update throughput comparison table in light of latest improvements 2024-02-25 09:57:40 +05:30
Kovid Goyal
68d800d4fa make clean should clean generated asm as well 2024-02-25 09:57:40 +05:30
Kovid Goyal
9fc3db1dd1 Work on C0 index func 2024-02-25 09:57:40 +05:30
Kovid Goyal
d4c4805f96 const away to glory 2024-02-25 09:57:40 +05:30
Kovid Goyal
161eae78b6 Make generated asm_* files world readable 2024-02-25 09:57:40 +05:30
Kovid Goyal
6cdc7ac91d A further 5% speedup for UTF-8 decoding
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal
0bccada9d1 No longer need to abort after dealing with trailing bytes 2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274 Allow unbounded output in UTF8Decoder
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
d987ffe49a Use unaligned stores
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal
77cfd44f24 More efficient clearing of register to all zeros or all ones 2024-02-25 09:57:39 +05:30
Kovid Goyal
59be7213cf Make set1_epi8 more general 2024-02-25 09:57:39 +05:30
Kovid Goyal
d60dacbd09 Implement > and < intrinsics for vector registers 2024-02-25 09:57:39 +05:30
Kovid Goyal
82b7b4fcce Make a re-useable template for generating ASM index functions with different tests 2024-02-25 09:57:39 +05:30
Kovid Goyal
fa9a2b1e2e Switch file input to use new SIMD parser to search for \n and \r in parallel 2024-02-25 09:57:39 +05:30
Kovid Goyal
4e6138d785 Generate SIMD code during build 2024-02-25 09:57:39 +05:30
Kovid Goyal
86a55e2c0a Use an aligned slice for file reads 2024-02-25 09:57:39 +05:30
Kovid Goyal
de8c1e0206 Work on porting SIMD vt arser to Go for the kittens 2024-02-25 09:57:39 +05:30
Kovid Goyal
131716da00 Ignore another warning on some compiler versions in simde 2024-02-25 09:57:39 +05:30
Kovid Goyal
4d35fc2928 Use a custom movmask for ARM rather than the one from simde
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30
Kovid Goyal
3b65c1a58a remove declaration without implementation 2024-02-25 09:57:39 +05:30
Kovid Goyal
9bca415af2 Use aligned loads when finding either of two bytes
No measurable performance improvement, but neater algorithm anyway.
2024-02-25 09:57:39 +05:30
Kovid Goyal
60bc8e6c25 ... 2024-02-25 09:57:39 +05:30
Kovid Goyal
8aa1b112b8 Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit 2024-02-25 09:57:39 +05:30
Kovid Goyal
0bd47d8457 Cleanup KITTY_NO_SIMD compilation 2024-02-25 09:57:39 +05:30
Kovid Goyal
fcbda63023 Move finding byte code into separate functions
movemask() is inefficient on ARM64 this will allow us to use a dedicated
implementation for finding bytes on that platform
2024-02-25 09:57:38 +05:30
Kovid Goyal
1d59bfade3 ... 2024-02-25 09:57:38 +05:30
Kovid Goyal
fd7d0f8787 Fix event loop continuously ticking every input_delay seconds even when no input is available 2024-02-25 09:57:38 +05:30
Kovid Goyal
fa11858a72 Make bash integration tests more robust on macOS 2024-02-25 09:57:38 +05:30
Kovid Goyal
1293ee60e0 ... 2024-02-25 09:57:38 +05:30
Kovid Goyal
66341aa28e Make the env var controlling which SIMD level to use more capable 2024-02-25 09:57:38 +05:30
Kovid Goyal
73342411bc Dont build any SIMD code when the target is neither ARM64 nor x86/amd64 2024-02-25 09:57:38 +05:30
Kovid Goyal
8dd6f9b07c Get universal builds working again
Now we use lipo and build individually so we can pass the correct
compiler flags per arch
2024-02-25 09:57:38 +05:30
Kovid Goyal
7e77a196e6 Build only the SIMD code with SIMD compiler flags 2024-02-25 09:57:38 +05:30
Kovid Goyal
465616223c Drop using the v2 microarch
No significant performance impact and small risk of breakage
2024-02-25 09:57:38 +05:30
Kovid Goyal
9d4193f4ea Fix texture ref not useable on repurposed image object 2024-02-25 09:57:38 +05:30
Kovid Goyal
dafb876d75 Skip simd parser tests on machines without SIMD instructions 2024-02-25 09:57:38 +05:30
Kovid Goyal
4b846e0106 Turns out that using 256 bit code on ARM is slightly faster even though it is emulated with 128 bit registers 2024-02-25 09:57:38 +05:30
Kovid Goyal
76c6630084 Dont use 256 bit code paths on ARM
ARM only has 128 bit registers. simde simulates 256 bit operations using
them, which is fairly pointless for us.
2024-02-25 09:57:38 +05:30
Kovid Goyal
23a4012aeb Add an env var to turn off use of SIMD instructions 2024-02-25 09:57:38 +05:30
Kovid Goyal
eee14ae148 Workaround for machines on GitHub Actions that incorrectly report CPU vector instruction availability 2024-02-25 09:57:37 +05:30
Kovid Goyal
b0ccaa09be Clean up test env reporting 2024-02-25 09:57:37 +05:30
Kovid Goyal
bbaccfdaae DRYer 2024-02-25 09:57:37 +05:30
Kovid Goyal
cb5a2cce53 ... 2024-02-25 09:57:37 +05:30
Kovid Goyal
4fec11af05 Run dsymutil in post link phase 2024-02-25 09:57:37 +05:30
Kovid Goyal
5a9304e1b8 DRYer 2024-02-25 09:57:37 +05:30
Kovid Goyal
2b9c646c5b Build dSYM bundles on CI 2024-02-25 09:57:37 +05:30
Kovid Goyal
6b6f3e0ece ... 2024-02-25 09:57:37 +05:30
Kovid Goyal
b560fe34c9 Give the functions for creating various objects unique names so they are easily recognized in macOS's non-fully-symolicated crash reports 2024-02-25 09:57:37 +05:30
Kovid Goyal
e5b27d066c Output macOS crash reports on CI with nicer formatting 2024-02-25 09:57:37 +05:30
Kovid Goyal
8762a939c0 Dont specify arch/tune when building universal binary 2024-02-25 09:57:37 +05:30
Kovid Goyal
06da31019c Micro-optimize clearing of lines
Use a doubling strategy to memset arrays to a fixed value. Makes the
memset O(log(N)) from O(N) in number of calls to memcpy.
2024-02-25 09:57:37 +05:30
Kovid Goyal
d0621cb82a Better ipd crash report printing 2024-02-25 09:57:37 +05:30
Kovid Goyal
9935b5ddb2 ... 2024-02-25 09:57:37 +05:30
Kovid Goyal
49d664bb0d Fix incorrect line mapping when clearing screen using optimized code 2024-02-25 09:57:37 +05:30
Kovid Goyal
c6c0d0ed60 Sleep for a minute in the hope that macOS crash log will become available 2024-02-25 09:57:37 +05:30
Kovid Goyal
6f74d1b0c1 ... 2024-02-25 09:57:36 +05:30
Kovid Goyal
5eb852532f Use coredumpctl on Linux CI 2024-02-25 09:57:36 +05:30
Kovid Goyal
43e0281ab5 No ulimit on Linux CI 2024-02-25 09:57:36 +05:30
Kovid Goyal
99d1eec021 ... 2024-02-25 09:57:36 +05:30
Kovid Goyal
0a158f3577 More attempts at finding a core dump on macOS 2024-02-25 09:57:36 +05:30
Kovid Goyal
89c431a624 Optimize implementation of clear screen escape code 2024-02-25 09:57:36 +05:30
Kovid Goyal
b48b70aedf Speed up CSI benchmark by another 10% 2024-02-25 09:57:36 +05:30
Kovid Goyal
f105bc5f4e ... 2024-02-25 09:57:36 +05:30
Kovid Goyal
d5fae07ab7 More help text for the benchmark kitten 2024-02-25 09:57:36 +05:30
Kovid Goyal
58dbcf0840 ... 2024-02-25 09:57:36 +05:30
Kovid Goyal
0340c3c8f7 Ensure CSI state reset at end of test 2024-02-25 09:57:36 +05:30
Kovid Goyal
d8a53fbafd Retry on temp errors when reading from terminal 2024-02-25 09:57:36 +05:30
Kovid Goyal
b237e1b99f ... 2024-02-25 09:57:36 +05:30
Kovid Goyal
43f64f71e4 DRYer 2024-02-25 09:57:36 +05:30
Kovid Goyal
ceac074dad Try to print the Apple crash report on test run failure 2024-02-25 09:57:36 +05:30
Kovid Goyal
903dd26a08 Sadly -march=x86-64-v2 is not the culprit for the intermittent SIGILL in macOS CI 2024-02-25 09:57:36 +05:30
Kovid Goyal
f0efb1cb19 Also clear screen at end of each loop when rendering 2024-02-25 09:57:35 +05:30
Kovid Goyal
4eb49b3320 Simplify benchmark kitten
On macOS reading from the same tty device file as we are writing too in
another thread gives continuous EAGAIN errors. We dont actually need
simultaneous read/write, so move the reads to the end.
2024-02-25 09:57:35 +05:30
Kovid Goyal
0fcb055246 tty: retry on temporary read errors 2024-02-25 09:57:35 +05:30
Kovid Goyal
61a89a14b6 Ignore temporary write failures in benchmark kitten 2024-02-25 09:57:35 +05:30
Kovid Goyal
4fbb70d89e Explain the purpose of the CSI column 2024-02-25 09:57:35 +05:30
Kovid Goyal
5c3e54dede Note that konsole and xterm dont support synchronized update 2024-02-25 09:57:35 +05:30
Kovid Goyal
5721b1315e ... 2024-02-25 09:57:35 +05:30
Kovid Goyal
85fcac2a61 Add throughput performance numbers 2024-02-25 09:57:35 +05:30
Kovid Goyal
8d01a42db1 Make the default number of repetitions for benchmark 100 2024-02-25 09:57:35 +05:30
Kovid Goyal
0e4c49a0d6 Fix building on macOS ARM 2024-02-25 09:57:35 +05:30
Kovid Goyal
a9111f9a40 Try disabling x86-64-v2 on macOS 2024-02-25 09:57:35 +05:30
Kovid Goyal
616fcfd201 More tests 2024-02-25 09:57:35 +05:30
Kovid Goyal
b3ca5d51fb Use the new SIMD utf-8 decoder 2024-02-25 09:57:35 +05:30
Kovid Goyal
e783eccc97 fix handling of bits from high byte of 4 byte sequences 2024-02-25 09:57:35 +05:30
Kovid Goyal
c98b9403ac Dynamically allocated parser state should be 64 byte aligned as well 2024-02-25 09:57:35 +05:30
Kovid Goyal
7e6459a5e4 DRYer 2024-02-25 09:57:35 +05:30
Kovid Goyal
67d22b0ec6 Avoid multiple branches for checking for trailing sequence 2024-02-25 09:57:34 +05:30
Kovid Goyal
79f99bb3ad Make print_register useable without full debug 2024-02-25 09:57:34 +05:30
Kovid Goyal
fa3579656b More invalid utf-8 tests 2024-02-25 09:57:34 +05:30
Kovid Goyal
8a10fcaf5a More tests 2024-02-25 09:57:34 +05:30
Kovid Goyal
4c8b8caead Handle trailing incomplete sequences 2024-02-25 09:57:34 +05:30
Kovid Goyal
4238fedee7 More tests 2024-02-25 09:57:34 +05:30
Kovid Goyal
b0dcdf74bd More tests and micro-optimize switch to ASCII fast path 2024-02-25 09:57:34 +05:30
Kovid Goyal
a63d62fb4e ... 2024-02-25 09:57:34 +05:30
Kovid Goyal
8dbb0cff6f Dont call __builtin_ctz with zero 2024-02-25 09:57:34 +05:30
Kovid Goyal
07bba337f5 fix various bugs in AVX2 utility functions 2024-02-25 09:57:34 +05:30
Kovid Goyal
b28fbf6817 fix zero-ing of last n bytes 2024-02-25 09:57:34 +05:30
Kovid Goyal
daa169b8ed More work on utf8 SIMD decode 2024-02-25 09:57:34 +05:30
Kovid Goyal
a5251bedc9 More work on SIMD utf8 decode 2024-02-25 09:57:34 +05:30
Kovid Goyal
e9820eb207 zero out bytes beyond src_sz after loading src into the register 2024-02-25 09:57:34 +05:30
Kovid Goyal
645b2811e2 more work on the SIMD utf8 decode 2024-02-25 09:57:34 +05:30
Kovid Goyal
9804021de4 More work on SIMD utf8 decode 2024-02-25 09:57:34 +05:30
Kovid Goyal
ea5858e10b avoid repeated construction of one, two, tree values vectors 2024-02-25 09:57:33 +05:30
Kovid Goyal
4589a62738 ... 2024-02-25 09:57:33 +05:30
Kovid Goyal
1275c9275b Output the third and final utf8 decoded byte 2024-02-25 09:57:33 +05:30
Kovid Goyal
d95f7ac159 Fix compilation on clang 2024-02-25 09:57:33 +05:30
Kovid Goyal
8e2d448c5c More work on UTF-8 SIMD decode 2024-02-25 09:57:33 +05:30
Kovid Goyal
37c05e3212 ... 2024-02-25 09:57:33 +05:30
Kovid Goyal
99e67f0859 ... 2024-02-25 09:57:33 +05:30
Kovid Goyal
2cb87861c0 Ensure cpu is inited before calling cpu_supports() 2024-02-25 09:57:33 +05:30
Kovid Goyal
c1793d8781 Pause rendering per repetition
Needed when number of repetitions is large enough to cause
paused rendering to be aborted
2024-02-25 09:57:33 +05:30
Kovid Goyal
fce896c480 Do not render when benchmarking parser to better isolate parser performance 2024-02-25 09:57:33 +05:30
Kovid Goyal
f45cd87488 Implement paused rendering for graphics 2024-02-25 09:57:33 +05:30
Kovid Goyal
7b963a2372 Allow texture references to outlive parent images
This is needed for paused rendering of images. Use a simple ref counting
scheme.
2024-02-25 09:57:33 +05:30
Kovid Goyal
d863cbd7c0 Ensure selection data is updated on GPU after paused rendering 2024-02-25 09:57:33 +05:30
Kovid Goyal
e50447c840 Fix cursor rendering during rendering pause 2024-02-25 09:57:33 +05:30
Kovid Goyal
ab919f6fa1 fix copy onto incorrect buffer 2024-02-25 09:57:33 +05:30
Kovid Goyal
f596351bc1 Pause selection rendering 2024-02-25 09:57:33 +05:30
Kovid Goyal
7c5e011fe6 No need to pass Screen to iteration_data() 2024-02-25 09:57:32 +05:30
Kovid Goyal
b444b2ee36 Implement paused rendering for cell data 2024-02-25 09:57:32 +05:30
Kovid Goyal
aeb60edf55 Freeze inverted status during paused rendering 2024-02-25 09:57:32 +05:30
Kovid Goyal
6c2ef90033 Add some const for functions taking ColorProfile 2024-02-25 09:57:32 +05:30
Kovid Goyal
182b0aac98 Freeze the color profile during paused rendering 2024-02-25 09:57:32 +05:30
Kovid Goyal
d9d6bd7ffb ... 2024-02-25 09:57:32 +05:30
Kovid Goyal
21bba05805 Turn off paused rendering on reset, resize and scrollback scroll 2024-02-25 09:57:32 +05:30
Kovid Goyal
b146e9c457 Add basic parser tests for pending mode activation/de-activation 2024-02-25 09:57:32 +05:30
Kovid Goyal
1f835b27c4 start work on implementing pending mode as paused rendering 2024-02-25 09:57:32 +05:30
Kovid Goyal
89debca4af Ensure leftover bytes are a copy 2024-02-25 09:57:32 +05:30
Kovid Goyal
a33b747de5 Fix find_in_memoryview() 2024-02-25 09:57:32 +05:30
Kovid Goyal
532cc44e66 Ensure screen is always set when calling parse_sgr 2024-02-25 09:57:32 +05:30
Kovid Goyal
391a43d967 Store last cursor render pos in the rendered info struct 2024-02-25 09:57:32 +05:30
Kovid Goyal
be37a283d5 Move unfocused ender bool into cursor render info 2024-02-25 09:57:32 +05:30
Kovid Goyal
7e424e1848 Refactor ascii decode into its own function 2024-02-25 09:57:32 +05:30
Kovid Goyal
96bcb1d33b Fix handling on new_input_at 2024-02-25 09:57:32 +05:30
Kovid Goyal
7f60c649f4 ... 2024-02-25 09:57:31 +05:30
Kovid Goyal
e52bcb5b93 more workon simd utf-8 decode 2024-02-25 09:57:31 +05:30
Kovid Goyal
d93283c547 annotate utf-8 encoder 2024-02-25 09:57:31 +05:30
Kovid Goyal
aef0b9f50f ... 2024-02-25 09:57:31 +05:30
Kovid Goyal
74391d7c50 More work on SIMD utf-8 decode 2024-02-25 09:57:31 +05:30
Kovid Goyal
8975d1a9f4 no need to parametrize sentinel 2024-02-25 09:57:31 +05:30
Kovid Goyal
48bf8c6105 Report out of single byte control code embedded in CSI 2024-02-25 09:57:31 +05:30
Kovid Goyal
0ed1c6f840 Simplify utf8 parser func
Also show a replacement char for incomplete utf-8 sequences interrupted by an esc char
2024-02-25 09:57:31 +05:30
Kovid Goyal
72e73f2f81 Fix alignment of output array in UTF8Decoder 2024-02-25 09:57:31 +05:30
Kovid Goyal
95eac2e510 ... 2024-02-25 09:57:31 +05:30
Kovid Goyal
bc499000a5 Infrastructure for developing and testing UTF-8 SIMD decode 2024-02-25 09:57:31 +05:30
Kovid Goyal
e2be8c2d37 Use unaligned loads for SIMD
makes no difference to the benchmarks and simplifies the code
2024-02-25 09:57:31 +05:30
Kovid Goyal
fd4c8e1e2d Get rid of ByteLoader
Doesnt move the benchmarks
2024-02-25 09:57:31 +05:30
Kovid Goyal
ba18c5a669 Move ByteLoader back to simd-string.c in preparation for getting rid of it 2024-02-25 09:57:31 +05:30
Kovid Goyal
293ad34535 Get rid of utoi() 2024-02-25 09:57:31 +05:30
Kovid Goyal
45e10394a0 Get rid of ByteLoader from csi_parse_loop
It benchmark's 4% slower on my machine
2024-02-25 09:57:31 +05:30
Kovid Goyal
0531b4bc79 Move too long CSI check out of parse loop 2024-02-25 09:57:30 +05:30
Kovid Goyal
0f6d11351b Fix benchmark rate calculation 2024-02-25 09:57:30 +05:30
Kovid Goyal
1b11c3e923 Double timeouts on flaky test 2024-02-25 09:57:30 +05:30
Kovid Goyal
c79baa56e4 Remove unused SIMD code 2024-02-25 09:57:30 +05:30
Kovid Goyal
c6f4c93d0a Nicer exit code diagnostic 2024-02-25 09:57:30 +05:30
Kovid Goyal
8742fb8cce Detect availability of intrinsics on intel macs just in case 2024-02-25 09:57:30 +05:30
Kovid Goyal
0bd67620c6 ... 2024-02-25 09:57:30 +05:30
Kovid Goyal
0f60ac2dd7 sprintf -> snprintf 2024-02-25 09:57:30 +05:30
Kovid Goyal
0cd761808a draw_codepoint is never called with from_inputstream=true 2024-02-25 09:57:30 +05:30
Kovid Goyal
43fb09dc39 Speed up Screen.draw 2024-02-25 09:57:30 +05:30
Kovid Goyal
c2d81d67c2 Nicer macros to ignore diagnostics 2024-02-25 09:57:30 +05:30
Kovid Goyal
272e944a13 DRYer 2024-02-25 09:57:30 +05:30
Kovid Goyal
a9f5519d11 Add tests for writing with cursor on trailer of wide char 2024-02-25 09:57:30 +05:30
Kovid Goyal
a055aaf035 ... 2024-02-25 09:57:30 +05:30
Kovid Goyal
718f4b328f Go back to a single code path for drawing text
Slightly reduces pure ASCII performance and improves Unicode
performance. We should be able to get pure ASCII performance back
via SIMD eventually.
2024-02-25 09:57:30 +05:30
Kovid Goyal
b41cf52ce4 ensure no control chars are drawn 2024-02-25 09:57:29 +05:30
Kovid Goyal
e08e15a676 Ensure parser buffer is aligned to 64 bytes 2024-02-25 09:57:29 +05:30
Kovid Goyal
c5f0b03a62 Remove not needed function 2024-02-25 09:57:29 +05:30
Kovid Goyal
794bd85371 Ignore warning from simde on clang 2024-02-25 09:57:29 +05:30
Kovid Goyal
73d657a21a Dont use intel intrinsics switches on ARM 2024-02-25 09:57:29 +05:30
Kovid Goyal
103f5f3956 Move ringbuf into 3rdparty 2024-02-25 09:57:29 +05:30
Kovid Goyal
ef7d92a117 Update uthash from upstream 2024-02-25 09:57:29 +05:30
Kovid Goyal
33102d8c4e Move uthash into 3rdparty 2024-02-25 09:57:29 +05:30
Kovid Goyal
56dcbca238 Move base64simd into a 3rdparty folder 2024-02-25 09:57:29 +05:30
Kovid Goyal
cc6dc96c90 Allow setting benchmark options 2024-02-25 09:57:29 +05:30
Kovid Goyal
2dffad1d8e Use byteloader for printable char ranges 2024-02-25 09:57:29 +05:30
Kovid Goyal
93430cd5f4 Images benchmark should not measure speed of zlib 2024-02-25 09:57:29 +05:30
Kovid Goyal
3d0a90e63d Switch to SIMD based base64 2024-02-25 09:57:29 +05:30
Kovid Goyal
9eb91984dd Cleanup benchmark warmup code 2024-02-25 09:57:29 +05:30
Kovid Goyal
071c8200a6 ... 2024-02-25 09:57:29 +05:30
Kovid Goyal
ad7175a24d ... 2024-02-25 09:57:29 +05:30
Kovid Goyal
24232ba277 Ensure goroutine has started before sending data 2024-02-25 09:57:28 +05:30
Kovid Goyal
0f6e5fe57e Fix benchmark rate calculation 2024-02-25 09:57:28 +05:30
Kovid Goyal
ef8e8313ab For some reason, memcpy is faster than assignment 2024-02-25 09:57:28 +05:30
Kovid Goyal
17cb65e981 Adjust amount of data in the benchmarks for more consistent timing 2024-02-25 09:57:28 +05:30
Kovid Goyal
f2153f060d add unicode benchmark 2024-02-25 09:57:28 +05:30
Kovid Goyal
48c0b30671 Install simde on CI 2024-02-25 09:57:28 +05:30
Kovid Goyal
e8f67281cf Warmup font rendering before running benchmark 2024-02-25 09:57:28 +05:30
Kovid Goyal
49a54b086f Use simde so SIMD speedups work on ARM as well 2024-02-25 09:57:28 +05:30
Kovid Goyal
4790959938 Use -fno-plt
We dont need the PLT and it frees up some registers
2024-02-25 09:57:28 +05:30
Kovid Goyal
33249c872f Use a better default march for binary builds
x86-64-v2 implies SSE4.2 which should be available everywhere by now. We
will see if we get errors with it.

https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level#architectural_considerations_for_rhel_9
2024-02-25 09:57:28 +05:30
Kovid Goyal
71bf099041 Speed up drawing of printable ascii chars 2024-02-25 09:57:28 +05:30
Kovid Goyal
307acb3f64 Add API to Screen to draw a set of printable ascii chars fast 2024-02-25 09:57:28 +05:30
Kovid Goyal
e5675e9537 Simplify API 2024-02-25 09:57:28 +05:30
Kovid Goyal
9cf425006f ... 2024-02-25 09:57:28 +05:30
Kovid Goyal
c052831291 Dont double parse CSI digits 2024-02-25 09:57:28 +05:30
Kovid Goyal
fe2cd543ba Switch to same algorithm for 128bit SIMD as used for 256 bit SIMD
Avoids needing to write to the haystack and also less chance of a bug in
the never tested simd since all CPUs I have access to have AVX2
2024-02-25 09:57:28 +05:30
Kovid Goyal
1925d5ea65 Prepare for plain sse4 fallback 2024-02-25 09:57:27 +05:30
Kovid Goyal
aacdffd539 DRYer 2024-02-25 09:57:27 +05:30
Kovid Goyal
a0e1eb4985 AVX2 implementation for find either of two 2024-02-25 09:57:27 +05:30
Kovid Goyal
e4c48a5f17 Add AVX2 implementation of find byte not in range
Also fix alignment bug and ensure the simd finders dont return a pointer
beyond the end
2024-02-25 09:57:27 +05:30
Kovid Goyal
021dd168e5 ... 2024-02-25 09:57:27 +05:30
Kovid Goyal
b032313c45 Only use SIMD if CPU supports it at runtime 2024-02-25 09:57:27 +05:30
Kovid Goyal
19a41b4d9a Use sse4.2 instruction for normal mode printable ascii detection 2024-02-25 09:57:27 +05:30
Kovid Goyal
25e7a2882d Work on using SIMD for normal mode dispatch 2024-02-25 09:57:27 +05:30
Kovid Goyal
a75fb6509e ... 2024-02-25 09:57:27 +05:30
Kovid Goyal
23c42cb555 ... 2024-02-25 09:57:27 +05:30
Kovid Goyal
1f8feea454 Parse new data that is writtne while parsing is in progress in the parse loop
Avoids unnecessary memmove()
2024-02-25 09:57:27 +05:30
Kovid Goyal
f0afdc51af ... 2024-02-25 09:57:27 +05:30
Kovid Goyal
ac6afcb0a8 Release the parser IO lock while parsing 2024-02-25 09:57:27 +05:30
Kovid Goyal
ad7f671a7b Add a long escape code benchmark 2024-02-25 09:57:27 +05:30
Kovid Goyal
4f67b8b433 Need -msse4.2 on non-native builds 2024-02-25 09:57:27 +05:30
Kovid Goyal
e3d6aa2c60 Use simd in a few loops 2024-02-25 09:57:27 +05:30
Kovid Goyal
89d416806b ... 2024-02-25 09:57:26 +05:30
Kovid Goyal
859b0cc585 Include -march=native for debug builds 2024-02-25 09:57:26 +05:30
Kovid Goyal
8b4209cb97 Also use fast find for pending mode 2024-02-25 09:57:26 +05:30
Kovid Goyal
8dca5a6b9a ... 2024-02-25 09:57:26 +05:30
Kovid Goyal
200e5bf6e3 Examine 8 bytes at once for terminator char 2024-02-25 09:57:26 +05:30
Kovid Goyal
f4819175b0 Start work on vectorizing searches 2024-02-25 09:57:26 +05:30
Kovid Goyal
5921ca1139 Add images benchmark 2024-02-25 09:57:26 +05:30
Kovid Goyal
dbc4b98742 Ignore input_delay when the input buffer is close to full 2024-02-25 09:57:26 +05:30
Kovid Goyal
822c9cb1d6 ... 2024-02-25 09:57:26 +05:30
Kovid Goyal
529de9c91d Allow specifying benchmarks to run on the command line 2024-02-25 09:57:26 +05:30
Kovid Goyal
7914523a16 Add a CSI + ascii test 2024-02-25 09:57:26 +05:30
Kovid Goyal
d39c71f927 Round the time to two digit precision 2024-02-25 09:57:26 +05:30
Kovid Goyal
934f2ede0b Start work on simple benchmark tool 2024-02-25 09:57:26 +05:30
Kovid Goyal
8dbea2a046 ... 2024-02-25 09:57:26 +05:30
Kovid Goyal
38c8100a76 ... 2024-02-25 09:57:26 +05:30
Kovid Goyal
a560d86d0f Use aligned loads for the byte loader 2024-02-25 09:57:26 +05:30
Kovid Goyal
47a493c090 Increase chunk size for graphics protocol since the VT parser now supports it 2024-02-25 09:57:25 +05:30
Kovid Goyal
35da87994b Fix input_delay not working 2024-02-25 09:57:25 +05:30
Kovid Goyal
f49f2a1b82 Fix buf full -> not full reporting 2024-02-25 09:57:25 +05:30
Kovid Goyal
91c3492455 Allow logging code to log arbitrary length messages 2024-02-25 09:57:25 +05:30
Kovid Goyal
75872a1097 Dont need an extra variable 2024-02-25 09:57:25 +05:30
Kovid Goyal
4c267bdc24 Use a faster base64 implementation
From the Chromium source code BSD licensed
2024-02-25 09:57:25 +05:30
Kovid Goyal
409ca6bfab Allow larger graphics escape code sizes 2024-02-25 09:57:25 +05:30
Kovid Goyal
56abcbf910 Remove unused base64 32bit functions 2024-02-25 09:57:25 +05:30
Kovid Goyal
f140b74f17 ... 2024-02-25 09:57:25 +05:30
Kovid Goyal
8360a4ec53 Only reset urf8 state when transitioning into normal 2024-02-25 09:57:25 +05:30
Kovid Goyal
ccf124218b ... 2024-02-25 09:57:25 +05:30
Kovid Goyal
737d7bf8f2 Fix parse_sgr buf overread 2024-02-25 09:57:25 +05:30
Kovid Goyal
3f41b22011 Use the byte loader for normal mode 2024-02-25 09:57:25 +05:30
Kovid Goyal
43451b1287 ... 2024-02-25 09:57:25 +05:30
Kovid Goyal
2914c2eb95 Use the byte loader for parsing CSI as well 2024-02-25 09:57:25 +05:30
Kovid Goyal
fc1775753a ... 2024-02-25 09:57:25 +05:30
Kovid Goyal
65aca5b140 Speedup utoi by loading numbers in 8 byte chunks 2024-02-25 09:57:24 +05:30
Kovid Goyal
e7c466797c threading test for full buffer 2024-02-25 09:57:24 +05:30
Kovid Goyal
c66c0b8edc threading tests for pending 2024-02-25 09:57:24 +05:30
Kovid Goyal
a6da0ac6ca Log bad remote commands 2024-02-25 09:57:24 +05:30
Kovid Goyal
5c9aa6a21a json.loads() stupidly does not accept memoryview 2024-02-25 09:57:24 +05:30
Kovid Goyal
50935b6c93 Cleanup kitty dcs parsing 2024-02-25 09:57:24 +05:30
Kovid Goyal
0a6d83901d ... 2024-02-25 09:57:24 +05:30
Kovid Goyal
8bff6f1995 More threading tests 2024-02-25 09:57:24 +05:30
Kovid Goyal
8f1b30a25b No need to ask for 7bit controls anymore 2024-02-25 09:57:24 +05:30
Kovid Goyal
9f337e93fc Add some threading tests 2024-02-25 09:57:24 +05:30
Kovid Goyal
08d99967dc Sanitize contents of remote print 2024-02-25 09:57:24 +05:30
Kovid Goyal
72635c55c5 Convenience methods to test parser threading 2024-02-25 09:57:24 +05:30
Kovid Goyal
2b3b8bae23 Fix osc52 null termination 2024-02-25 09:57:24 +05:30
Kovid Goyal
f96182cc11 Fix utf8 decode 2024-02-25 09:57:24 +05:30
Kovid Goyal
93784903b2 Remove FLUSH_DRAW as it is not needed 2024-02-25 09:57:24 +05:30
Kovid Goyal
afcffc03b1 Separate test of write and read so we can test threading 2024-02-25 09:57:24 +05:30
Kovid Goyal
34164dc341 Read errors from child must commit a zero write 2024-02-25 09:57:23 +05:30
Kovid Goyal
6205fb32fd Refactor VT parser for more speed
No longer copy bytes into a separate buffer, instead parse them in place
in the read buffer
2024-02-25 09:57:23 +05:30
Kovid Goyal
23bb2e1b67 Fast function to replace c0 codes 2024-02-25 09:57:23 +05:30
Kovid Goyal
c81ac668da Use a single code path for tests and live VT parsing 2024-02-25 09:57:23 +05:30
Kovid Goyal
f42b49e597 Avoid a double parse for pending mode 2024-02-25 09:57:23 +05:30
Kovid Goyal
a4ca143fc5 Limit amount of pending data we will store 2024-02-25 09:57:23 +05:30
Kovid Goyal
969bd05fc5 Represent malformed UTF-8 with the replacement character 2024-02-25 09:57:23 +05:30
Kovid Goyal
8a83014f51 Dont construct memoryview when not needed in non dump code path 2024-02-25 09:57:23 +05:30
Kovid Goyal
dcde461c02 ... 2024-02-25 09:57:23 +05:30
Kovid Goyal
6c0e938d5a ... 2024-02-25 09:57:23 +05:30
Kovid Goyal
76158f39ba Pass the window id to the dump calback 2024-02-25 09:57:23 +05:30
Kovid Goyal
a4193a1b02 Fix dumping of bytes/commands 2024-02-25 09:57:23 +05:30
Kovid Goyal
5ab1e647bf Use libc alloc instead of python alloc for vt parser 2024-02-25 09:57:23 +05:30
Kovid Goyal
9ecf79fa84 Fix parse worker 2024-02-25 09:57:23 +05:30
Kovid Goyal
44c96a208e All tests now pass 2024-02-25 09:57:23 +05:30
Kovid Goyal
065866895c Get pending mode working and add a few more tests 2024-02-25 09:57:23 +05:30
Kovid Goyal
52025ff030 misc parser and test fixes 2024-02-25 09:57:22 +05:30
Kovid Goyal
5168e0b576 Port parse_bytes() used in the tests 2024-02-25 09:57:22 +05:30
Kovid Goyal
e4bb00d942 Implement UTF-8 decoding for screen_draw() 2024-02-25 09:57:22 +05:30
Kovid Goyal
5f809bf249 Get kitty building with the new VT parser 2024-02-25 09:57:22 +05:30
Kovid Goyal
b083ad9038 Start work on bytes based VT parser 2024-02-25 09:57:22 +05:30
Kovid Goyal
ce2e1b0813 Ensure we dont pass a NULL pointer to wl_pointer_set_cursor()
Possible fix for #7139
2024-02-20 23:31:18 +05:30
Kovid Goyal
be92cc87a4 macOS: The command line args from macos-launch-services-cmdline are now prefixed to any args from open --args rather than overwriting them
The purpose of the file is to provide default command line args when
launching from GUI. Since macOS nowadays also allows command line args
when launch via open, also respect them.

Fixes #7135
2024-02-18 11:22:15 +05:30
Kovid Goyal
b2391553f9 Keyboard protocol: Fix the Enter Tab and Backspace keys generating spurious release events even when report all keys as escape codes is not set
Fixes #7136
2024-02-18 11:12:24 +05:30
Kovid Goyal
d35f391725 Fix #7131 2024-02-15 13:06:33 +05:30
Kovid Goyal
b9ebb23bb9 Fix #7130 2024-02-14 19:11:02 +05:30
Kovid Goyal
c4ef6b87aa ... 2024-02-12 14:58:13 +05:30
Kovid Goyal
031f9d8c26 ... 2024-02-12 14:43:46 +05:30
Kovid Goyal
8c25c55f01 Merge branch 'fix-build-docs' of https://github.com/sytranvn/kitty 2024-02-12 14:36:21 +05:30
Sy Tran
cdce26e519 fix: typo in build docs 2024-02-12 15:59:01 +07:00
Kovid Goyal
925043d645 ... 2024-02-12 11:17:25 +05:30
Kovid Goyal
f63a4cf90c version 0.32.2 2024-02-12 11:15:49 +05:30
Kovid Goyal
e6d881e89b Ensure we have at least a 1px thick line in cross shade 2024-02-12 09:37:35 +05:30
Kovid Goyal
325e6acd7e Parametrize by number of lines not density 2024-02-12 09:35:15 +05:30
Kovid Goyal
10cfc66737 Merge branch 'dependabot/go_modules/all-go-deps-3cea7aaa0b' of https://github.com/kovidgoyal/kitty 2024-02-12 09:33:11 +05:30
Kovid Goyal
5dbfee9e9c DRYer 2024-02-12 09:32:25 +05:30
Kovid Goyal
edd2bc85ae Adjust cross_shade to have appearance more like in Unicode standard
We try to draw approximately seven diagonal lines per cell
2024-02-12 09:30:11 +05:30
dependabot[bot]
e918b3fb1e Bump the all-go-deps group with 1 update
Bumps the all-go-deps group with 1 update: [golang.org/x/sys](https://github.com/golang/sys).


Updates `golang.org/x/sys` from 0.16.0 to 0.17.0
- [Commits](https://github.com/golang/sys/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-12 03:57:03 +00:00
Kovid Goyal
c915d1bf58 Fix #7121 2024-02-12 09:24:51 +05:30
Kovid Goyal
cd2c7b3bbd git rev-list --skip invocation changed 2024-02-12 08:47:50 +05:30
Kovid Goyal
63b8893c50 Fix #7117 2024-02-11 06:30:01 +05:30
Kovid Goyal
946d28ae37 Completion for kitty @ load-config --override xxx 2024-02-10 14:52:29 +05:30
Kovid Goyal
97e2d41233 Completion for kitty @ action 2024-02-10 14:01:45 +05:30
Kovid Goyal
54548931b5 Allow running mappable actions via remote control
Saves me having to define a special remote control wrapper for every
mappable action.
2024-02-10 13:23:06 +05:30
Kovid Goyal
ac7b6870a8 close_other_os_windows: to close non active OS windows
Fixes #7113
2024-02-10 12:20:55 +05:30
Kovid Goyal
576a269648 Special case rendering of some more box drawing characters using shades from the block of symbols for legacy computing
Fixes #7110
2024-02-10 10:13:46 +05:30
Kovid Goyal
4bcf69a47e Add more shade box drawing characters
From the legacy computing symbols block
2024-02-10 09:45:25 +05:30
Kovid Goyal
5a418a8cd6 Merge branch 'prompt_command-empty-check' of https://github.com/akinomyoga/kitty 2024-02-09 19:15:56 +05:30
Kovid Goyal
585ac148a6 ... 2024-02-09 19:14:26 +05:30
Koichi Murase
af84161528 Fix Bash integration removing existing elements of PROMPT_COMMAND 2024-02-09 20:50:30 +09:00
Kovid Goyal
7c14e0d666 macOS: Fix an abort when changing OS window chrome for a full screen window via remote control or the themes kitten
Fixes #7106
2024-02-09 15:32:09 +05:30
Kovid Goyal
62347d7e97 remove unneeded headers 2024-02-09 15:16:56 +05:30
Kovid Goyal
777fd5350b Add a test for Go flock implementation 2024-02-09 11:54:51 +05:30
Kovid Goyal
442ca012fd ... 2024-02-07 11:19:42 +05:30
Kovid Goyal
065b17ddbd kitten @ load-config: Allow (re)loading kitty.conf via remote control 2024-02-07 11:08:55 +05:30
Kovid Goyal
bc3c9ce2fa Fix #7100 2024-02-05 20:48:49 +05:30
Kovid Goyal
9bea8bb5bc remove no longer needed code 2024-02-05 13:54:22 +05:30
Kovid Goyal
fef8c536d8 update .gitignore for vt branch as well 2024-02-05 13:33:30 +05:30
Kovid Goyal
8cc2cad4d9 Use list of legal chars in URL from the WHATWG standard
Notably this excludes some ASCII chars: <>{}[]`|
See https://url.spec.whatwg.org/#url-code-points

Fixes #7095
2024-02-05 13:27:22 +05:30
Kovid Goyal
5f8e5b0a29 Merge branch 'dependabot/go_modules/all-go-deps-b84e69789e' of https://github.com/kovidgoyal/kitty 2024-02-05 09:03:02 +05:30
dependabot[bot]
4ede3a8a82 Bump the all-go-deps group with 1 update
Bumps the all-go-deps group with 1 update: [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil).


Updates `github.com/shirou/gopsutil/v3` from 3.23.12 to 3.24.1
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.23.12...v3.24.1)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 03:26:59 +00:00
Kovid Goyal
934217baf1 Merge branch 'fix/open-url-with-spl' of https://github.com/chuck-sys/kitty 2024-02-03 08:20:42 +05:30
Kovid Goyal
d0f3b34517 Fix typo in docs 2024-02-03 08:20:11 +05:30
Cheuk Yin Ng
cd8f0c1374 fix: open_url_with docs spelling 2024-02-02 12:02:08 -08:00
Kovid Goyal
9b8ee54034 better example of conditional key mapping 2024-01-29 21:58:10 +05:30
Kovid Goyal
d730c189db Merge branch 'dependabot/go_modules/all-go-deps-676548f652' of https://github.com/kovidgoyal/kitty 2024-01-29 08:40:38 +05:30
dependabot[bot]
3fc1e6911a Bump the all-go-deps group with 1 update
Bumps the all-go-deps group with 1 update: [github.com/google/uuid](https://github.com/google/uuid).


Updates `github.com/google/uuid` from 1.5.0 to 1.6.0
- [Release notes](https://github.com/google/uuid/releases)
- [Changelog](https://github.com/google/uuid/blob/master/CHANGELOG.md)
- [Commits](https://github.com/google/uuid/compare/v1.5.0...v1.6.0)

---
updated-dependencies:
- dependency-name: github.com/google/uuid
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: all-go-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-29 03:06:38 +00:00
Kovid Goyal
e8d9ca4465 Graphics protocol: Improve display of images using unicode placeholders or row/column boxes by resizing them using linear instead of nearest neighbor interpolation on the GPU
Fixes #7070
2024-01-28 08:05:02 +05:30
Kovid Goyal
8c12086beb Dont store query images in disk cache and dont send their data to GPU 2024-01-27 13:41:59 +05:30
Kovid Goyal
5a2ee2f9a3 macOS: Fix kitten @ select-window leaving the keyboard in a partially functional state
Fixes #7074
2024-01-27 12:53:58 +05:30
Kovid Goyal
cafd5a7471 @ send-text --bracketed-paste
Allow automatically wrapping sent text in bracketed paste if the
program running in the destination window has turned on bracketed paste
mode.
2024-01-26 20:51:21 +05:30
Kovid Goyal
4c46d2bc95 ... 2024-01-26 20:07:55 +05:30
Kovid Goyal
c95fc3689b A single multi-key mapping should not prematurely complete as that confuses people trying out the feature
See #7073
2024-01-26 20:04:33 +05:30
Kovid Goyal
8c50632a10 Fix a single key mapping not overriding a previously defined multi-key mapping 2024-01-26 18:02:25 +05:30
Kovid Goyal
ae1bf69a3d Fix date in changelog 2024-01-26 17:24:45 +05:30
Kovid Goyal
08d88af2fb version 0.32.1 2024-01-26 08:33:51 +05:30
Kovid Goyal
4dfbcb539f Add basic tests for modal mappings 2024-01-25 14:42:27 +05:30
Kovid Goyal
cc0d6621a4 Also document how to set user vars from nvim 2024-01-25 14:27:55 +05:30
Kovid Goyal
d6e55f72c0 Forgot to stub out one method for the test 2024-01-25 14:18:09 +05:30
Kovid Goyal
cd30de3727 Fix #7055 2024-01-25 14:06:52 +05:30
Kovid Goyal
cec427777c Add some tests for mappings 2024-01-25 13:56:42 +05:30
Kovid Goyal
30e3ad83bc Move mapping code into its own class
Better encapsulation. Makes boss.py smaller. Allows writing tests
for mapping logic
2024-01-25 11:51:43 +05:30
Kovid Goyal
9ef6801f4c A single key shortcut should override all previous multi-key shortcuts that have that shortcut as a prefix
Fixes #7058
2024-01-25 11:24:40 +05:30
Kovid Goyal
7f1c371b6e DRYer 2024-01-25 09:00:46 +05:30
Kovid Goyal
2f7b0d1d94 Dont show multiple keys bindings in debug output when their focus on conditions are the same 2024-01-25 08:08:52 +05:30
Kovid Goyal
90e1ba7781 Fix #7051 2024-01-24 18:56:04 +05:30
Kovid Goyal
0dfe89a817 ... 2024-01-23 18:42:28 +05:30
Kovid Goyal
c76f75a154 Fix a regression in the previous release that caused overriding of existing multi-key mappings to fail
Fixes #7044
2024-01-23 15:49:30 +05:30
Kovid Goyal
f51520eb79 Clarify behavior of image id==!0 and placement id == 0
See https://github.com/kovidgoyal/kitty/discussions/7043
2024-01-23 08:41:23 +05:30
Kovid Goyal
828f4f312a Wayland+NVIDIA: Do not request an sRGB output buffer as a bug in Wayland causes kitty to not start
Fixes #7021
2024-01-22 13:22:04 +05:30
Kovid Goyal
a9c7a85d9a Clarify the behavior of functional keys with no legacy encoding
See https://github.com/kovidgoyal/kitty/discussions/7037
2024-01-22 08:35:54 +05:30
Kovid Goyal
38393b50c1 Show how to send SIGUSR1 to kitty 2024-01-22 07:36:57 +05:30
Kovid Goyal
7b6c532ac2 ... 2024-01-21 15:34:06 +05:30
Kovid Goyal
b3e74de390 More work on pager kitten 2024-01-21 14:47:56 +05:30
Kovid Goyal
1aa4d7d24b When displaying scrollback fallback to less if the user configures a pager that is not in PATH 2024-01-21 09:22:02 +05:30
Kovid Goyal
a3e324d623 When testing for cf-protection support take env into account 2024-01-21 08:42:55 +05:30
Kovid Goyal
d6116f7426 Fix #7026 2024-01-21 08:33:59 +05:30
Kovid Goyal
ab9631f045 Better fix 2024-01-21 08:27:16 +05:30
Kovid Goyal
ec0a449c63 Fix a regression in the previous release that caused kitten @ send-text with a match parameter to send text twice to the active window
Fixes #7027
2024-01-21 08:24:22 +05:30
Kovid Goyal
01ffbfdb42 Fix a regression in the previous release that caused kitten @ launch --cwd=current to fail over SSH
Fixes #7028
2024-01-21 08:06:44 +05:30
Kovid Goyal
f5621bd56c Merge branch 'dmenu-term' of https://github.com/weakish/kitty 2024-01-20 19:15:50 +05:30
weakish
708750173e Remove dmenu-term in docs
The dmenu-term link returns 404 now.
2024-01-20 08:39:05 +00:00
Kovid Goyal
20e43a3e7d Fix a regression in the previous release that caused multi-key sequences to not abort when pressing an unknown key
Fixes #7022
2024-01-20 08:13:12 +05:30
Kovid Goyal
ff4ee95eba ... 2024-01-20 06:49:49 +05:30
Kovid Goyal
2707c44f0f DRYer 2024-01-19 21:48:40 +05:30
Kovid Goyal
e7e401c8dd More work on pager kitten 2024-01-19 21:16:09 +05:30
Kovid Goyal
b0ab5bd5eb ... 2024-01-19 20:50:11 +05:30
Kovid Goyal
d75395794d ... 2024-01-19 20:46:20 +05:30
Kovid Goyal
c7d894d499 Merge branch 'fix-go-version-check' of https://github.com/Maytha8/kitty 2024-01-19 20:22:42 +05:30
Maytham Alsudany
30905db75f Explicit GO111MODULE=on when getting required Go version 2024-01-19 22:46:48 +08:00
Kovid Goyal
89c3b4f9e2 macOS: Fix a regression in the previous release that broke overriding keyboard shortcuts for actions present in the global menu bar
Fixes #7016
2024-01-19 19:44:04 +05:30
Kovid Goyal
0bd50abd77 Start work on pager kitten 2024-01-19 15:09:20 +05:30
Kovid Goyal
7038292d11 Merge branch 'master' of https://github.com/solopasha/kitty 2024-01-19 14:03:30 +05:30
Kovid Goyal
b33f8416db Fix for spurious github code scanning alert 2024-01-19 14:01:26 +05:30
Pavel Solovev
99b3d0727d Fix build with gcc14 2024-01-19 11:25:53 +03:00
Kovid Goyal
9503725a32 Fix #7013 2024-01-19 13:29:12 +05:30
325 changed files with 20152 additions and 5064 deletions

6
.gitattributes vendored
View File

@@ -16,11 +16,17 @@ kittens/diff/options/types.py linguist-generated=true
kittens/diff/options/parse.py linguist-generated=true
glfw/*.c linguist-vendored=true
glfw/*.h linguist-vendored=true
3rdparty/** linguist-vendored=true
kittens/unicode_input/names.h linguist-generated=true
tools/wcswidth/std.go linguist-generated=true
tools/unicode_names/names.txt linguist-generated=true
terminfo/kitty.term* linguist-generated=true
terminfo/x/* linguist-generated=true
*_generated.h linguist-generated=true
*_generated.go linguist-generated=true
*_generated_test.go linguist-generated=true
*_generated_test.s linguist-generated=true
*_generated.s linguist-generated=true
*.py text diff=python
*.m text diff=objc

View File

@@ -2,6 +2,7 @@
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2020, Kovid Goyal <kovid at kovidgoyal.net>
import glob
import io
import os
import shlex
@@ -9,6 +10,7 @@ import shutil
import subprocess
import sys
import tarfile
import time
from urllib.request import urlopen
BUNDLE_URL = 'https://download.calibre-ebook.com/ci/kitty/{}-64.tar.xz'
@@ -17,14 +19,44 @@ is_macos = 'darwin' in sys.platform.lower()
SW = None
def run(*a):
def do_print_crash_reports():
print('Printing available crash reports...')
if is_macos:
end_time = time.monotonic() + 90
while time.monotonic() < end_time:
time.sleep(1)
items = glob.glob(os.path.join(os.path.expanduser('~/Library/Logs/DiagnosticReports'), 'kitty-*.ips'))
if items:
break
if items:
time.sleep(1)
print(os.path.basename(items[0]))
sdir = os.path.dirname(os.path.abspath(__file__))
subprocess.check_call([sys.executable, os.path.join(sdir, 'macos_crash_report.py'), items[0]])
else:
run('sh -c "echo bt | coredumpctl debug"')
print(flush=True)
def run(*a, print_crash_reports=False):
if len(a) == 1:
a = shlex.split(a[0])
print(' '.join(map(shlex.quote, a)))
cmd = ' '.join(map(shlex.quote, a))
print(cmd)
sys.stdout.flush()
ret = subprocess.Popen(a).wait()
if ret != 0:
raise SystemExit(ret)
if ret < 0:
import signal
try:
sig = signal.Signals(-ret)
except ValueError:
pass
else:
if print_crash_reports:
do_print_crash_reports()
raise SystemExit(f'The following process was killed by signal: {sig.name}:\n{cmd}')
raise SystemExit(f'The following process failed with exit code: {ret}:\n{cmd}')
def install_deps():
@@ -37,13 +69,13 @@ def install_deps():
import ssl
if ssl.OPENSSL_VERSION_INFO[0] == 1:
openssl += '@1.1'
run('brew', 'install', 'fish', openssl, *items)
run('brew', 'install', 'fish', 'simde', openssl, *items)
else:
run('sudo apt-get update')
run('sudo apt-get install -y libgl1-mesa-dev libxi-dev libxrandr-dev libxinerama-dev ca-certificates'
' libxcursor-dev libxcb-xkb-dev libdbus-1-dev libxkbcommon-dev libharfbuzz-dev libx11-xcb-dev zsh'
' libpng-dev liblcms2-dev libfontconfig-dev libxkbcommon-x11-dev libcanberra-dev libxxhash-dev uuid-dev'
' zsh bash dash')
' libsimde-dev zsh bash dash systemd-coredump gdb')
# for some reason these directories are world writable which causes zsh
# compinit to break
run('sudo chmod -R og-w /usr/share/zsh')
@@ -59,13 +91,18 @@ def install_deps():
def build_kitty():
python = shutil.which('python3') if is_bundle else sys.executable
cmd = f'{python} setup.py build --verbose'
if is_macos:
cmd += ' --debug' # for better crash report to debug SIGILL issue
if os.environ.get('KITTY_SANITIZE') == '1':
cmd += ' --debug --sanitize'
run(cmd)
def test_kitty():
run('./test.py')
if is_macos:
run('ulimit -c unlimited')
run('sudo chmod -R 777 /cores')
run('./test.py', print_crash_reports=True)
def package_kitty():

View File

@@ -2,7 +2,7 @@ name: CI
on: [push, pull_request]
env:
CI: 'true'
ASAN_OPTIONS: leak_check_at_exit=0
ASAN_OPTIONS: detect_leaks=0
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

450
.github/workflows/macos_crash_report.py vendored Executable file
View File

@@ -0,0 +1,450 @@
#!/usr/bin/env python
# License: GPLv3 Copyright: 2024, Kovid Goyal <kovid at kovidgoyal.net>
import json
import posixpath
import sys
from collections import namedtuple
from datetime import datetime
from enum import Enum
from functools import cached_property
from typing import IO, List, Mapping, Optional
Frame = namedtuple('Frame', 'image_name image_base image_offset symbol symbol_offset')
Register = namedtuple('Register', 'name value')
def surround(x: str, start: int, end: int) -> str:
if sys.stdout.isatty():
x = f'\033[{start}m{x}\033[{end}m'
return x
def cyan(x: str) -> str:
return surround(x, 96, 39)
def bold(x: str) -> str:
return surround(x, 1, 22)
class BugType(Enum):
WatchdogTimeout = '28'
BasebandStats = '195'
GPUEvent = '284'
Sandbox = '187'
TerminatingStackshot = '509'
ServiceWatchdogTimeout = '29'
Session = '179'
LegacyStackshot = '188'
MACorrelation = '197'
iMessages = '189'
log_power = '278'
PowerLog = 'powerlog'
DuetKnowledgeCollector2 = '58'
BridgeRestore = '83'
LegacyJetsam = '198'
ExcResource_385 = '385'
Modem = '199'
Stackshot = '288'
SystemInformation = 'system_profile'
Jetsam_298 = '298'
MemoryResource = '30'
Bridge = '31'
DifferentialPrivacy = 'diff_privacy'
FirmwareIntegrity = '32'
CoreAnalytics_33 = '33'
AutoBugCapture = '34'
EfiFirmwareIntegrity = '35'
SystemStats = '36'
AnonSystemStats = '37'
Crash_9 = '9'
Jetsam_98 = '98'
LDCM = '100'
Panic_10 = '10'
Spin = '11'
CLTM = '101'
Hang = '12'
Panic_110 = '110'
ConnectionFailure = '13'
MessageTracer = '14'
LowBattery = '120'
Siri = '201'
ShutdownStall = '17'
Panic_210 = '210'
SymptomsCPUUsage = '202'
AssumptionViolation = '18'
CoreHandwriting = 'chw'
IOMicroStackShot = '44'
CoreAnalytics_211 = '211'
SiriAppPrediction = '203'
spin_45 = '45'
PowerMicroStackshots = '220'
BTMetadata = '212'
SystemMemoryReset = '301'
ResetCount = '115'
AutoBugCapture_204 = '204'
WifiCrashBinary = '221'
MicroRunloopHang = '310'
Rosetta = '213'
glitchyspin = '302'
System = '116'
IOPowerSources = '141'
PanicStats = '205'
PowerLog_230 = '230'
LongRunloopHang = '222'
HomeProductsAnalytics = '311'
DifferentialPrivacy_150 = '150'
Rhodes = '214'
ProactiveEventTrackerTransparency = '303'
WiFi = '117'
SymptomsCPUWakes = '142'
SymptomsCPUUsageFatal = '206'
Crash_109 = '109'
ShortRunloopHang = '223'
CoreHandwriting_231 = '231'
ForceReset = '151'
SiriAppSelection = '215'
PrivateFederatedLearning = '304'
Bluetooth = '118'
SCPMotion = '143'
HangSpin = '207'
StepCount = '160'
RTCTransparency = '224'
DiagnosticRequest = '312'
MemorySnapshot = '152'
Rosetta_B = '216'
AudioAccessory = '305'
General = '119'
HotSpotIOMicroSS = '144'
GeoServicesTransparency = '233'
MotionState = '161'
AppStoreTransparency = '225'
SiriSearchFeedback = '313'
BearTrapReserved = '153'
Portrait = '217'
AWDMetricLog = 'metriclog'
SymptomsIO = '145'
SubmissionReserved = '170'
WifiCrash = '209'
Natalies = '162'
SecurityTransparency = '226'
BiomeMapReduce = '234'
MemoryGraph = '154'
MultichannelAudio = '218'
honeybee_payload = '146'
MesaReserved = '171'
WifiSensing = '235'
SiriMiss = '163'
ExcResourceThreads_227 = '227'
TestA = 'T01'
NetworkUsage = '155'
WifiReserved = '180'
SiriActionPrediction = '219'
honeybee_heartbeat = '147'
ECCEvent = '172'
KeyTransparency = '236'
SubDiagHeartBeat = '164'
ThirdPartyHang = '228'
OSFault = '308'
CoreTime = '156'
WifiDriverReserved = '181'
Crash_309 = '309'
honeybee_issue = '148'
CellularPerfReserved = '173'
TestB = 'T02'
StorageStatus = '165'
SiriNotificationTransparency = '229'
TestC = 'T03'
CPUMicroSS = '157'
AccessoryUpdate = '182'
xprotect = '20'
MultitouchFirmware = '149'
MicroStackshot = '174'
AppLaunchDiagnostics = '238'
KeyboardAccuracy = '166'
GPURestart = '21'
FaceTime = '191'
DuetKnowledgeCollector = '158'
OTASUpdate = '183'
ExcResourceThreads_327 = '327'
ExcResource_22 = '22'
DuetDB = '175'
ThirdPartyHangDeveloper = '328'
PrivacySettings = '167'
GasGauge = '192'
MicroStackShots = '23'
BasebandCrash = '159'
GPURestart_184 = '184'
SystemWatchdogCrash = '409'
FlashStatus = '176'
SleepWakeFailure = '24'
CarouselEvent = '168'
AggregateD = '193'
WakeupsMonitorViolation = '25'
DifferentialPrivacy_50 = '50'
ExcResource_185 = '185'
UIAutomation = '177'
ping = '26'
SiriTransaction = '169'
SURestore = '194'
KtraceStackshot = '186'
WirelessDiagnostics = '27'
PowerLogLite = '178'
SKAdNetworkAnalytics = '237'
HangWorkflowResponsiveness = '239'
CompositorClientHang = '243'
class CrashReportBase:
def __init__(self, metadata: Mapping, data: str, filename: str = None):
self.filename = filename
self._metadata = metadata
self._data = data
self._parse()
def _parse(self):
self._is_json = False
try:
modified_data = self._data
if '\n \n' in modified_data:
modified_data, rest = modified_data.split('\n \n', 1)
rest = '",' + rest.split('",', 1)[1]
modified_data += rest
self._data = json.loads(modified_data)
self._is_json = True
except json.decoder.JSONDecodeError:
pass
@cached_property
def bug_type(self) -> BugType:
return BugType(self.bug_type_str)
@cached_property
def bug_type_str(self) -> str:
return self._metadata['bug_type']
@cached_property
def incident_id(self):
return self._metadata.get('incident_id')
@cached_property
def timestamp(self) -> datetime:
timestamp = self._metadata.get('timestamp')
timestamp_without_timezone = timestamp.rsplit(' ', 1)[0]
return datetime.strptime(timestamp_without_timezone, '%Y-%m-%d %H:%M:%S.%f')
@cached_property
def name(self) -> str:
return self._metadata.get('name')
def __repr__(self) -> str:
filename = ''
if self.filename:
filename = f'FILENAME:{posixpath.basename(self.filename)} '
return f'<{self.__class__} {filename}TIMESTAMP:{self.timestamp}>'
def __str__(self) -> str:
filename = ''
if self.filename:
filename = self.filename
return cyan(f'{self.incident_id} {self.timestamp}\n{filename}\n\n')
class UserModeCrashReport(CrashReportBase):
def _parse_field(self, name: str) -> str:
name += ':'
for line in self._data.split('\n'):
if line.startswith(name):
field = line.split(name, 1)[1]
field = field.strip()
return field
@cached_property
def faulting_thread(self) -> int:
if self._is_json:
return self._data['faultingThread']
else:
return int(self._parse_field('Triggered by Thread'))
@cached_property
def frames(self) -> List[Frame]:
result = []
if self._is_json:
thread_index = self.faulting_thread
images = self._data['usedImages']
for frame in self._data['threads'][thread_index]['frames']:
image = images[frame['imageIndex']]
result.append(
Frame(image_name=image.get('path'), image_base=image.get('base'), symbol=frame.get('symbol'),
image_offset=frame.get('imageOffset'), symbol_offset=frame.get('symbolLocation')))
else:
in_frames = False
for line in self._data.split('\n'):
if in_frames:
splitted = line.split()
if len(splitted) == 0:
break
assert splitted[-2] == '+'
image_base = splitted[-3]
if image_base.startswith('0x'):
result.append(Frame(image_name=splitted[1], image_base=int(image_base, 16), symbol=None,
image_offset=int(splitted[-1]), symbol_offset=None))
else:
# symbolicated
result.append(Frame(image_name=splitted[1], image_base=None, symbol=image_base,
image_offset=None, symbol_offset=int(splitted[-1])))
if line.startswith(f'Thread {self.faulting_thread} Crashed:'):
in_frames = True
return result
@cached_property
def registers(self) -> List[Register]:
result = []
if self._is_json:
thread_index = self._data['faultingThread']
thread_state = self._data['threads'][thread_index]['threadState']
if 'x' in thread_state:
for i, reg_x in enumerate(thread_state['x']):
result.append(Register(name=f'x{i}', value=reg_x['value']))
for i, (name, value) in enumerate(thread_state.items()):
if name == 'x':
for j, reg_x in enumerate(value):
result.append(Register(name=f'x{j}', value=reg_x['value']))
else:
if isinstance(value, dict):
result.append(Register(name=name, value=value['value']))
else:
in_frames = False
for line in self._data.split('\n'):
if in_frames:
splitted = line.split()
if len(splitted) == 0:
break
for i in range(0, len(splitted), 2):
register_name = splitted[i]
if not register_name.endswith(':'):
break
register_name = register_name[:-1]
register_value = int(splitted[i + 1], 16)
result.append(Register(name=register_name, value=register_value))
if line.startswith(f'Thread {self.faulting_thread} crashed with ARM Thread State'):
in_frames = True
return result
@cached_property
def exception_type(self):
if self._is_json:
return self._data['exception'].get('type')
else:
return self._parse_field('Exception Type')
@cached_property
def exception_subtype(self) -> Optional[str]:
if self._is_json:
return self._data['exception'].get('subtype')
else:
return self._parse_field('Exception Subtype')
@cached_property
def application_specific_information(self) -> Optional[str]:
result = ''
if self._is_json:
asi = self._data.get('asi')
if asi is None:
return None
return asi
else:
in_frames = False
for line in self._data.split('\n'):
if in_frames:
line = line.strip()
if len(line) == 0:
break
result += line + '\n'
if line.startswith('Application Specific Information:'):
in_frames = True
result = result.strip()
if not result:
return None
return result
def __str__(self) -> str:
result = super().__str__()
result += bold(f'Exception: {self.exception_type}\n')
if self.exception_subtype:
result += bold('Exception Subtype: ')
result += f'{self.exception_subtype}\n'
if self.application_specific_information:
result += bold('Application Specific Information: ')
result += str(self.application_specific_information)
result += '\n'
result += bold('Registers:')
for i, register in enumerate(self.registers):
if i % 4 == 0:
result += '\n'
result += f'{register.name} = 0x{register.value:016x} '.rjust(30)
result += '\n\n'
result += bold('Frames:\n')
for frame in self.frames:
image_base = '_HEADER'
if frame.image_base is not None:
image_base = f'0x{frame.image_base:x}'
result += f'\t[{frame.image_name}] {image_base}'
if frame.image_offset:
result += f' + 0x{frame.image_offset:x}'
if frame.symbol is not None:
result += f' ({frame.symbol} + 0x{frame.symbol_offset:x})'
result += '\n'
return result
def get_crash_report_from_file(crash_report_file: IO) -> CrashReportBase:
metadata = json.loads(crash_report_file.readline())
try:
bug_type = BugType(metadata['bug_type'])
except ValueError:
return CrashReportBase(metadata, crash_report_file.read(), crash_report_file.name)
bug_type_parsers = {
BugType.Crash_109: UserModeCrashReport,
BugType.Crash_309: UserModeCrashReport,
BugType.ExcResourceThreads_327: UserModeCrashReport,
BugType.ExcResource_385: UserModeCrashReport,
}
parser = bug_type_parsers.get(bug_type)
if parser is None:
return CrashReportBase(metadata, crash_report_file.read(), crash_report_file.name)
return parser(metadata, crash_report_file.read(), crash_report_file.name)
if __name__ == '__main__':
with open(sys.argv[-1]) as f:
print(get_crash_report_from_file(f))

4
.gitignore vendored
View File

@@ -4,6 +4,9 @@
*.bin
*_stub.pyi
*_generated.go
*_generated.s
*_generated_test.go
*_generated_test.s
*_generated.h
/.dmypy.json
/dependencies
@@ -18,6 +21,7 @@ __pycache__/
/glfw/wayland-*-client-protocol.[ch]
/docs/_build/
/docs/generated/
/tools/simdstring/simdstring.test
/.mypy_cache
/.ruff_cache
.DS_Store

28
3rdparty/base64/LICENSE vendored Normal file
View File

@@ -0,0 +1,28 @@
Copyright (c) 2005-2007, Nick Galbreath
Copyright (c) 2015-2018, Wojciech Muła
Copyright (c) 2016-2017, Matthieu Darbois
Copyright (c) 2013-2022, Alfred Klomp
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

491
3rdparty/base64/README.md vendored Normal file
View File

@@ -0,0 +1,491 @@
# Fast Base64 stream encoder/decoder
[![Build Status](https://github.com/aklomp/base64/actions/workflows/test.yml/badge.svg)](https://github.com/aklomp/base64/actions/workflows/test.yml)
This is an implementation of a base64 stream encoding/decoding library in C99
with SIMD (AVX2, AVX512, NEON, AArch64/NEON, SSSE3, SSE4.1, SSE4.2, AVX) and
[OpenMP](http://www.openmp.org) acceleration. It also contains wrapper functions
to encode/decode simple length-delimited strings. This library aims to be:
- FAST;
- easy to use;
- elegant.
On x86, the library does runtime feature detection. The first time it's called,
the library will determine the appropriate encoding/decoding routines for the
machine. It then remembers them for the lifetime of the program. If your
processor supports AVX2, SSSE3, SSE4.1, SSE4.2 or AVX instructions, the library
will pick an optimized codec that lets it encode/decode 12 or 24 bytes at a
time, which gives a speedup of four or more times compared to the "plain"
bytewise codec.
AVX512 support is only for encoding at present, utilizing the AVX512 VL and VBMI
instructions. Decoding part reused AVX2 implementations. For CPUs later than
Cannonlake (manufactured in 2018) supports these instructions.
NEON support is hardcoded to on or off at compile time, because portable
runtime feature detection is unavailable on ARM.
Even if your processor does not support SIMD instructions, this is a very fast
library. The fallback routine can process 32 or 64 bits of input in one round,
depending on your processor's word width, which still makes it significantly
faster than naive bytewise implementations. On some 64-bit machines, the 64-bit
routines even outperform the SSSE3 ones.
To the author's knowledge, at the time of original release, this was the only
Base64 library to offer SIMD acceleration. The author wrote
[an article](http://www.alfredklomp.com/programming/sse-base64) explaining one
possible SIMD approach to encoding/decoding Base64. The article can help figure
out what the code is doing, and why.
Notable features:
- Really fast on x86 and ARM systems by using SIMD vector processing;
- Can use [OpenMP](http://www.openmp.org) for even more parallel speedups;
- Really fast on other 32 or 64-bit platforms through optimized routines;
- Reads/writes blocks of streaming data;
- Does not dynamically allocate memory;
- Valid C99 that compiles with pedantic options on;
- Re-entrant and threadsafe;
- Unit tested;
- Uses Duff's Device.
## Acknowledgements
The original AVX2, NEON and Aarch64/NEON codecs were generously contributed by
[Inkymail](https://github.com/inkymail/base64), who, in their fork, also
implemented some additional features. Their work is slowly being backported
into this project.
The SSSE3 and AVX2 codecs were substantially improved by using some very clever
optimizations described by Wojciech Muła in a
[series](http://0x80.pl/notesen/2016-01-12-sse-base64-encoding.html) of
[articles](http://0x80.pl/notesen/2016-01-17-sse-base64-decoding.html).
His own code is [here](https://github.com/WojciechMula/toys/tree/master/base64).
The AVX512 encoder is based on code from Wojciech Muła's
[base64simd](https://github.com/WojciechMula/base64simd) library.
The OpenMP implementation was added by Ferry Toth (@htot) from [Exalon Delft](http://www.exalondelft.nl).
## Building
The `lib` directory contains the code for the actual library.
Typing `make` in the toplevel directory will build `lib/libbase64.o` and `bin/base64`.
The first is a single, self-contained object file that you can link into your own project.
The second is a standalone test binary that works similarly to the `base64` system utility.
The matching header file needed to use this library is in `include/libbase64.h`.
To compile just the "plain" library without SIMD codecs, type:
```sh
make lib/libbase64.o
```
Optional SIMD codecs can be included by specifying the `AVX2_CFLAGS`, `AVX512_CFLAGS`,
`NEON32_CFLAGS`, `NEON64_CFLAGS`, `SSSE3_CFLAGS`, `SSE41_CFLAGS`, `SSE42_CFLAGS` and/or `AVX_CFLAGS` environment variables.
A typical build invocation on x86 looks like this:
```sh
AVX2_CFLAGS=-mavx2 SSSE3_CFLAGS=-mssse3 SSE41_CFLAGS=-msse4.1 SSE42_CFLAGS=-msse4.2 AVX_CFLAGS=-mavx make lib/libbase64.o
```
### AVX2
To build and include the AVX2 codec, set the `AVX2_CFLAGS` environment variable to a value that will turn on AVX2 support in your compiler, typically `-mavx2`.
Example:
```sh
AVX2_CFLAGS=-mavx2 make
```
### AVX512
To build and include the AVX512 codec, set the `AVX512_CFLAGS` environment variable to a value that will turn on AVX512 support in your compiler, typically `-mavx512vl -mavx512vbmi`.
Example:
```sh
AVX512_CFLAGS="-mavx512vl -mavx512vbmi" make
```
The codec will only be used if runtime feature detection shows that the target machine supports AVX2.
### SSSE3
To build and include the SSSE3 codec, set the `SSSE3_CFLAGS` environment variable to a value that will turn on SSSE3 support in your compiler, typically `-mssse3`.
Example:
```sh
SSSE3_CFLAGS=-mssse3 make
```
The codec will only be used if runtime feature detection shows that the target machine supports SSSE3.
### NEON
This library includes two NEON codecs: one for regular 32-bit ARM and one for the 64-bit AArch64 with NEON, which has double the amount of SIMD registers and can do full 64-byte table lookups.
These codecs encode in 48-byte chunks and decode in massive 64-byte chunks, so they had to be augmented with an uint32/64 codec to stay fast on smaller inputs!
Use LLVM/Clang for compiling the NEON codecs.
The code generation of at least GCC 4.6 (the version shipped with Raspbian and used for testing) contains a bug when compiling `vstq4_u8()`, and the generated assembly code is of low quality.
NEON intrinsics are a known weak area of GCC.
Clang does a better job.
NEON support can unfortunately not be portably detected at runtime from userland (the `mrc` instruction is privileged), so the default value for using the NEON codec is determined at compile-time.
But you can do your own runtime detection.
You can include the NEON codec and make it the default, then do a runtime check if the CPU has NEON support, and if not, force a downgrade to non-NEON with `BASE64_FORCE_PLAIN`.
These are your options:
1. Don't include NEON support;
2. build NEON support and make it the default, but build all other code without NEON flags so that you can override the default at runtime with `BASE64_FORCE_PLAIN`;
3. build everything with NEON support and make it the default;
4. build everything with NEON support, but don't make it the default (which makes no sense).
For option 1, simply don't specify any NEON-specific compiler flags at all, like so:
```sh
CC=clang CFLAGS="-march=armv6" make
```
For option 2, keep your `CFLAGS` plain, but set the `NEON32_CFLAGS` environment variable to a value that will build NEON support.
The line below, for instance, will build all the code at ARMv6 level, except for the NEON codec, which is built at ARMv7.
It will also make the NEON codec the default.
For ARMv6 platforms, override that default at runtime with the `BASE64_FORCE_PLAIN` flag.
No ARMv7/NEON code will then be touched.
```sh
CC=clang CFLAGS="-march=armv6" NEON32_CFLAGS="-march=armv7 -mfpu=neon" make
```
For option 3, put everything in your `CFLAGS` and use a stub, but non-empty, `NEON32_CFLAGS`.
This example works for the Raspberry Pi 2B V1.1, which has NEON support:
```sh
CC=clang CFLAGS="-march=armv7 -mtune=cortex-a7" NEON32_CFLAGS="-mfpu=neon" make
```
To build and include the NEON64 codec, use `CFLAGS` as usual to define the platform and set `NEON64_CFLAGS` to a nonempty stub.
(The AArch64 target has mandatory NEON64 support.)
Example:
```sh
CC=clang CFLAGS="--target=aarch64-linux-gnu -march=armv8-a" NEON64_CFLAGS=" " make
```
### OpenMP
To enable OpenMP on GCC you need to build with `-fopenmp`. This can be by setting the `OPENMP` environment variable to `1`.
Example:
```sh
OPENMP=1 make
```
This will let the compiler define `_OPENMP`, which in turn will include the OpenMP optimized `lib_openmp.c` into `lib.c`.
By default the number of parallel threads will be equal to the number of cores of the processor.
On a quad core with hyperthreading eight cores will be detected, but hyperthreading will not increase the performance.
To get verbose information about OpenMP start the program with `OMP_DISPLAY_ENV=VERBOSE`, for instance
```sh
OMP_DISPLAY_ENV=VERBOSE test/benchmark
```
To put a limit on the number of threads, start the program with `OMP_THREAD_LIMIT=n`, for instance
```sh
OMP_THREAD_LIMIT=2 test/benchmark
```
An example of running a benchmark with OpenMP, SSSE3 and AVX2 enabled:
```sh
make clean && OPENMP=1 SSSE3_CFLAGS=-mssse3 AVX2_CFLAGS=-mavx2 make && OPENMP=1 make -C test
```
## API reference
Strings are represented as a pointer and a length; they are not
zero-terminated. This was a conscious design decision. In the decoding step,
relying on zero-termination would make no sense since the output could contain
legitimate zero bytes. In the encoding step, returning the length saves the
overhead of calling `strlen()` on the output. If you insist on the trailing
zero, you can easily add it yourself at the given offset.
### Flags
Some API calls take a `flags` argument.
That argument can be used to force the use of a specific codec, even if that codec is a no-op in the current build.
Mainly there for testing purposes, this is also useful on ARM where the only way to do runtime NEON detection is to ask the OS if it's available.
The following constants can be used:
- `BASE64_FORCE_AVX2`
- `BASE64_FORCE_AVX512`
- `BASE64_FORCE_NEON32`
- `BASE64_FORCE_NEON64`
- `BASE64_FORCE_PLAIN`
- `BASE64_FORCE_SSSE3`
- `BASE64_FORCE_SSE41`
- `BASE64_FORCE_SSE42`
- `BASE64_FORCE_AVX`
Set `flags` to `0` for the default behavior, which is runtime feature detection on x86, a compile-time fixed codec on ARM, and the plain codec on other platforms.
### Encoding
#### base64_encode
```c
void base64_encode
( const char *src
, size_t srclen
, char *out
, size_t *outlen
, int flags
) ;
```
Wrapper function to encode a plain string of given length.
Output is written to `out` without trailing zero.
Output length in bytes is written to `outlen`.
The buffer in `out` has been allocated by the caller and is at least 4/3 the size of the input.
#### base64_stream_encode_init
```c
void base64_stream_encode_init
( struct base64_state *state
, int flags
) ;
```
Call this before calling `base64_stream_encode()` to init the state.
#### base64_stream_encode
```c
void base64_stream_encode
( struct base64_state *state
, const char *src
, size_t srclen
, char *out
, size_t *outlen
) ;
```
Encodes the block of data of given length at `src`, into the buffer at `out`.
Caller is responsible for allocating a large enough out-buffer; it must be at least 4/3 the size of the in-buffer, but take some margin.
Places the number of new bytes written into `outlen` (which is set to zero when the function starts).
Does not zero-terminate or finalize the output.
#### base64_stream_encode_final
```c
void base64_stream_encode_final
( struct base64_state *state
, char *out
, size_t *outlen
) ;
```
Finalizes the output begun by previous calls to `base64_stream_encode()`.
Adds the required end-of-stream markers if appropriate.
`outlen` is modified and will contain the number of new bytes written at `out` (which will quite often be zero).
### Decoding
#### base64_decode
```c
int base64_decode
( const char *src
, size_t srclen
, char *out
, size_t *outlen
, int flags
) ;
```
Wrapper function to decode a plain string of given length.
Output is written to `out` without trailing zero. Output length in bytes is written to `outlen`.
The buffer in `out` has been allocated by the caller and is at least 3/4 the size of the input.
Returns `1` for success, and `0` when a decode error has occured due to invalid input.
Returns `-1` if the chosen codec is not included in the current build.
#### base64_stream_decode_init
```c
void base64_stream_decode_init
( struct base64_state *state
, int flags
) ;
```
Call this before calling `base64_stream_decode()` to init the state.
#### base64_stream_decode
```c
int base64_stream_decode
( struct base64_state *state
, const char *src
, size_t srclen
, char *out
, size_t *outlen
) ;
```
Decodes the block of data of given length at `src`, into the buffer at `out`.
Caller is responsible for allocating a large enough out-buffer; it must be at least 3/4 the size of the in-buffer, but take some margin.
Places the number of new bytes written into `outlen` (which is set to zero when the function starts).
Does not zero-terminate the output.
Returns 1 if all is well, and 0 if a decoding error was found, such as an invalid character.
Returns -1 if the chosen codec is not included in the current build.
Used by the test harness to check whether a codec is available for testing.
## Examples
A simple example of encoding a static string to base64 and printing the output
to stdout:
```c
#include <stdio.h> /* fwrite */
#include "libbase64.h"
int main ()
{
char src[] = "hello world";
char out[20];
size_t srclen = sizeof(src) - 1;
size_t outlen;
base64_encode(src, srclen, out, &outlen, 0);
fwrite(out, outlen, 1, stdout);
return 0;
}
```
A simple example (no error checking, etc) of stream encoding standard input to
standard output:
```c
#include <stdio.h>
#include "libbase64.h"
int main ()
{
size_t nread, nout;
char buf[12000], out[16000];
struct base64_state state;
// Initialize stream encoder:
base64_stream_encode_init(&state, 0);
// Read contents of stdin into buffer:
while ((nread = fread(buf, 1, sizeof(buf), stdin)) > 0) {
// Encode buffer:
base64_stream_encode(&state, buf, nread, out, &nout);
// If there's output, print it to stdout:
if (nout) {
fwrite(out, nout, 1, stdout);
}
// If an error occurred, exit the loop:
if (feof(stdin)) {
break;
}
}
// Finalize encoding:
base64_stream_encode_final(&state, out, &nout);
// If the finalizing resulted in extra output bytes, print them:
if (nout) {
fwrite(out, nout, 1, stdout);
}
return 0;
}
```
Also see `bin/base64.c` for a simple re-implementation of the `base64` utility.
A file or standard input is fed through the encoder/decoder, and the output is
written to standard output.
## Tests
See `tests/` for a small test suite. Testing is automated with
[GitHub Actions](https://github.com/aklomp/base64/actions), which builds and
tests the code across various architectures.
## Benchmarks
Benchmarks can be run with the built-in benchmark program as follows:
```sh
make -C test benchmark <buildflags> && test/benchmark
```
It will run an encoding and decoding benchmark for all of the compiled-in codecs.
The tables below contain some results on random machines. All numbers measured with a 10MB buffer in MB/sec, rounded to the nearest integer.
\*: Update needed
x86 processors
| Processor | Plain enc | Plain dec | SSSE3 enc | SSSE3 dec | AVX enc | AVX dec | AVX2 enc | AVX2 dec |
|-------------------------------------------|----------:|----------:|----------:|----------:|--------:|--------:|---------:|---------:|
| i7-4771 @ 3.5 GHz | 833\* | 1111\* | 3333\* | 4444\* | TBD | TBD | 4999\* | 6666\* |
| i7-4770 @ 3.4 GHz DDR1600 | 1790\* | 3038\* | 4899\* | 4043\* | 4796\* | 5709\* | 4681\* | 6386\* |
| i7-4770 @ 3.4 GHz DDR1600 OPENMP 1 thread | 1784\* | 3041\* | 4945\* | 4035\* | 4776\* | 5719\* | 4661\* | 6294\* |
| i7-4770 @ 3.4 GHz DDR1600 OPENMP 2 thread | 3401\* | 5729\* | 5489\* | 7444\* | 5003\* | 8624\* | 5105\* | 8558\* |
| i7-4770 @ 3.4 GHz DDR1600 OPENMP 4 thread | 4884\* | 7099\* | 4917\* | 7057\* | 4799\* | 7143\* | 4902\* | 7219\* |
| i7-4770 @ 3.4 GHz DDR1600 OPENMP 8 thread | 5212\* | 8849\* | 5284\* | 9099\* | 5289\* | 9220\* | 4849\* | 9200\* |
| i7-4870HQ @ 2.5 GHz | 1471\* | 3066\* | 6721\* | 6962\* | 7015\* | 8267\* | 8328\* | 11576\* |
| i5-4590S @ 3.0 GHz | 3356 | 3197 | 4363 | 6104 | 4243\* | 6233 | 4160\* | 6344 |
| Xeon X5570 @ 2.93 GHz | 2161 | 1508 | 3160 | 3915 | - | - | - | - |
| Pentium4 @ 3.4 GHz | 896 | 740 | - | - | - | - | - | - |
| Atom N270 | 243 | 266 | 508 | 387 | - | - | - | - |
| AMD E-450 | 645 | 564 | 625 | 634 | - | - | - | - |
| Intel Edison @ 500 MHz | 79\* | 92\* | 152\* | 172\* | - | - | - | - |
| Intel Edison @ 500 MHz OPENMP 2 thread | 158\* | 184\* | 300\* | 343\* | - | - | - | - |
| Intel Edison @ 500 MHz (x86-64) | 162 | 119 | 209 | 164 | - | - | - | - |
| Intel Edison @ 500 MHz (x86-64) 2 thread | 319 | 237 | 412 | 329 | - | - | - | - |
ARM processors
| Processor | Plain enc | Plain dec | NEON32 enc | NEON32 dec | NEON64 enc | NEON64 dec |
|-------------------------------------------|----------:|----------:|-----------:|-----------:|-----------:|-----------:|
| Raspberry PI B+ V1.2 | 46\* | 40\* | - | - | - | - |
| Raspberry PI 2 B V1.1 | 85 | 141 | 300 | 225 | - | - |
| Apple iPhone SE armv7 | 1056\* | 895\* | 2943\* | 2618\* | - | - |
| Apple iPhone SE arm64 | 1061\* | 1239\* | - | - | 4098\* | 3983\* |
PowerPC processors
| Processor | Plain enc | Plain dec |
|-------------------------------------------|----------:|----------:|
| PowerPC E6500 @ 1.8GHz | 270\* | 265\* |
Benchmarks on i7-4770 @ 3.4 GHz DDR1600 with varrying buffer sizes:
![Benchmarks](base64-benchmarks.png)
Note: optimal buffer size to take advantage of the cache is in the range of 100 kB to 1 MB, leading to 12x faster AVX encoding/decoding compared to Plain, or a throughput of 24/27GB/sec.
Also note the performance degradation when the buffer size is less than 10 kB due to thread creation overhead.
To prevent this from happening `lib_openmp.c` defines `OMP_THRESHOLD 20000`, requiring at least a 20000 byte buffer to enable multithreading.
## License
This repository is licensed under the
[BSD 2-clause License](http://opensource.org/licenses/BSD-2-Clause). See the
LICENSE file.

0
3rdparty/base64/config.h vendored Normal file
View File

146
3rdparty/base64/include/libbase64.h vendored Normal file
View File

@@ -0,0 +1,146 @@
#ifndef LIBBASE64_H
#define LIBBASE64_H
#include <stddef.h> /* size_t */
#if defined(_WIN32) || defined(__CYGWIN__)
#define BASE64_SYMBOL_IMPORT __declspec(dllimport)
#define BASE64_SYMBOL_EXPORT __declspec(dllexport)
#define BASE64_SYMBOL_PRIVATE
#elif __GNUC__ >= 4
#define BASE64_SYMBOL_IMPORT __attribute__ ((visibility ("default")))
#define BASE64_SYMBOL_EXPORT __attribute__ ((visibility ("default")))
#define BASE64_SYMBOL_PRIVATE __attribute__ ((visibility ("hidden")))
#else
#define BASE64_SYMBOL_IMPORT
#define BASE64_SYMBOL_EXPORT
#define BASE64_SYMBOL_PRIVATE
#endif
#if defined(BASE64_STATIC_DEFINE)
#define BASE64_EXPORT
#define BASE64_NO_EXPORT
#else
#if defined(BASE64_EXPORTS) // defined if we are building the shared library
#define BASE64_EXPORT BASE64_SYMBOL_EXPORT
#else
#define BASE64_EXPORT BASE64_SYMBOL_IMPORT
#endif
#define BASE64_NO_EXPORT BASE64_SYMBOL_PRIVATE
#endif
#ifdef __cplusplus
extern "C" {
#endif
/* These are the flags that can be passed in the `flags` argument. The values
* below force the use of a given codec, even if that codec is a no-op in the
* current build. Used in testing. Set to 0 for the default behavior, which is
* runtime feature detection on x86, a compile-time fixed codec on ARM, and
* the plain codec on other platforms: */
#define BASE64_FORCE_AVX2 (1 << 0)
#define BASE64_FORCE_NEON32 (1 << 1)
#define BASE64_FORCE_NEON64 (1 << 2)
#define BASE64_FORCE_PLAIN (1 << 3)
#define BASE64_FORCE_SSSE3 (1 << 4)
#define BASE64_FORCE_SSE41 (1 << 5)
#define BASE64_FORCE_SSE42 (1 << 6)
#define BASE64_FORCE_AVX (1 << 7)
#define BASE64_FORCE_AVX512 (1 << 8)
struct base64_state {
int eof;
int bytes;
int flags;
unsigned char carry;
};
/* Wrapper function to encode a plain string of given length. Output is written
* to *out without trailing zero. Output length in bytes is written to *outlen.
* The buffer in `out` has been allocated by the caller and is at least 4/3 the
* size of the input. See above for `flags`; set to 0 for default operation: */
void BASE64_EXPORT base64_encode
( const char *src
, size_t srclen
, char *out
, size_t *outlen
, int flags
) ;
/* Call this before calling base64_stream_encode() to init the state. See above
* for `flags`; set to 0 for default operation: */
void BASE64_EXPORT base64_stream_encode_init
( struct base64_state *state
, int flags
) ;
/* Encodes the block of data of given length at `src`, into the buffer at
* `out`. Caller is responsible for allocating a large enough out-buffer; it
* must be at least 4/3 the size of the in-buffer, but take some margin. Places
* the number of new bytes written into `outlen` (which is set to zero when the
* function starts). Does not zero-terminate or finalize the output. */
void BASE64_EXPORT base64_stream_encode
( struct base64_state *state
, const char *src
, size_t srclen
, char *out
, size_t *outlen
) ;
/* Finalizes the output begun by previous calls to `base64_stream_encode()`.
* Adds the required end-of-stream markers if appropriate. `outlen` is modified
* and will contain the number of new bytes written at `out` (which will quite
* often be zero). */
void BASE64_EXPORT base64_stream_encode_final
( struct base64_state *state
, char *out
, size_t *outlen
) ;
/* Wrapper function to decode a plain string of given length. Output is written
* to *out without trailing zero. Output length in bytes is written to *outlen.
* The buffer in `out` has been allocated by the caller and is at least 3/4 the
* size of the input. See above for `flags`, set to 0 for default operation: */
int BASE64_EXPORT base64_decode
( const char *src
, size_t srclen
, char *out
, size_t *outlen
, int flags
) ;
/* Call this before calling base64_stream_decode() to init the state. See above
* for `flags`; set to 0 for default operation: */
void BASE64_EXPORT base64_stream_decode_init
( struct base64_state *state
, int flags
) ;
/* Decodes the block of data of given length at `src`, into the buffer at
* `out`. Caller is responsible for allocating a large enough out-buffer; it
* must be at least 3/4 the size of the in-buffer, but take some margin. Places
* the number of new bytes written into `outlen` (which is set to zero when the
* function starts). Does not zero-terminate the output. Returns 1 if all is
* well, and 0 if a decoding error was found, such as an invalid character.
* Returns -1 if the chosen codec is not included in the current build. Used by
* the test harness to check whether a codec is available for testing. */
int BASE64_EXPORT base64_stream_decode
( struct base64_state *state
, const char *src
, size_t srclen
, char *out
, size_t *outlen
) ;
#ifdef __cplusplus
}
#endif
#endif /* LIBBASE64_H */

66
3rdparty/base64/lib/arch/avx/codec.c vendored Normal file
View File

@@ -0,0 +1,66 @@
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#if HAVE_AVX
#include <immintrin.h>
// Only enable inline assembly on supported compilers and on 64-bit CPUs.
#ifndef BASE64_AVX_USE_ASM
# if (defined(__GNUC__) || defined(__clang__)) && BASE64_WORDSIZE == 64
# define BASE64_AVX_USE_ASM 1
# else
# define BASE64_AVX_USE_ASM 0
# endif
#endif
#include "../ssse3/dec_reshuffle.c"
#include "../ssse3/dec_loop.c"
#if BASE64_AVX_USE_ASM
# include "enc_loop_asm.c"
#else
# include "../ssse3/enc_translate.c"
# include "../ssse3/enc_reshuffle.c"
# include "../ssse3/enc_loop.c"
#endif
#endif // HAVE_AVX
BASE64_ENC_FUNCTION(avx)
{
#if HAVE_AVX
#include "../generic/enc_head.c"
// For supported compilers, use a hand-optimized inline assembly
// encoder. Otherwise fall back on the SSSE3 encoder, but compiled with
// AVX flags to generate better optimized AVX code.
#if BASE64_AVX_USE_ASM
enc_loop_avx(&s, &slen, &o, &olen);
#else
enc_loop_ssse3(&s, &slen, &o, &olen);
#endif
#include "../generic/enc_tail.c"
#else
BASE64_ENC_STUB
#endif
}
BASE64_DEC_FUNCTION(avx)
{
#if HAVE_AVX
#include "../generic/dec_head.c"
dec_loop_ssse3(&s, &slen, &o, &olen);
#include "../generic/dec_tail.c"
#else
BASE64_DEC_STUB
#endif
}

View File

@@ -0,0 +1,264 @@
// Apologies in advance for combining the preprocessor with inline assembly,
// two notoriously gnarly parts of C, but it was necessary to avoid a lot of
// code repetition. The preprocessor is used to template large sections of
// inline assembly that differ only in the registers used. If the code was
// written out by hand, it would become very large and hard to audit.
// Generate a block of inline assembly that loads register R0 from memory. The
// offset at which the register is loaded is set by the given round.
#define LOAD(R0, ROUND) \
"vlddqu ("#ROUND" * 12)(%[src]), %["R0"] \n\t"
// Generate a block of inline assembly that deinterleaves and shuffles register
// R0 using preloaded constants. Outputs in R0 and R1.
#define SHUF(R0, R1, R2) \
"vpshufb %[lut0], %["R0"], %["R1"] \n\t" \
"vpand %["R1"], %[msk0], %["R2"] \n\t" \
"vpand %["R1"], %[msk2], %["R1"] \n\t" \
"vpmulhuw %["R2"], %[msk1], %["R2"] \n\t" \
"vpmullw %["R1"], %[msk3], %["R1"] \n\t" \
"vpor %["R1"], %["R2"], %["R1"] \n\t"
// Generate a block of inline assembly that takes R0 and R1 and translates
// their contents to the base64 alphabet, using preloaded constants.
#define TRAN(R0, R1, R2) \
"vpsubusb %[n51], %["R1"], %["R0"] \n\t" \
"vpcmpgtb %[n25], %["R1"], %["R2"] \n\t" \
"vpsubb %["R2"], %["R0"], %["R0"] \n\t" \
"vpshufb %["R0"], %[lut1], %["R2"] \n\t" \
"vpaddb %["R1"], %["R2"], %["R0"] \n\t"
// Generate a block of inline assembly that stores the given register R0 at an
// offset set by the given round.
#define STOR(R0, ROUND) \
"vmovdqu %["R0"], ("#ROUND" * 16)(%[dst]) \n\t"
// Generate a block of inline assembly that generates a single self-contained
// encoder round: fetch the data, process it, and store the result. Then update
// the source and destination pointers.
#define ROUND() \
LOAD("a", 0) \
SHUF("a", "b", "c") \
TRAN("a", "b", "c") \
STOR("a", 0) \
"add $12, %[src] \n\t" \
"add $16, %[dst] \n\t"
// Define a macro that initiates a three-way interleaved encoding round by
// preloading registers a, b and c from memory.
// The register graph shows which registers are in use during each step, and
// is a visual aid for choosing registers for that step. Symbol index:
//
// + indicates that a register is loaded by that step.
// | indicates that a register is in use and must not be touched.
// - indicates that a register is decommissioned by that step.
// x indicates that a register is used as a temporary by that step.
// V indicates that a register is an input or output to the macro.
//
#define ROUND_3_INIT() /* a b c d e f */ \
LOAD("a", 0) /* + */ \
SHUF("a", "d", "e") /* | + x */ \
LOAD("b", 1) /* | + | */ \
TRAN("a", "d", "e") /* | | - x */ \
LOAD("c", 2) /* V V V */
// Define a macro that translates, shuffles and stores the input registers A, B
// and C, and preloads registers D, E and F for the next round.
// This macro can be arbitrarily daisy-chained by feeding output registers D, E
// and F back into the next round as input registers A, B and C. The macro
// carefully interleaves memory operations with data operations for optimal
// pipelined performance.
#define ROUND_3(ROUND, A,B,C,D,E,F) /* A B C D E F */ \
LOAD(D, (ROUND + 3)) /* V V V + */ \
SHUF(B, E, F) /* | | | | + x */ \
STOR(A, (ROUND + 0)) /* - | | | | */ \
TRAN(B, E, F) /* | | | - x */ \
LOAD(E, (ROUND + 4)) /* | | | + */ \
SHUF(C, A, F) /* + | | | | x */ \
STOR(B, (ROUND + 1)) /* | - | | | */ \
TRAN(C, A, F) /* - | | | x */ \
LOAD(F, (ROUND + 5)) /* | | | + */ \
SHUF(D, A, B) /* + x | | | | */ \
STOR(C, (ROUND + 2)) /* | - | | | */ \
TRAN(D, A, B) /* - x V V V */
// Define a macro that terminates a ROUND_3 macro by taking pre-loaded
// registers D, E and F, and translating, shuffling and storing them.
#define ROUND_3_END(ROUND, A,B,C,D,E,F) /* A B C D E F */ \
SHUF(E, A, B) /* + x V V V */ \
STOR(D, (ROUND + 3)) /* | - | | */ \
TRAN(E, A, B) /* - x | | */ \
SHUF(F, C, D) /* + x | | */ \
STOR(E, (ROUND + 4)) /* | - | */ \
TRAN(F, C, D) /* - x | */ \
STOR(F, (ROUND + 5)) /* - */
// Define a type A round. Inputs are a, b, and c, outputs are d, e, and f.
#define ROUND_3_A(ROUND) \
ROUND_3(ROUND, "a", "b", "c", "d", "e", "f")
// Define a type B round. Inputs and outputs are swapped with regard to type A.
#define ROUND_3_B(ROUND) \
ROUND_3(ROUND, "d", "e", "f", "a", "b", "c")
// Terminating macro for a type A round.
#define ROUND_3_A_LAST(ROUND) \
ROUND_3_A(ROUND) \
ROUND_3_END(ROUND, "a", "b", "c", "d", "e", "f")
// Terminating macro for a type B round.
#define ROUND_3_B_LAST(ROUND) \
ROUND_3_B(ROUND) \
ROUND_3_END(ROUND, "d", "e", "f", "a", "b", "c")
// Suppress clang's warning that the literal string in the asm statement is
// overlong (longer than the ISO-mandated minimum size of 4095 bytes for C99
// compilers). It may be true, but the goal here is not C99 portability.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Woverlength-strings"
static inline void
enc_loop_avx (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
// For a clearer explanation of the algorithm used by this function,
// please refer to the plain (not inline assembly) implementation. This
// function follows the same basic logic.
if (*slen < 16) {
return;
}
// Process blocks of 12 bytes at a time. Input is read in blocks of 16
// bytes, so "reserve" four bytes from the input buffer to ensure that
// we never read beyond the end of the input buffer.
size_t rounds = (*slen - 4) / 12;
*slen -= rounds * 12; // 12 bytes consumed per round
*olen += rounds * 16; // 16 bytes produced per round
// Number of times to go through the 36x loop.
size_t loops = rounds / 36;
// Number of rounds remaining after the 36x loop.
rounds %= 36;
// Lookup tables.
const __m128i lut0 = _mm_set_epi8(
10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1);
const __m128i lut1 = _mm_setr_epi8(
65, 71, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -19, -16, 0, 0);
// Temporary registers.
__m128i a, b, c, d, e, f;
__asm__ volatile (
// If there are 36 rounds or more, enter a 36x unrolled loop of
// interleaved encoding rounds. The rounds interleave memory
// operations (load/store) with data operations (table lookups,
// etc) to maximize pipeline throughput.
" test %[loops], %[loops] \n\t"
" jz 18f \n\t"
" jmp 36f \n\t"
" \n\t"
".balign 64 \n\t"
"36: " ROUND_3_INIT()
" " ROUND_3_A( 0)
" " ROUND_3_B( 3)
" " ROUND_3_A( 6)
" " ROUND_3_B( 9)
" " ROUND_3_A(12)
" " ROUND_3_B(15)
" " ROUND_3_A(18)
" " ROUND_3_B(21)
" " ROUND_3_A(24)
" " ROUND_3_B(27)
" " ROUND_3_A_LAST(30)
" add $(12 * 36), %[src] \n\t"
" add $(16 * 36), %[dst] \n\t"
" dec %[loops] \n\t"
" jnz 36b \n\t"
// Enter an 18x unrolled loop for rounds of 18 or more.
"18: cmp $18, %[rounds] \n\t"
" jl 9f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A(0)
" " ROUND_3_B(3)
" " ROUND_3_A(6)
" " ROUND_3_B(9)
" " ROUND_3_A_LAST(12)
" sub $18, %[rounds] \n\t"
" add $(12 * 18), %[src] \n\t"
" add $(16 * 18), %[dst] \n\t"
// Enter a 9x unrolled loop for rounds of 9 or more.
"9: cmp $9, %[rounds] \n\t"
" jl 6f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A(0)
" " ROUND_3_B_LAST(3)
" sub $9, %[rounds] \n\t"
" add $(12 * 9), %[src] \n\t"
" add $(16 * 9), %[dst] \n\t"
// Enter a 6x unrolled loop for rounds of 6 or more.
"6: cmp $6, %[rounds] \n\t"
" jl 55f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A_LAST(0)
" sub $6, %[rounds] \n\t"
" add $(12 * 6), %[src] \n\t"
" add $(16 * 6), %[dst] \n\t"
// Dispatch the remaining rounds 0..5.
"55: cmp $3, %[rounds] \n\t"
" jg 45f \n\t"
" je 3f \n\t"
" cmp $1, %[rounds] \n\t"
" jg 2f \n\t"
" je 1f \n\t"
" jmp 0f \n\t"
"45: cmp $4, %[rounds] \n\t"
" je 4f \n\t"
// Block of non-interlaced encoding rounds, which can each
// individually be jumped to. Rounds fall through to the next.
"5: " ROUND()
"4: " ROUND()
"3: " ROUND()
"2: " ROUND()
"1: " ROUND()
"0: \n\t"
// Outputs (modified).
: [rounds] "+r" (rounds),
[loops] "+r" (loops),
[src] "+r" (*s),
[dst] "+r" (*o),
[a] "=&x" (a),
[b] "=&x" (b),
[c] "=&x" (c),
[d] "=&x" (d),
[e] "=&x" (e),
[f] "=&x" (f)
// Inputs (not modified).
: [lut0] "x" (lut0),
[lut1] "x" (lut1),
[msk0] "x" (_mm_set1_epi32(0x0FC0FC00)),
[msk1] "x" (_mm_set1_epi32(0x04000040)),
[msk2] "x" (_mm_set1_epi32(0x003F03F0)),
[msk3] "x" (_mm_set1_epi32(0x01000010)),
[n51] "x" (_mm_set1_epi8(51)),
[n25] "x" (_mm_set1_epi8(25))
// Clobbers.
: "cc", "memory"
);
}
#pragma GCC diagnostic pop

56
3rdparty/base64/lib/arch/avx2/codec.c vendored Normal file
View File

@@ -0,0 +1,56 @@
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#if HAVE_AVX2
#include <immintrin.h>
// Only enable inline assembly on supported compilers and on 64-bit CPUs.
#ifndef BASE64_AVX2_USE_ASM
# if (defined(__GNUC__) || defined(__clang__)) && BASE64_WORDSIZE == 64
# define BASE64_AVX2_USE_ASM 1
# else
# define BASE64_AVX2_USE_ASM 0
# endif
#endif
#include "dec_reshuffle.c"
#include "dec_loop.c"
#if BASE64_AVX2_USE_ASM
# include "enc_loop_asm.c"
#else
# include "enc_translate.c"
# include "enc_reshuffle.c"
# include "enc_loop.c"
#endif
#endif // HAVE_AVX2
BASE64_ENC_FUNCTION(avx2)
{
#if HAVE_AVX2
#include "../generic/enc_head.c"
enc_loop_avx2(&s, &slen, &o, &olen);
#include "../generic/enc_tail.c"
#else
BASE64_ENC_STUB
#endif
}
BASE64_DEC_FUNCTION(avx2)
{
#if HAVE_AVX2
#include "../generic/dec_head.c"
dec_loop_avx2(&s, &slen, &o, &olen);
#include "../generic/dec_tail.c"
#else
BASE64_DEC_STUB
#endif
}

110
3rdparty/base64/lib/arch/avx2/dec_loop.c vendored Normal file
View File

@@ -0,0 +1,110 @@
static inline int
dec_loop_avx2_inner (const uint8_t **s, uint8_t **o, size_t *rounds)
{
const __m256i lut_lo = _mm256_setr_epi8(
0x15, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
0x11, 0x11, 0x13, 0x1A, 0x1B, 0x1B, 0x1B, 0x1A,
0x15, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
0x11, 0x11, 0x13, 0x1A, 0x1B, 0x1B, 0x1B, 0x1A);
const __m256i lut_hi = _mm256_setr_epi8(
0x10, 0x10, 0x01, 0x02, 0x04, 0x08, 0x04, 0x08,
0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
0x10, 0x10, 0x01, 0x02, 0x04, 0x08, 0x04, 0x08,
0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10);
const __m256i lut_roll = _mm256_setr_epi8(
0, 16, 19, 4, -65, -65, -71, -71,
0, 0, 0, 0, 0, 0, 0, 0,
0, 16, 19, 4, -65, -65, -71, -71,
0, 0, 0, 0, 0, 0, 0, 0);
const __m256i mask_2F = _mm256_set1_epi8(0x2F);
// Load input:
__m256i str = _mm256_loadu_si256((__m256i *) *s);
// See the SSSE3 decoder for an explanation of the algorithm.
const __m256i hi_nibbles = _mm256_and_si256(_mm256_srli_epi32(str, 4), mask_2F);
const __m256i lo_nibbles = _mm256_and_si256(str, mask_2F);
const __m256i hi = _mm256_shuffle_epi8(lut_hi, hi_nibbles);
const __m256i lo = _mm256_shuffle_epi8(lut_lo, lo_nibbles);
if (!_mm256_testz_si256(lo, hi)) {
return 0;
}
const __m256i eq_2F = _mm256_cmpeq_epi8(str, mask_2F);
const __m256i roll = _mm256_shuffle_epi8(lut_roll, _mm256_add_epi8(eq_2F, hi_nibbles));
// Now simply add the delta values to the input:
str = _mm256_add_epi8(str, roll);
// Reshuffle the input to packed 12-byte output format:
str = dec_reshuffle(str);
// Store the output:
_mm256_storeu_si256((__m256i *) *o, str);
*s += 32;
*o += 24;
*rounds -= 1;
return 1;
}
static inline void
dec_loop_avx2 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 45) {
return;
}
// Process blocks of 32 bytes per round. Because 8 extra zero bytes are
// written after the output, ensure that there will be at least 13
// bytes of input data left to cover the gap. (11 data bytes and up to
// two end-of-string markers.)
size_t rounds = (*slen - 13) / 32;
*slen -= rounds * 32; // 32 bytes consumed per round
*olen += rounds * 24; // 24 bytes produced per round
do {
if (rounds >= 8) {
if (dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds)) {
continue;
}
break;
}
if (rounds >= 4) {
if (dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds)) {
continue;
}
break;
}
if (rounds >= 2) {
if (dec_loop_avx2_inner(s, o, &rounds) &&
dec_loop_avx2_inner(s, o, &rounds)) {
continue;
}
break;
}
dec_loop_avx2_inner(s, o, &rounds);
break;
} while (rounds > 0);
// Adjust for any rounds that were skipped:
*slen += rounds * 32;
*olen -= rounds * 24;
}

View File

@@ -0,0 +1,34 @@
static inline __m256i
dec_reshuffle (const __m256i in)
{
// in, lower lane, bits, upper case are most significant bits, lower
// case are least significant bits:
// 00llllll 00kkkkLL 00jjKKKK 00JJJJJJ
// 00iiiiii 00hhhhII 00ggHHHH 00GGGGGG
// 00ffffff 00eeeeFF 00ddEEEE 00DDDDDD
// 00cccccc 00bbbbCC 00aaBBBB 00AAAAAA
const __m256i merge_ab_and_bc = _mm256_maddubs_epi16(in, _mm256_set1_epi32(0x01400140));
// 0000kkkk LLllllll 0000JJJJ JJjjKKKK
// 0000hhhh IIiiiiii 0000GGGG GGggHHHH
// 0000eeee FFffffff 0000DDDD DDddEEEE
// 0000bbbb CCcccccc 0000AAAA AAaaBBBB
__m256i out = _mm256_madd_epi16(merge_ab_and_bc, _mm256_set1_epi32(0x00011000));
// 00000000 JJJJJJjj KKKKkkkk LLllllll
// 00000000 GGGGGGgg HHHHhhhh IIiiiiii
// 00000000 DDDDDDdd EEEEeeee FFffffff
// 00000000 AAAAAAaa BBBBbbbb CCcccccc
// Pack bytes together in each lane:
out = _mm256_shuffle_epi8(out, _mm256_setr_epi8(
2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1,
2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1));
// 00000000 00000000 00000000 00000000
// LLllllll KKKKkkkk JJJJJJjj IIiiiiii
// HHHHhhhh GGGGGGgg FFffffff EEEEeeee
// DDDDDDdd CCcccccc BBBBbbbb AAAAAAaa
// Pack lanes:
return _mm256_permutevar8x32_epi32(out, _mm256_setr_epi32(0, 1, 2, 4, 5, 6, -1, -1));
}

View File

@@ -0,0 +1,89 @@
static inline void
enc_loop_avx2_inner_first (const uint8_t **s, uint8_t **o)
{
// First load is done at s - 0 to not get a segfault:
__m256i src = _mm256_loadu_si256((__m256i *) *s);
// Shift by 4 bytes, as required by enc_reshuffle:
src = _mm256_permutevar8x32_epi32(src, _mm256_setr_epi32(0, 0, 1, 2, 3, 4, 5, 6));
// Reshuffle, translate, store:
src = enc_reshuffle(src);
src = enc_translate(src);
_mm256_storeu_si256((__m256i *) *o, src);
// Subsequent loads will be done at s - 4, set pointer for next round:
*s += 20;
*o += 32;
}
static inline void
enc_loop_avx2_inner (const uint8_t **s, uint8_t **o)
{
// Load input:
__m256i src = _mm256_loadu_si256((__m256i *) *s);
// Reshuffle, translate, store:
src = enc_reshuffle(src);
src = enc_translate(src);
_mm256_storeu_si256((__m256i *) *o, src);
*s += 24;
*o += 32;
}
static inline void
enc_loop_avx2 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 32) {
return;
}
// Process blocks of 24 bytes at a time. Because blocks are loaded 32
// bytes at a time an offset of -4, ensure that there will be at least
// 4 remaining bytes after the last round, so that the final read will
// not pass beyond the bounds of the input buffer:
size_t rounds = (*slen - 4) / 24;
*slen -= rounds * 24; // 24 bytes consumed per round
*olen += rounds * 32; // 32 bytes produced per round
// The first loop iteration requires special handling to ensure that
// the read, which is done at an offset, does not underflow the buffer:
enc_loop_avx2_inner_first(s, o);
rounds--;
while (rounds > 0) {
if (rounds >= 8) {
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
rounds -= 8;
continue;
}
if (rounds >= 4) {
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
rounds -= 4;
continue;
}
if (rounds >= 2) {
enc_loop_avx2_inner(s, o);
enc_loop_avx2_inner(s, o);
rounds -= 2;
continue;
}
enc_loop_avx2_inner(s, o);
break;
}
// Add the offset back:
*s += 4;
}

View File

@@ -0,0 +1,291 @@
// Apologies in advance for combining the preprocessor with inline assembly,
// two notoriously gnarly parts of C, but it was necessary to avoid a lot of
// code repetition. The preprocessor is used to template large sections of
// inline assembly that differ only in the registers used. If the code was
// written out by hand, it would become very large and hard to audit.
// Generate a block of inline assembly that loads register R0 from memory. The
// offset at which the register is loaded is set by the given round and a
// constant offset.
#define LOAD(R0, ROUND, OFFSET) \
"vlddqu ("#ROUND" * 24 + "#OFFSET")(%[src]), %["R0"] \n\t"
// Generate a block of inline assembly that deinterleaves and shuffles register
// R0 using preloaded constants. Outputs in R0 and R1.
#define SHUF(R0, R1, R2) \
"vpshufb %[lut0], %["R0"], %["R1"] \n\t" \
"vpand %["R1"], %[msk0], %["R2"] \n\t" \
"vpand %["R1"], %[msk2], %["R1"] \n\t" \
"vpmulhuw %["R2"], %[msk1], %["R2"] \n\t" \
"vpmullw %["R1"], %[msk3], %["R1"] \n\t" \
"vpor %["R1"], %["R2"], %["R1"] \n\t"
// Generate a block of inline assembly that takes R0 and R1 and translates
// their contents to the base64 alphabet, using preloaded constants.
#define TRAN(R0, R1, R2) \
"vpsubusb %[n51], %["R1"], %["R0"] \n\t" \
"vpcmpgtb %[n25], %["R1"], %["R2"] \n\t" \
"vpsubb %["R2"], %["R0"], %["R0"] \n\t" \
"vpshufb %["R0"], %[lut1], %["R2"] \n\t" \
"vpaddb %["R1"], %["R2"], %["R0"] \n\t"
// Generate a block of inline assembly that stores the given register R0 at an
// offset set by the given round.
#define STOR(R0, ROUND) \
"vmovdqu %["R0"], ("#ROUND" * 32)(%[dst]) \n\t"
// Generate a block of inline assembly that generates a single self-contained
// encoder round: fetch the data, process it, and store the result. Then update
// the source and destination pointers.
#define ROUND() \
LOAD("a", 0, -4) \
SHUF("a", "b", "c") \
TRAN("a", "b", "c") \
STOR("a", 0) \
"add $24, %[src] \n\t" \
"add $32, %[dst] \n\t"
// Define a macro that initiates a three-way interleaved encoding round by
// preloading registers a, b and c from memory.
// The register graph shows which registers are in use during each step, and
// is a visual aid for choosing registers for that step. Symbol index:
//
// + indicates that a register is loaded by that step.
// | indicates that a register is in use and must not be touched.
// - indicates that a register is decommissioned by that step.
// x indicates that a register is used as a temporary by that step.
// V indicates that a register is an input or output to the macro.
//
#define ROUND_3_INIT() /* a b c d e f */ \
LOAD("a", 0, -4) /* + */ \
SHUF("a", "d", "e") /* | + x */ \
LOAD("b", 1, -4) /* | + | */ \
TRAN("a", "d", "e") /* | | - x */ \
LOAD("c", 2, -4) /* V V V */
// Define a macro that translates, shuffles and stores the input registers A, B
// and C, and preloads registers D, E and F for the next round.
// This macro can be arbitrarily daisy-chained by feeding output registers D, E
// and F back into the next round as input registers A, B and C. The macro
// carefully interleaves memory operations with data operations for optimal
// pipelined performance.
#define ROUND_3(ROUND, A,B,C,D,E,F) /* A B C D E F */ \
LOAD(D, (ROUND + 3), -4) /* V V V + */ \
SHUF(B, E, F) /* | | | | + x */ \
STOR(A, (ROUND + 0)) /* - | | | | */ \
TRAN(B, E, F) /* | | | - x */ \
LOAD(E, (ROUND + 4), -4) /* | | | + */ \
SHUF(C, A, F) /* + | | | | x */ \
STOR(B, (ROUND + 1)) /* | - | | | */ \
TRAN(C, A, F) /* - | | | x */ \
LOAD(F, (ROUND + 5), -4) /* | | | + */ \
SHUF(D, A, B) /* + x | | | | */ \
STOR(C, (ROUND + 2)) /* | - | | | */ \
TRAN(D, A, B) /* - x V V V */
// Define a macro that terminates a ROUND_3 macro by taking pre-loaded
// registers D, E and F, and translating, shuffling and storing them.
#define ROUND_3_END(ROUND, A,B,C,D,E,F) /* A B C D E F */ \
SHUF(E, A, B) /* + x V V V */ \
STOR(D, (ROUND + 3)) /* | - | | */ \
TRAN(E, A, B) /* - x | | */ \
SHUF(F, C, D) /* + x | | */ \
STOR(E, (ROUND + 4)) /* | - | */ \
TRAN(F, C, D) /* - x | */ \
STOR(F, (ROUND + 5)) /* - */
// Define a type A round. Inputs are a, b, and c, outputs are d, e, and f.
#define ROUND_3_A(ROUND) \
ROUND_3(ROUND, "a", "b", "c", "d", "e", "f")
// Define a type B round. Inputs and outputs are swapped with regard to type A.
#define ROUND_3_B(ROUND) \
ROUND_3(ROUND, "d", "e", "f", "a", "b", "c")
// Terminating macro for a type A round.
#define ROUND_3_A_LAST(ROUND) \
ROUND_3_A(ROUND) \
ROUND_3_END(ROUND, "a", "b", "c", "d", "e", "f")
// Terminating macro for a type B round.
#define ROUND_3_B_LAST(ROUND) \
ROUND_3_B(ROUND) \
ROUND_3_END(ROUND, "d", "e", "f", "a", "b", "c")
// Suppress clang's warning that the literal string in the asm statement is
// overlong (longer than the ISO-mandated minimum size of 4095 bytes for C99
// compilers). It may be true, but the goal here is not C99 portability.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Woverlength-strings"
static inline void
enc_loop_avx2 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
// For a clearer explanation of the algorithm used by this function,
// please refer to the plain (not inline assembly) implementation. This
// function follows the same basic logic.
if (*slen < 32) {
return;
}
// Process blocks of 24 bytes at a time. Because blocks are loaded 32
// bytes at a time an offset of -4, ensure that there will be at least
// 4 remaining bytes after the last round, so that the final read will
// not pass beyond the bounds of the input buffer.
size_t rounds = (*slen - 4) / 24;
*slen -= rounds * 24; // 24 bytes consumed per round
*olen += rounds * 32; // 32 bytes produced per round
// Pre-decrement the number of rounds to get the number of rounds
// *after* the first round, which is handled as a special case.
rounds--;
// Number of times to go through the 36x loop.
size_t loops = rounds / 36;
// Number of rounds remaining after the 36x loop.
rounds %= 36;
// Lookup tables.
const __m256i lut0 = _mm256_set_epi8(
10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1,
14, 15, 13, 14, 11, 12, 10, 11, 8, 9, 7, 8, 5, 6, 4, 5);
const __m256i lut1 = _mm256_setr_epi8(
65, 71, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -19, -16, 0, 0,
65, 71, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -19, -16, 0, 0);
// Temporary registers.
__m256i a, b, c, d, e;
// Temporary register f doubles as the shift mask for the first round.
__m256i f = _mm256_setr_epi32(0, 0, 1, 2, 3, 4, 5, 6);
__asm__ volatile (
// The first loop iteration requires special handling to ensure
// that the read, which is normally done at an offset of -4,
// does not underflow the buffer. Load the buffer at an offset
// of 0 and permute the input to achieve the same effect.
LOAD("a", 0, 0)
"vpermd %[a], %[f], %[a] \n\t"
// Perform the standard shuffling and translation steps.
SHUF("a", "b", "c")
TRAN("a", "b", "c")
// Store the result and increment the source and dest pointers.
"vmovdqu %[a], (%[dst]) \n\t"
"add $24, %[src] \n\t"
"add $32, %[dst] \n\t"
// If there are 36 rounds or more, enter a 36x unrolled loop of
// interleaved encoding rounds. The rounds interleave memory
// operations (load/store) with data operations (table lookups,
// etc) to maximize pipeline throughput.
" test %[loops], %[loops] \n\t"
" jz 18f \n\t"
" jmp 36f \n\t"
" \n\t"
".balign 64 \n\t"
"36: " ROUND_3_INIT()
" " ROUND_3_A( 0)
" " ROUND_3_B( 3)
" " ROUND_3_A( 6)
" " ROUND_3_B( 9)
" " ROUND_3_A(12)
" " ROUND_3_B(15)
" " ROUND_3_A(18)
" " ROUND_3_B(21)
" " ROUND_3_A(24)
" " ROUND_3_B(27)
" " ROUND_3_A_LAST(30)
" add $(24 * 36), %[src] \n\t"
" add $(32 * 36), %[dst] \n\t"
" dec %[loops] \n\t"
" jnz 36b \n\t"
// Enter an 18x unrolled loop for rounds of 18 or more.
"18: cmp $18, %[rounds] \n\t"
" jl 9f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A(0)
" " ROUND_3_B(3)
" " ROUND_3_A(6)
" " ROUND_3_B(9)
" " ROUND_3_A_LAST(12)
" sub $18, %[rounds] \n\t"
" add $(24 * 18), %[src] \n\t"
" add $(32 * 18), %[dst] \n\t"
// Enter a 9x unrolled loop for rounds of 9 or more.
"9: cmp $9, %[rounds] \n\t"
" jl 6f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A(0)
" " ROUND_3_B_LAST(3)
" sub $9, %[rounds] \n\t"
" add $(24 * 9), %[src] \n\t"
" add $(32 * 9), %[dst] \n\t"
// Enter a 6x unrolled loop for rounds of 6 or more.
"6: cmp $6, %[rounds] \n\t"
" jl 55f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A_LAST(0)
" sub $6, %[rounds] \n\t"
" add $(24 * 6), %[src] \n\t"
" add $(32 * 6), %[dst] \n\t"
// Dispatch the remaining rounds 0..5.
"55: cmp $3, %[rounds] \n\t"
" jg 45f \n\t"
" je 3f \n\t"
" cmp $1, %[rounds] \n\t"
" jg 2f \n\t"
" je 1f \n\t"
" jmp 0f \n\t"
"45: cmp $4, %[rounds] \n\t"
" je 4f \n\t"
// Block of non-interlaced encoding rounds, which can each
// individually be jumped to. Rounds fall through to the next.
"5: " ROUND()
"4: " ROUND()
"3: " ROUND()
"2: " ROUND()
"1: " ROUND()
"0: \n\t"
// Outputs (modified).
: [rounds] "+r" (rounds),
[loops] "+r" (loops),
[src] "+r" (*s),
[dst] "+r" (*o),
[a] "=&x" (a),
[b] "=&x" (b),
[c] "=&x" (c),
[d] "=&x" (d),
[e] "=&x" (e),
[f] "+x" (f)
// Inputs (not modified).
: [lut0] "x" (lut0),
[lut1] "x" (lut1),
[msk0] "x" (_mm256_set1_epi32(0x0FC0FC00)),
[msk1] "x" (_mm256_set1_epi32(0x04000040)),
[msk2] "x" (_mm256_set1_epi32(0x003F03F0)),
[msk3] "x" (_mm256_set1_epi32(0x01000010)),
[n51] "x" (_mm256_set1_epi8(51)),
[n25] "x" (_mm256_set1_epi8(25))
// Clobbers.
: "cc", "memory"
);
}
#pragma GCC diagnostic pop

View File

@@ -0,0 +1,83 @@
static inline __m256i
enc_reshuffle (const __m256i input)
{
// Translation of the SSSE3 reshuffling algorithm to AVX2. This one
// works with shifted (4 bytes) input in order to be able to work
// efficiently in the two 128-bit lanes.
// Input, bytes MSB to LSB:
// 0 0 0 0 x w v u t s r q p o n m
// l k j i h g f e d c b a 0 0 0 0
const __m256i in = _mm256_shuffle_epi8(input, _mm256_set_epi8(
10, 11, 9, 10,
7, 8, 6, 7,
4, 5, 3, 4,
1, 2, 0, 1,
14, 15, 13, 14,
11, 12, 10, 11,
8, 9, 7, 8,
5, 6, 4, 5));
// in, bytes MSB to LSB:
// w x v w
// t u s t
// q r p q
// n o m n
// k l j k
// h i g h
// e f d e
// b c a b
const __m256i t0 = _mm256_and_si256(in, _mm256_set1_epi32(0x0FC0FC00));
// bits, upper case are most significant bits, lower case are least
// significant bits.
// 0000wwww XX000000 VVVVVV00 00000000
// 0000tttt UU000000 SSSSSS00 00000000
// 0000qqqq RR000000 PPPPPP00 00000000
// 0000nnnn OO000000 MMMMMM00 00000000
// 0000kkkk LL000000 JJJJJJ00 00000000
// 0000hhhh II000000 GGGGGG00 00000000
// 0000eeee FF000000 DDDDDD00 00000000
// 0000bbbb CC000000 AAAAAA00 00000000
const __m256i t1 = _mm256_mulhi_epu16(t0, _mm256_set1_epi32(0x04000040));
// 00000000 00wwwwXX 00000000 00VVVVVV
// 00000000 00ttttUU 00000000 00SSSSSS
// 00000000 00qqqqRR 00000000 00PPPPPP
// 00000000 00nnnnOO 00000000 00MMMMMM
// 00000000 00kkkkLL 00000000 00JJJJJJ
// 00000000 00hhhhII 00000000 00GGGGGG
// 00000000 00eeeeFF 00000000 00DDDDDD
// 00000000 00bbbbCC 00000000 00AAAAAA
const __m256i t2 = _mm256_and_si256(in, _mm256_set1_epi32(0x003F03F0));
// 00000000 00xxxxxx 000000vv WWWW0000
// 00000000 00uuuuuu 000000ss TTTT0000
// 00000000 00rrrrrr 000000pp QQQQ0000
// 00000000 00oooooo 000000mm NNNN0000
// 00000000 00llllll 000000jj KKKK0000
// 00000000 00iiiiii 000000gg HHHH0000
// 00000000 00ffffff 000000dd EEEE0000
// 00000000 00cccccc 000000aa BBBB0000
const __m256i t3 = _mm256_mullo_epi16(t2, _mm256_set1_epi32(0x01000010));
// 00xxxxxx 00000000 00vvWWWW 00000000
// 00uuuuuu 00000000 00ssTTTT 00000000
// 00rrrrrr 00000000 00ppQQQQ 00000000
// 00oooooo 00000000 00mmNNNN 00000000
// 00llllll 00000000 00jjKKKK 00000000
// 00iiiiii 00000000 00ggHHHH 00000000
// 00ffffff 00000000 00ddEEEE 00000000
// 00cccccc 00000000 00aaBBBB 00000000
return _mm256_or_si256(t1, t3);
// 00xxxxxx 00wwwwXX 00vvWWWW 00VVVVVV
// 00uuuuuu 00ttttUU 00ssTTTT 00SSSSSS
// 00rrrrrr 00qqqqRR 00ppQQQQ 00PPPPPP
// 00oooooo 00nnnnOO 00mmNNNN 00MMMMMM
// 00llllll 00kkkkLL 00jjKKKK 00JJJJJJ
// 00iiiiii 00hhhhII 00ggHHHH 00GGGGGG
// 00ffffff 00eeeeFF 00ddEEEE 00DDDDDD
// 00cccccc 00bbbbCC 00aaBBBB 00AAAAAA
}

View File

@@ -0,0 +1,30 @@
static inline __m256i
enc_translate (const __m256i in)
{
// A lookup table containing the absolute offsets for all ranges:
const __m256i lut = _mm256_setr_epi8(
65, 71, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -19, -16, 0, 0,
65, 71, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -19, -16, 0, 0);
// Translate values 0..63 to the Base64 alphabet. There are five sets:
// # From To Abs Index Characters
// 0 [0..25] [65..90] +65 0 ABCDEFGHIJKLMNOPQRSTUVWXYZ
// 1 [26..51] [97..122] +71 1 abcdefghijklmnopqrstuvwxyz
// 2 [52..61] [48..57] -4 [2..11] 0123456789
// 3 [62] [43] -19 12 +
// 4 [63] [47] -16 13 /
// Create LUT indices from the input. The index for range #0 is right,
// others are 1 less than expected:
__m256i indices = _mm256_subs_epu8(in, _mm256_set1_epi8(51));
// mask is 0xFF (-1) for range #[1..4] and 0x00 for range #0:
const __m256i mask = _mm256_cmpgt_epi8(in, _mm256_set1_epi8(25));
// Subtract -1, so add 1 to indices for range #[1..4]. All indices are
// now correct:
indices = _mm256_sub_epi8(indices, mask);
// Add offsets to input values:
return _mm256_add_epi8(in, _mm256_shuffle_epi8(lut, indices));
}

42
3rdparty/base64/lib/arch/avx512/codec.c vendored Normal file
View File

@@ -0,0 +1,42 @@
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#if HAVE_AVX512
#include <immintrin.h>
#include "../avx2/dec_reshuffle.c"
#include "../avx2/dec_loop.c"
#include "enc_reshuffle_translate.c"
#include "enc_loop.c"
#endif // HAVE_AVX512
BASE64_ENC_FUNCTION(avx512)
{
#if HAVE_AVX512
#include "../generic/enc_head.c"
enc_loop_avx512(&s, &slen, &o, &olen);
#include "../generic/enc_tail.c"
#else
BASE64_ENC_STUB
#endif
}
// Reuse AVX2 decoding. Not supporting AVX512 at present
BASE64_DEC_FUNCTION(avx512)
{
#if HAVE_AVX512
#include "../generic/dec_head.c"
dec_loop_avx2(&s, &slen, &o, &olen);
#include "../generic/dec_tail.c"
#else
BASE64_DEC_STUB
#endif
}

View File

@@ -0,0 +1,61 @@
static inline void
enc_loop_avx512_inner (const uint8_t **s, uint8_t **o)
{
// Load input.
__m512i src = _mm512_loadu_si512((__m512i *) *s);
// Reshuffle, translate, store.
src = enc_reshuffle_translate(src);
_mm512_storeu_si512((__m512i *) *o, src);
*s += 48;
*o += 64;
}
static inline void
enc_loop_avx512 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 64) {
return;
}
// Process blocks of 48 bytes at a time. Because blocks are loaded 64
// bytes at a time, ensure that there will be at least 24 remaining
// bytes after the last round, so that the final read will not pass
// beyond the bounds of the input buffer.
size_t rounds = (*slen - 24) / 48;
*slen -= rounds * 48; // 48 bytes consumed per round
*olen += rounds * 64; // 64 bytes produced per round
while (rounds > 0) {
if (rounds >= 8) {
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
rounds -= 8;
continue;
}
if (rounds >= 4) {
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
rounds -= 4;
continue;
}
if (rounds >= 2) {
enc_loop_avx512_inner(s, o);
enc_loop_avx512_inner(s, o);
rounds -= 2;
continue;
}
enc_loop_avx512_inner(s, o);
break;
}
}

View File

@@ -0,0 +1,50 @@
// AVX512 algorithm is based on permutevar and multishift. The code is based on
// https://github.com/WojciechMula/base64simd which is under BSD-2 license.
static inline __m512i
enc_reshuffle_translate (const __m512i input)
{
// 32-bit input
// [ 0 0 0 0 0 0 0 0|c1 c0 d5 d4 d3 d2 d1 d0|
// b3 b2 b1 b0 c5 c4 c3 c2|a5 a4 a3 a2 a1 a0 b5 b4]
// output order [1, 2, 0, 1]
// [b3 b2 b1 b0 c5 c4 c3 c2|c1 c0 d5 d4 d3 d2 d1 d0|
// a5 a4 a3 a2 a1 a0 b5 b4|b3 b2 b1 b0 c3 c2 c1 c0]
const __m512i shuffle_input = _mm512_setr_epi32(0x01020001,
0x04050304,
0x07080607,
0x0a0b090a,
0x0d0e0c0d,
0x10110f10,
0x13141213,
0x16171516,
0x191a1819,
0x1c1d1b1c,
0x1f201e1f,
0x22232122,
0x25262425,
0x28292728,
0x2b2c2a2b,
0x2e2f2d2e);
// Reorder bytes
// [b3 b2 b1 b0 c5 c4 c3 c2|c1 c0 d5 d4 d3 d2 d1 d0|
// a5 a4 a3 a2 a1 a0 b5 b4|b3 b2 b1 b0 c3 c2 c1 c0]
const __m512i in = _mm512_permutexvar_epi8(shuffle_input, input);
// After multishift a single 32-bit lane has following layout
// [c1 c0 d5 d4 d3 d2 d1 d0|b1 b0 c5 c4 c3 c2 c1 c0|
// a1 a0 b5 b4 b3 b2 b1 b0|d1 d0 a5 a4 a3 a2 a1 a0]
// (a = [10:17], b = [4:11], c = [22:27], d = [16:21])
// 48, 54, 36, 42, 16, 22, 4, 10
const __m512i shifts = _mm512_set1_epi64(0x3036242a1016040alu);
__m512i shuffled_in = _mm512_multishift_epi64_epi8(shifts, in);
// Translate immediatedly after reshuffled.
const __m512i lookup = _mm512_loadu_si512(base64_table_enc_6bit);
// Translation 6-bit values to ASCII.
return _mm512_permutexvar_epi8(shuffled_in, lookup);
}

View File

@@ -0,0 +1,86 @@
static inline int
dec_loop_generic_32_inner (const uint8_t **s, uint8_t **o, size_t *rounds)
{
const uint32_t str
= base64_table_dec_32bit_d0[(*s)[0]]
| base64_table_dec_32bit_d1[(*s)[1]]
| base64_table_dec_32bit_d2[(*s)[2]]
| base64_table_dec_32bit_d3[(*s)[3]];
#if BASE64_LITTLE_ENDIAN
// LUTs for little-endian set MSB in case of invalid character:
if (str & UINT32_C(0x80000000)) {
return 0;
}
#else
// LUTs for big-endian set LSB in case of invalid character:
if (str & UINT32_C(1)) {
return 0;
}
#endif
// Store the output:
memcpy(*o, &str, sizeof (str));
*s += 4;
*o += 3;
*rounds -= 1;
return 1;
}
static inline void
dec_loop_generic_32 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 8) {
return;
}
// Process blocks of 4 bytes per round. Because one extra zero byte is
// written after the output, ensure that there will be at least 4 bytes
// of input data left to cover the gap. (Two data bytes and up to two
// end-of-string markers.)
size_t rounds = (*slen - 4) / 4;
*slen -= rounds * 4; // 4 bytes consumed per round
*olen += rounds * 3; // 3 bytes produced per round
do {
if (rounds >= 8) {
if (dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds)) {
continue;
}
break;
}
if (rounds >= 4) {
if (dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds)) {
continue;
}
break;
}
if (rounds >= 2) {
if (dec_loop_generic_32_inner(s, o, &rounds) &&
dec_loop_generic_32_inner(s, o, &rounds)) {
continue;
}
break;
}
dec_loop_generic_32_inner(s, o, &rounds);
break;
} while (rounds > 0);
// Adjust for any rounds that were skipped:
*slen += rounds * 4;
*olen -= rounds * 3;
}

View File

@@ -0,0 +1,73 @@
static inline void
enc_loop_generic_32_inner (const uint8_t **s, uint8_t **o)
{
uint32_t src;
// Load input:
memcpy(&src, *s, sizeof (src));
// Reorder to 32-bit big-endian, if not already in that format. The
// workset must be in big-endian, otherwise the shifted bits do not
// carry over properly among adjacent bytes:
src = BASE64_HTOBE32(src);
// Two indices for the 12-bit lookup table:
const size_t index0 = (src >> 20) & 0xFFFU;
const size_t index1 = (src >> 8) & 0xFFFU;
// Table lookup and store:
memcpy(*o + 0, base64_table_enc_12bit + index0, 2);
memcpy(*o + 2, base64_table_enc_12bit + index1, 2);
*s += 3;
*o += 4;
}
static inline void
enc_loop_generic_32 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 4) {
return;
}
// Process blocks of 3 bytes at a time. Because blocks are loaded 4
// bytes at a time, ensure that there will be at least one remaining
// byte after the last round, so that the final read will not pass
// beyond the bounds of the input buffer:
size_t rounds = (*slen - 1) / 3;
*slen -= rounds * 3; // 3 bytes consumed per round
*olen += rounds * 4; // 4 bytes produced per round
do {
if (rounds >= 8) {
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
rounds -= 8;
continue;
}
if (rounds >= 4) {
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
rounds -= 4;
continue;
}
if (rounds >= 2) {
enc_loop_generic_32_inner(s, o);
enc_loop_generic_32_inner(s, o);
rounds -= 2;
continue;
}
enc_loop_generic_32_inner(s, o);
break;
} while (rounds > 0);
}

View File

@@ -0,0 +1,77 @@
static inline void
enc_loop_generic_64_inner (const uint8_t **s, uint8_t **o)
{
uint64_t src;
// Load input:
memcpy(&src, *s, sizeof (src));
// Reorder to 64-bit big-endian, if not already in that format. The
// workset must be in big-endian, otherwise the shifted bits do not
// carry over properly among adjacent bytes:
src = BASE64_HTOBE64(src);
// Four indices for the 12-bit lookup table:
const size_t index0 = (src >> 52) & 0xFFFU;
const size_t index1 = (src >> 40) & 0xFFFU;
const size_t index2 = (src >> 28) & 0xFFFU;
const size_t index3 = (src >> 16) & 0xFFFU;
// Table lookup and store:
memcpy(*o + 0, base64_table_enc_12bit + index0, 2);
memcpy(*o + 2, base64_table_enc_12bit + index1, 2);
memcpy(*o + 4, base64_table_enc_12bit + index2, 2);
memcpy(*o + 6, base64_table_enc_12bit + index3, 2);
*s += 6;
*o += 8;
}
static inline void
enc_loop_generic_64 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 8) {
return;
}
// Process blocks of 6 bytes at a time. Because blocks are loaded 8
// bytes at a time, ensure that there will be at least 2 remaining
// bytes after the last round, so that the final read will not pass
// beyond the bounds of the input buffer:
size_t rounds = (*slen - 2) / 6;
*slen -= rounds * 6; // 6 bytes consumed per round
*olen += rounds * 8; // 8 bytes produced per round
do {
if (rounds >= 8) {
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
rounds -= 8;
continue;
}
if (rounds >= 4) {
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
rounds -= 4;
continue;
}
if (rounds >= 2) {
enc_loop_generic_64_inner(s, o);
enc_loop_generic_64_inner(s, o);
rounds -= 2;
continue;
}
enc_loop_generic_64_inner(s, o);
break;
} while (rounds > 0);
}

View File

@@ -0,0 +1,39 @@
#include <stdint.h>
#include <stddef.h>
#include <string.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#if BASE64_WORDSIZE == 32
# include "32/enc_loop.c"
#elif BASE64_WORDSIZE == 64
# include "64/enc_loop.c"
#endif
#if BASE64_WORDSIZE >= 32
# include "32/dec_loop.c"
#endif
BASE64_ENC_FUNCTION(plain)
{
#include "enc_head.c"
#if BASE64_WORDSIZE == 32
enc_loop_generic_32(&s, &slen, &o, &olen);
#elif BASE64_WORDSIZE == 64
enc_loop_generic_64(&s, &slen, &o, &olen);
#endif
#include "enc_tail.c"
}
BASE64_DEC_FUNCTION(plain)
{
#include "dec_head.c"
#if BASE64_WORDSIZE >= 32
dec_loop_generic_32(&s, &slen, &o, &olen);
#endif
#include "dec_tail.c"
}

View File

@@ -0,0 +1,37 @@
int ret = 0;
const uint8_t *s = (const uint8_t *) src;
uint8_t *o = (uint8_t *) out;
uint8_t q;
// Use local temporaries to avoid cache thrashing:
size_t olen = 0;
size_t slen = srclen;
struct base64_state st;
st.eof = state->eof;
st.bytes = state->bytes;
st.carry = state->carry;
// If we previously saw an EOF or an invalid character, bail out:
if (st.eof) {
*outlen = 0;
ret = 0;
// If there was a trailing '=' to check, check it:
if (slen && (st.eof == BASE64_AEOF)) {
state->bytes = 0;
state->eof = BASE64_EOF;
ret = ((base64_table_dec_8bit[*s++] == 254) && (slen == 1)) ? 1 : 0;
}
return ret;
}
// Turn four 6-bit numbers into three bytes:
// out[0] = 11111122
// out[1] = 22223333
// out[2] = 33444444
// Duff's device again:
switch (st.bytes)
{
for (;;)
{
case 0:

View File

@@ -0,0 +1,91 @@
if (slen-- == 0) {
ret = 1;
break;
}
if ((q = base64_table_dec_8bit[*s++]) >= 254) {
st.eof = BASE64_EOF;
// Treat character '=' as invalid for byte 0:
break;
}
st.carry = q << 2;
st.bytes++;
// Deliberate fallthrough:
BASE64_FALLTHROUGH
case 1: if (slen-- == 0) {
ret = 1;
break;
}
if ((q = base64_table_dec_8bit[*s++]) >= 254) {
st.eof = BASE64_EOF;
// Treat character '=' as invalid for byte 1:
break;
}
*o++ = st.carry | (q >> 4);
st.carry = q << 4;
st.bytes++;
olen++;
// Deliberate fallthrough:
BASE64_FALLTHROUGH
case 2: if (slen-- == 0) {
ret = 1;
break;
}
if ((q = base64_table_dec_8bit[*s++]) >= 254) {
st.bytes++;
// When q == 254, the input char is '='.
// Check if next byte is also '=':
if (q == 254) {
if (slen-- != 0) {
st.bytes = 0;
// EOF:
st.eof = BASE64_EOF;
q = base64_table_dec_8bit[*s++];
ret = ((q == 254) && (slen == 0)) ? 1 : 0;
break;
}
else {
// Almost EOF
st.eof = BASE64_AEOF;
ret = 1;
break;
}
}
// If we get here, there was an error:
break;
}
*o++ = st.carry | (q >> 2);
st.carry = q << 6;
st.bytes++;
olen++;
// Deliberate fallthrough:
BASE64_FALLTHROUGH
case 3: if (slen-- == 0) {
ret = 1;
break;
}
if ((q = base64_table_dec_8bit[*s++]) >= 254) {
st.bytes = 0;
st.eof = BASE64_EOF;
// When q == 254, the input char is '='. Return 1 and EOF.
// When q == 255, the input char is invalid. Return 0 and EOF.
ret = ((q == 254) && (slen == 0)) ? 1 : 0;
break;
}
*o++ = st.carry | q;
st.carry = 0;
st.bytes = 0;
olen++;
}
}
state->eof = st.eof;
state->bytes = st.bytes;
state->carry = st.carry;
*outlen = olen;
return ret;

View File

@@ -0,0 +1,24 @@
// Assume that *out is large enough to contain the output.
// Theoretically it should be 4/3 the length of src.
const uint8_t *s = (const uint8_t *) src;
uint8_t *o = (uint8_t *) out;
// Use local temporaries to avoid cache thrashing:
size_t olen = 0;
size_t slen = srclen;
struct base64_state st;
st.bytes = state->bytes;
st.carry = state->carry;
// Turn three bytes into four 6-bit numbers:
// in[0] = 00111111
// in[1] = 00112222
// in[2] = 00222233
// in[3] = 00333333
// Duff's device, a for() loop inside a switch() statement. Legal!
switch (st.bytes)
{
for (;;)
{
case 0:

View File

@@ -0,0 +1,34 @@
if (slen-- == 0) {
break;
}
*o++ = base64_table_enc_6bit[*s >> 2];
st.carry = (*s++ << 4) & 0x30;
st.bytes++;
olen += 1;
// Deliberate fallthrough:
BASE64_FALLTHROUGH
case 1: if (slen-- == 0) {
break;
}
*o++ = base64_table_enc_6bit[st.carry | (*s >> 4)];
st.carry = (*s++ << 2) & 0x3C;
st.bytes++;
olen += 1;
// Deliberate fallthrough:
BASE64_FALLTHROUGH
case 2: if (slen-- == 0) {
break;
}
*o++ = base64_table_enc_6bit[st.carry | (*s >> 6)];
*o++ = base64_table_enc_6bit[*s++ & 0x3F];
st.bytes = 0;
olen += 2;
}
}
state->bytes = st.bytes;
state->carry = st.carry;
*outlen = olen;

77
3rdparty/base64/lib/arch/neon32/codec.c vendored Normal file
View File

@@ -0,0 +1,77 @@
#include <stdint.h>
#include <stddef.h>
#include <string.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#ifdef __arm__
# if (defined(__ARM_NEON__) || defined(__ARM_NEON)) && HAVE_NEON32
# define BASE64_USE_NEON32
# endif
#endif
#ifdef BASE64_USE_NEON32
#include <arm_neon.h>
// Only enable inline assembly on supported compilers.
#if defined(__GNUC__) || defined(__clang__)
#define BASE64_NEON32_USE_ASM
#endif
static inline uint8x16_t
vqtbl1q_u8 (const uint8x16_t lut, const uint8x16_t indices)
{
// NEON32 only supports 64-bit wide lookups in 128-bit tables. Emulate
// the NEON64 `vqtbl1q_u8` intrinsic to do 128-bit wide lookups.
uint8x8x2_t lut2;
uint8x8x2_t result;
lut2.val[0] = vget_low_u8(lut);
lut2.val[1] = vget_high_u8(lut);
result.val[0] = vtbl2_u8(lut2, vget_low_u8(indices));
result.val[1] = vtbl2_u8(lut2, vget_high_u8(indices));
return vcombine_u8(result.val[0], result.val[1]);
}
#include "../generic/32/dec_loop.c"
#include "../generic/32/enc_loop.c"
#include "dec_loop.c"
#include "enc_reshuffle.c"
#include "enc_translate.c"
#include "enc_loop.c"
#endif // BASE64_USE_NEON32
// Stride size is so large on these NEON 32-bit functions
// (48 bytes encode, 32 bytes decode) that we inline the
// uint32 codec to stay performant on smaller inputs.
BASE64_ENC_FUNCTION(neon32)
{
#ifdef BASE64_USE_NEON32
#include "../generic/enc_head.c"
enc_loop_neon32(&s, &slen, &o, &olen);
enc_loop_generic_32(&s, &slen, &o, &olen);
#include "../generic/enc_tail.c"
#else
BASE64_ENC_STUB
#endif
}
BASE64_DEC_FUNCTION(neon32)
{
#ifdef BASE64_USE_NEON32
#include "../generic/dec_head.c"
dec_loop_neon32(&s, &slen, &o, &olen);
dec_loop_generic_32(&s, &slen, &o, &olen);
#include "../generic/dec_tail.c"
#else
BASE64_DEC_STUB
#endif
}

View File

@@ -0,0 +1,106 @@
static inline int
is_nonzero (const uint8x16_t v)
{
uint64_t u64;
const uint64x2_t v64 = vreinterpretq_u64_u8(v);
const uint32x2_t v32 = vqmovn_u64(v64);
vst1_u64(&u64, vreinterpret_u64_u32(v32));
return u64 != 0;
}
static inline uint8x16_t
delta_lookup (const uint8x16_t v)
{
const uint8x8_t lut = {
0, 16, 19, 4, (uint8_t) -65, (uint8_t) -65, (uint8_t) -71, (uint8_t) -71,
};
return vcombine_u8(
vtbl1_u8(lut, vget_low_u8(v)),
vtbl1_u8(lut, vget_high_u8(v)));
}
static inline uint8x16_t
dec_loop_neon32_lane (uint8x16_t *lane)
{
// See the SSSE3 decoder for an explanation of the algorithm.
const uint8x16_t lut_lo = {
0x15, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
0x11, 0x11, 0x13, 0x1A, 0x1B, 0x1B, 0x1B, 0x1A
};
const uint8x16_t lut_hi = {
0x10, 0x10, 0x01, 0x02, 0x04, 0x08, 0x04, 0x08,
0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10
};
const uint8x16_t mask_0F = vdupq_n_u8(0x0F);
const uint8x16_t mask_2F = vdupq_n_u8(0x2F);
const uint8x16_t hi_nibbles = vshrq_n_u8(*lane, 4);
const uint8x16_t lo_nibbles = vandq_u8(*lane, mask_0F);
const uint8x16_t eq_2F = vceqq_u8(*lane, mask_2F);
const uint8x16_t hi = vqtbl1q_u8(lut_hi, hi_nibbles);
const uint8x16_t lo = vqtbl1q_u8(lut_lo, lo_nibbles);
// Now simply add the delta values to the input:
*lane = vaddq_u8(*lane, delta_lookup(vaddq_u8(eq_2F, hi_nibbles)));
// Return the validity mask:
return vandq_u8(lo, hi);
}
static inline void
dec_loop_neon32 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 64) {
return;
}
// Process blocks of 64 bytes per round. Unlike the SSE codecs, no
// extra trailing zero bytes are written, so it is not necessary to
// reserve extra input bytes:
size_t rounds = *slen / 64;
*slen -= rounds * 64; // 64 bytes consumed per round
*olen += rounds * 48; // 48 bytes produced per round
do {
uint8x16x3_t dec;
// Load 64 bytes and deinterleave:
uint8x16x4_t str = vld4q_u8(*s);
// Decode each lane, collect a mask of invalid inputs:
const uint8x16_t classified
= dec_loop_neon32_lane(&str.val[0])
| dec_loop_neon32_lane(&str.val[1])
| dec_loop_neon32_lane(&str.val[2])
| dec_loop_neon32_lane(&str.val[3]);
// Check for invalid input: if any of the delta values are
// zero, fall back on bytewise code to do error checking and
// reporting:
if (is_nonzero(classified)) {
break;
}
// Compress four bytes into three:
dec.val[0] = vorrq_u8(vshlq_n_u8(str.val[0], 2), vshrq_n_u8(str.val[1], 4));
dec.val[1] = vorrq_u8(vshlq_n_u8(str.val[1], 4), vshrq_n_u8(str.val[2], 2));
dec.val[2] = vorrq_u8(vshlq_n_u8(str.val[2], 6), str.val[3]);
// Interleave and store decoded result:
vst3q_u8(*o, dec);
*s += 64;
*o += 48;
} while (--rounds > 0);
// Adjust for any rounds that were skipped:
*slen += rounds * 64;
*olen -= rounds * 48;
}

View File

@@ -0,0 +1,170 @@
#ifdef BASE64_NEON32_USE_ASM
static inline void
enc_loop_neon32_inner_asm (const uint8_t **s, uint8_t **o)
{
// This function duplicates the functionality of enc_loop_neon32_inner,
// but entirely with inline assembly. This gives a significant speedup
// over using NEON intrinsics, which do not always generate very good
// code. The logic of the assembly is directly lifted from the
// intrinsics version, so it can be used as a guide to this code.
// Temporary registers, used as scratch space.
uint8x16_t tmp0, tmp1, tmp2, tmp3;
uint8x16_t mask0, mask1, mask2, mask3;
// A lookup table containing the absolute offsets for all ranges.
const uint8x16_t lut = {
65U, 71U, 252U, 252U,
252U, 252U, 252U, 252U,
252U, 252U, 252U, 252U,
237U, 240U, 0U, 0U
};
// Numeric constants.
const uint8x16_t n51 = vdupq_n_u8(51);
const uint8x16_t n25 = vdupq_n_u8(25);
const uint8x16_t n63 = vdupq_n_u8(63);
__asm__ (
// Load 48 bytes and deinterleave. The bytes are loaded to
// hard-coded registers q12, q13 and q14, to ensure that they
// are contiguous. Increment the source pointer.
"vld3.8 {d24, d26, d28}, [%[src]]! \n\t"
"vld3.8 {d25, d27, d29}, [%[src]]! \n\t"
// Reshuffle the bytes using temporaries.
"vshr.u8 %q[t0], q12, #2 \n\t"
"vshr.u8 %q[t1], q13, #4 \n\t"
"vshr.u8 %q[t2], q14, #6 \n\t"
"vsli.8 %q[t1], q12, #4 \n\t"
"vsli.8 %q[t2], q13, #2 \n\t"
"vand.u8 %q[t1], %q[t1], %q[n63] \n\t"
"vand.u8 %q[t2], %q[t2], %q[n63] \n\t"
"vand.u8 %q[t3], q14, %q[n63] \n\t"
// t0..t3 are the reshuffled inputs. Create LUT indices.
"vqsub.u8 q12, %q[t0], %q[n51] \n\t"
"vqsub.u8 q13, %q[t1], %q[n51] \n\t"
"vqsub.u8 q14, %q[t2], %q[n51] \n\t"
"vqsub.u8 q15, %q[t3], %q[n51] \n\t"
// Create the mask for range #0.
"vcgt.u8 %q[m0], %q[t0], %q[n25] \n\t"
"vcgt.u8 %q[m1], %q[t1], %q[n25] \n\t"
"vcgt.u8 %q[m2], %q[t2], %q[n25] \n\t"
"vcgt.u8 %q[m3], %q[t3], %q[n25] \n\t"
// Subtract -1 to correct the LUT indices.
"vsub.u8 q12, %q[m0] \n\t"
"vsub.u8 q13, %q[m1] \n\t"
"vsub.u8 q14, %q[m2] \n\t"
"vsub.u8 q15, %q[m3] \n\t"
// Lookup the delta values.
"vtbl.u8 d24, {%q[lut]}, d24 \n\t"
"vtbl.u8 d25, {%q[lut]}, d25 \n\t"
"vtbl.u8 d26, {%q[lut]}, d26 \n\t"
"vtbl.u8 d27, {%q[lut]}, d27 \n\t"
"vtbl.u8 d28, {%q[lut]}, d28 \n\t"
"vtbl.u8 d29, {%q[lut]}, d29 \n\t"
"vtbl.u8 d30, {%q[lut]}, d30 \n\t"
"vtbl.u8 d31, {%q[lut]}, d31 \n\t"
// Add the delta values.
"vadd.u8 q12, %q[t0] \n\t"
"vadd.u8 q13, %q[t1] \n\t"
"vadd.u8 q14, %q[t2] \n\t"
"vadd.u8 q15, %q[t3] \n\t"
// Store 64 bytes and interleave. Increment the dest pointer.
"vst4.8 {d24, d26, d28, d30}, [%[dst]]! \n\t"
"vst4.8 {d25, d27, d29, d31}, [%[dst]]! \n\t"
// Outputs (modified).
: [src] "+r" (*s),
[dst] "+r" (*o),
[t0] "=&w" (tmp0),
[t1] "=&w" (tmp1),
[t2] "=&w" (tmp2),
[t3] "=&w" (tmp3),
[m0] "=&w" (mask0),
[m1] "=&w" (mask1),
[m2] "=&w" (mask2),
[m3] "=&w" (mask3)
// Inputs (not modified).
: [lut] "w" (lut),
[n25] "w" (n25),
[n51] "w" (n51),
[n63] "w" (n63)
// Clobbers.
: "d24", "d25", "d26", "d27", "d28", "d29", "d30", "d31",
"cc", "memory"
);
}
#endif
static inline void
enc_loop_neon32_inner (const uint8_t **s, uint8_t **o)
{
#ifdef BASE64_NEON32_USE_ASM
enc_loop_neon32_inner_asm(s, o);
#else
// Load 48 bytes and deinterleave:
uint8x16x3_t src = vld3q_u8(*s);
// Reshuffle:
uint8x16x4_t out = enc_reshuffle(src);
// Translate reshuffled bytes to the Base64 alphabet:
out = enc_translate(out);
// Interleave and store output:
vst4q_u8(*o, out);
*s += 48;
*o += 64;
#endif
}
static inline void
enc_loop_neon32 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
size_t rounds = *slen / 48;
*slen -= rounds * 48; // 48 bytes consumed per round
*olen += rounds * 64; // 64 bytes produced per round
while (rounds > 0) {
if (rounds >= 8) {
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
rounds -= 8;
continue;
}
if (rounds >= 4) {
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
rounds -= 4;
continue;
}
if (rounds >= 2) {
enc_loop_neon32_inner(s, o);
enc_loop_neon32_inner(s, o);
rounds -= 2;
continue;
}
enc_loop_neon32_inner(s, o);
break;
}
}

View File

@@ -0,0 +1,31 @@
static inline uint8x16x4_t
enc_reshuffle (uint8x16x3_t in)
{
uint8x16x4_t out;
// Input:
// in[0] = a7 a6 a5 a4 a3 a2 a1 a0
// in[1] = b7 b6 b5 b4 b3 b2 b1 b0
// in[2] = c7 c6 c5 c4 c3 c2 c1 c0
// Output:
// out[0] = 00 00 a7 a6 a5 a4 a3 a2
// out[1] = 00 00 a1 a0 b7 b6 b5 b4
// out[2] = 00 00 b3 b2 b1 b0 c7 c6
// out[3] = 00 00 c5 c4 c3 c2 c1 c0
// Move the input bits to where they need to be in the outputs. Except
// for the first output, the high two bits are not cleared.
out.val[0] = vshrq_n_u8(in.val[0], 2);
out.val[1] = vshrq_n_u8(in.val[1], 4);
out.val[2] = vshrq_n_u8(in.val[2], 6);
out.val[1] = vsliq_n_u8(out.val[1], in.val[0], 4);
out.val[2] = vsliq_n_u8(out.val[2], in.val[1], 2);
// Clear the high two bits in the second, third and fourth output.
out.val[1] = vandq_u8(out.val[1], vdupq_n_u8(0x3F));
out.val[2] = vandq_u8(out.val[2], vdupq_n_u8(0x3F));
out.val[3] = vandq_u8(in.val[2], vdupq_n_u8(0x3F));
return out;
}

View File

@@ -0,0 +1,57 @@
static inline uint8x16x4_t
enc_translate (const uint8x16x4_t in)
{
// A lookup table containing the absolute offsets for all ranges:
const uint8x16_t lut = {
65U, 71U, 252U, 252U,
252U, 252U, 252U, 252U,
252U, 252U, 252U, 252U,
237U, 240U, 0U, 0U
};
const uint8x16_t offset = vdupq_n_u8(51);
uint8x16x4_t indices, mask, delta, out;
// Translate values 0..63 to the Base64 alphabet. There are five sets:
// # From To Abs Index Characters
// 0 [0..25] [65..90] +65 0 ABCDEFGHIJKLMNOPQRSTUVWXYZ
// 1 [26..51] [97..122] +71 1 abcdefghijklmnopqrstuvwxyz
// 2 [52..61] [48..57] -4 [2..11] 0123456789
// 3 [62] [43] -19 12 +
// 4 [63] [47] -16 13 /
// Create LUT indices from input:
// the index for range #0 is right, others are 1 less than expected:
indices.val[0] = vqsubq_u8(in.val[0], offset);
indices.val[1] = vqsubq_u8(in.val[1], offset);
indices.val[2] = vqsubq_u8(in.val[2], offset);
indices.val[3] = vqsubq_u8(in.val[3], offset);
// mask is 0xFF (-1) for range #[1..4] and 0x00 for range #0:
mask.val[0] = vcgtq_u8(in.val[0], vdupq_n_u8(25));
mask.val[1] = vcgtq_u8(in.val[1], vdupq_n_u8(25));
mask.val[2] = vcgtq_u8(in.val[2], vdupq_n_u8(25));
mask.val[3] = vcgtq_u8(in.val[3], vdupq_n_u8(25));
// Subtract -1, so add 1 to indices for range #[1..4], All indices are
// now correct:
indices.val[0] = vsubq_u8(indices.val[0], mask.val[0]);
indices.val[1] = vsubq_u8(indices.val[1], mask.val[1]);
indices.val[2] = vsubq_u8(indices.val[2], mask.val[2]);
indices.val[3] = vsubq_u8(indices.val[3], mask.val[3]);
// Lookup delta values:
delta.val[0] = vqtbl1q_u8(lut, indices.val[0]);
delta.val[1] = vqtbl1q_u8(lut, indices.val[1]);
delta.val[2] = vqtbl1q_u8(lut, indices.val[2]);
delta.val[3] = vqtbl1q_u8(lut, indices.val[3]);
// Add delta values:
out.val[0] = vaddq_u8(in.val[0], delta.val[0]);
out.val[1] = vaddq_u8(in.val[1], delta.val[1]);
out.val[2] = vaddq_u8(in.val[2], delta.val[2]);
out.val[3] = vaddq_u8(in.val[3], delta.val[3]);
return out;
}

97
3rdparty/base64/lib/arch/neon64/codec.c vendored Normal file
View File

@@ -0,0 +1,97 @@
#include <stdint.h>
#include <stddef.h>
#include <string.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#ifdef __aarch64__
# if (defined(__ARM_NEON__) || defined(__ARM_NEON)) && HAVE_NEON64
# define BASE64_USE_NEON64
# endif
#endif
#ifdef BASE64_USE_NEON64
#include <arm_neon.h>
// Only enable inline assembly on supported compilers.
#if defined(__GNUC__) || defined(__clang__)
#define BASE64_NEON64_USE_ASM
#endif
static inline uint8x16x4_t
load_64byte_table (const uint8_t *p)
{
#ifdef BASE64_NEON64_USE_ASM
// Force the table to be loaded into contiguous registers. GCC will not
// normally allocate contiguous registers for a `uint8x16x4_t'. These
// registers are chosen to not conflict with the ones in the enc loop.
register uint8x16_t t0 __asm__ ("v8");
register uint8x16_t t1 __asm__ ("v9");
register uint8x16_t t2 __asm__ ("v10");
register uint8x16_t t3 __asm__ ("v11");
__asm__ (
"ld1 {%[t0].16b, %[t1].16b, %[t2].16b, %[t3].16b}, [%[src]], #64 \n\t"
: [src] "+r" (p),
[t0] "=w" (t0),
[t1] "=w" (t1),
[t2] "=w" (t2),
[t3] "=w" (t3)
);
return (uint8x16x4_t) {
.val[0] = t0,
.val[1] = t1,
.val[2] = t2,
.val[3] = t3,
};
#else
return vld1q_u8_x4(p);
#endif
}
#include "../generic/32/dec_loop.c"
#include "../generic/64/enc_loop.c"
#include "dec_loop.c"
#ifdef BASE64_NEON64_USE_ASM
# include "enc_loop_asm.c"
#else
# include "enc_reshuffle.c"
# include "enc_loop.c"
#endif
#endif // BASE64_USE_NEON64
// Stride size is so large on these NEON 64-bit functions
// (48 bytes encode, 64 bytes decode) that we inline the
// uint64 codec to stay performant on smaller inputs.
BASE64_ENC_FUNCTION(neon64)
{
#ifdef BASE64_USE_NEON64
#include "../generic/enc_head.c"
enc_loop_neon64(&s, &slen, &o, &olen);
enc_loop_generic_64(&s, &slen, &o, &olen);
#include "../generic/enc_tail.c"
#else
BASE64_ENC_STUB
#endif
}
BASE64_DEC_FUNCTION(neon64)
{
#ifdef BASE64_USE_NEON64
#include "../generic/dec_head.c"
dec_loop_neon64(&s, &slen, &o, &olen);
dec_loop_generic_32(&s, &slen, &o, &olen);
#include "../generic/dec_tail.c"
#else
BASE64_DEC_STUB
#endif
}

View File

@@ -0,0 +1,129 @@
// The input consists of five valid character sets in the Base64 alphabet,
// which we need to map back to the 6-bit values they represent.
// There are three ranges, two singles, and then there's the rest.
//
// # From To LUT Characters
// 1 [0..42] [255] #1 invalid input
// 2 [43] [62] #1 +
// 3 [44..46] [255] #1 invalid input
// 4 [47] [63] #1 /
// 5 [48..57] [52..61] #1 0..9
// 6 [58..63] [255] #1 invalid input
// 7 [64] [255] #2 invalid input
// 8 [65..90] [0..25] #2 A..Z
// 9 [91..96] [255] #2 invalid input
// 10 [97..122] [26..51] #2 a..z
// 11 [123..126] [255] #2 invalid input
// (12) Everything else => invalid input
// The first LUT will use the VTBL instruction (out of range indices are set to
// 0 in destination).
static const uint8_t dec_lut1[] = {
255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U,
255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U,
255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 255U, 62U, 255U, 255U, 255U, 63U,
52U, 53U, 54U, 55U, 56U, 57U, 58U, 59U, 60U, 61U, 255U, 255U, 255U, 255U, 255U, 255U,
};
// The second LUT will use the VTBX instruction (out of range indices will be
// unchanged in destination). Input [64..126] will be mapped to index [1..63]
// in this LUT. Index 0 means that value comes from LUT #1.
static const uint8_t dec_lut2[] = {
0U, 255U, 0U, 1U, 2U, 3U, 4U, 5U, 6U, 7U, 8U, 9U, 10U, 11U, 12U, 13U,
14U, 15U, 16U, 17U, 18U, 19U, 20U, 21U, 22U, 23U, 24U, 25U, 255U, 255U, 255U, 255U,
255U, 255U, 26U, 27U, 28U, 29U, 30U, 31U, 32U, 33U, 34U, 35U, 36U, 37U, 38U, 39U,
40U, 41U, 42U, 43U, 44U, 45U, 46U, 47U, 48U, 49U, 50U, 51U, 255U, 255U, 255U, 255U,
};
// All input values in range for the first look-up will be 0U in the second
// look-up result. All input values out of range for the first look-up will be
// 0U in the first look-up result. Thus, the two results can be ORed without
// conflicts.
//
// Invalid characters that are in the valid range for either look-up will be
// set to 255U in the combined result. Other invalid characters will just be
// passed through with the second look-up result (using the VTBX instruction).
// Since the second LUT is 64 bytes, those passed-through values are guaranteed
// to have a value greater than 63U. Therefore, valid characters will be mapped
// to the valid [0..63] range and all invalid characters will be mapped to
// values greater than 63.
static inline void
dec_loop_neon64 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 64) {
return;
}
// Process blocks of 64 bytes per round. Unlike the SSE codecs, no
// extra trailing zero bytes are written, so it is not necessary to
// reserve extra input bytes:
size_t rounds = *slen / 64;
*slen -= rounds * 64; // 64 bytes consumed per round
*olen += rounds * 48; // 48 bytes produced per round
const uint8x16x4_t tbl_dec1 = load_64byte_table(dec_lut1);
const uint8x16x4_t tbl_dec2 = load_64byte_table(dec_lut2);
do {
const uint8x16_t offset = vdupq_n_u8(63U);
uint8x16x4_t dec1, dec2;
uint8x16x3_t dec;
// Load 64 bytes and deinterleave:
uint8x16x4_t str = vld4q_u8((uint8_t *) *s);
// Get indices for second LUT:
dec2.val[0] = vqsubq_u8(str.val[0], offset);
dec2.val[1] = vqsubq_u8(str.val[1], offset);
dec2.val[2] = vqsubq_u8(str.val[2], offset);
dec2.val[3] = vqsubq_u8(str.val[3], offset);
// Get values from first LUT:
dec1.val[0] = vqtbl4q_u8(tbl_dec1, str.val[0]);
dec1.val[1] = vqtbl4q_u8(tbl_dec1, str.val[1]);
dec1.val[2] = vqtbl4q_u8(tbl_dec1, str.val[2]);
dec1.val[3] = vqtbl4q_u8(tbl_dec1, str.val[3]);
// Get values from second LUT:
dec2.val[0] = vqtbx4q_u8(dec2.val[0], tbl_dec2, dec2.val[0]);
dec2.val[1] = vqtbx4q_u8(dec2.val[1], tbl_dec2, dec2.val[1]);
dec2.val[2] = vqtbx4q_u8(dec2.val[2], tbl_dec2, dec2.val[2]);
dec2.val[3] = vqtbx4q_u8(dec2.val[3], tbl_dec2, dec2.val[3]);
// Get final values:
str.val[0] = vorrq_u8(dec1.val[0], dec2.val[0]);
str.val[1] = vorrq_u8(dec1.val[1], dec2.val[1]);
str.val[2] = vorrq_u8(dec1.val[2], dec2.val[2]);
str.val[3] = vorrq_u8(dec1.val[3], dec2.val[3]);
// Check for invalid input, any value larger than 63:
const uint8x16_t classified
= vcgtq_u8(str.val[0], vdupq_n_u8(63))
| vcgtq_u8(str.val[1], vdupq_n_u8(63))
| vcgtq_u8(str.val[2], vdupq_n_u8(63))
| vcgtq_u8(str.val[3], vdupq_n_u8(63));
// Check that all bits are zero:
if (vmaxvq_u8(classified) != 0U) {
break;
}
// Compress four bytes into three:
dec.val[0] = vshlq_n_u8(str.val[0], 2) | vshrq_n_u8(str.val[1], 4);
dec.val[1] = vshlq_n_u8(str.val[1], 4) | vshrq_n_u8(str.val[2], 2);
dec.val[2] = vshlq_n_u8(str.val[2], 6) | str.val[3];
// Interleave and store decoded result:
vst3q_u8((uint8_t *) *o, dec);
*s += 64;
*o += 48;
} while (--rounds > 0);
// Adjust for any rounds that were skipped:
*slen += rounds * 64;
*olen -= rounds * 48;
}

View File

@@ -0,0 +1,66 @@
static inline void
enc_loop_neon64_inner (const uint8_t **s, uint8_t **o, const uint8x16x4_t tbl_enc)
{
// Load 48 bytes and deinterleave:
uint8x16x3_t src = vld3q_u8(*s);
// Divide bits of three input bytes over four output bytes:
uint8x16x4_t out = enc_reshuffle(src);
// The bits have now been shifted to the right locations;
// translate their values 0..63 to the Base64 alphabet.
// Use a 64-byte table lookup:
out.val[0] = vqtbl4q_u8(tbl_enc, out.val[0]);
out.val[1] = vqtbl4q_u8(tbl_enc, out.val[1]);
out.val[2] = vqtbl4q_u8(tbl_enc, out.val[2]);
out.val[3] = vqtbl4q_u8(tbl_enc, out.val[3]);
// Interleave and store output:
vst4q_u8(*o, out);
*s += 48;
*o += 64;
}
static inline void
enc_loop_neon64 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
size_t rounds = *slen / 48;
*slen -= rounds * 48; // 48 bytes consumed per round
*olen += rounds * 64; // 64 bytes produced per round
// Load the encoding table:
const uint8x16x4_t tbl_enc = load_64byte_table(base64_table_enc_6bit);
while (rounds > 0) {
if (rounds >= 8) {
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
rounds -= 8;
continue;
}
if (rounds >= 4) {
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
rounds -= 4;
continue;
}
if (rounds >= 2) {
enc_loop_neon64_inner(s, o, tbl_enc);
enc_loop_neon64_inner(s, o, tbl_enc);
rounds -= 2;
continue;
}
enc_loop_neon64_inner(s, o, tbl_enc);
break;
}
}

View File

@@ -0,0 +1,168 @@
// Apologies in advance for combining the preprocessor with inline assembly,
// two notoriously gnarly parts of C, but it was necessary to avoid a lot of
// code repetition. The preprocessor is used to template large sections of
// inline assembly that differ only in the registers used. If the code was
// written out by hand, it would become very large and hard to audit.
// Generate a block of inline assembly that loads three user-defined registers
// A, B, C from memory and deinterleaves them, post-incrementing the src
// pointer. The register set should be sequential.
#define LOAD(A, B, C) \
"ld3 {"A".16b, "B".16b, "C".16b}, [%[src]], #48 \n\t"
// Generate a block of inline assembly that takes three deinterleaved registers
// and shuffles the bytes. The output is in temporary registers t0..t3.
#define SHUF(A, B, C) \
"ushr %[t0].16b, "A".16b, #2 \n\t" \
"ushr %[t1].16b, "B".16b, #4 \n\t" \
"ushr %[t2].16b, "C".16b, #6 \n\t" \
"sli %[t1].16b, "A".16b, #4 \n\t" \
"sli %[t2].16b, "B".16b, #2 \n\t" \
"and %[t1].16b, %[t1].16b, %[n63].16b \n\t" \
"and %[t2].16b, %[t2].16b, %[n63].16b \n\t" \
"and %[t3].16b, "C".16b, %[n63].16b \n\t"
// Generate a block of inline assembly that takes temporary registers t0..t3
// and translates them to the base64 alphabet, using a table loaded into
// v8..v11. The output is in user-defined registers A..D.
#define TRAN(A, B, C, D) \
"tbl "A".16b, {v8.16b-v11.16b}, %[t0].16b \n\t" \
"tbl "B".16b, {v8.16b-v11.16b}, %[t1].16b \n\t" \
"tbl "C".16b, {v8.16b-v11.16b}, %[t2].16b \n\t" \
"tbl "D".16b, {v8.16b-v11.16b}, %[t3].16b \n\t"
// Generate a block of inline assembly that interleaves four registers and
// stores them, post-incrementing the destination pointer.
#define STOR(A, B, C, D) \
"st4 {"A".16b, "B".16b, "C".16b, "D".16b}, [%[dst]], #64 \n\t"
// Generate a block of inline assembly that generates a single self-contained
// encoder round: fetch the data, process it, and store the result.
#define ROUND() \
LOAD("v12", "v13", "v14") \
SHUF("v12", "v13", "v14") \
TRAN("v12", "v13", "v14", "v15") \
STOR("v12", "v13", "v14", "v15")
// Generate a block of assembly that generates a type A interleaved encoder
// round. It uses registers that were loaded by the previous type B round, and
// in turn loads registers for the next type B round.
#define ROUND_A() \
SHUF("v2", "v3", "v4") \
LOAD("v12", "v13", "v14") \
TRAN("v2", "v3", "v4", "v5") \
STOR("v2", "v3", "v4", "v5")
// Type B interleaved encoder round. Same as type A, but register sets swapped.
#define ROUND_B() \
SHUF("v12", "v13", "v14") \
LOAD("v2", "v3", "v4") \
TRAN("v12", "v13", "v14", "v15") \
STOR("v12", "v13", "v14", "v15")
// The first type A round needs to load its own registers.
#define ROUND_A_FIRST() \
LOAD("v2", "v3", "v4") \
ROUND_A()
// The last type B round omits the load for the next step.
#define ROUND_B_LAST() \
SHUF("v12", "v13", "v14") \
TRAN("v12", "v13", "v14", "v15") \
STOR("v12", "v13", "v14", "v15")
// Suppress clang's warning that the literal string in the asm statement is
// overlong (longer than the ISO-mandated minimum size of 4095 bytes for C99
// compilers). It may be true, but the goal here is not C99 portability.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Woverlength-strings"
static inline void
enc_loop_neon64 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
size_t rounds = *slen / 48;
if (rounds == 0) {
return;
}
*slen -= rounds * 48; // 48 bytes consumed per round.
*olen += rounds * 64; // 64 bytes produced per round.
// Number of times to go through the 8x loop.
size_t loops = rounds / 8;
// Number of rounds remaining after the 8x loop.
rounds %= 8;
// Temporary registers, used as scratch space.
uint8x16_t tmp0, tmp1, tmp2, tmp3;
__asm__ volatile (
// Load the encoding table into v8..v11.
" ld1 {v8.16b-v11.16b}, [%[tbl]] \n\t"
// If there are eight rounds or more, enter an 8x unrolled loop
// of interleaved encoding rounds. The rounds interleave memory
// operations (load/store) with data operations to maximize
// pipeline throughput.
" cbz %[loops], 4f \n\t"
// The SIMD instructions do not touch the flags.
"88: subs %[loops], %[loops], #1 \n\t"
" " ROUND_A_FIRST()
" " ROUND_B()
" " ROUND_A()
" " ROUND_B()
" " ROUND_A()
" " ROUND_B()
" " ROUND_A()
" " ROUND_B_LAST()
" b.ne 88b \n\t"
// Enter a 4x unrolled loop for rounds of 4 or more.
"4: cmp %[rounds], #4 \n\t"
" b.lt 30f \n\t"
" " ROUND_A_FIRST()
" " ROUND_B()
" " ROUND_A()
" " ROUND_B_LAST()
" sub %[rounds], %[rounds], #4 \n\t"
// Dispatch the remaining rounds 0..3.
"30: cbz %[rounds], 0f \n\t"
" cmp %[rounds], #2 \n\t"
" b.eq 2f \n\t"
" b.lt 1f \n\t"
// Block of non-interlaced encoding rounds, which can each
// individually be jumped to. Rounds fall through to the next.
"3: " ROUND()
"2: " ROUND()
"1: " ROUND()
"0: \n\t"
// Outputs (modified).
: [loops] "+r" (loops),
[src] "+r" (*s),
[dst] "+r" (*o),
[t0] "=&w" (tmp0),
[t1] "=&w" (tmp1),
[t2] "=&w" (tmp2),
[t3] "=&w" (tmp3)
// Inputs (not modified).
: [rounds] "r" (rounds),
[tbl] "r" (base64_table_enc_6bit),
[n63] "w" (vdupq_n_u8(63))
// Clobbers.
: "v2", "v3", "v4", "v5",
"v8", "v9", "v10", "v11",
"v12", "v13", "v14", "v15",
"cc", "memory"
);
}
#pragma GCC diagnostic pop

View File

@@ -0,0 +1,31 @@
static inline uint8x16x4_t
enc_reshuffle (const uint8x16x3_t in)
{
uint8x16x4_t out;
// Input:
// in[0] = a7 a6 a5 a4 a3 a2 a1 a0
// in[1] = b7 b6 b5 b4 b3 b2 b1 b0
// in[2] = c7 c6 c5 c4 c3 c2 c1 c0
// Output:
// out[0] = 00 00 a7 a6 a5 a4 a3 a2
// out[1] = 00 00 a1 a0 b7 b6 b5 b4
// out[2] = 00 00 b3 b2 b1 b0 c7 c6
// out[3] = 00 00 c5 c4 c3 c2 c1 c0
// Move the input bits to where they need to be in the outputs. Except
// for the first output, the high two bits are not cleared.
out.val[0] = vshrq_n_u8(in.val[0], 2);
out.val[1] = vshrq_n_u8(in.val[1], 4);
out.val[2] = vshrq_n_u8(in.val[2], 6);
out.val[1] = vsliq_n_u8(out.val[1], in.val[0], 4);
out.val[2] = vsliq_n_u8(out.val[2], in.val[1], 2);
// Clear the high two bits in the second, third and fourth output.
out.val[1] = vandq_u8(out.val[1], vdupq_n_u8(0x3F));
out.val[2] = vandq_u8(out.val[2], vdupq_n_u8(0x3F));
out.val[3] = vandq_u8(in.val[2], vdupq_n_u8(0x3F));
return out;
}

56
3rdparty/base64/lib/arch/sse41/codec.c vendored Normal file
View File

@@ -0,0 +1,56 @@
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#if HAVE_SSE41
#include <smmintrin.h>
// Only enable inline assembly on supported compilers and on 64-bit CPUs.
#ifndef BASE64_SSE41_USE_ASM
# if (defined(__GNUC__) || defined(__clang__)) && BASE64_WORDSIZE == 64
# define BASE64_SSE41_USE_ASM 1
# else
# define BASE64_SSE41_USE_ASM 0
# endif
#endif
#include "../ssse3/dec_reshuffle.c"
#include "../ssse3/dec_loop.c"
#if BASE64_SSE41_USE_ASM
# include "../ssse3/enc_loop_asm.c"
#else
# include "../ssse3/enc_translate.c"
# include "../ssse3/enc_reshuffle.c"
# include "../ssse3/enc_loop.c"
#endif
#endif // HAVE_SSE41
BASE64_ENC_FUNCTION(sse41)
{
#if HAVE_SSE41
#include "../generic/enc_head.c"
enc_loop_ssse3(&s, &slen, &o, &olen);
#include "../generic/enc_tail.c"
#else
BASE64_ENC_STUB
#endif
}
BASE64_DEC_FUNCTION(sse41)
{
#if HAVE_SSE41
#include "../generic/dec_head.c"
dec_loop_ssse3(&s, &slen, &o, &olen);
#include "../generic/dec_tail.c"
#else
BASE64_DEC_STUB
#endif
}

56
3rdparty/base64/lib/arch/sse42/codec.c vendored Normal file
View File

@@ -0,0 +1,56 @@
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#if HAVE_SSE42
#include <nmmintrin.h>
// Only enable inline assembly on supported compilers and on 64-bit CPUs.
#ifndef BASE64_SSE42_USE_ASM
# if (defined(__GNUC__) || defined(__clang__)) && BASE64_WORDSIZE == 64
# define BASE64_SSE42_USE_ASM 1
# else
# define BASE64_SSE42_USE_ASM 0
# endif
#endif
#include "../ssse3/dec_reshuffle.c"
#include "../ssse3/dec_loop.c"
#if BASE64_SSE42_USE_ASM
# include "../ssse3/enc_loop_asm.c"
#else
# include "../ssse3/enc_translate.c"
# include "../ssse3/enc_reshuffle.c"
# include "../ssse3/enc_loop.c"
#endif
#endif // HAVE_SSE42
BASE64_ENC_FUNCTION(sse42)
{
#if HAVE_SSE42
#include "../generic/enc_head.c"
enc_loop_ssse3(&s, &slen, &o, &olen);
#include "../generic/enc_tail.c"
#else
BASE64_ENC_STUB
#endif
}
BASE64_DEC_FUNCTION(sse42)
{
#if HAVE_SSE42
#include "../generic/dec_head.c"
dec_loop_ssse3(&s, &slen, &o, &olen);
#include "../generic/dec_tail.c"
#else
BASE64_DEC_STUB
#endif
}

58
3rdparty/base64/lib/arch/ssse3/codec.c vendored Normal file
View File

@@ -0,0 +1,58 @@
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
#include "../../../include/libbase64.h"
#include "../../tables/tables.h"
#include "../../codecs.h"
#include "config.h"
#include "../../env.h"
#if HAVE_SSSE3
#include <tmmintrin.h>
// Only enable inline assembly on supported compilers and on 64-bit CPUs.
// 32-bit CPUs with SSSE3 support, such as low-end Atoms, only have eight XMM
// registers, which is not enough to run the inline assembly.
#ifndef BASE64_SSSE3_USE_ASM
# if (defined(__GNUC__) || defined(__clang__)) && BASE64_WORDSIZE == 64
# define BASE64_SSSE3_USE_ASM 1
# else
# define BASE64_SSSE3_USE_ASM 0
# endif
#endif
#include "dec_reshuffle.c"
#include "dec_loop.c"
#if BASE64_SSSE3_USE_ASM
# include "enc_loop_asm.c"
#else
# include "enc_reshuffle.c"
# include "enc_translate.c"
# include "enc_loop.c"
#endif
#endif // HAVE_SSSE3
BASE64_ENC_FUNCTION(ssse3)
{
#if HAVE_SSSE3
#include "../generic/enc_head.c"
enc_loop_ssse3(&s, &slen, &o, &olen);
#include "../generic/enc_tail.c"
#else
BASE64_ENC_STUB
#endif
}
BASE64_DEC_FUNCTION(ssse3)
{
#if HAVE_SSSE3
#include "../generic/dec_head.c"
dec_loop_ssse3(&s, &slen, &o, &olen);
#include "../generic/dec_tail.c"
#else
BASE64_DEC_STUB
#endif
}

View File

@@ -0,0 +1,173 @@
// The input consists of six character sets in the Base64 alphabet, which we
// need to map back to the 6-bit values they represent. There are three ranges,
// two singles, and then there's the rest.
//
// # From To Add Characters
// 1 [43] [62] +19 +
// 2 [47] [63] +16 /
// 3 [48..57] [52..61] +4 0..9
// 4 [65..90] [0..25] -65 A..Z
// 5 [97..122] [26..51] -71 a..z
// (6) Everything else => invalid input
//
// We will use lookup tables for character validation and offset computation.
// Remember that 0x2X and 0x0X are the same index for _mm_shuffle_epi8, this
// allows to mask with 0x2F instead of 0x0F and thus save one constant
// declaration (register and/or memory access).
//
// For offsets:
// Perfect hash for lut = ((src >> 4) & 0x2F) + ((src == 0x2F) ? 0xFF : 0x00)
// 0000 = garbage
// 0001 = /
// 0010 = +
// 0011 = 0-9
// 0100 = A-Z
// 0101 = A-Z
// 0110 = a-z
// 0111 = a-z
// 1000 >= garbage
//
// For validation, here's the table.
// A character is valid if and only if the AND of the 2 lookups equals 0:
//
// hi \ lo 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
// LUT 0x15 0x11 0x11 0x11 0x11 0x11 0x11 0x11 0x11 0x11 0x13 0x1A 0x1B 0x1B 0x1B 0x1A
//
// 0000 0x10 char NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
// andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
//
// 0001 0x10 char DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
// andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
//
// 0010 0x01 char ! " # $ % & ' ( ) * + , - . /
// andlut 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x00 0x01 0x01 0x01 0x00
//
// 0011 0x02 char 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
// andlut 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x02 0x02 0x02 0x02 0x02 0x02
//
// 0100 0x04 char @ A B C D E F G H I J K L M N O
// andlut 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
//
// 0101 0x08 char P Q R S T U V W X Y Z [ \ ] ^ _
// andlut 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x08 0x08 0x08 0x08
//
// 0110 0x04 char ` a b c d e f g h i j k l m n o
// andlut 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
// 0111 0x08 char p q r s t u v w x y z { | } ~
// andlut 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x08 0x08 0x08 0x08
//
// 1000 0x10 andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1001 0x10 andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1010 0x10 andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1011 0x10 andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1100 0x10 andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1101 0x10 andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1110 0x10 andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1111 0x10 andlut 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
static inline int
dec_loop_ssse3_inner (const uint8_t **s, uint8_t **o, size_t *rounds)
{
const __m128i lut_lo = _mm_setr_epi8(
0x15, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
0x11, 0x11, 0x13, 0x1A, 0x1B, 0x1B, 0x1B, 0x1A);
const __m128i lut_hi = _mm_setr_epi8(
0x10, 0x10, 0x01, 0x02, 0x04, 0x08, 0x04, 0x08,
0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10);
const __m128i lut_roll = _mm_setr_epi8(
0, 16, 19, 4, -65, -65, -71, -71,
0, 0, 0, 0, 0, 0, 0, 0);
const __m128i mask_2F = _mm_set1_epi8(0x2F);
// Load input:
__m128i str = _mm_loadu_si128((__m128i *) *s);
// Table lookups:
const __m128i hi_nibbles = _mm_and_si128(_mm_srli_epi32(str, 4), mask_2F);
const __m128i lo_nibbles = _mm_and_si128(str, mask_2F);
const __m128i hi = _mm_shuffle_epi8(lut_hi, hi_nibbles);
const __m128i lo = _mm_shuffle_epi8(lut_lo, lo_nibbles);
// Check for invalid input: if any "and" values from lo and hi are not
// zero, fall back on bytewise code to do error checking and reporting:
if (_mm_movemask_epi8(_mm_cmpgt_epi8(_mm_and_si128(lo, hi), _mm_setzero_si128())) != 0) {
return 0;
}
const __m128i eq_2F = _mm_cmpeq_epi8(str, mask_2F);
const __m128i roll = _mm_shuffle_epi8(lut_roll, _mm_add_epi8(eq_2F, hi_nibbles));
// Now simply add the delta values to the input:
str = _mm_add_epi8(str, roll);
// Reshuffle the input to packed 12-byte output format:
str = dec_reshuffle(str);
// Store the output:
_mm_storeu_si128((__m128i *) *o, str);
*s += 16;
*o += 12;
*rounds -= 1;
return 1;
}
static inline void
dec_loop_ssse3 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 24) {
return;
}
// Process blocks of 16 bytes per round. Because 4 extra zero bytes are
// written after the output, ensure that there will be at least 8 bytes
// of input data left to cover the gap. (6 data bytes and up to two
// end-of-string markers.)
size_t rounds = (*slen - 8) / 16;
*slen -= rounds * 16; // 16 bytes consumed per round
*olen += rounds * 12; // 12 bytes produced per round
do {
if (rounds >= 8) {
if (dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds)) {
continue;
}
break;
}
if (rounds >= 4) {
if (dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds)) {
continue;
}
break;
}
if (rounds >= 2) {
if (dec_loop_ssse3_inner(s, o, &rounds) &&
dec_loop_ssse3_inner(s, o, &rounds)) {
continue;
}
break;
}
dec_loop_ssse3_inner(s, o, &rounds);
break;
} while (rounds > 0);
// Adjust for any rounds that were skipped:
*slen += rounds * 16;
*olen -= rounds * 12;
}

View File

@@ -0,0 +1,33 @@
static inline __m128i
dec_reshuffle (const __m128i in)
{
// in, bits, upper case are most significant bits, lower case are least significant bits
// 00llllll 00kkkkLL 00jjKKKK 00JJJJJJ
// 00iiiiii 00hhhhII 00ggHHHH 00GGGGGG
// 00ffffff 00eeeeFF 00ddEEEE 00DDDDDD
// 00cccccc 00bbbbCC 00aaBBBB 00AAAAAA
const __m128i merge_ab_and_bc = _mm_maddubs_epi16(in, _mm_set1_epi32(0x01400140));
// 0000kkkk LLllllll 0000JJJJ JJjjKKKK
// 0000hhhh IIiiiiii 0000GGGG GGggHHHH
// 0000eeee FFffffff 0000DDDD DDddEEEE
// 0000bbbb CCcccccc 0000AAAA AAaaBBBB
const __m128i out = _mm_madd_epi16(merge_ab_and_bc, _mm_set1_epi32(0x00011000));
// 00000000 JJJJJJjj KKKKkkkk LLllllll
// 00000000 GGGGGGgg HHHHhhhh IIiiiiii
// 00000000 DDDDDDdd EEEEeeee FFffffff
// 00000000 AAAAAAaa BBBBbbbb CCcccccc
// Pack bytes together:
return _mm_shuffle_epi8(out, _mm_setr_epi8(
2, 1, 0,
6, 5, 4,
10, 9, 8,
14, 13, 12,
-1, -1, -1, -1));
// 00000000 00000000 00000000 00000000
// LLllllll KKKKkkkk JJJJJJjj IIiiiiii
// HHHHhhhh GGGGGGgg FFffffff EEEEeeee
// DDDDDDdd CCcccccc BBBBbbbb AAAAAAaa
}

View File

@@ -0,0 +1,67 @@
static inline void
enc_loop_ssse3_inner (const uint8_t **s, uint8_t **o)
{
// Load input:
__m128i str = _mm_loadu_si128((__m128i *) *s);
// Reshuffle:
str = enc_reshuffle(str);
// Translate reshuffled bytes to the Base64 alphabet:
str = enc_translate(str);
// Store:
_mm_storeu_si128((__m128i *) *o, str);
*s += 12;
*o += 16;
}
static inline void
enc_loop_ssse3 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
if (*slen < 16) {
return;
}
// Process blocks of 12 bytes at a time. Because blocks are loaded 16
// bytes at a time, ensure that there will be at least 4 remaining
// bytes after the last round, so that the final read will not pass
// beyond the bounds of the input buffer:
size_t rounds = (*slen - 4) / 12;
*slen -= rounds * 12; // 12 bytes consumed per round
*olen += rounds * 16; // 16 bytes produced per round
do {
if (rounds >= 8) {
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
rounds -= 8;
continue;
}
if (rounds >= 4) {
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
rounds -= 4;
continue;
}
if (rounds >= 2) {
enc_loop_ssse3_inner(s, o);
enc_loop_ssse3_inner(s, o);
rounds -= 2;
continue;
}
enc_loop_ssse3_inner(s, o);
break;
} while (rounds > 0);
}

View File

@@ -0,0 +1,268 @@
// Apologies in advance for combining the preprocessor with inline assembly,
// two notoriously gnarly parts of C, but it was necessary to avoid a lot of
// code repetition. The preprocessor is used to template large sections of
// inline assembly that differ only in the registers used. If the code was
// written out by hand, it would become very large and hard to audit.
// Generate a block of inline assembly that loads register R0 from memory. The
// offset at which the register is loaded is set by the given round.
#define LOAD(R0, ROUND) \
"lddqu ("#ROUND" * 12)(%[src]), %["R0"] \n\t"
// Generate a block of inline assembly that deinterleaves and shuffles register
// R0 using preloaded constants. Outputs in R0 and R1.
#define SHUF(R0, R1) \
"pshufb %[lut0], %["R0"] \n\t" \
"movdqa %["R0"], %["R1"] \n\t" \
"pand %[msk0], %["R0"] \n\t" \
"pand %[msk2], %["R1"] \n\t" \
"pmulhuw %[msk1], %["R0"] \n\t" \
"pmullw %[msk3], %["R1"] \n\t" \
"por %["R1"], %["R0"] \n\t"
// Generate a block of inline assembly that takes R0 and R1 and translates
// their contents to the base64 alphabet, using preloaded constants.
#define TRAN(R0, R1, R2) \
"movdqa %["R0"], %["R1"] \n\t" \
"movdqa %["R0"], %["R2"] \n\t" \
"psubusb %[n51], %["R1"] \n\t" \
"pcmpgtb %[n25], %["R2"] \n\t" \
"psubb %["R2"], %["R1"] \n\t" \
"movdqa %[lut1], %["R2"] \n\t" \
"pshufb %["R1"], %["R2"] \n\t" \
"paddb %["R2"], %["R0"] \n\t"
// Generate a block of inline assembly that stores the given register R0 at an
// offset set by the given round.
#define STOR(R0, ROUND) \
"movdqu %["R0"], ("#ROUND" * 16)(%[dst]) \n\t"
// Generate a block of inline assembly that generates a single self-contained
// encoder round: fetch the data, process it, and store the result. Then update
// the source and destination pointers.
#define ROUND() \
LOAD("a", 0) \
SHUF("a", "b") \
TRAN("a", "b", "c") \
STOR("a", 0) \
"add $12, %[src] \n\t" \
"add $16, %[dst] \n\t"
// Define a macro that initiates a three-way interleaved encoding round by
// preloading registers a, b and c from memory.
// The register graph shows which registers are in use during each step, and
// is a visual aid for choosing registers for that step. Symbol index:
//
// + indicates that a register is loaded by that step.
// | indicates that a register is in use and must not be touched.
// - indicates that a register is decommissioned by that step.
// x indicates that a register is used as a temporary by that step.
// V indicates that a register is an input or output to the macro.
//
#define ROUND_3_INIT() /* a b c d e f */ \
LOAD("a", 0) /* + */ \
SHUF("a", "d") /* | + */ \
LOAD("b", 1) /* | + | */ \
TRAN("a", "d", "e") /* | | - x */ \
LOAD("c", 2) /* V V V */
// Define a macro that translates, shuffles and stores the input registers A, B
// and C, and preloads registers D, E and F for the next round.
// This macro can be arbitrarily daisy-chained by feeding output registers D, E
// and F back into the next round as input registers A, B and C. The macro
// carefully interleaves memory operations with data operations for optimal
// pipelined performance.
#define ROUND_3(ROUND, A,B,C,D,E,F) /* A B C D E F */ \
LOAD(D, (ROUND + 3)) /* V V V + */ \
SHUF(B, E) /* | | | | + */ \
STOR(A, (ROUND + 0)) /* - | | | | */ \
TRAN(B, E, F) /* | | | - x */ \
LOAD(E, (ROUND + 4)) /* | | | + */ \
SHUF(C, A) /* + | | | | */ \
STOR(B, (ROUND + 1)) /* | - | | | */ \
TRAN(C, A, F) /* - | | | x */ \
LOAD(F, (ROUND + 5)) /* | | | + */ \
SHUF(D, A) /* + | | | | */ \
STOR(C, (ROUND + 2)) /* | - | | | */ \
TRAN(D, A, B) /* - x V V V */
// Define a macro that terminates a ROUND_3 macro by taking pre-loaded
// registers D, E and F, and translating, shuffling and storing them.
#define ROUND_3_END(ROUND, A,B,C,D,E,F) /* A B C D E F */ \
SHUF(E, A) /* + V V V */ \
STOR(D, (ROUND + 3)) /* | - | | */ \
TRAN(E, A, B) /* - x | | */ \
SHUF(F, C) /* + | | */ \
STOR(E, (ROUND + 4)) /* | - | */ \
TRAN(F, C, D) /* - x | */ \
STOR(F, (ROUND + 5)) /* - */
// Define a type A round. Inputs are a, b, and c, outputs are d, e, and f.
#define ROUND_3_A(ROUND) \
ROUND_3(ROUND, "a", "b", "c", "d", "e", "f")
// Define a type B round. Inputs and outputs are swapped with regard to type A.
#define ROUND_3_B(ROUND) \
ROUND_3(ROUND, "d", "e", "f", "a", "b", "c")
// Terminating macro for a type A round.
#define ROUND_3_A_LAST(ROUND) \
ROUND_3_A(ROUND) \
ROUND_3_END(ROUND, "a", "b", "c", "d", "e", "f")
// Terminating macro for a type B round.
#define ROUND_3_B_LAST(ROUND) \
ROUND_3_B(ROUND) \
ROUND_3_END(ROUND, "d", "e", "f", "a", "b", "c")
// Suppress clang's warning that the literal string in the asm statement is
// overlong (longer than the ISO-mandated minimum size of 4095 bytes for C99
// compilers). It may be true, but the goal here is not C99 portability.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Woverlength-strings"
static inline void
enc_loop_ssse3 (const uint8_t **s, size_t *slen, uint8_t **o, size_t *olen)
{
// For a clearer explanation of the algorithm used by this function,
// please refer to the plain (not inline assembly) implementation. This
// function follows the same basic logic.
if (*slen < 16) {
return;
}
// Process blocks of 12 bytes at a time. Input is read in blocks of 16
// bytes, so "reserve" four bytes from the input buffer to ensure that
// we never read beyond the end of the input buffer.
size_t rounds = (*slen - 4) / 12;
*slen -= rounds * 12; // 12 bytes consumed per round
*olen += rounds * 16; // 16 bytes produced per round
// Number of times to go through the 36x loop.
size_t loops = rounds / 36;
// Number of rounds remaining after the 36x loop.
rounds %= 36;
// Lookup tables.
const __m128i lut0 = _mm_set_epi8(
10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1);
const __m128i lut1 = _mm_setr_epi8(
65, 71, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -19, -16, 0, 0);
// Temporary registers.
__m128i a, b, c, d, e, f;
__asm__ volatile (
// If there are 36 rounds or more, enter a 36x unrolled loop of
// interleaved encoding rounds. The rounds interleave memory
// operations (load/store) with data operations (table lookups,
// etc) to maximize pipeline throughput.
" test %[loops], %[loops] \n\t"
" jz 18f \n\t"
" jmp 36f \n\t"
" \n\t"
".balign 64 \n\t"
"36: " ROUND_3_INIT()
" " ROUND_3_A( 0)
" " ROUND_3_B( 3)
" " ROUND_3_A( 6)
" " ROUND_3_B( 9)
" " ROUND_3_A(12)
" " ROUND_3_B(15)
" " ROUND_3_A(18)
" " ROUND_3_B(21)
" " ROUND_3_A(24)
" " ROUND_3_B(27)
" " ROUND_3_A_LAST(30)
" add $(12 * 36), %[src] \n\t"
" add $(16 * 36), %[dst] \n\t"
" dec %[loops] \n\t"
" jnz 36b \n\t"
// Enter an 18x unrolled loop for rounds of 18 or more.
"18: cmp $18, %[rounds] \n\t"
" jl 9f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A(0)
" " ROUND_3_B(3)
" " ROUND_3_A(6)
" " ROUND_3_B(9)
" " ROUND_3_A_LAST(12)
" sub $18, %[rounds] \n\t"
" add $(12 * 18), %[src] \n\t"
" add $(16 * 18), %[dst] \n\t"
// Enter a 9x unrolled loop for rounds of 9 or more.
"9: cmp $9, %[rounds] \n\t"
" jl 6f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A(0)
" " ROUND_3_B_LAST(3)
" sub $9, %[rounds] \n\t"
" add $(12 * 9), %[src] \n\t"
" add $(16 * 9), %[dst] \n\t"
// Enter a 6x unrolled loop for rounds of 6 or more.
"6: cmp $6, %[rounds] \n\t"
" jl 55f \n\t"
" " ROUND_3_INIT()
" " ROUND_3_A_LAST(0)
" sub $6, %[rounds] \n\t"
" add $(12 * 6), %[src] \n\t"
" add $(16 * 6), %[dst] \n\t"
// Dispatch the remaining rounds 0..5.
"55: cmp $3, %[rounds] \n\t"
" jg 45f \n\t"
" je 3f \n\t"
" cmp $1, %[rounds] \n\t"
" jg 2f \n\t"
" je 1f \n\t"
" jmp 0f \n\t"
"45: cmp $4, %[rounds] \n\t"
" je 4f \n\t"
// Block of non-interlaced encoding rounds, which can each
// individually be jumped to. Rounds fall through to the next.
"5: " ROUND()
"4: " ROUND()
"3: " ROUND()
"2: " ROUND()
"1: " ROUND()
"0: \n\t"
// Outputs (modified).
: [rounds] "+r" (rounds),
[loops] "+r" (loops),
[src] "+r" (*s),
[dst] "+r" (*o),
[a] "=&x" (a),
[b] "=&x" (b),
[c] "=&x" (c),
[d] "=&x" (d),
[e] "=&x" (e),
[f] "=&x" (f)
// Inputs (not modified).
: [lut0] "x" (lut0),
[lut1] "x" (lut1),
[msk0] "x" (_mm_set1_epi32(0x0FC0FC00)),
[msk1] "x" (_mm_set1_epi32(0x04000040)),
[msk2] "x" (_mm_set1_epi32(0x003F03F0)),
[msk3] "x" (_mm_set1_epi32(0x01000010)),
[n51] "x" (_mm_set1_epi8(51)),
[n25] "x" (_mm_set1_epi8(25))
// Clobbers.
: "cc", "memory"
);
}
#pragma GCC diagnostic pop

View File

@@ -0,0 +1,48 @@
static inline __m128i
enc_reshuffle (__m128i in)
{
// Input, bytes MSB to LSB:
// 0 0 0 0 l k j i h g f e d c b a
in = _mm_shuffle_epi8(in, _mm_set_epi8(
10, 11, 9, 10,
7, 8, 6, 7,
4, 5, 3, 4,
1, 2, 0, 1));
// in, bytes MSB to LSB:
// k l j k
// h i g h
// e f d e
// b c a b
const __m128i t0 = _mm_and_si128(in, _mm_set1_epi32(0x0FC0FC00));
// bits, upper case are most significant bits, lower case are least significant bits
// 0000kkkk LL000000 JJJJJJ00 00000000
// 0000hhhh II000000 GGGGGG00 00000000
// 0000eeee FF000000 DDDDDD00 00000000
// 0000bbbb CC000000 AAAAAA00 00000000
const __m128i t1 = _mm_mulhi_epu16(t0, _mm_set1_epi32(0x04000040));
// 00000000 00kkkkLL 00000000 00JJJJJJ
// 00000000 00hhhhII 00000000 00GGGGGG
// 00000000 00eeeeFF 00000000 00DDDDDD
// 00000000 00bbbbCC 00000000 00AAAAAA
const __m128i t2 = _mm_and_si128(in, _mm_set1_epi32(0x003F03F0));
// 00000000 00llllll 000000jj KKKK0000
// 00000000 00iiiiii 000000gg HHHH0000
// 00000000 00ffffff 000000dd EEEE0000
// 00000000 00cccccc 000000aa BBBB0000
const __m128i t3 = _mm_mullo_epi16(t2, _mm_set1_epi32(0x01000010));
// 00llllll 00000000 00jjKKKK 00000000
// 00iiiiii 00000000 00ggHHHH 00000000
// 00ffffff 00000000 00ddEEEE 00000000
// 00cccccc 00000000 00aaBBBB 00000000
return _mm_or_si128(t1, t3);
// 00llllll 00kkkkLL 00jjKKKK 00JJJJJJ
// 00iiiiii 00hhhhII 00ggHHHH 00GGGGGG
// 00ffffff 00eeeeFF 00ddEEEE 00DDDDDD
// 00cccccc 00bbbbCC 00aaBBBB 00AAAAAA
}

View File

@@ -0,0 +1,33 @@
static inline __m128i
enc_translate (const __m128i in)
{
// A lookup table containing the absolute offsets for all ranges:
const __m128i lut = _mm_setr_epi8(
65, 71, -4, -4,
-4, -4, -4, -4,
-4, -4, -4, -4,
-19, -16, 0, 0
);
// Translate values 0..63 to the Base64 alphabet. There are five sets:
// # From To Abs Index Characters
// 0 [0..25] [65..90] +65 0 ABCDEFGHIJKLMNOPQRSTUVWXYZ
// 1 [26..51] [97..122] +71 1 abcdefghijklmnopqrstuvwxyz
// 2 [52..61] [48..57] -4 [2..11] 0123456789
// 3 [62] [43] -19 12 +
// 4 [63] [47] -16 13 /
// Create LUT indices from the input. The index for range #0 is right,
// others are 1 less than expected:
__m128i indices = _mm_subs_epu8(in, _mm_set1_epi8(51));
// mask is 0xFF (-1) for range #[1..4] and 0x00 for range #0:
__m128i mask = _mm_cmpgt_epi8(in, _mm_set1_epi8(25));
// Subtract -1, so add 1 to indices for range #[1..4]. All indices are
// now correct:
indices = _mm_sub_epi8(indices, mask);
// Add offsets to input values:
return _mm_add_epi8(in, _mm_shuffle_epi8(lut, indices));
}

305
3rdparty/base64/lib/codec_choose.c vendored Normal file
View File

@@ -0,0 +1,305 @@
#include <stdbool.h>
#include <stdint.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include "../include/libbase64.h"
#include "codecs.h"
#include "config.h"
#include "env.h"
#if (__x86_64__ || __i386__ || _M_X86 || _M_X64)
#define BASE64_X86
#if (HAVE_SSSE3 || HAVE_SSE41 || HAVE_SSE42 || HAVE_AVX || HAVE_AVX2 || HAVE_AVX512)
#define BASE64_X86_SIMD
#endif
#endif
#ifdef BASE64_X86
#ifdef _MSC_VER
#include <intrin.h>
#define __cpuid_count(__level, __count, __eax, __ebx, __ecx, __edx) \
{ \
int info[4]; \
__cpuidex(info, __level, __count); \
__eax = info[0]; \
__ebx = info[1]; \
__ecx = info[2]; \
__edx = info[3]; \
}
#define __cpuid(__level, __eax, __ebx, __ecx, __edx) \
__cpuid_count(__level, 0, __eax, __ebx, __ecx, __edx)
#else
#include <cpuid.h>
#if HAVE_AVX512 || HAVE_AVX2 || HAVE_AVX
#if ((__GNUC__ > 4 || __GNUC__ == 4 && __GNUC_MINOR__ >= 2) || (__clang_major__ >= 3))
static inline uint64_t _xgetbv (uint32_t index)
{
uint32_t eax, edx;
__asm__ __volatile__("xgetbv" : "=a"(eax), "=d"(edx) : "c"(index));
return ((uint64_t)edx << 32) | eax;
}
#else
#error "Platform not supported"
#endif
#endif
#endif
#ifndef bit_AVX512vl
#define bit_AVX512vl (1 << 31)
#endif
#ifndef bit_AVX512vbmi
#define bit_AVX512vbmi (1 << 1)
#endif
#ifndef bit_AVX2
#define bit_AVX2 (1 << 5)
#endif
#ifndef bit_SSSE3
#define bit_SSSE3 (1 << 9)
#endif
#ifndef bit_SSE41
#define bit_SSE41 (1 << 19)
#endif
#ifndef bit_SSE42
#define bit_SSE42 (1 << 20)
#endif
#ifndef bit_AVX
#define bit_AVX (1 << 28)
#endif
#define bit_XSAVE_XRSTORE (1 << 27)
#ifndef _XCR_XFEATURE_ENABLED_MASK
#define _XCR_XFEATURE_ENABLED_MASK 0
#endif
#define _XCR_XMM_AND_YMM_STATE_ENABLED_BY_OS 0x6
#endif
// Function declarations:
#define BASE64_CODEC_FUNCS(arch) \
BASE64_ENC_FUNCTION(arch); \
BASE64_DEC_FUNCTION(arch); \
BASE64_CODEC_FUNCS(avx512)
BASE64_CODEC_FUNCS(avx2)
BASE64_CODEC_FUNCS(neon32)
BASE64_CODEC_FUNCS(neon64)
BASE64_CODEC_FUNCS(plain)
BASE64_CODEC_FUNCS(ssse3)
BASE64_CODEC_FUNCS(sse41)
BASE64_CODEC_FUNCS(sse42)
BASE64_CODEC_FUNCS(avx)
static bool
codec_choose_forced (struct codec *codec, int flags)
{
// If the user wants to use a certain codec,
// always allow it, even if the codec is a no-op.
// For testing purposes.
if (!(flags & 0xFFFF)) {
return false;
}
if (flags & BASE64_FORCE_AVX2) {
codec->enc = base64_stream_encode_avx2;
codec->dec = base64_stream_decode_avx2;
return true;
}
if (flags & BASE64_FORCE_NEON32) {
codec->enc = base64_stream_encode_neon32;
codec->dec = base64_stream_decode_neon32;
return true;
}
if (flags & BASE64_FORCE_NEON64) {
codec->enc = base64_stream_encode_neon64;
codec->dec = base64_stream_decode_neon64;
return true;
}
if (flags & BASE64_FORCE_PLAIN) {
codec->enc = base64_stream_encode_plain;
codec->dec = base64_stream_decode_plain;
return true;
}
if (flags & BASE64_FORCE_SSSE3) {
codec->enc = base64_stream_encode_ssse3;
codec->dec = base64_stream_decode_ssse3;
return true;
}
if (flags & BASE64_FORCE_SSE41) {
codec->enc = base64_stream_encode_sse41;
codec->dec = base64_stream_decode_sse41;
return true;
}
if (flags & BASE64_FORCE_SSE42) {
codec->enc = base64_stream_encode_sse42;
codec->dec = base64_stream_decode_sse42;
return true;
}
if (flags & BASE64_FORCE_AVX) {
codec->enc = base64_stream_encode_avx;
codec->dec = base64_stream_decode_avx;
return true;
}
if (flags & BASE64_FORCE_AVX512) {
codec->enc = base64_stream_encode_avx512;
codec->dec = base64_stream_decode_avx512;
return true;
}
return false;
}
static bool
codec_choose_arm (struct codec *codec)
{
#if (defined(__ARM_NEON__) || defined(__ARM_NEON)) && ((defined(__aarch64__) && HAVE_NEON64) || HAVE_NEON32)
// Unfortunately there is no portable way to check for NEON
// support at runtime from userland in the same way that x86
// has cpuid, so just stick to the compile-time configuration:
#if defined(__aarch64__) && HAVE_NEON64
codec->enc = base64_stream_encode_neon64;
codec->dec = base64_stream_decode_neon64;
#else
codec->enc = base64_stream_encode_neon32;
codec->dec = base64_stream_decode_neon32;
#endif
return true;
#else
(void)codec;
return false;
#endif
}
static bool
codec_choose_x86 (struct codec *codec)
{
#ifdef BASE64_X86_SIMD
unsigned int eax, ebx = 0, ecx = 0, edx;
unsigned int max_level;
#ifdef _MSC_VER
int info[4];
__cpuidex(info, 0, 0);
max_level = info[0];
#else
max_level = __get_cpuid_max(0, NULL);
#endif
#if HAVE_AVX512 || HAVE_AVX2 || HAVE_AVX
// Check for AVX/AVX2/AVX512 support:
// Checking for AVX requires 3 things:
// 1) CPUID indicates that the OS uses XSAVE and XRSTORE instructions
// (allowing saving YMM registers on context switch)
// 2) CPUID indicates support for AVX
// 3) XGETBV indicates the AVX registers will be saved and restored on
// context switch
//
// Note that XGETBV is only available on 686 or later CPUs, so the
// instruction needs to be conditionally run.
if (max_level >= 1) {
__cpuid_count(1, 0, eax, ebx, ecx, edx);
if (ecx & bit_XSAVE_XRSTORE) {
uint64_t xcr_mask;
xcr_mask = _xgetbv(_XCR_XFEATURE_ENABLED_MASK);
if ((xcr_mask & _XCR_XMM_AND_YMM_STATE_ENABLED_BY_OS) == _XCR_XMM_AND_YMM_STATE_ENABLED_BY_OS) { // check multiple bits at once
#if HAVE_AVX512
if (max_level >= 7) {
__cpuid_count(7, 0, eax, ebx, ecx, edx);
if ((ebx & bit_AVX512vl) && (ecx & bit_AVX512vbmi)) {
codec->enc = base64_stream_encode_avx512;
codec->dec = base64_stream_decode_avx512;
return true;
}
}
#endif
#if HAVE_AVX2
if (max_level >= 7) {
__cpuid_count(7, 0, eax, ebx, ecx, edx);
if (ebx & bit_AVX2) {
codec->enc = base64_stream_encode_avx2;
codec->dec = base64_stream_decode_avx2;
return true;
}
}
#endif
#if HAVE_AVX
__cpuid_count(1, 0, eax, ebx, ecx, edx);
if (ecx & bit_AVX) {
codec->enc = base64_stream_encode_avx;
codec->dec = base64_stream_decode_avx;
return true;
}
#endif
}
}
}
#endif
#if HAVE_SSE42
// Check for SSE42 support:
if (max_level >= 1) {
__cpuid(1, eax, ebx, ecx, edx);
if (ecx & bit_SSE42) {
codec->enc = base64_stream_encode_sse42;
codec->dec = base64_stream_decode_sse42;
return true;
}
}
#endif
#if HAVE_SSE41
// Check for SSE41 support:
if (max_level >= 1) {
__cpuid(1, eax, ebx, ecx, edx);
if (ecx & bit_SSE41) {
codec->enc = base64_stream_encode_sse41;
codec->dec = base64_stream_decode_sse41;
return true;
}
}
#endif
#if HAVE_SSSE3
// Check for SSSE3 support:
if (max_level >= 1) {
__cpuid(1, eax, ebx, ecx, edx);
if (ecx & bit_SSSE3) {
codec->enc = base64_stream_encode_ssse3;
codec->dec = base64_stream_decode_ssse3;
return true;
}
}
#endif
#else
(void)codec;
#endif
return false;
}
void
codec_choose (struct codec *codec, int flags)
{
// User forced a codec:
if (codec_choose_forced(codec, flags)) {
return;
}
// Runtime feature detection:
if (codec_choose_arm(codec)) {
return;
}
if (codec_choose_x86(codec)) {
return;
}
codec->enc = base64_stream_encode_plain;
codec->dec = base64_stream_decode_plain;
}

65
3rdparty/base64/lib/codecs.h vendored Normal file
View File

@@ -0,0 +1,65 @@
#include <stdint.h>
#include <stddef.h>
#include "../include/libbase64.h"
#include "config.h"
// Function parameters for encoding functions:
#define BASE64_ENC_PARAMS \
( struct base64_state *state \
, const char *src \
, size_t srclen \
, char *out \
, size_t *outlen \
)
// Function parameters for decoding functions:
#define BASE64_DEC_PARAMS \
( struct base64_state *state \
, const char *src \
, size_t srclen \
, char *out \
, size_t *outlen \
)
// Function signature for encoding functions:
#define BASE64_ENC_FUNCTION(arch) \
void \
base64_stream_encode_ ## arch \
BASE64_ENC_PARAMS
// Function signature for decoding functions:
#define BASE64_DEC_FUNCTION(arch) \
int \
base64_stream_decode_ ## arch \
BASE64_DEC_PARAMS
// Cast away unused variable, silence compiler:
#define UNUSED(x) ((void)(x))
// Stub function when encoder arch unsupported:
#define BASE64_ENC_STUB \
UNUSED(state); \
UNUSED(src); \
UNUSED(srclen); \
UNUSED(out); \
\
*outlen = 0;
// Stub function when decoder arch unsupported:
#define BASE64_DEC_STUB \
UNUSED(state); \
UNUSED(src); \
UNUSED(srclen); \
UNUSED(out); \
UNUSED(outlen); \
\
return -1;
struct codec
{
void (* enc) BASE64_ENC_PARAMS;
int (* dec) BASE64_DEC_PARAMS;
};
extern void codec_choose (struct codec *, int flags);

74
3rdparty/base64/lib/env.h vendored Normal file
View File

@@ -0,0 +1,74 @@
#ifndef BASE64_ENV_H
#define BASE64_ENV_H
// This header file contains macro definitions that describe certain aspects of
// the compile-time environment. Compatibility and portability macros go here.
// Define machine endianness. This is for GCC:
#if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
# define BASE64_LITTLE_ENDIAN 1
#else
# define BASE64_LITTLE_ENDIAN 0
#endif
// This is for Clang:
#ifdef __LITTLE_ENDIAN__
# define BASE64_LITTLE_ENDIAN 1
#endif
#ifdef __BIG_ENDIAN__
# define BASE64_LITTLE_ENDIAN 0
#endif
// MSVC++ needs intrin.h for _byteswap_uint64 (issue #68):
#if BASE64_LITTLE_ENDIAN && defined(_MSC_VER)
# include <intrin.h>
#endif
// Endian conversion functions:
#if BASE64_LITTLE_ENDIAN
# ifdef _MSC_VER
// Microsoft Visual C++:
# define BASE64_HTOBE32(x) _byteswap_ulong(x)
# define BASE64_HTOBE64(x) _byteswap_uint64(x)
# else
// GCC and Clang:
# define BASE64_HTOBE32(x) __builtin_bswap32(x)
# define BASE64_HTOBE64(x) __builtin_bswap64(x)
# endif
#else
// No conversion needed:
# define BASE64_HTOBE32(x) (x)
# define BASE64_HTOBE64(x) (x)
#endif
// Detect word size:
#if defined (__x86_64__)
// This also works for the x32 ABI, which has a 64-bit word size.
# define BASE64_WORDSIZE 64
#elif defined (_INTEGRAL_MAX_BITS)
# define BASE64_WORDSIZE _INTEGRAL_MAX_BITS
#elif defined (__WORDSIZE)
# define BASE64_WORDSIZE __WORDSIZE
#elif defined (__SIZE_WIDTH__)
# define BASE64_WORDSIZE __SIZE_WIDTH__
#else
# error BASE64_WORDSIZE_NOT_DEFINED
#endif
// End-of-file definitions.
// Almost end-of-file when waiting for the last '=' character:
#define BASE64_AEOF 1
// End-of-file when stream end has been reached or invalid input provided:
#define BASE64_EOF 2
// GCC 7 defaults to issuing a warning for fallthrough in switch statements,
// unless the fallthrough cases are marked with an attribute. As we use
// fallthrough deliberately, define an alias for the attribute:
#if __GNUC__ >= 7
# define BASE64_FALLTHROUGH __attribute__((fallthrough));
#else
# define BASE64_FALLTHROUGH
#endif
#endif // BASE64_ENV_H

7
3rdparty/base64/lib/exports.txt vendored Normal file
View File

@@ -0,0 +1,7 @@
base64_encode
base64_stream_encode
base64_stream_encode_init
base64_stream_encode_final
base64_decode
base64_stream_decode
base64_stream_decode_init

164
3rdparty/base64/lib/lib.c vendored Normal file
View File

@@ -0,0 +1,164 @@
#include <stdint.h>
#include <stddef.h>
#ifdef _OPENMP
#include <omp.h>
#endif
#include "../include/libbase64.h"
#include "tables/tables.h"
#include "codecs.h"
#include "env.h"
// These static function pointers are initialized once when the library is
// first used, and remain in use for the remaining lifetime of the program.
// The idea being that CPU features don't change at runtime.
static struct codec codec = { NULL, NULL };
void
base64_stream_encode_init (struct base64_state *state, int flags)
{
// If any of the codec flags are set, redo choice:
if (codec.enc == NULL || flags & 0xFF) {
codec_choose(&codec, flags);
}
state->eof = 0;
state->bytes = 0;
state->carry = 0;
state->flags = flags;
}
void
base64_stream_encode
( struct base64_state *state
, const char *src
, size_t srclen
, char *out
, size_t *outlen
)
{
codec.enc(state, src, srclen, out, outlen);
}
void
base64_stream_encode_final
( struct base64_state *state
, char *out
, size_t *outlen
)
{
uint8_t *o = (uint8_t *)out;
if (state->bytes == 1) {
*o++ = base64_table_enc_6bit[state->carry];
*o++ = '=';
*o++ = '=';
*outlen = 3;
return;
}
if (state->bytes == 2) {
*o++ = base64_table_enc_6bit[state->carry];
*o++ = '=';
*outlen = 2;
return;
}
*outlen = 0;
}
void
base64_stream_decode_init (struct base64_state *state, int flags)
{
// If any of the codec flags are set, redo choice:
if (codec.dec == NULL || flags & 0xFFFF) {
codec_choose(&codec, flags);
}
state->eof = 0;
state->bytes = 0;
state->carry = 0;
state->flags = flags;
}
int
base64_stream_decode
( struct base64_state *state
, const char *src
, size_t srclen
, char *out
, size_t *outlen
)
{
return codec.dec(state, src, srclen, out, outlen);
}
#ifdef _OPENMP
// Due to the overhead of initializing OpenMP and creating a team of
// threads, we require the data length to be larger than a threshold:
#define OMP_THRESHOLD 20000
// Conditionally include OpenMP-accelerated codec implementations:
#include "lib_openmp.c"
#endif
void
base64_encode
( const char *src
, size_t srclen
, char *out
, size_t *outlen
, int flags
)
{
size_t s;
size_t t;
struct base64_state state;
#ifdef _OPENMP
if (srclen >= OMP_THRESHOLD) {
base64_encode_openmp(src, srclen, out, outlen, flags);
return;
}
#endif
// Init the stream reader:
base64_stream_encode_init(&state, flags);
// Feed the whole string to the stream reader:
base64_stream_encode(&state, src, srclen, out, &s);
// Finalize the stream by writing trailer if any:
base64_stream_encode_final(&state, out + s, &t);
// Final output length is stream length plus tail:
*outlen = s + t;
}
int
base64_decode
( const char *src
, size_t srclen
, char *out
, size_t *outlen
, int flags
)
{
int ret;
struct base64_state state;
#ifdef _OPENMP
if (srclen >= OMP_THRESHOLD) {
return base64_decode_openmp(src, srclen, out, outlen, flags);
}
#endif
// Init the stream reader:
base64_stream_decode_init(&state, flags);
// Feed the whole string to the stream reader:
ret = base64_stream_decode(&state, src, srclen, out, outlen);
// If when decoding a whole block, we're still waiting for input then fail:
if (ret && (state.bytes == 0)) {
return ret;
}
return 0;
}

149
3rdparty/base64/lib/lib_openmp.c vendored Normal file
View File

@@ -0,0 +1,149 @@
// This code makes some assumptions on the implementation of
// base64_stream_encode_init(), base64_stream_encode() and base64_stream_decode().
// Basically these assumptions boil down to that when breaking the src into
// parts, out parts can be written without side effects.
// This is met when:
// 1) base64_stream_encode() and base64_stream_decode() don't use globals;
// 2) the shared variables src and out are not read or written outside of the
// bounds of their parts, i.e. when base64_stream_encode() reads a multiple
// of 3 bytes, it must write no more then a multiple of 4 bytes, not even
// temporarily;
// 3) the state flag can be discarded after base64_stream_encode() and
// base64_stream_decode() on the parts.
static inline void
base64_encode_openmp
( const char *src
, size_t srclen
, char *out
, size_t *outlen
, int flags
)
{
size_t s;
size_t t;
size_t sum = 0, len, last_len;
struct base64_state state, initial_state;
int num_threads, i;
// Request a number of threads but not necessarily get them:
#pragma omp parallel
{
// Get the number of threads used from one thread only,
// as num_threads is a shared var:
#pragma omp single
{
num_threads = omp_get_num_threads();
// Split the input string into num_threads parts, each
// part a multiple of 3 bytes. The remaining bytes will
// be done later:
len = srclen / (num_threads * 3);
len *= 3;
last_len = srclen - num_threads * len;
// Init the stream reader:
base64_stream_encode_init(&state, flags);
initial_state = state;
}
// Single has an implicit barrier for all threads to wait here
// for the above to complete:
#pragma omp for firstprivate(state) private(s) reduction(+:sum) schedule(static,1)
for (i = 0; i < num_threads; i++)
{
// Feed each part of the string to the stream reader:
base64_stream_encode(&state, src + i * len, len, out + i * len * 4 / 3, &s);
sum += s;
}
}
// As encoding should never fail and we encode an exact multiple
// of 3 bytes, we can discard state:
state = initial_state;
// Encode the remaining bytes:
base64_stream_encode(&state, src + num_threads * len, last_len, out + num_threads * len * 4 / 3, &s);
// Finalize the stream by writing trailer if any:
base64_stream_encode_final(&state, out + num_threads * len * 4 / 3 + s, &t);
// Final output length is stream length plus tail:
sum += s + t;
*outlen = sum;
}
static inline int
base64_decode_openmp
( const char *src
, size_t srclen
, char *out
, size_t *outlen
, int flags
)
{
int num_threads, result = 0, i;
size_t sum = 0, len, last_len, s;
struct base64_state state, initial_state;
// Request a number of threads but not necessarily get them:
#pragma omp parallel
{
// Get the number of threads used from one thread only,
// as num_threads is a shared var:
#pragma omp single
{
num_threads = omp_get_num_threads();
// Split the input string into num_threads parts, each
// part a multiple of 4 bytes. The remaining bytes will
// be done later:
len = srclen / (num_threads * 4);
len *= 4;
last_len = srclen - num_threads * len;
// Init the stream reader:
base64_stream_decode_init(&state, flags);
initial_state = state;
}
// Single has an implicit barrier to wait here for the above to
// complete:
#pragma omp for firstprivate(state) private(s) reduction(+:sum, result) schedule(static,1)
for (i = 0; i < num_threads; i++)
{
int this_result;
// Feed each part of the string to the stream reader:
this_result = base64_stream_decode(&state, src + i * len, len, out + i * len * 3 / 4, &s);
sum += s;
result += this_result;
}
}
// If `result' equals `-num_threads', then all threads returned -1,
// indicating that the requested codec is not available:
if (result == -num_threads) {
return -1;
}
// If `result' does not equal `num_threads', then at least one of the
// threads hit a decode error:
if (result != num_threads) {
return 0;
}
// So far so good, now decode whatever remains in the buffer. Reuse the
// initial state, since we are at a 4-byte boundary:
state = initial_state;
result = base64_stream_decode(&state, src + num_threads * len, last_len, out + num_threads * len * 3 / 4, &s);
sum += s;
*outlen = sum;
// If when decoding a whole block, we're still waiting for input then fail:
if (result && (state.bytes == 0)) {
return result;
}
return 0;
}

1
3rdparty/base64/lib/tables/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
table_generator

17
3rdparty/base64/lib/tables/Makefile vendored Normal file
View File

@@ -0,0 +1,17 @@
.PHONY: all clean
TARGETS := table_dec_32bit.h table_enc_12bit.h table_generator
all: $(TARGETS)
clean:
$(RM) $(TARGETS)
table_dec_32bit.h: table_generator
./$^ > $@
table_enc_12bit.h: table_enc_12bit.py
./$^ > $@
table_generator: table_generator.c
$(CC) $(CFLAGS) -o $@ $^

View File

@@ -0,0 +1,393 @@
#include <stdint.h>
#define CHAR62 '+'
#define CHAR63 '/'
#define CHARPAD '='
#if BASE64_LITTLE_ENDIAN
/* SPECIAL DECODE TABLES FOR LITTLE ENDIAN (INTEL) CPUS */
const uint32_t base64_table_dec_32bit_d0[256] = {
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x000000f8, 0xffffffff, 0xffffffff, 0xffffffff, 0x000000fc,
0x000000d0, 0x000000d4, 0x000000d8, 0x000000dc, 0x000000e0, 0x000000e4,
0x000000e8, 0x000000ec, 0x000000f0, 0x000000f4, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000,
0x00000004, 0x00000008, 0x0000000c, 0x00000010, 0x00000014, 0x00000018,
0x0000001c, 0x00000020, 0x00000024, 0x00000028, 0x0000002c, 0x00000030,
0x00000034, 0x00000038, 0x0000003c, 0x00000040, 0x00000044, 0x00000048,
0x0000004c, 0x00000050, 0x00000054, 0x00000058, 0x0000005c, 0x00000060,
0x00000064, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x00000068, 0x0000006c, 0x00000070, 0x00000074, 0x00000078,
0x0000007c, 0x00000080, 0x00000084, 0x00000088, 0x0000008c, 0x00000090,
0x00000094, 0x00000098, 0x0000009c, 0x000000a0, 0x000000a4, 0x000000a8,
0x000000ac, 0x000000b0, 0x000000b4, 0x000000b8, 0x000000bc, 0x000000c0,
0x000000c4, 0x000000c8, 0x000000cc, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
};
const uint32_t base64_table_dec_32bit_d1[256] = {
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x0000e003, 0xffffffff, 0xffffffff, 0xffffffff, 0x0000f003,
0x00004003, 0x00005003, 0x00006003, 0x00007003, 0x00008003, 0x00009003,
0x0000a003, 0x0000b003, 0x0000c003, 0x0000d003, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000,
0x00001000, 0x00002000, 0x00003000, 0x00004000, 0x00005000, 0x00006000,
0x00007000, 0x00008000, 0x00009000, 0x0000a000, 0x0000b000, 0x0000c000,
0x0000d000, 0x0000e000, 0x0000f000, 0x00000001, 0x00001001, 0x00002001,
0x00003001, 0x00004001, 0x00005001, 0x00006001, 0x00007001, 0x00008001,
0x00009001, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x0000a001, 0x0000b001, 0x0000c001, 0x0000d001, 0x0000e001,
0x0000f001, 0x00000002, 0x00001002, 0x00002002, 0x00003002, 0x00004002,
0x00005002, 0x00006002, 0x00007002, 0x00008002, 0x00009002, 0x0000a002,
0x0000b002, 0x0000c002, 0x0000d002, 0x0000e002, 0x0000f002, 0x00000003,
0x00001003, 0x00002003, 0x00003003, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
};
const uint32_t base64_table_dec_32bit_d2[256] = {
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x00800f00, 0xffffffff, 0xffffffff, 0xffffffff, 0x00c00f00,
0x00000d00, 0x00400d00, 0x00800d00, 0x00c00d00, 0x00000e00, 0x00400e00,
0x00800e00, 0x00c00e00, 0x00000f00, 0x00400f00, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000,
0x00400000, 0x00800000, 0x00c00000, 0x00000100, 0x00400100, 0x00800100,
0x00c00100, 0x00000200, 0x00400200, 0x00800200, 0x00c00200, 0x00000300,
0x00400300, 0x00800300, 0x00c00300, 0x00000400, 0x00400400, 0x00800400,
0x00c00400, 0x00000500, 0x00400500, 0x00800500, 0x00c00500, 0x00000600,
0x00400600, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x00800600, 0x00c00600, 0x00000700, 0x00400700, 0x00800700,
0x00c00700, 0x00000800, 0x00400800, 0x00800800, 0x00c00800, 0x00000900,
0x00400900, 0x00800900, 0x00c00900, 0x00000a00, 0x00400a00, 0x00800a00,
0x00c00a00, 0x00000b00, 0x00400b00, 0x00800b00, 0x00c00b00, 0x00000c00,
0x00400c00, 0x00800c00, 0x00c00c00, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
};
const uint32_t base64_table_dec_32bit_d3[256] = {
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x003e0000, 0xffffffff, 0xffffffff, 0xffffffff, 0x003f0000,
0x00340000, 0x00350000, 0x00360000, 0x00370000, 0x00380000, 0x00390000,
0x003a0000, 0x003b0000, 0x003c0000, 0x003d0000, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000,
0x00010000, 0x00020000, 0x00030000, 0x00040000, 0x00050000, 0x00060000,
0x00070000, 0x00080000, 0x00090000, 0x000a0000, 0x000b0000, 0x000c0000,
0x000d0000, 0x000e0000, 0x000f0000, 0x00100000, 0x00110000, 0x00120000,
0x00130000, 0x00140000, 0x00150000, 0x00160000, 0x00170000, 0x00180000,
0x00190000, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x001a0000, 0x001b0000, 0x001c0000, 0x001d0000, 0x001e0000,
0x001f0000, 0x00200000, 0x00210000, 0x00220000, 0x00230000, 0x00240000,
0x00250000, 0x00260000, 0x00270000, 0x00280000, 0x00290000, 0x002a0000,
0x002b0000, 0x002c0000, 0x002d0000, 0x002e0000, 0x002f0000, 0x00300000,
0x00310000, 0x00320000, 0x00330000, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
};
#else
/* SPECIAL DECODE TABLES FOR BIG ENDIAN (IBM/MOTOROLA/SUN) CPUS */
const uint32_t base64_table_dec_32bit_d0[256] = {
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xf8000000, 0xffffffff, 0xffffffff, 0xffffffff, 0xfc000000,
0xd0000000, 0xd4000000, 0xd8000000, 0xdc000000, 0xe0000000, 0xe4000000,
0xe8000000, 0xec000000, 0xf0000000, 0xf4000000, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000,
0x04000000, 0x08000000, 0x0c000000, 0x10000000, 0x14000000, 0x18000000,
0x1c000000, 0x20000000, 0x24000000, 0x28000000, 0x2c000000, 0x30000000,
0x34000000, 0x38000000, 0x3c000000, 0x40000000, 0x44000000, 0x48000000,
0x4c000000, 0x50000000, 0x54000000, 0x58000000, 0x5c000000, 0x60000000,
0x64000000, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x68000000, 0x6c000000, 0x70000000, 0x74000000, 0x78000000,
0x7c000000, 0x80000000, 0x84000000, 0x88000000, 0x8c000000, 0x90000000,
0x94000000, 0x98000000, 0x9c000000, 0xa0000000, 0xa4000000, 0xa8000000,
0xac000000, 0xb0000000, 0xb4000000, 0xb8000000, 0xbc000000, 0xc0000000,
0xc4000000, 0xc8000000, 0xcc000000, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
};
const uint32_t base64_table_dec_32bit_d1[256] = {
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x03e00000, 0xffffffff, 0xffffffff, 0xffffffff, 0x03f00000,
0x03400000, 0x03500000, 0x03600000, 0x03700000, 0x03800000, 0x03900000,
0x03a00000, 0x03b00000, 0x03c00000, 0x03d00000, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000,
0x00100000, 0x00200000, 0x00300000, 0x00400000, 0x00500000, 0x00600000,
0x00700000, 0x00800000, 0x00900000, 0x00a00000, 0x00b00000, 0x00c00000,
0x00d00000, 0x00e00000, 0x00f00000, 0x01000000, 0x01100000, 0x01200000,
0x01300000, 0x01400000, 0x01500000, 0x01600000, 0x01700000, 0x01800000,
0x01900000, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x01a00000, 0x01b00000, 0x01c00000, 0x01d00000, 0x01e00000,
0x01f00000, 0x02000000, 0x02100000, 0x02200000, 0x02300000, 0x02400000,
0x02500000, 0x02600000, 0x02700000, 0x02800000, 0x02900000, 0x02a00000,
0x02b00000, 0x02c00000, 0x02d00000, 0x02e00000, 0x02f00000, 0x03000000,
0x03100000, 0x03200000, 0x03300000, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
};
const uint32_t base64_table_dec_32bit_d2[256] = {
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x000f8000, 0xffffffff, 0xffffffff, 0xffffffff, 0x000fc000,
0x000d0000, 0x000d4000, 0x000d8000, 0x000dc000, 0x000e0000, 0x000e4000,
0x000e8000, 0x000ec000, 0x000f0000, 0x000f4000, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000,
0x00004000, 0x00008000, 0x0000c000, 0x00010000, 0x00014000, 0x00018000,
0x0001c000, 0x00020000, 0x00024000, 0x00028000, 0x0002c000, 0x00030000,
0x00034000, 0x00038000, 0x0003c000, 0x00040000, 0x00044000, 0x00048000,
0x0004c000, 0x00050000, 0x00054000, 0x00058000, 0x0005c000, 0x00060000,
0x00064000, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x00068000, 0x0006c000, 0x00070000, 0x00074000, 0x00078000,
0x0007c000, 0x00080000, 0x00084000, 0x00088000, 0x0008c000, 0x00090000,
0x00094000, 0x00098000, 0x0009c000, 0x000a0000, 0x000a4000, 0x000a8000,
0x000ac000, 0x000b0000, 0x000b4000, 0x000b8000, 0x000bc000, 0x000c0000,
0x000c4000, 0x000c8000, 0x000cc000, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
};
const uint32_t base64_table_dec_32bit_d3[256] = {
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x00003e00, 0xffffffff, 0xffffffff, 0xffffffff, 0x00003f00,
0x00003400, 0x00003500, 0x00003600, 0x00003700, 0x00003800, 0x00003900,
0x00003a00, 0x00003b00, 0x00003c00, 0x00003d00, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000,
0x00000100, 0x00000200, 0x00000300, 0x00000400, 0x00000500, 0x00000600,
0x00000700, 0x00000800, 0x00000900, 0x00000a00, 0x00000b00, 0x00000c00,
0x00000d00, 0x00000e00, 0x00000f00, 0x00001000, 0x00001100, 0x00001200,
0x00001300, 0x00001400, 0x00001500, 0x00001600, 0x00001700, 0x00001800,
0x00001900, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x00001a00, 0x00001b00, 0x00001c00, 0x00001d00, 0x00001e00,
0x00001f00, 0x00002000, 0x00002100, 0x00002200, 0x00002300, 0x00002400,
0x00002500, 0x00002600, 0x00002700, 0x00002800, 0x00002900, 0x00002a00,
0x00002b00, 0x00002c00, 0x00002d00, 0x00002e00, 0x00002f00, 0x00003000,
0x00003100, 0x00003200, 0x00003300, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
};
#endif

File diff suppressed because it is too large Load Diff

45
3rdparty/base64/lib/tables/table_enc_12bit.py vendored Executable file
View File

@@ -0,0 +1,45 @@
#!/usr/bin/python3
def tr(x):
"""Translate a 6-bit value to the Base64 alphabet."""
s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' \
+ 'abcdefghijklmnopqrstuvwxyz' \
+ '0123456789' \
+ '+/'
return ord(s[x])
def table(fn):
"""Generate a 12-bit lookup table."""
ret = []
for n in range(0, 2**12):
pre = "\n\t" if n % 8 == 0 else " "
pre = "\t" if n == 0 else pre
ret.append("{}0x{:04X}U,".format(pre, fn(n)))
return "".join(ret)
def table_be():
"""Generate a 12-bit big-endian lookup table."""
return table(lambda n: (tr(n & 0x3F) << 0) | (tr(n >> 6) << 8))
def table_le():
"""Generate a 12-bit little-endian lookup table."""
return table(lambda n: (tr(n >> 6) << 0) | (tr(n & 0x3F) << 8))
def main():
"""Entry point."""
lines = [
"#include <stdint.h>",
"",
"const uint16_t base64_table_enc_12bit[] = {",
"#if BASE64_LITTLE_ENDIAN",
table_le(),
"#else",
table_be(),
"#endif",
"};"
]
for line in lines:
print(line)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,184 @@
/**
*
* Copyright 2005, 2006 Nick Galbreath -- nickg [at] modp [dot] com
* Copyright 2017 Matthieu Darbois
* All rights reserved.
*
* http://modp.com/release/base64
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* - Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
*
* - Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
* IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
* PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/
/****************************/
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
static uint8_t b64chars[64] = {
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/'
};
static uint8_t padchar = '=';
static void printStart(void)
{
printf("#include <stdint.h>\n");
printf("#define CHAR62 '%c'\n", b64chars[62]);
printf("#define CHAR63 '%c'\n", b64chars[63]);
printf("#define CHARPAD '%c'\n", padchar);
}
static void clearDecodeTable(uint32_t* ary)
{
int i = 0;
for (i = 0; i < 256; ++i) {
ary[i] = 0xFFFFFFFF;
}
}
/* dump uint32_t as hex digits */
void uint32_array_to_c_hex(const uint32_t* ary, size_t sz, const char* name)
{
size_t i = 0;
printf("const uint32_t %s[%d] = {\n", name, (int)sz);
for (;;) {
printf("0x%08" PRIx32, ary[i]);
++i;
if (i == sz)
break;
if (i % 6 == 0) {
printf(",\n");
} else {
printf(", ");
}
}
printf("\n};\n");
}
int main(int argc, char** argv)
{
uint32_t x;
uint32_t i = 0;
uint32_t ary[256];
/* over-ride standard alphabet */
if (argc == 2) {
uint8_t* replacements = (uint8_t*)argv[1];
if (strlen((char*)replacements) != 3) {
fprintf(stderr, "input must be a string of 3 characters '-', '.' or '_'\n");
exit(1);
}
fprintf(stderr, "fusing '%s' as replacements in base64 encoding\n", replacements);
b64chars[62] = replacements[0];
b64chars[63] = replacements[1];
padchar = replacements[2];
}
printStart();
printf("\n\n#if BASE64_LITTLE_ENDIAN\n");
printf("\n\n/* SPECIAL DECODE TABLES FOR LITTLE ENDIAN (INTEL) CPUS */\n\n");
clearDecodeTable(ary);
for (i = 0; i < 64; ++i) {
x = b64chars[i];
ary[x] = i << 2;
}
uint32_array_to_c_hex(ary, sizeof(ary) / sizeof(uint32_t), "base64_table_dec_32bit_d0");
printf("\n\n");
clearDecodeTable(ary);
for (i = 0; i < 64; ++i) {
x = b64chars[i];
ary[x] = ((i & 0x30) >> 4) | ((i & 0x0F) << 12);
}
uint32_array_to_c_hex(ary, sizeof(ary) / sizeof(uint32_t), "base64_table_dec_32bit_d1");
printf("\n\n");
clearDecodeTable(ary);
for (i = 0; i < 64; ++i) {
x = b64chars[i];
ary[x] = ((i & 0x03) << 22) | ((i & 0x3c) << 6);
}
uint32_array_to_c_hex(ary, sizeof(ary) / sizeof(uint32_t), "base64_table_dec_32bit_d2");
printf("\n\n");
clearDecodeTable(ary);
for (i = 0; i < 64; ++i) {
x = b64chars[i];
ary[x] = i << 16;
}
uint32_array_to_c_hex(ary, sizeof(ary) / sizeof(uint32_t), "base64_table_dec_32bit_d3");
printf("\n\n");
printf("#else\n");
printf("\n\n/* SPECIAL DECODE TABLES FOR BIG ENDIAN (IBM/MOTOROLA/SUN) CPUS */\n\n");
clearDecodeTable(ary);
for (i = 0; i < 64; ++i) {
x = b64chars[i];
ary[x] = i << 26;
}
uint32_array_to_c_hex(ary, sizeof(ary) / sizeof(uint32_t), "base64_table_dec_32bit_d0");
printf("\n\n");
clearDecodeTable(ary);
for (i = 0; i < 64; ++i) {
x = b64chars[i];
ary[x] = i << 20;
}
uint32_array_to_c_hex(ary, sizeof(ary) / sizeof(uint32_t), "base64_table_dec_32bit_d1");
printf("\n\n");
clearDecodeTable(ary);
for (i = 0; i < 64; ++i) {
x = b64chars[i];
ary[x] = i << 14;
}
uint32_array_to_c_hex(ary, sizeof(ary) / sizeof(uint32_t), "base64_table_dec_32bit_d2");
printf("\n\n");
clearDecodeTable(ary);
for (i = 0; i < 64; ++i) {
x = b64chars[i];
ary[x] = i << 8;
}
uint32_array_to_c_hex(ary, sizeof(ary) / sizeof(uint32_t), "base64_table_dec_32bit_d3");
printf("\n\n");
printf("#endif\n");
return 0;
}

40
3rdparty/base64/lib/tables/tables.c vendored Normal file
View File

@@ -0,0 +1,40 @@
#include "tables.h"
const uint8_t
base64_table_enc_6bit[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
"+/";
// In the lookup table below, note that the value for '=' (character 61) is
// 254, not 255. This character is used for in-band signaling of the end of
// the datastream, and we will use that later. The characters A-Z, a-z, 0-9
// and + / are mapped to their "decoded" values. The other bytes all map to
// the value 255, which flags them as "invalid input".
const uint8_t
base64_table_dec_8bit[] =
{
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, // 0..15
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, // 16..31
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 62, 255, 255, 255, 63, // 32..47
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 255, 255, 255, 254, 255, 255, // 48..63
255, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, // 64..79
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 255, 255, 255, 255, 255, // 80..95
255, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, // 96..111
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 255, 255, 255, 255, 255, // 112..127
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, // 128..143
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
};
#if BASE64_WORDSIZE >= 32
# include "table_dec_32bit.h"
# include "table_enc_12bit.h"
#endif

23
3rdparty/base64/lib/tables/tables.h vendored Normal file
View File

@@ -0,0 +1,23 @@
#ifndef BASE64_TABLES_H
#define BASE64_TABLES_H
#include <stdint.h>
#include "../env.h"
// These tables are used by all codecs for fallback plain encoding/decoding:
extern const uint8_t base64_table_enc_6bit[];
extern const uint8_t base64_table_dec_8bit[];
// These tables are used for the 32-bit and 64-bit generic decoders:
#if BASE64_WORDSIZE >= 32
extern const uint32_t base64_table_dec_32bit_d0[];
extern const uint32_t base64_table_dec_32bit_d1[];
extern const uint32_t base64_table_dec_32bit_d2[];
extern const uint32_t base64_table_dec_32bit_d3[];
// This table is used by the 32 and 64-bit generic encoders:
extern const uint16_t base64_table_enc_12bit[];
#endif
#endif // BASE64_TABLES_H

View File

@@ -1,5 +1,5 @@
/*
Copyright (c) 2003-2021, Troy D. Hanson http://troydhanson.github.io/uthash/
Copyright (c) 2003-2022, Troy D. Hanson https://troydhanson.github.io/uthash/
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -51,6 +51,8 @@ typedef unsigned char uint8_t;
#else /* VS2008 or older (or VS2010 in C mode) */
#define NO_DECLTYPE
#endif
#elif defined(__MCST__) /* Elbrus C Compiler */
#define DECLTYPE(x) (__typeof(x))
#elif defined(__BORLANDC__) || defined(__ICCARM__) || defined(__LCC__) || defined(__WATCOMC__)
#define NO_DECLTYPE
#else /* GNU, Sun and other compilers */
@@ -450,7 +452,7 @@ do {
#define HASH_DELETE_HH(hh,head,delptrhh) \
do { \
struct UT_hash_handle *_hd_hh_del = (delptrhh); \
const struct UT_hash_handle *_hd_hh_del = (delptrhh); \
if ((_hd_hh_del->prev == NULL) && (_hd_hh_del->next == NULL)) { \
HASH_BLOOM_FREE((head)->hh.tbl); \
uthash_free((head)->hh.tbl->buckets, \
@@ -593,7 +595,9 @@ do {
/* SAX/FNV/OAT/JEN hash functions are macro variants of those listed at
* http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx */
* http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
* (archive link: https://archive.is/Ivcan )
*/
#define HASH_SAX(key,keylen,hashv) \
do { \
unsigned _sx_i; \

View File

@@ -27,6 +27,7 @@ def compile_terminfo(base):
os.makedirs(odir, exist_ok=True)
ofile = os.path.join(odir, xterm_kitty)
shutil.move(tfile, ofile)
return ofile
def generate_terminfo():
@@ -46,7 +47,14 @@ def generate_terminfo():
with open('terminfo/kitty.termcap', 'w') as f:
f.write(tcap)
compile_terminfo(os.path.join(base, 'terminfo'))
dbfile = compile_terminfo(os.path.join(base, 'terminfo'))
with open(dbfile, 'rb') as f:
data = f.read()
with open('kitty/terminfo.h', 'w') as f:
print(f'static const uint8_t terminfo_data[{len(data)}] = ''{', file=f)
for b in data:
print(b, end=', ', file=f)
print('};', file=f)
if __name__ == '__main__':

View File

@@ -242,7 +242,9 @@ func dependencies(args []string) {
chdir_to_base()
nf := flag.NewFlagSet("deps", flag.ExitOnError)
docsptr := nf.Bool("for-docs", false, "download the dependencies needed to build the documentation")
nf.Parse(args)
if err := nf.Parse(args); err != nil {
exit(err)
}
if *docsptr {
dependencies_for_docs()
fmt.Println("Dependencies needed to generate documentation have been installed. Build docs with ./dev.sh docs")
@@ -323,7 +325,7 @@ func dependencies(args []string) {
}); err != nil {
exit(err)
}
fmt.Println(`Dependencies downloaded. Now build kitty with: make develop`)
fmt.Println(`Dependencies downloaded. Now build kitty with: ./dev.sh build`)
}
// }}}
@@ -384,7 +386,9 @@ func docs(args []string) {
nf := flag.NewFlagSet("deps", flag.ExitOnError)
livereload := nf.Bool("live-reload", false, "build the docs and make them available via s local server with live reloading for ease of development")
failwarn := nf.Bool("fail-warn", false, "make warnings fatal when building the docs")
nf.Parse(args)
if err := nf.Parse(args); err != nil {
exit(err)
}
exe := filepath.Join(root_dir(), "bin", "sphinx-build")
aexe := filepath.Join(root_dir(), "bin", "sphinx-autobuild")
target := "docs"

View File

@@ -46,7 +46,7 @@ def run(*args, **extra_env):
return subprocess.call(list(args), env=env, cwd=cwd)
SETUP_CMD = [PYTHON, 'setup.py', '--build-universal-binary']
SETUP_CMD = [PYTHON, 'setup.py']
def build_frozen_launcher(extra_include_dirs):

View File

@@ -250,6 +250,15 @@
}
},
{
"name": "simde",
"unix": {
"filename": "simde-amalgamated-0.7.6.tar.xz",
"hash": "sha256:703eac1f2af7de1f7e4aea2286130b98e1addcc0559426e78304c92e2b4eb5e1",
"urls": ["https://github.com/simd-everywhere/simde/releases/download/v0.7.6/{filename}"]
}
},
{
"name": "wayland",
"os": "linux",

View File

@@ -5,12 +5,13 @@ import subprocess
ls_files = subprocess.check_output([ 'git', 'ls-files']).decode('utf-8')
all_files = set(ls_files.splitlines())
all_files.discard('')
cp = subprocess.run(['git', 'check-attr', 'linguist-generated', '--stdin'],
check=True, stdout=subprocess.PIPE, input='\n'.join(all_files).encode('utf-8'))
for line in cp.stdout.decode().splitlines():
if line.endswith(' true'):
fname = line.split(':', 1)[0]
all_files.discard(fname)
for attr in ('linguist-generated', 'linguist-vendored'):
cp = subprocess.run(['git', 'check-attr', attr, '--stdin'],
check=True, stdout=subprocess.PIPE, input='\n'.join(all_files).encode('utf-8'))
for line in cp.stdout.decode().splitlines():
if line.endswith(' true'):
fname = line.split(':', 1)[0]
all_files.discard(fname)
all_files -= {'gen/nerd-fonts-glyphs.txt', 'gen/rowcolumn-diacritics.txt'}
cp = subprocess.run(['cloc', '--list-file', '-'], input='\n'.join(all_files).encode())

Binary file not shown.

Before

Width:  |  Height:  |  Size: 80 KiB

After

Width:  |  Height:  |  Size: 76 KiB

View File

@@ -104,6 +104,8 @@ or another OS window::
map ctrl+f3 detach_window tab-prev
# moves the window into the tab at the left of the active tab
map ctrl+f3 detach_window tab-left
# moves the window into a new tab created to the left of the active tab
map ctrl+f3 detach_window new-tab-left
# asks which tab to move the window into
map ctrl+f4 detach_window ask

View File

@@ -54,14 +54,14 @@ particular desktop, but it should work for most major desktop environments.
cp ~/.local/kitty.app/share/applications/kitty.desktop ~/.local/share/applications/
# If you want to open text files and images in kitty via your file manager also add the kitty-open.desktop file
cp ~/.local/kitty.app/share/applications/kitty-open.desktop ~/.local/share/applications/
# Update the paths to the kitty and its icon in the kitty.desktop file(s)
# Update the paths to the kitty and its icon in the kitty desktop file(s)
sed -i "s|Icon=kitty|Icon=/home/$USER/.local/kitty.app/share/icons/hicolor/256x256/apps/kitty.png|g" ~/.local/share/applications/kitty*.desktop
sed -i "s|Exec=kitty|Exec=/home/$USER/.local/kitty.app/bin/kitty|g" ~/.local/share/applications/kitty*.desktop
.. note::
In :file:`kitty-open.desktop`, kitty is registered to handle some supported
MIME types. This will cause kitty to take precedence on some systems where
the default apps are not explicitly set. For example, you expect to use
the default apps are not explicitly set. For example, if you expect to use
other GUI file managers to open dir paths when using commands such as
:program:`xdg-open`, you should configure the default opener for the MIME
type ``inode/directory``::

View File

@@ -30,7 +30,7 @@ to build kitty with your changes.
.. note::
If you plan to run kitty from source long-term, there are a couple of
caveats to be aware of. You should occassionally run ``./dev.sh deps``
caveats to be aware of. You should occasionally run ``./dev.sh deps``
to have the dependencies re-downloaded as they are updated periodically.
Also, the built kitty executable assumes it will find source in whatever
directory you first ran :code:`./dev.sh build` in. If you move/rename the
@@ -96,6 +96,7 @@ Run-time dependencies:
Build-time dependencies:
* ``gcc`` or ``clang``
* ``simde``
* ``go`` >= _build_go_version (see :file:`go.mod` for go packages used during building)
* ``pkg-config``
* For building on Linux in addition to the above dependencies you might also
@@ -115,6 +116,7 @@ Build-time dependencies:
- ``libssl-dev``
- ``libpython3-dev``
- ``libxxhash-dev``
- ``libsmide-dev``
Build and run from source with Nix

View File

@@ -9,40 +9,200 @@ To update |kitty|, :doc:`follow the instructions <binary>`.
Recent major new features
---------------------------
File transfer over the tty device
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Wayland goodies [0.34]
~~~~~~~~~~~~~~~~~~~~~~~
Transfer files to and from remote computers over the ``TTY`` device itself.
This means that file transfer works over nested SSH sessions, serial links,
etc. Anywhere you have a terminal device, you can transfer files.
Wayland users should rejoice as kitty now comes with major Wayland
quality-of-life improvements:
Simply ssh into a remote computer using the :doc:`ssh kitten </kittens/ssh>`
and run the :doc:`transfer kitten </kittens/transfer>` (which the ssh kitten
makes available for you on the remote computer automatically). For example, to
copy a file from a remote computer::
* Draw GPU accelerated :doc:`desktop panels and background </kittens/panel>`
running arbitrary terminal programs. For example, run `btop
<https://github.com/aristocratos/btop/>`__ as your desktop background
<local computer> $ kitten ssh my-remote-computer
<remote computer> $ kitten transfer some-file /path/on/local/computer
* Background blur for transparent windows is now supported under KDE
using a custom KDE specific protocol
The kitten can transfer files to and from the remote computer. It supports
recursive transfer of directories, symlinks and hardlinks. It can even use the
rsync algorithm to speed up repeated transfers of large files.
* The kitty window decorations in GNOME are now fully functional with buttons
and they follow system dark/light mode automatically
Truly convenient SSH
~~~~~~~~~~~~~~~~~~~~~~~~
* kitty now supports fractional scaling in Wayland which means pixel perfect
rendering when you use a fractional scale with no wasted performance on
resizing an overdrawn pixmap in the compositor
The :doc:`ssh kitten <kittens/ssh>` is redesigned with powerful new features:
With this release kitty's Wayland support is now on par with X11, provided
you use a decent Wayland compositor.
* Automatic :ref:`shell_integration` on remote machines
* Easily :ref:`clone local shell/editor config <real_world_ssh_kitten_config>` on remote machines
* Easily :ref:`edit files in your local editor <edit_file>` on remote machines
* Automatic :opt:`re-use of existing connections <kitten-ssh.share_connections>` to avoid connection setup latency
Cheetah speed 🐆 [0.33]
~~~~~~~~~~~~~~~~~~~~~~~~~
kitty has grown up and become a cheetah. It now parses data it receives in
parallel :iss:`using SIMD vector CPU instructions <7005>` for a 2x speedup in
benchmarks and a 10%-50% real world speedup depending on workload. There is a
new benchmarking kitten ``kitten __benchmark__`` that can be used to measure
terminal throughput. There is also :ref:`a table <throughput>` showing kitty is
much faster than other terminal emulators based on the benchmark kitten. While
kitty was already so fast that its performance was never a bottleneck, this
improvement makes it even faster and more importantly reduces the energy
consumption to do the same tasks.
.. }}}
Detailed list of changes
-------------------------------------
0.34.0 [2024-04-15]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Wayland: :doc:`panel kitten <kittens/panel>`: Add support for drawing desktop background and bars
using the panel kitten for all compositors that support the `requisite Wayland
protocol <https://wayland.app/protocols/wlr-layer-shell-unstable-v1>`__ which is practically speaking all of them but GNOME (:pull:`2590`)
- Show a small :opt:`scrollback indicator <scrollback_indicator_opacity>` along the right window edge when viewing
the scrollback to keep track of scroll position (:iss:`2502`)
- Wayland: Support fractional scales so that there is no wasted drawing at larger scale followed by resizing in the compositor
- Wayland KDE: Support :opt:`background_blur`
- Wayland GNOME: The window titlebar now has buttons to minimize/maximize/close the window
- Wayland GNOME: The window titlebar color now follows the system light/dark color scheme preference, see :opt:`wayland_titlebar_color`
- Wayland KDE: Fix mouse cursor hiding not working in Plasma 6 (:iss:`7265`)
- Wayland IME: Fix a bug with handling synthetic keypresses generated by ZMK keyboard + fcitx (:pull:`7283`)
- A new option :opt:`terminfo_type` to allow passing the terminfo database embedded into the :envvar:`TERMINFO` env var directly instead of via a file
- Mouse reporting: Fix drag release event outside the window not being reported in legacy mouse reporting modes (:iss:`7244`)
- macOS: Fix a regression in the previous release that broke rendering of some symbols on some systems (:iss:`7249`)
- Fix handling of tab character when cursor is at end of line and wrapping is enabled (:iss:`7250`)
- Splits layout: Fix :ac:`move_window_forward` not working (:iss:`7264`)
- macOS: Fix an abort due to an assertion when a program tries to set an invalid window title (:iss:`7271`)
- fish shell integration: Fix clicking at the prompt causing autosuggestions to be accepted, needs fish >= 3.8.0 (:iss:`7168`)
- Linux: Fix for a regression in 0.32.0 that caused some CJK fonts to not render glyphs (:iss:`7263`)
- Wayland: Support preferred integer scales
- Wayland: A new option :opt:`wayland_enable_ime` to turn off Input Method Extensions which add latency and create bugs
- Wayland: Fix :opt:`hide_window_decorations` not working on non GNOME desktops
- When asking for quit confirmation because of a running program, mention the program name (:iss:`7331`)
- Fix flickering of prompt during window resize (:iss:`7324`)
0.33.1 [2024-03-21]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Fix a regression in the previous release that caused requesting data from the clipboard via OSC 52 to instead return data from the primary selection (:iss:`7213`)
- Splits layout: Allow resizing until one of the halves in a split is minimally sized (:iss:`7220`)
- macOS: Fix text rendered with fallback fonts not respecting bold/italic styling (:disc:`7241`)
- macOS: When CoreText fails to find a fallback font for a character in the first Private Use Unicode Area, preferentially use the NERD font, if available, for it (:iss:`6043`)
0.33.0 [2024-03-12]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- :ref:`Cheetah speed <throughput>` with a redesigned render loop and a 2x faster escape code
parser that uses SIMD CPU vector instruction to parse data in parallel
(:iss:`7005`)
- A new benchmark kitten (``kitten __benchmark__``) to measure terminal
throughput performance
- Graphics protocol: Add a new delete mode for deleting images whose ids fall within a range. Useful for bulk deletion (:iss:`7080`)
- Keyboard protocol: Fix the :kbd:`Enter`, :kbd:`Tab` and :kbd:`Backspace` keys
generating spurious release events even when report all keys as escape codes
is not set (:iss:`7136`)
- macOS: The command line args from :file:`macos-launch-services-cmdline` are now
prefixed to any args from ``open --args`` rather than overwriting them (:iss:`7135`)
- Allow specifying where the new tab is created for :ac:`detach_window` (:pull:`7134`)
- hints kitten: The option to set the text color for hints now allows arbitrary
colors (:pull:`7150`)
- icat kitten: Add a command line argument to override terminal window size detection (:iss:`7165`)
- A new action :ac:`toggle_tab` to easily switch to and back from a tab with a single shortcut (:iss:`7203`)
- When :ac:`clearing terminal <clear_terminal>` add a new type ``to_cursor_scroll`` which can be
used to clear to prompt while moving cleared lines into the scrollback
- Fix a performance bottleneck when dealing with thousands of small images
(:iss:`7080`)
- kitten @ ls: Return the timestamp at which the window was created (:iss:`7178`)
- hints kitten: Use default editor rather than hardcoding vim to open file at specific line (:iss:`7186`)
- Remote control: Fix ``--match`` argument not working for @ls, @send-key,
@set-background-image (:iss:`7192`)
- Keyboard protocol: Do not deliver a fake key release events on OS window focus out for engaged modifiers (:iss:`7196`)
- Ignore :opt:`startup_session` when kitty is invoked with command line options specifying a command to run (:pull:`7198`)
- Box drawing: Specialize rendering for the Fira Code progress bar/spinner glyphs
0.32.2 [2024-02-12]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- kitten @ load-config: Allow (re)loading kitty.conf via remote control
- Remote control: Allow running mappable actions via remote control (`kitten @ action`)
- kitten @ send-text: Add a new option to automatically wrap the sent text in
bracketed paste escape codes if the program in the destination window has
turned on bracketed paste.
- Fix a single key mapping not overriding a previously defined multi-key mapping
- macOS: Fix :code:`kitten @ select-window` leaving the keyboard in a partially functional state (:iss:`7074`)
- Graphics protocol: Improve display of images using Unicode placeholders or
row/column boxes by resizing them using linear instead of nearest neighbor
interpolation on the GPU (:iss:`7070`)
- When matching URLs use the definition of legal characters in URLs from the
`WHATWG spec <https://url.spec.whatwg.org/#url-code-points>`__ rather than older standards (:iss:`7095`)
- hints kitten: Respect the kitty :opt:`url_excluded_characters` option
(:iss:`7075`)
- macOS: Fix an abort when changing OS window chrome for a full screen window via remote control or the themes kitten (:iss:`7106`)
- Special case rendering of some more box drawing characters using shades from the block of symbols for legacy computing (:iss:`7110`)
- A new action :ac:`close_other_os_windows` to close non active OS windows (:disc:`7113`)
0.32.1 [2024-01-26]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- macOS: Fix a regression in the previous release that broke overriding keyboard shortcuts for actions present in the global menu bar (:iss:`7016`)
- Fix a regression in the previous release that caused multi-key sequences to not abort when pressing an unknown key (:iss:`7022`)
- Fix a regression in the previous release that caused `kitten @ launch --cwd=current` to fail over SSH (:iss:`7028`)
- Fix a regression in the previous release that caused `kitten @ send-text` with a match tab parameter to send text twice to the active window (:iss:`7027`)
- Fix a regression in the previous release that caused overriding of existing multi-key mappings to fail (:iss:`7044`, :iss:`7058`)
- Wayland+NVIDIA: Do not request an sRGB output buffer as a bug in Wayland causes kitty to not start (:iss:`7021`)
0.32.0 [2024-01-19]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -211,16 +211,16 @@ def commit_role(
' Link to a github commit '
try:
commit_id = subprocess.check_output(
f'git rev-list --max-count=1 --skip=# {text}'.split()).decode('utf-8').strip()
f'git rev-list --max-count=1 {text}'.split()).decode('utf-8').strip()
except Exception:
msg = inliner.reporter.error(
f'GitHub commit id "{text}" not recognized.', line=lineno)
f'git commit id "{text}" not recognized.', line=lineno)
prb = inliner.problematic(rawtext, rawtext, msg)
return [prb], [msg]
url = f'https://github.com/kovidgoyal/kitty/commit/{commit_id}'
set_classes(options)
short_id = subprocess.check_output(
f'git rev-list --max-count=1 --abbrev-commit --skip=# {commit_id}'.split()).decode('utf-8').strip()
f'git rev-list --max-count=1 --abbrev-commit {commit_id}'.split()).decode('utf-8').strip()
node = nodes.reference(rawtext, f'commit: {short_id}', refuri=url, **options)
return [node], []
# }}}

View File

@@ -16,10 +16,10 @@ frames-per-second. See below for an overview of all customization possibilities.
You can open the config file within kitty by pressing :sc:`edit_config_file`
(:kbd:`⌘+,` on macOS). A :file:`kitty.conf` with commented default
configurations and descriptions will be created if the file does not exist.
You can reload the config file within kitty by pressing :sc:`reload_config_file`
(:kbd:`⌃+⌘+,` on macOS) or sending kitty the ``SIGUSR1`` signal.
You can also display the current configuration by pressing :sc:`debug_config`
(:kbd:`⌥+⌘+,` on macOS).
You can reload the config file within kitty by pressing
:sc:`reload_config_file` (:kbd:`⌃+⌘+,` on macOS) or sending kitty the
``SIGUSR1`` signal with ``kill -SIGUSR1 $KITTY_PID``. You can also display the
current configuration by pressing :sc:`debug_config` (:kbd:`⌥+⌘+,` on macOS).
.. _confloc:

View File

@@ -263,11 +263,12 @@ fonts to be freely resizable, so it does not support bitmapped fonts.
symbols from it automatically, and you can tell it to do so explicitly in
case it doesn't with the :opt:`symbol_map` directive::
# Nerd Fonts v2.3.3
# Nerd Fonts v3.1.0
symbol_map U+23FB-U+23FE,U+2665,U+26A1,U+2B58,U+E000-U+E00A,U+E0A0-U+E0A3,U+E0B0-U+E0D4,U+E200-U+E2A9,U+E300-U+E3E3,U+E5FA-U+E6AA,U+E700-U+E7C5,U+EA60-U+EBEB,U+F000-U+F2E0,U+F300-U+F32F,U+F400-U+F4A9,U+F500-U+F8FF,U+F0001-U+F1AF0 Symbols Nerd Font Mono
symbol_map U+e000-U+e00a,U+ea60-U+ebeb,U+e0a0-U+e0c8,U+e0ca,U+e0cc-U+e0d4,U+e200-U+e2a9,U+e300-U+e3e3,U+e5fa-U+e6b1,U+e700-U+e7c5,U+f000-U+f2e0,U+f300-U+f372,U+f400-U+f532,U+f0001-U+f1af0 Symbols Nerd Font Mono
Those Unicode symbols beyond the ``E000-F8FF`` Unicode private use area are
Those Unicode symbols not in the `Unicode private use areas
<https://en.wikipedia.org/wiki/Private_Use_Areas>`__ are
not included.
If your font is not listed in ``kitty +list-fonts`` it means that it is not
@@ -386,7 +387,7 @@ You can also change the icon manually by following the steps:
How do I map key presses in kitty to different keys in the terminal program?
--------------------------------------------------------------------------------------
This is accomplished by using ``map`` with :sc:`send_text <send_text>` in :file:`kitty.conf`.
This is accomplished by using ``map`` with :ac:`send_key` in :file:`kitty.conf`.
For example::
map alt+s send_key ctrl+s
@@ -396,7 +397,8 @@ you press the :kbd:`alt+s` key. To see this in action, run::
kitten show-key -m kitty
Which will print out what key events it receives.
Which will print out what key events it receives. To send arbitrary text rather
than a key press, see :sc:`send_text <send_text>` instead.
How do I open a new window or tab with the same working directory as the current window?
@@ -447,9 +449,6 @@ do not use them, if at all possible. kitty contains features that do all of what
tmux does, but better, with the exception of remote persistence (:iss:`391`).
If you still want to use tmux, read on.
Image display will not work, see `tmux issue
<https://github.com/tmux/tmux/issues/1391>`__.
Using ancient versions of tmux such as 1.8 will cause gibberish on screen when
pressing keys (:iss:`3541`).
@@ -458,11 +457,17 @@ and then switch to another and these terminals have different :envvar:`TERM`
variables, tmux will break. You will need to restart it as tmux does not support
multiple terminfo definitions.
Displaying images while inside programs such as nvim or ranger may not work
depending on whether those programs have adopted support for the :ref:`unicode
placeholders <graphics_unicode_placeholders>` workaround that kitty created
for tmux refusing to support images.
If you use any of the advanced features that kitty has innovated, such as
:doc:`styled underlines </underlines>`, :doc:`desktop notifications
</desktop-notifications>`, :doc:`extended keyboard support
</keyboard-protocol>`, etc. they may or may not work, depending on the whims of
tmux's maintainer, your version of tmux, etc.
</keyboard-protocol>`, :doc:`file transfer </kittens/transfer>`, etc.
they may or may not work, depending on the whims of tmux's maintainer,
your version of tmux, etc.
I opened and closed a lot of windows/tabs and top shows kitty's memory usage is very high?

View File

@@ -164,7 +164,8 @@ Variables that kitty sets when running child programs
.. envvar:: TERMINFO
Path to a directory containing the kitty terminfo database.
Path to a directory containing the kitty terminfo database. Or the terminfo
database itself encoded in base64. See :opt:`terminfo_type`.
.. envvar:: KITTY_INSTALLATION_DIR
@@ -230,3 +231,11 @@ Variables that kitty sets when running child programs
Set to ``1`` when kitty is running a shell because of the ``--hold`` flag. Can
be used to specialize shell behavior in the shell rc files as desired.
.. envvar:: KITTY_SIMD
Set it to ``128`` to use 128 bit vector registers, ``256`` to use 256 bit
vector registers or any other value to prevent kitty from using SIMD CPU
vector instructions. Warning, this overrides CPU capability detection so
will cause kitty to crash with SIGILL if your CPU does not support the
necessary SIMD extensions.

View File

@@ -460,7 +460,10 @@ When you specify a placement id, it will be added to the acknowledgement code
above. Every placement is uniquely identified by the pair of the ``image id``
and the ``placement id``. If you specify a placement id for an image that does
not have an id (i.e. has id=0), it will be ignored. In particular this means
there can exist multiple images with ``image id=0, placement id=0``.
there can exist multiple images with ``image id=0, placement id=0``. Not
specifying a placement id or using ``p=0`` for multiple put commands (``a=p``)
with the same non-zero image id results in multiple placements the image.
An example response::
<ESC>_Gi=<image id>,p=<placement id>;OK<ESC>\
@@ -634,7 +637,7 @@ terminal may apply other heuristics (but it doesn't have to).
It is important to distinguish between virtual image placements and real images
displayed on top of Unicode placeholders. Virtual placements are invisible and only play
the role of prototypes for real images. Virtual placements can be deleted by a
deletion command only when the `d` key is equal to ``i``, ``I``, ``n`` or ``N``.
deletion command only when the `d` key is equal to ``i``, ``I``, ``r``, ``R``, ``n`` or ``N``.
The key values ``a``, ``c``, ``p``, ``q``, ``x``, ``y``, ``z`` and their capital
variants never affect virtual placements because they do not have a physical
location on the screen.
@@ -726,13 +729,14 @@ scrollback buffer. The values of the ``x`` and ``y`` keys are the same as cursor
Value of ``d`` Meaning
================= ============
``a`` or ``A`` Delete all placements visible on screen
``i`` or ``I`` Delete all images with the specified id, specified using the ``i`` key. If you specify a ``p`` key for the placement id as well, then only the placement with the specified image id and placement id will be deleted.
``i`` or ``I`` Delete all images with the specified id, specified using the ``i`` key. If you specify a ``p`` key for the placement id as well, then only the placement with the specified image id and placement id will be deleted.
``n`` or ``N`` Delete newest image with the specified number, specified using the ``I`` key. If you specify a ``p`` key for the
placement id as well, then only the placement with the specified number and placement id will be deleted.
``c`` or ``C`` Delete all placements that intersect with the current cursor position.
``f`` or ``F`` Delete animation frames.
``p`` or ``P`` Delete all placements that intersect a specific cell, the cell is specified using the ``x`` and ``y`` keys
``q`` or ``Q`` Delete all placements that intersect a specific cell having a specific z-index. The cell and z-index is specified using the ``x``, ``y`` and ``z`` keys.
``r`` or ``R`` Delete all images whose id is greater than or equal to the value of the ``x`` key and less than or equal to the value of the ``y`` (added in kitty version 0.33.0).
``x`` or ``X`` Delete all placements that intersect the specified column, specified using the ``x`` key.
``y`` or ``Y`` Delete all placements that intersect the specified row, specified using the ``y`` key.
``z`` or ``Z`` Delete all placements that have the specified z-index, specified using the ``z`` key.
@@ -1080,9 +1084,11 @@ Key Value Default Description
**Keys for deleting images**
-----------------------------------------------------------
``d`` Single character. ``a`` What to delete.
``(a, A, c, C, n, N,
i, I, p, P, q, Q, x,
X, y, Y, z, Z)``.
``(
a, A, c, C, n, N,
i, I, p, P, q, Q, r,
R, x, X, y, Y, z, Z
)``.
======= ==================== ========= =================

View File

@@ -164,6 +164,12 @@ Add this to bashrc and then to plot a function, simply do:
iplot 'sin(x*3)*exp(x*.2)'
.. tool_tgutui:
`tgutui <https://github.com/tgu-ltd/tgutui>`_
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A Terminal Operating Test hardware equipment
.. tool_onefetch:
`onefetch <https://github.com/o2sh/onefetch>`_
@@ -189,13 +195,6 @@ A tool to display weather information in your terminal with curl
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
View and manage the system clipboard under Wayland in your kitty terminal
.. tool_dmenu_term:
`dmenu-term <https://github.com/maximbaz/dmenu-term>`_
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Run applications on your system with fuzzy find inside a kitty window
Editor integration
-----------------------

View File

@@ -49,8 +49,10 @@ In addition to kitty, this protocol is also implemented in:
* The `dte text editor <https://gitlab.com/craigbarnes/dte/-/issues/138>`__
* The `Helix text editor <https://github.com/helix-editor/helix/pull/4939>`__
* The `far2l file manager <https://github.com/elfmz/far2l/commit/e1f2ee0ef2b8332e5fa3ad7f2e4afefe7c96fc3b>`__
* The `yazi file manager <https://github.com/sxyazi/yazi>`__
* The `awrit web browser <https://github.com/chase/awrit>`__
* The `nushell shell <https://github.com/nushell/nushell/pull/10540>`__
* The `fish shell <https://github.com/fish-shell/fish-shell/commit/8bf8b10f685d964101f491b9cc3da04117a308b4>`__
.. versionadded:: 0.20.0
@@ -83,7 +85,7 @@ text (``CSI`` is the bytes ``0x1b 0x5b``)::
The ``number`` in the first form above will be either the Unicode codepoint for a
key, such as ``97`` for the :kbd:`a` key, or one of the numbers from the
:ref:`functional` table below. The ``modifiers`` optional parameter encodes any
modifiers pressed for the key event. The encoding is described in the
modifiers active for the key event. The encoding is described in the
:ref:`modifiers` section.
The second form is used for a few functional keys, such as the :kbd:`Home`,
@@ -105,9 +107,7 @@ do not. When a key event produces text, the text is sent directly as UTF-8
encoded bytes. This is safe as UTF-8 contains no C0 control codes.
When the key event does not have text, the key event is encoded as an escape code. In
legacy compatibility mode (the default) this uses legacy escape codes, so old terminal
applications continue to work. Key events that could not be represented in
legacy mode are encoded using a ``CSI u`` escape code, that most terminal
programs should just ignore. For more advanced features, such as release/repeat
applications continue to work. For more advanced features, such as release/repeat
reporting etc., applications can tell the terminal they want this information by
sending an escape code to :ref:`progressively enhance <progressive_enhancement>` the data reported for
key events.
@@ -181,10 +181,12 @@ bit field with::
num_lock 0b10000000 (128)
In the escape code, the modifier value is encoded as a decimal number which is
``1 + actual modifiers``. So to represent :kbd:`shift` only, the value would be ``1 +
1 = 2``, to represent :kbd:`ctrl+shift` the value would be ``1 + 0b101 = 6``
and so on. If the modifier field is not present in the escape code, its default
value is ``1`` which means no modifiers.
``1 + actual modifiers``. So to represent :kbd:`shift` only, the value would be
``1 + 1 = 2``, to represent :kbd:`ctrl+shift` the value would be ``1 + 0b101 =
6`` and so on. If the modifier field is not present in the escape code, its
default value is ``1`` which means no modifiers. If a modifier is *active* when
the key event occurs, i.e. if the key is pressed or the lock (for caps lock/num
lock) is enabled, the key event must have the bit for that modifier set.
When the key event is related to an actual modifier key, the corresponding
modifier's bit must be set to the modifier state including the effect for the
@@ -229,8 +231,10 @@ enhancement <progressive_enhancement>` mechanism described below. Some examples:
shift+a -> CSI 97 ; 2 ; 65 u # The text 'A' is reported as 65
option+a -> CSI 97 ; ; 229 u # The text 'å' is reported as 229
If multiple code points are present, they must be separated by colons.
If no known key is associated with the text the key number ``0`` must be used.
If multiple code points are present, they must be separated by colons. If no
known key is associated with the text the key number ``0`` must be used. The
associated text must not contain control codes (control codes are code points
below U+0020 and codepoints in the C0 and C1 blocks).
Non-Unicode keys
@@ -336,7 +340,9 @@ much easier to integrate into the application event loop. The only exceptions
are the :kbd:`Enter`, :kbd:`Tab` and :kbd:`Backspace` keys which still generate the same
bytes as in legacy mode this is to allow the user to type and execute commands
in the shell such as ``reset`` after a program that sets this mode crashes
without clearing it.
without clearing it. Note that the Lock modifiers are not reported for text
producing keys, to keep them useable in legacy programs. To get lock modifiers
for all keys use the :ref:`report_all_keys` enhancement.
.. _report_events:
@@ -348,6 +354,13 @@ and key release events. Normally only key press events are reported and key
repeat events are treated as key press events. See :ref:`event_types` for
details on how these are reported.
.. note::
The :kbd:`Enter`, :kbd:`Tab` and :kbd:`Backspace` keys will not have release
events unless :ref:`report_all_keys` is also set, so that the user can still
type reset at a shell prompt when a program that sets this mode ends without
resetting it.
.. _report_alternates:
Report alternate keys
@@ -482,6 +495,12 @@ must correspond to the :kbd:`Backspace` key.
All keypad keys are reported as their equivalent non-keypad keys. To
distinguish these, use the :ref:`disambiguate <disambiguate>` flag.
Terminals may choose what they want to do about functional keys that have no
legacy encoding. kitty chooses to encode these using ``CSI u`` encoding even in
legacy mode, so that they become usable even in programs that do not
understand the full kitty keyboard protocol. However, terminals may instead choose to
ignore such keys in legacy mode instead, or have an option to control this behavior.
.. _legacy_text:
Legacy text keys

View File

@@ -26,8 +26,10 @@ and adding them to the command line for the next command.
You can also press :sc:`goto_file_line` to select anything that looks like a
path or filename followed by a colon and a line number and open the file in
:program:`vim` at the specified line number. The patterns and editor to be used
can be modified using options passed to the kitten. For example::
your default editor at the specified line number (opening at line number will
work only if your editor supports the +linenum command line syntax or is a
"known" editor). The patterns and editor to be used can be modified using
options passed to the kitten. For example::
map ctrl+g kitten hints --type=linenum --linenum-action=tab nvim +{line} {path}

View File

@@ -45,15 +45,29 @@ from inside other programs to display images. In particular, :option:`--place`,
:option:`--detect-support` and :option:`--print-window-size`.
If you are trying to integrate icat into a complex program like a file manager
or editor, there are a few things to keep in mind. icat works by communicating
or editor, there are a few things to keep in mind. icat normally works by communicating
over the TTY device, it both writes to and reads from the TTY. So it is
imperative that while it is running the host program does not do any TTY I/O.
Any key presses or other input from the user on the TTY device will be
discarded. At a minimum, you should use the :option:`--transfer-mode`
command line arguments. To be really robust you should
consider writing proper support for the :doc:`kitty graphics protocol
</graphics-protocol>` in the program instead. Nowadays there are many libraries
that have support for it.
discarded. If you would instead like to use it just as a backend to generate
the escape codes for image display, you need to pass it options to tell it the
window dimensions, where to place the image in the window and the transfer mode
to use. If you do that, it will not try to communicate with the TTY device at
all. The requisite options are: :option:`--use-window-size`, :option:`--place`
and :option:`--transfer-mode`, :option:`--stdin=no`.
For example, to demonstrate usage without access to the TTY:
.. code:: sh
zsh -c 'setsid kitten icat --stdin=no --use-window-size $COLUMNS,$LINES,3000,2000 --transfer-mode=file myimage.png'
Here, ``setsid`` ensures icat has no access to the TTY device.
The values, 3000, 2000 are made up. They are the window width and height in
pixels, to obtain which access to the TTY is needed.
To be really robust you should consider writing proper support for the
:doc:`kitty graphics protocol </graphics-protocol>` in the program instead.
Nowadays there are many libraries that have support for it.
.. include:: /generated/cli-kitten-icat.rst

View File

@@ -10,12 +10,12 @@ Draw a GPU accelerated dock panel on your desktop
You can use this kitten to draw a GPU accelerated panel on the edge of your
screen, that shows the output from an arbitrary terminal program.
screen or as the desktop wallpaper, that shows the output from an arbitrary
terminal program.
It is useful for showing status information or notifications on your desktop
using terminal programs instead of GUI toolkits.
.. figure:: ../screenshots/panel.png
:alt: Screenshot, showing a sample panel
:align: center
@@ -28,18 +28,32 @@ The screenshot above shows a sample panel that displays the current desktop and
window title as well as miscellaneous system information such as network
activity, CPU load, date/time, etc.
.. versionadded:: 0.34.0
Support for Wayland
.. note::
This kitten currently only works on X11 desktops
This kitten currently only works on X11 desktops and Wayland compositors
that support the `wlr layer shell protocol
<https://wayland.app/protocols/wlr-layer-shell-unstable-v1#compositor-support>`__
(which is almost all of them except the, as usual, crippled GNOME).
Using this kitten is simple, for example::
kitty +kitten panel sh -c 'printf "\n\n\nHello, world."; sleep 5s'
This will show ``Hello, world.`` at the top edge of your screen for five
seconds. Here the terminal program we are running is :program:`sh` with a script
seconds. Here, the terminal program we are running is :program:`sh` with a script
to print out ``Hello, world!``. You can make the terminal program as complex as
you like, as demonstrated in the screenshot above.
If you are on Wayland, you can, for instance run::
kitty +kitten panel --edge=background htop
to display htop as your desktop background. Remember this works in everything
but GNOME and also, in sway, you have to disable the background wallpaper as
sway renders that over the panel kitten surface.
.. include:: ../generated/cli-kitten-panel.rst

View File

@@ -141,7 +141,7 @@ The Splits Layout
--------------------
This is the most flexible layout. You can create any arrangement of windows
by splitting exiting windows repeatedly. To best use this layout you should
by splitting existing windows repeatedly. To best use this layout you should
define a few extra key bindings in :file:`kitty.conf`::
# Create a new window splitting the space used by the existing one so that

View File

@@ -81,7 +81,7 @@ control scripts. To run a kitten on a key press::
map f1 kitten mykitten.py
Many of kitty;s features are themselves implemented as kittens, for example,
Many of kitty's features are themselves implemented as kittens, for example,
:doc:`/kittens/unicode_input`, :doc:`/kittens/hints` and
:doc:`/kittens/themes`. To learn about writing your own kittens, see
:doc:`/kittens/custom`.
@@ -189,16 +189,46 @@ has :code:`keyboard protocol` in its title. Run the show-key kitten as::
Press :kbd:`ctrl+shift+t` and instead of a new tab opening, you will
see the key press being reported by the kitten. :code:`--when-focus-on` can test
the focused window using very powerful criteria, see :ref:`search_syntax` for
details. A more practical example unmaps the key when the focused window is running vim::
details. A more practical example unmaps the key when the focused window is
running an editor::
map --when-focus-on var:in_editor
map --when-focus-on var:in_editor kitty_mod+c
In order to make this work, you need the following lines in your :file:`.vimrc`::
In order to make this work, you need to configure your editor as show below:
let &t_ti = &t_ti . "\\033]1337;SetUserVar=in_editor=MQo\\007"
let &t_te = &t_te . "\\033]1337;SetUserVar=in_editor\\007"
.. tab:: vim
These cause vim to set the :code:`in_editor` variable in kitty and unset it when leaving vim.
In :file:`~/.vimrc` add:
.. code-block:: vim
let &t_ti = &t_ti . "\\033]1337;SetUserVar=in_editor=MQo\\007"
let &t_te = &t_te . "\\033]1337;SetUserVar=in_editor\\007"
.. tab:: neovim
In :file:`~/.config/nvim/init.lua` add:
.. code-block:: lua
vim.api.nvim_create_autocmd({ "VimEnter", "VimResume" }, {
group = vim.api.nvim_create_augroup("KittySetVarVimEnter", { clear = true }),
callback = function()
io.stdout:write("\x1b]1337;SetUserVar=in_editor=MQo\007")
end,
})
vim.api.nvim_create_autocmd({ "VimLeave", "VimSuspend" }, {
group = vim.api.nvim_create_augroup("KittyUnsetVarVimLeave", { clear = true }),
callback = function()
io.stdout:write("\x1b]1337;SetUserVar=in_editor\007")
end,
})
These cause the editor to set the :code:`in_editor` variable in kitty and unset it when exiting.
As a result, the :kbd:`ctrl+shift+c` key will be passed to the editor instead of
copying to clipboard. In the editor, you can map it to copy to the clipboard,
thereby allowing use of a common shortcut both inside and outside the editor
for copying to clipboard.
Sending arbitrary text or keys to the program running in kitty
--------------------------------------------------------------------------------

View File

@@ -251,8 +251,11 @@ The scrollback buffer
-----------------------
|kitty| supports scrolling back to view history, just like most terminals. You
can use either keyboard shortcuts or the mouse scroll wheel to do so. However,
|kitty| has an extra, neat feature. Sometimes you need to explore the scrollback
can use either keyboard shortcuts or the mouse scroll wheel to do so. While
you are browsing the scrollback a :opt:`small indicator <scrollback_indicator_opacity>`
is displayed along the right edge of the window to show how far back you are.
However, |kitty| has an extra, neat feature. Sometimes you need to explore the scrollback
buffer in more detail, maybe search for some text or refer to it side-by-side
while typing in a follow-up command. |kitty| allows you to do this by pressing
the :sc:`show_scrollback` shortcut, which will open the scrollback buffer in

View File

@@ -3,10 +3,14 @@ Performance
The main goals for |kitty| performance are user perceived latency while typing
and "smoothness" while scrolling as well as CPU usage. |kitty| tries hard to
find an optimum balance for these. To that end it keeps a cache of each rendered
glyph in video RAM so that font rendering is not a bottleneck. Interaction with
child programs takes place in a separate thread from rendering, to improve
smoothness.
find an optimum balance for these. To that end it keeps a cache of each
rendered glyph in video RAM so that font rendering is not a bottleneck.
Interaction with child programs takes place in a separate thread from
rendering, to improve smoothness. Parsing of the byte stream is done using
`vector CPU instructions
<https://en.wikipedia.org/wiki/Single_instruction,_multiple_data>`__ for
maximum performance. Updates to the screen typically require sending just a few
bytes to the GPU.
There are two config options you can tune to adjust the performance,
:opt:`repaint_delay` and :opt:`input_delay`. These control the artificial delays
@@ -15,19 +19,110 @@ introduced into the render loop to reduce CPU usage. See
option to further decrease latency at the cost of some `screen tearing
<https://en.wikipedia.org/wiki/Screen_tearing>`__ while scrolling.
You can generate detailed per-function performance data using
`gperftools <https://github.com/gperftools/gperftools>`__. Build |kitty| with
``make profile``. Run kitty and perform the task you want to analyse, for
example, scrolling a large file with :program:`less`. After you quit, function
call statistics will be printed to STDOUT and you can use tools like
*KCachegrind* for more detailed analysis.
Benchmarks
-------------
Here are some CPU usage numbers for the task of scrolling a file continuously in
:program:`less`. The CPU usage is for the terminal process and X together and is
measured using :program:`htop`. The measurements are taken at the same font and
window size for all terminals on a ``Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz``
CPU with a ``Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD
7770/8760 / R7 250X]`` GPU.
Measuring terminal emulator performance is fairly subtle, there are three main
axes on which performance is measured: Energy usage for typical tasks,
Keyboard to screen latency, and throughput (processing large amounts of data).
Keyboard to screen latency
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is measured either with dedicated hardware, or software such as `Typometer
<https://pavelfatin.com/typometer/>`__. Third party measurements comparing
kitty with other terminal emulators on various systems show kitty has best in
class keyboard to screen latency.
Note that to minimize latency at the expense of more energy usage, use the
following settings in kitty.conf::
input_delay 0
repaint_delay 2
sync_to_monitor no
wayland_enable_ime no
`Hardware based measurement on macOS
<https://thume.ca/2020/05/20/making-a-latency-tester/>`__ show that kitty and
Apple's Terminal.app share the crown for best latency. These
measurements were done with :opt:`input_delay` at its default value of ``3 ms``
which means kitty's actual numbers would be even lower.
`Typometer based measurements on Linux
<https://github.com/kovidgoyal/kitty/issues/2701#issuecomment-911089374>`__
show that kitty has far and away the best latency of the terminals tested.
.. _throughput:
Throughput
^^^^^^^^^^^^^^^^
kitty has a builtin kitten to measure throughput, it works by dumping large
amounts of data of different types into the tty device and measuring how fast
the terminal parses and responds to it. The measurements below were taken with
the same font, font size and window size for all terminals, and default
settings, on the same computer. They clearly show kitty has the fastest
throughput. To run the tests yourself, run ``kitten __benchmark__`` in the
terminal emulator you want to test, where the kitten binary is part of the
kitty install.
The numbers are megabytes per second of data that the terminal
processes. Measurements were taken under Linux/X11 with an ``AMD Ryzen 7 PRO
5850U``. Entries are in order of decreasing performance. kitty is twice
as fast as the next best.
================ ====== ======= ===== ====== =======
Terminal ASCII Unicode CSI Images Average
================ ====== ======= ===== ====== =======
kitty 0.33 121.8 105.0 59.8 251.6 134.55
gnometerm 3.50.1 33.4 55.0 16.1 142.8 61.83
alacritty 0.13.1 43.1 46.5 32.5 94.1 54.05
wezterm 20230712 16.4 26.0 11.1 140.5 48.5
xterm 389 47.7 18.3 0.6 56.3 30.72
konsole 23.08.04 25.2 37.7 23.6 23.4 27.48
alacritty+tmux 30.3 7.8 14.7 46.1 24.73
================ ====== ======= ===== ====== =======
In this table, each column represents different types of data. The CSI column
is for data consisting of a mix of typical formatting escape codes and some
ASCII only text.
.. note::
By default, the benchmark kitten suppresses actual rendering, to better
focus on parser speed, you can pass it the ``--render`` flag to not suppress
rendering. However, modern terminals typically render asynchronously,
therefore the numbers are not really useful for comparison, as it is just a
game about how much input to *batch* before rendering the next frame.
However, even with rendering enabled kitty is still faster than all the
rest. For brevity those numbers are not included.
.. note::
foot, iterm2 and Terminal.app are left out as they do not run under X11.
Alacritty+tmux is included just to show the effect of putting a terminal
multiplexer into the mix (halving throughput) and because alacritty isnt
remotely comparable to any of the other terminals feature wise without tmux.
.. note::
konsole, gnome-terminal and xterm do not support the `Synchronized update
<https://gitlab.com/gnachman/iterm2/-/wikis/synchronized-updates-spec>`__
escape code used to suppress rendering, if and when they gain support for it
their numbers are likely to improve by ``20 - 50%``, depending on how well they
implement it.
Energy usage
^^^^^^^^^^^^^^^^^
Sadly, I do not have the infrastructure to measure actual energy usage so CPU
usage will have to stand in for it. Here are some CPU usage numbers for the
task of scrolling a file continuously in :program:`less`. The CPU usage is for
the terminal process and X together and is measured using :program:`htop`. The
measurements are taken at the same font and window size for all terminals on a
``Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz`` CPU with a ``Advanced Micro
Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X]`` GPU.
============== =========================
Terminal CPU usage (X + terminal)
@@ -40,21 +135,16 @@ gnome-terminal 15 - 17%
konsole 29 - 31%
============== =========================
As you can see, |kitty| uses much less CPU than all terminals, except xterm, but
its scrolling "smoothness" is much better than that of xterm (at least to my,
admittedly biased, eyes).
Instrumenting kitty
-----------------------
.. _perf-cat:
.. note::
Some people have asked why kitty does not perform better than terminal XXX
in the test of sinking large amounts of data, such as catting a large text
file. The answer is because this is not a goal for kitty. kitty deliberately
throttles input parsing and output rendering to minimize resource usage
while still being able to sink output faster than any real world program can
produce it. Reducing CPU usage, and hence battery drain while achieving
instant response times and smooth scrolling to a human eye is a far more
important goal.
You can generate detailed per-function performance data using
`gperftools <https://github.com/gperftools/gperftools>`__. Build |kitty| with
``make profile``. Run kitty and perform the task you want to analyse, for
example, scrolling a large file with :program:`less`. After you quit, function
call statistics will be displayed in *KCachegrind*. Hence, profiling is best done
on Linux which has these tools easily available.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 118 KiB

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 53 KiB

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

After

Width:  |  Height:  |  Size: 8.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 958 KiB

After

Width:  |  Height:  |  Size: 870 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 111 KiB

After

Width:  |  Height:  |  Size: 103 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 110 KiB

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 23 KiB

Some files were not shown because too many files have changed in this diff Show More