Commit Graph

30 Commits

Author SHA1 Message Date
pagedown
76669ad14d Note the Unicode version in the generated files
Files generated from the same Unicode version will be consistent
regardless of the date they were built.
2022-11-18 13:01:32 +08:00
Kovid Goyal
e5e8cc72c6 Make the Unicode database version used available 2022-11-17 20:11:50 +05:30
Kovid Goyal
6b04c42730 update wcswidth go version to match unicode 15 update in master 2022-11-14 15:42:03 +05:30
pagedown
13a3c6b5b2 Update to Unicode 15.0 2022-09-29 10:13:21 +08:00
Kovid Goyal
e8b19e08fa Fix non-renderable combining chars causing some text to not be rendered on Linux
The test for non-renderable chars was broken and the variation selectors
were not included in the test. Fixes #4444
2022-01-05 22:33:53 +05:30
Kovid Goyal
d875615c03 Fix a regression in the handling of some combining characters such as zero width joiners
Fixes #4439
2022-01-05 08:50:55 +05:30
Kovid Goyal
fbf47f75d5 Fix soft hyphens not being preserved when round tripping text through the terminal
Also roundtrip all characters in the Cf category.

Characters with the DI (Default Ignorable) property are now
preserved but not rendered and treated as zero-width
as per the unicode standard.
See https://www.unicode.org/faq/unsup_char.html
2021-10-07 12:44:22 +05:30
Kovid Goyal
31e623afb3 Add support for Unicode 14
Fixes #3542
2021-10-04 14:00:35 +05:30
Kovid Goyal
3633049ba5 Forgot to include \r in the url regex 2021-07-19 18:09:00 +05:30
Kovid Goyal
ff1585acfe Unicode input: Make diamond a synonym for gem
Fixes #3437
2021-04-02 12:53:58 +05:30
Kovid Goyal
d09666aba9 Unicode input kitten: Add symbols from NERD font
These are mostly Private Use symbols not in any standard,
however they are common enough to be useful.

Fixes #2972
2020-09-22 19:47:39 +05:30
Kovid Goyal
628b92f20b Speed up is_ignored_char in the common case 2020-08-06 18:05:33 +05:30
Kovid Goyal
a835b56a51 Speed up is_combining_char() in the common case 2020-08-06 17:45:40 +05:30
Kovid Goyal
24197dc422 Render known country flags designated by a pair of unicode codepoints in two cells instead of four. 2020-04-06 22:16:59 +05:30
Kovid Goyal
bf4e8c490c Update to Unicode 13.0
Fixes #2513
2020-04-06 18:59:35 +05:30
Kovid Goyal
b709ee6842 Add a function to check if a codepoint is a symbol 2019-10-01 18:57:06 +05:30
Kovid Goyal
8e1ed2f8c3 Update unicode data to 12.1 2019-08-02 14:48:18 +05:30
Kovid Goyal
facd353228 Update to using the Unicode 12 standard 2019-03-06 13:58:16 +05:30
Kovid Goyal
094ddd9333 Round-trip the zwj unicode character
Rendering of sequences containing zwj is still not implemented, since it
can cause the collapse of an unbounded number of characters into a
single cell. However, kitty at least preserves the zwj by storing it as
a combining character.
2018-08-04 18:29:45 +05:30
Kovid Goyal
000c1cf306 Implement support for emoji skin tone modifiers
Fixes #787
2018-08-04 10:06:25 +05:30
Kovid Goyal
61dd52b50f Ignore the non-characters from the unicode standard in addition to ignoring the control characters 2018-06-14 10:20:13 +05:30
Kovid Goyal
0b93b85cf2 Dont use case range in names.h as it prevents compilation with Visual Studio 2018-05-01 11:27:10 +05:30
Kovid Goyal
f7001ea068 Fix character names for control characters not being read from unicode database
Also allow unicode_names.c to be compiled with python 2 so I can re-use
it in calibre.
2018-05-01 10:13:58 +05:30
Kovid Goyal
0b99bb534f Unicode input: When searching by name search for prefix matches as well as whole word matches
So now hori matches both "hori" and "horizontal". Switched to a
prefix-trie internally.
2018-04-24 07:45:20 +05:30
Kovid Goyal
8c18486836 Module with all the data for unicode entry by character name 2018-02-09 19:56:25 +05:30
Kovid Goyal
ff2e5b3966 Avoid unnecessary calls to mark_for_codepoint 2018-02-06 11:23:39 +05:30
Kovid Goyal
fbe4d036d8 Have wcwidth() return 0 for marks instead of -1
Since kitty always treats marks as combinig chars, this allows us to
remove a few unnecessary branches
2018-02-05 10:06:05 +05:30
Kovid Goyal
fc7ec1d3f7 Get rid of the option to use the system wcwidth
The system wcwidth() is often wrong. Not to mention that if you SSH into
a different machine, then you have a potentially different wcwidth. The
only sane way to deal with this is to use the unicode standard.
2018-02-04 21:02:30 +05:30
Kovid Goyal
32632264ee Mapping that can be used to store unicode mark symbols in only two bytes 2018-01-18 16:06:07 +05:30
Kovid Goyal
5faa649452 Drop the dependency on libunistring 2018-01-18 00:09:40 +05:30