Modified `start_classification` in `utf8_decode_to_esc` in `simd-string-impl.h`, which now:
Rejects `0xC0`, `0xC1` and `0xF5..0xFF` lead bytes in UTF-8 subsequences.
Enforces special ranges for the second subsequence bytes after `0xE0`, `0xED`, `0xF0` and `0xF4` bytes to prevent overlong sequences, surrogates, and code points above U+10FFFF.
Accumulates UTF-8 validation errors in a single vector to avoid many conditional branches.
Worsens unicode benchmark performance by about 4%.