mirror of
https://github.com/shaka-project/shaka-player.git
synced 2026-06-26 17:46:26 +03:00
f151f55da9
A regex in the TTML parser was looking for "word" characters (\w),
which is short-hand for non-whitespace ASCII characters, and excludes
Unicode characters and symbols from non-English languages.
There is a better option in Regex in some browsers, which is /\p{L}/u.
This is not supported universally yet, and defining an equivalent
regex constant using character codes is over 4kB of extra code.
It seems that the original author of that code meant "word" (\w) in
the sense of "non-whitespace" (\S), so we can just use \S instead.
This change also adds a regression test that doesn't depend on the
specifics of the regular expression used.
Closes #2478
Change-Id: I794258a797ba5d26a7bd8fc0a5244adc70c8a1ef