Are you losst?

Stamets@startrek.website · edit-2 8 months ago

Are you losst?

sarmale@lemmy.zip · 8 months ago

How many unicode characters could you add to the standard until it becomes unreliable?

Kerb@discuss.tchncs.de · edit-2 8 months ago

aparently unicode supports about 1.1 million characters, and we ~~currently only use 96,382 as of version 4.0~~

EDIT: i just read that unicode 4.0 is very outdated, current version is unicode 15.1 with 149,878 characters.

dual_sport_dork@lemmy.world · 8 months ago

A Unicode character can be up to 4 bytes, so 2^32 or 4,294,967,296 potential unique characters. And it’d be easy enough to adjust the standard to allow for an extra byte(s) if necessary – it’s been done before.

Turun@feddit.de · edit-2 8 months ago

This is incorrect. While in UTF-32 a character (actually a code point) requires 4 bytes, and in UTF-8 up to 4 bytes, the Unicode standard is limited to 17*2^16 code points. (edit: apparently because that is the limit of UTF-16. 4 Byte UTF-8 can encode 2^21 code points, but it is not technically limited to four bytes, so in total is a ble to encode 2^31 code points)

Unicode is the standard that says “the thing we call captial A is the 65th character”, literally defining a mapping from numbers to concepts.
UTF-8 or UTF-32 are a way to encode a list of numbers in a more (UTF-8) or less (UTF-32) efficient way.