✨ geoff, but witchy ✨ utilise witches.town. Vous pouvez læ suivre et interagir si vous possédez un compte quelque part dans le "fediverse".

Dumb Unicode question:

I know that 'code points' are not characters / glyphs. They include control characters etc.

Does there exist any standardised abstraction within Unicode, or systems handling Unicode, for anything approximating 'actual glyph'?

Eg: a UTF-8 string, or a series of 32-bit codepoints, that together unambiguously define a visual glyph?

Are there standard ways of isolating and dealing with such a thing, which is roughly the equivalent of 'character'?

@natecull elixir does this. They refer to the final represented glyphs as `graphemes`.

@cooler_ranch It seems like we need a WHOLE lot of new 'string normalisation' standards for Unicode. To detect canonical forms of visually-equivalent characters.

sorta like 'lowercasing' a string, but for non-Latin chars.