I'm thinking even bog-standard European umlauts, cedillas, etc go multi-byte in ... | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

		CRConrad on June 6, 2024 \| parent \| context \| favorite \| on: How to chop off bytes of an UTF-8 string to fit in... I'm thinking even bog-standard European umlauts, cedillas, etc go multi-byte in Unicode? (Take a string of ÅÄÖåäöÜü and chop it off at various byte limits and see.)

gmueckl on June 6, 2024 | [–]

This is just the general behavior of truncating strings by code point when they contain decomposed glyphs. This can also impact accents etc.

panzi on June 6, 2024 | [–]

I don't remember the details, only that it was a bigger deal than with umlauts. I'll see if I can find the talk again.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact