snaggen@programming.dev to Programming@programming.dev · 1 year agoThe Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!)tonsky.meexternal-linkmessage-square41fedilinkarrow-up1242arrow-down110
arrow-up1232arrow-down1external-linkThe Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!)tonsky.mesnaggen@programming.dev to Programming@programming.dev · 1 year agomessage-square41fedilink
minus-squareabhibeckert@lemmy.worldlinkfedilinkarrow-up25arrow-down3·edit-21 year agoI love the comparison of string length of the same UTF-8 string in four programming languages (only the last one is correct, by the way): Python 3: len(“🤦🏼♂️”) 5 JavaScript / Java / C#: “🤦🏼♂️”.length 7 Rust: println!(“{}”, “🤦🏼♂️”.len()); 17 Swift: print(“🤦🏼♂️”.count) 1
minus-squareWalnut356@programming.devlinkfedilinkarrow-up35·edit-21 year agoThat depends on your definition of correct lmao. Rust explicitly counts utf-8 scalar values, because that’s the length of the raw bytes contained in the string. There are many times where that value is more useful than the grapheme count.
I love the comparison of string length of the same UTF-8 string in four programming languages (only the last one is correct, by the way):
Python 3:
JavaScript / Java / C#:
Rust:
Swift:
That depends on your definition of correct lmao. Rust explicitly counts utf-8 scalar values, because that’s the length of the raw bytes contained in the string. There are many times where that value is more useful than the grapheme count.