Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Python 1991

Ruby 1995

But I don't know if Python was good with Japanese text in 4 years after inception.



Python only supports native Unicode strings from Python3.


Unicode strings were added to Python in 2.0 (released in 2000). What changed in Python 3 is that they became the only string type, with what used to be non-Unicode strings effectively becoming immutable byte arrays.

However, Japanese often don't want Unicode, because of Han unification, and the associated controversy. Thus, other encodings are often preferred, and languages that can handle them in a transparent way have the advantage. It's largely for this reason that Ruby avoided a Unicode string type for a long time, treating strings mostly as raw byte arrays, with a few places that needed encoding supporting them on an ad-hoc basis.

In Ruby 1.9 (2007), they finally added encoding-aware strings; but unlike Python 3, where the encoding is always Unicode, in Ruby the encoding can be anything, and string is bytes + encoding. So you can do UTF-8, but you can also do e.g. Shift_JIS (which many Japanese users prefer). The cost is that it means that e.g. a+b is no longer always valid for two arbitrary strings, if they are in different encodings that cannot be reconciled.


I was quite amused by the design of Oniguruma [1] which is essentially the Ruby's regular expression library. It contains tons of associated tables for each supported character encoding---segmentation, character types, case folding and the likes. I found it ironical that the non-ASCII case folding doesn't seem to work for most CJK character encodings though (in my shallow analysis).

[1] https://github.com/kkos/oniguruma


> Python only supports native Unicode strings from Python3.

"Unicode" strings were added to Python 2.0 (October 2000), they're the default since Python 3 (December 2008).

To this day, Ruby does not have "unicode" strings, it has byte strings, which since 1.9 (December 2007) can have an associated encoding.


The default string type is a unicode string since Python3, a special string type for unicode had been part of the language since somewhere in the early 2.x releases (which happened 5+ years before 3.0, in the early 2000s). Not being the default, it of course wasn't generally supported in libraries and still a lot later than Ruby.


> Not being the default, it of course wasn't generally supported in libraries and still a lot later than Ruby.

Ruby has never had text strings at all. It added an encoding property to bytestrings in 2007.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: