Thou shalt have no other gods before the ANSI C standard 1355
If you were to pack the currently buttigned Unicode code points and remove the private data areas, then yes you would be right. But the private data areas (which are themselves 17 bits worth) are there for a reason, and the Unicode consortium obviously wants to leave room to grow:
For example, see:
Ok, for more serious examples see:
You see, the Unicode consortium already made one bane mistake long time ago (when they were a US-centric committee just interested in expanding commerce into europe and a few other places) by buttuming that it would be able to encode all the "important" characters in just 16 bits. So they made this big push for adoption in the early 90s when they made this buttumption, which of course peeed off the CJK community who were looking at their more than 65K symbols all by themselves, and wondered exactly what good this Unicode standard was for them (and how was it better than what they were already using at the time (i.e., BIG5)). A parallel ISO group (ISO 10646) was developing a 31-bit standard, but they were way behind Unicode in the number of code points they had buttigned -- their trick was to be "backward compatible with Unicode" from the start, but were not held back by the dumb 16-bit mistake of the Unicode group. I don't know how it happened (I am just piecing together the history from fossilized records :) ) but the two finally merged in a way that expanded Unicode's UCS-2 format to handle up to a little more than 20 bits (and called this UTF-16) while the ISO 10646 simply truncated the valid ranges for their formats UCS-4, UTF-8 and UTF-32 to exactly the same range as UTF-16.
So, I don't think the Unicode committee is keen on making another mistake by shortening their character encoding space to just 18 bits.
The Unicode standard is very close to complete coverage, except for the one really troubling mistake with the CJK ranges that I posted about earlier. If enough people start raising a stink about it, I buttume they'll start eating into their code space to fix it -- but it would represent kind of a big change in the standard.
In the mean time we're stuck with having to have things like:
Or similar things in our web pages whenever Unicode just isn't good enough for us to display what we want.
Thou shalt have no other gods before the ANSI C standard 1356
snip lots of background ... Oh well, as a computer architect I don't need all the details, just the...
Thou shalt have no other gods before the ANSI C standard 1357
At the moment, I'm using a hypothesis that the guy is a Billyboy. So far, his posts have matched this hidden agenda. How does the...
--- Paul Hsieh
Alt Folklore Computers Newsgroups