V4RB, Jon, project /// More explanation.
Erik Mueller-Harder
valentina-list at vermontsoftworks.com
Wed Sep 15 14:09:48 CDT 2004
Hi, Erne --
On Sep 15, 2004, at 13:44, erne wrote:
> you just stated that Vale will convert those 1 byte chars to 2 bytes
> UTF16
> internally and thus consume all 50 bytes for 25 chars only
>
> or I still don't get something?
For UTF16, we think and talk about and define the number of
"characters." These characters take up two bytes each on the disk,
it's true, but it's a straightforward relationship of 1:2. This is
true of *all* characters stored with UTF16 encoding, even those in the
original ASCII 128.
For single-byte encodings such as Latin-1 and MacRoman, characters =
bytes always, and that's how we're used to thinking.
UTF8 is the most complicated. We define fields in terms of *bytes*
only. As long as our data consists only of the old ASCII values 1 -
128, they're the same and all is still simple. Any other character
that you store, though, take up two bytes; so if you have a "é" or a
"ç", in your word, it takes up more space than you're used to thinking
of.
So, to use the original example, if you store 10 characters in a
VString(10) field, and if those 10 characters are "abcdefghij", then
all will be well. If you attempt to store "abcdéfghij", though, only
"abcdefghi" will fit, since that's 10 bytes long.
The fact that Valentina converts everything to UTF16 for its own
processing is actually irrelevant -- just keep thinking about the
*byte*-length of your UTF8 fields and of the *byte*-length of the
strings you're trying to save to them.
Does this help?
-- Erik
More information about the Valentina-beta
mailing list