V4RB, Jon, project /// More explanation.

Erik Mueller-Harder valentina-list at vermontsoftworks.com
Wed Sep 15 14:09:48 CDT 2004


Hi, Erne --

On Sep 15, 2004, at 13:44, erne wrote:

> you just stated that Vale will convert those 1 byte chars to 2 bytes 
> UTF16
> internally and thus consume all 50 bytes for 25 chars only
>
> or I still don't get something?

For UTF16, we think and talk about and define the number of 
"characters."  These characters take up two bytes each on the disk, 
it's true, but it's a straightforward relationship of 1:2.  This is 
true of *all* characters stored with UTF16 encoding, even those in the 
original ASCII 128.

For single-byte encodings such as Latin-1 and MacRoman, characters = 
bytes always, and that's how we're used to thinking.

UTF8 is the most complicated.  We define fields in terms of *bytes* 
only.  As long as our data consists only of the old ASCII values 1 - 
128, they're the same and all is still simple.  Any other character 
that you store, though, take up two bytes; so if you have a "é" or a 
"ç", in your word, it takes up more space than you're used to thinking 
of.

So, to use the original example, if you store 10 characters in a 
VString(10) field, and if those 10 characters are "abcdefghij", then 
all will be well.  If you attempt to store "abcdéfghij", though, only 
"abcdefghi" will fit, since that's 10 bytes long.

The fact that Valentina converts everything to UTF16 for its own 
processing is actually irrelevant -- just keep thinking about the 
*byte*-length of your UTF8 fields and of the *byte*-length of the 
strings you're trying to save to them.

Does this help?

-- Erik


More information about the Valentina-beta mailing list