Unicode and length of field

Ruslan Zasukhin sunshine at public.kherson.ua
Mon May 12 19:10:30 CDT 2003


Hi,

I just think aloud:

-- assume we have String[N] / VarChar[N] field.
   up to now we have consider Len as the maximum number of characters.

-- for UTF8 String[N] / VarChar[N]
    N will play role of length in bytes.
    Deal is that if we store here chars >127 then they can take 2 bytes.
    but some chars still can use one byte.
    We get mess...
    
    the only way -- consider N as length in bytes.
    Valentina also add one byte to keep END ZERO character.

    It is bad idea try to consider this characters.
    Because in this way, Valentina should assume that if you say
    String[30], then it must allocate 60 bytes, to be able store worse case.
    but if you will store MacRoman text then you simply loose 30 bytes.
    Of course this is bad, developer must be able control each bit
    in database.

-- for UTF16, all chars take 2 bytes always.
    So for such field, if we say String[30],
    Valentina is able allocate (30 + 1) * 2 = 62 bytes
    but then N again start means characters...

        
So we have 2 ways:

1) consider all as I have describe above. In this case UTF8 is EXCEPTION
from rule.

2) To make things more consistence, we can require for UTF16 length also in
bytes. Then if you need 30 chars you say String[60] and Valentina self add 2
bytes for END ZERO.
In this way both Unicodes, UTF8 and UTF16 play by the same rules, that
differ from old Strings.


-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



More information about the Valentina mailing list