2.0

jda jda at his.com
Thu Mar 11 12:02:36 CST 2004


>>>>
>>>>
>>>I suppose I don't get the point of the entire discussion. It 
>>>appears to me that UTF-16 is a better choice than UTF-8 for 
>>>Valentina.
>>>
>>
>>Why?
>
>First of all, the storage requirements of UTF-8 v. UTF-16 aren't 
>important; it wouldn't surprise me if the differences were dominated 
>by the overhead of Valentina's own filesystem format.  Disk space is 
>cheap.  What does matter is speed, and it appears to me that UTF-16 
>is better suited for the sort of string manipulation required for 
>indexing and other such database operations.  You might take a look 
>at <http://www.unicode.org/notes/tn12/>.
>

If Valentina can offer both, I suggest that the developer should 
decide which better suits his users' needs (one could even let the 
user specify what storage to use as a preference). Ruslan has already 
said that all internal operations will use UTF-16 (as does Mac OS X 
and Windows). The issue of UTF-8 is really only about storage. As I 
said before, if supporting UTF-8 as a storage option raises 
significant problems (in implementation, indexing, performance, etc.) 
then let it be UTF-16 everywhere. Since Ruslan hasn't yet tackled 
text indexing issues, I guess we won't know for a bit.

Jon


More information about the Valentina-beta mailing list