How string are stored

Mon Aug 4 09:25:29 CDT 2003

jda wrote:

> You'll still have to do that, I believe. When you store the
> string in Valentina only the text is stored, not any tags or
> other info used by RB "strings". That means the encoding byte(s)
> will be lost and you'll have to define the string as utf8 when
> you load it back in.

Yes, I'm pretty sure you're right, Jon.

I, too, am storing my UTF-8 strings in Valentina (mis-identified to V4RB as "ASCII") and retrieving them with no problems, so long as I remember to identify the encoding upon retrieval to Rb with DefineEncoding.

The only pieces missing for me at this point are proper sorting (important) and proper word breaks (even more important).

I've found a rather laborious work-around for the word-break question (I keep a separate BO for indexing real, discrete words which is indexed, but -not- by words, using the first character of the "general category" property of each character as defined in UnicodeData.txt to decide where to separate words to go into the index BO; if the general category doesn't begin with L (letter) or N (numeral), I define the character as a word-break for my purposes -- slow, even with binary look-ups -- but effective).  I run searches for words through the same process, and am able successfully to match on any word.

I don't have a work-around for the sorting question, though.

Ruslan, I've been s-o-o-o patient for s-o-o-o long (2+ years?!).  I can't wait much longer for 1.9.9!  I'm thrilled that it's coming!

Erik