Efficient use of text-containing field
Ed Kleban
Ed at Kleban.com
Mon Dec 5 20:26:45 CST 2005
One option if you're really concerned about performance because:
1) you have a LOT of records, and
2) a lot of the strings in VarChar(1022) field really are sizeable in
length, i.e significantly longer than 16 characters,
would be to create a separate string(16) field in the record and index that
instead of indexing the VarChar field. Just include a copy of the first
16-chars of the VarChar field in that string field. The benefit is that
you'll get optimal sorting speed, and minimal paging overhead since the sort
routine won't have to haul in the entire text of the VarChar(1022) field
when sorting the strings. nor again later when performing a binary search on
them when you do a lookup. In fact you may as well do this since you can't
use "Indexed" for sorting purposes on a VarChar field that's got
IndexedByWords enabled. I'm sure Ruslan will correct me here if I'm wrong.
And there's no way you are likely going to write a sort routine that is
faster than the native indexing Ruslan already has built into Valentina
unless you're going to code it in C.
If you only need the sort for the purpose of doing lookups rather than a
relative ordering of the strings, and if you don't need ranged searches
(find all strings between "dinosaur" and "dog"), then by far the fastest way
to do searches is to store a string hash in a Long field of the record.
This is much more compact because you don't need to allocate the String(16)
field, just a Long or Ulong; it totally eliminates the need for having to do
a sort at all; and it will have minimal paging overhead to accomplish the
search. Unfortunately Ruslan hasn't implemented EVFlags.fHashedIndex yet,
so you have to allocate this field and manage it yourself manually rather
than simply telling V2 that's how you want it to manage indexing of the
VarChar(1022) field.
This is what I do to avoid sorting and achieve phenomenal lookup
performance.
On 12/5/05 6:02 PM, "jda" <jda at his.com> wrote:
> Hi Ruslan,
>
> I have lots of VarChar fields that will hold up to 1022 UTF-16 chars.
> Indexedbyword.
>
> I have a sort method for each. All I need is for the sort to account
> for, say, the first 16 chars. Here is what I'm doing now:
>
> newfield = CreateVarCharField("newfield", 1022, EVFlag.fNullable +
> EVFlag.fIndexed + EVFlag.fIndexByWords)
> newfield.storageEncoding = "UTF16"
> newfieldSortMethod = CreateVarCharField("newfieldSortMethod", 16,
> EVFlag.fNullable, "left(newfield, 16)")
>
>
> Is the sort method the most efficient (space-saving) way to do this?
> Or should I be doing something else? I do not need to be able to sort
> by the entire possible 1022 characters...
>
> Thanks,
>
> Jon
> _______________________________________________
> Valentina mailing list
> Valentina at lists.macserve.net
> http://lists.macserve.net/mailman/listinfo/valentina
More information about the Valentina
mailing list