accented characters, RegEx, temporary table

Ruslan Zasukhin sunshine at public.kherson.ua
Mon Jul 26 20:00:23 CDT 2004


On 7/26/04 7:02 PM, "olivier" <vidal_olivier at yahoo.fr> wrote:

>> Hmm, and how it can be? Even when we will have unicode?
>> 
>> It looks that to be able do this, we need add into upper() function
>> one more
>> parameter. Something as "IgnoreAccents". Right?
> 
> Yes, it would be perfect ! and very very useful for languages as French
> or Spanish.

We will try add this for 2.0.
 
> Other questions, please :
> 
> I have a database, empty. Cache memory 20mo. 2 string fields (38
> characters), indexed.
> 
> I add 100 000 recordings : 1 minute 20 seconds (.dat file : 4 Mo, .ind
> : 13.2 Mo)
> if encodings : time +25% ! (realbasic)
> Clear database
> I add 150 000 recordings : 2 minutes 20 seconds (.dat : 5.7 Mo, .ind :
> 19.5 Mo)
> Clear database
> I add 200 000 recordings : 9 minutes 2 seconds (.dat : 7.4 Mo, .ind :
> 25.7 Mo)
> 
> If only once I begin again with a new empty database.
> I add 500 000 recordings : several hours !
> 
> Why from 200 000 recordings, the time is not proportional ?
> Is it always necessary to make this type of operations several times
> (eg 5 X 100 000 additions of recordings) ?

I think the main reason here -> you get cache full with 200,000 records.

I think that if you will set cache in 40MB, then you will see fast work up
to 400,000 records.

Yes, I also have see that on such huge Add operations can be made
improvements.

Also you can try:
    set flag INDEXED = false.
    add many records
    set flag = true.

This means that you will add records without adding them into index on each
Add(). It is much much faster to build index once then change it a lots of
times.


> - operations with RegEx are they relatively fast ?

Yes. Because Valentina keep fields in separate logical files.

> Are they really practicable that 300 or 500 000 recordings ? (with a
> string of 40 characters)

Let's count:

    40 bytes * 500,000 recs = 20MB per field.

Today HDD can read file with speed 20-30MB.
So we need about 1 second to load into RAM.
And some time to do Regex itself.

In case file already in cache then we can reduce one second.


> -last question :
> I have a database of adresses. This db can reach more than a million
> adresses.
> Let us can be a search on a temporary table ? :
> 
> -the customer type at first the zip code (postcode)
> We obtain a first list sharply less important than the first one (eg
> 100 000 adresses)
> 
> -the customer makes now a search on a street. So that it is fast, the search
> would have to can be made on group of recordings corresponding to the first
> search on the zip code (on 100 000 recordings), and not on full database (1
> 000 000 recordings). it is possible with valentina ?

Guys, for Valentina this have no sense.

Okay you say you have found set of records for f1.
Now you want search f2. If it is indexed, then set of f1 DO NOT help to
search on index. 

Set can help only in case if f2 is NOT indexed.
Then yes, we reduce the number of loaded records (in ideal, here also exists
cases).

I believe both your fields will be indexed.
So no sense using set.

BTW, in Valentina/V4RB 2.0 we have introduce new functions Vfield.Find().
They really can help for such tasks.


-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



More information about the Valentina mailing list