Query Speed with v4RB

Ruslan Zasukhin sunshine at public.kherson.ua
Sun Jan 18 10:22:48 CST 2004


on 1/18/04 5:39 AM, Brad Bennett at bbennett at quicomm.com wrote:

Hi Brad,

> Hi Ruslan:
> 
> Thank you very much for your reply.  I took your recommendations into
> account, and we are very pleased with the results.  Valentina now
> demonstrates 70 - 100 times improvement in lookup speeds as compared to
> using the internal RealBasic Database.  If you are interested, please see:
> 
> http://quicomm.com/valentina_vs_realbasic_db.htm
> 
> The key was to break up the database in several manageable portions,
> increase the Valentina cache, and index the ipfrom and ipto fields.   To
> decrease the DB size, I also changed the strings to VarChar fields.  Thanks
> again for your suggestions and for a truly remarkable database product.

Glad to hear you have excellent results now with Valentina.
I will CC this to Valentina list and NUG to share your success with other
developers. 

NOTES: 

1) I am sure that even if your db will grow to 20 millions records, time of
Valentina answer will grow may be to 0.1 - 0.2 seconds.

2) In Valentina 2.0 we improve many parts of engine to works in 50-500 time
faster. Yes really. It is hard now to say general speed up for 2.0. I think
it will very depend on particular task.

3) and 2.0 will contain one more feature which will allow
IF NEEDED speed up answers even more.


> best regards,
> --- Brad Bennett
> 
>> on 1/16/04 10:44 PM, bbennett at quicomm.com at bbennett at quicomm.com wrote:
>> 
>>> Our database table (2.5 million records):
>>> 
>>> ipfrom double
>>> ipto double
>>> latitude double
>>> longitude double
>>> country string(64)
>>> region string(128)
>>> city string(128)
>>> isp string(256)
>>> 
>>> What we need to do is query this database on an IP address, hence the SQL
>>> command is something like:
>>> 
>>> tSQL = "SELECT latitude,longitude,country,region,city,isp FROM iplat
>>> WHERE
>>> ipfrom <= myIP AND ipto >= myIP"
>>> 
>>> In order to use the internal RealDatabase, we actually broke the DB into 20
>>> databases, then conditionally opened the needed DB depending on the IP
>>> address
>>> range.  For Valentina we just tried a single database (hence perhaps we are
>>> comparing Apples to Oranges).
>>> 
>>> My question is:
>>> 
>>> - our queries using the single 2.5 million record Valentina DB on an 800 MHz
>>> G4 are taking about 2.5 seconds.  Does this sound about right?
>> 
>> I think not. We have Valentina developer that have db in 28 millions of
>> records, and he have told that Valentina answer him in 0.2 seconds.
>> 
>> 
>>> - this is about the same time it takes us to query the internal RealDatabase
>>> system (although using 20 separate DB files, each having about 125,000
>>> records).  Would you believe if we broke up the Valentina DB into 20
>>> similarly
>>> sized DBs, we might experience a significant speed gain?
>> 
>> Reducing db size in 20 times, reduce the number of steps on the search in
>> log(n) = 4-5 times.
>> 
>> I am not sure that this is very good idea -- split db.
>> Only if data have some special characteristics which you can use.
>> In your task it seems you have such info.
>> 
>> Just you can create not 20 dbs, but use one db with 20 tables.
>> 
>> 
>>> - due to the fixed string field sizes, our Valentina DB has ballooned to
>>> over
>>> 1.5 GB.  Ouch.  Even compressed, this is still a 750 MB file.  Too much to
>>> easily distribute online.  IF we go to the "text" (blob) type of
>>> string assigment to these fields, will we see any significant performance
>>> penalty?
>> 
>> As I have said, you need VarChar field!
>> 
>> 
>>> - any other tips we should consider in using Valentina?  We really do
>>> like the
>>> ability to use it on both Mac and Windows...and the testimonials seem to
>>> indicate that we really should see some significant performance gains.  Any
>>> additional ideas?
>> 
>> Yes.
>> 
>> First of all check the size of cache that you have specify.
>> It should be at least 10-15MB.
>> 
>> Let's count. You do search on double. 8 bytes
>> 
>> 2.5MB * 8 = 20MB file of field.
>> 
>> Index will take ... Depend on if you have repeating values, and how much.
>> Index can be
>>    if unique field: (8 + 4) = 12 * 2.5M = 30MB index.
>>    actually even bigger 50% usually. So up to 45MB.
>> 
>>    if you have repeating then it can be e,g, 20MB.
>> 
>> And you do search on 2 such indexes.
>> 
>> 
>> -----------
>> As I understand, you have STABLE database, yes?
>> Then after you prepare table, close it, trash .ind file
>> And open it again, you will get index more compact.
>> 
>> 
>> ---------
>> How many records is found in average on your query?


-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



More information about the Valentina mailing list