can I query raw bits using bit logic in a huge RS database?

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Sun Jun 10 06:10:09 CDT 2012


On 6/10/12 1:15 PM, "Aaron Andrew Hunt" <aaronandrewhunt at gmail.com> wrote:

Hi Aaron,

> Thanks for your response ...
> 
>> On 6/10/12 1:47 AM, "Kem Tekinay" <ktekinay at mactechnologies.com> wrote:
>> Can you describe in more details your db parameters as you see that now.
>> * 3 billion records ... In single table?  Or this is total for e.g. 100
>> tables?
> 
> 
> We're still trying to determine whether or not we want to use multiple tables
> for our projected 3 billion records.
> 
>>> I need to store variable-lengths of raw bits in a database, and query them
>> * What is expected size of this bits ?
> 
> Expected size is less than 512 bits (but we need queries at the bit level, not
> byte level); however, ideally we want unlimited variable lengths of bits.

I understand that you need bit level :)

>> * such query will go without index, so will not be very fast ...  do you have
>> any time limits on query ?
> 
> We want it to be as fast as possible of course, but ideally not more than 30
> min. for the most complex queries.

>> * what hardware you going to use for this db?  RAID?  CPU? ...
> 
> latest Mac Pro (currently using latest, will use soon to be released newest as
> well, assuming there will be one) with Thunderbolt TB drives.

Okay. You need best possible hardware of course here.

I have see WD My Book Thunderbolt driver,
    with RAID1  they give 220-250Mb/s

And I have see yet WD have RAID10 with 4 disks inside,
This is even better, because as I understand speed will be doubled.

And of course RAM as much as possible ...


>> * this will be single db in some office or you need install hundreds copies
>> of it worldwide?
>> * your app can work with this db on the same computer? or you will need
>> client/server solution on some reason?
> 
> Initially this will be local for testing, in the future it should be on a
> server, for internet access.

So you going then switch to REALbasic WE?  or  PHP to get inet ?

Point is that with Valentina for RB you have access to API way,
And can work with BitSets ... In Valentina PHP we have not finish this yet


>> * Table which will contain these bit-strings will contain other fields
>> probably. Which?  How much?
> 
> Fewer than 20 fields per record. Data is all integers and booleans.

So bit-string (average 50-70 bytes)
   and another 100 bytes of data ...


>> * let you do query XOR to such table ...
>>  how much records you expect to get into result?
>>     a) near to zero-one
>>     b) few  10-100-1000
>>     c) a lots   -- millions  or even near to billion also
> 
> We want to avoid huge results, but this is one of our unknowns, and one of the
> reasons for having the database is to find out, so the number of results could
> in fact be gigantic.

I have asked this, because we know that Valentina has better
gap for big results ...


>> * what you going to do next with this result?
> 
> We want to query again within results.

Aha, this also make difference...

As I know SQL do not allow do this easy and effective.

But Valentina API way, which works on bitSets on low level really can do
this. But this still raise technical questions to think about ...


=========
So far I have the next idea for your task.

It seems to me, it is possible split your bit-strings into different
 Tables or even dbs and even located on different servers.

For example, 
  bit strings with length  1-4 bytes go to T1
  bit strings with length  5-8 bytes go to T2
  ...
  unlimited length bit-strings go to Tn

This will be okay of course only if satisfy your query,
I.e. If they mainly on match ...

If you do  something find BEGIN WITH  ...
Then all such tables should be asked.

But if they are on different N computers, this can be even good, because you
have scale task ...


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list