Using OrderedSets Re: [IDEA] BinaryLink that supports order !!!

Fri Dec 23 18:31:18 CST 2005

On 12/23/05 5:43 PM, "Ruslan Zasukhin" <sunshine at public.kherson.ua> wrote:

> On 12/18/05 9:07 AM, "Ed Kleban" <Ed at Kleban.com> wrote:
>  

[Since Ruslan has posted this public reply to my formerly private message
(which is fine) I'll point out for those who are now joining the thread and
trying to make sense of it that "OBL" refers to the OrderedBinaryLink
concept Ruslan has proposed implementing in a future version of Valentina.]

>> 
>> The benefits of using OBL therefore are that:
>> 
>> 1) The OBL saves you the hassle of having to explicitly allocate the
>> additional column to keep the order, because it effectively moves that
>> column into the OBL automatically and manages it for you no matter whether
>> OBL is implemented with (A), (B), or (C) above.  This is a benefit you get
>> whether using OBL from SQL or from API, and...

[(A), (B), and (C) being three possible different ways I had concluded that
Ruslan might choose to implement OBL]

> 
> Right!
>  
>> 2) Perhaps...  Actually (1) is the only benefit I can come up with.  As you
>> point out it makes the SQL a little more clear perhaps (at a penalty for
>> using non-standard SQL) but that's a minor issue.  Perhaps you can come up
>> with some words for other benefits this offers.  But (1) is a GREAT benefit.
> 
> 2 main befits as always in Valentina
> 
> 1) SIMPLER for developer, because Valentina take care self on stupid
>         error prone, repeating task, that is not so easy for developer.

Which is a restatement of (1) above.

> 
> 2) we get MUCH FASTER solution than based on RDBMS way.

Which is what you explain quite wonderfully in great detail below.  However,
I don't think that any of the benefits you describe below are necessarily
benefits of using OrderedBinaryLinks, but rather they are inherent benefits
of Valentina; namely from the fact that V2 supports Binary Links in the
first place, and more fundamentally because:

a) Each field of a record is stored as a separate logical construct, thus
obviating the need for reading in all of the field values for any record
that is examined.

b) RecIds do not change when new insertions, deletions or updates occur.

c) The field values required for implementing binary links are not stored as
fields in the records of the tables they define relationships between.

> Now you want to know why. Yes? :-)

You know me too well.  ;-)   Yes, of course.  I always want to know why.

> Because. First of all look how data are stored in that INDEX column for
> RDBMS:
> 
>     * let you have table with 10-20-50 columns.
>     * one of them is INDEX column.
> 
>     * when you change order of some group, you get N records to
>         update value of INDEX column.
> 
>     * How to update record: right!
>           DB need load into RAM each such record, with all its fields.
>           in fact DB load the whole page where such record is.
> 
>     * Now RDBMS can update value of that record, and go to next.
> 
>     * Ops. RDBMS need yet jump to index and DeleteOld + AddNew
["Oops"]
> 
> It is easy see that the bigger record of table, the more data from disk you
> need load. I.e. The number of fields in table affect speed of this
> operation. 

and the more that local Valentina and system virtual memory paging works
against you to make things slower instead of working for you to make things
faster as it does for Valentina

> 
> ------------
> Valentina instead will touch very precisely only required group of values to
> be changed. 
> 
> 
> ----------
> Example in numbers:
> 
>     let you have table in million records.
>     let avg size of row is 100 bytes only.
>     let you need correct index in group of 10,000 records.
> 
>     let this 10,000 records located in random order.
>         Hmm, I think this is real life case.
>         so each 100th record is record to be changed.
> 
> 
> * RDBMS *
> 
>     We have (1M * 100 byte) / 4K = 25,000 pages for table.
> 
>     100 records * 10 bytes = 10KB this is 3 pages of 4K size.
>     So we need touch each third page of table.
> 
>     We need load about 8300 pages into RAM. Change value,
>     and write them back.
> 
>       8300 pages * 4 k = 33 MB for read + 33 MB for write.
> 
>     PLUS we have probably index for this field.
> 
> 
> * Valentina * 
> 
>     Valentina need "load - correct - write" in average only 20 Kb.
> 
>     Even better: for operations 'Insert At position'
>     only one page will be touched.
> 
> Now you can count advantage:
>         at least 70+ MB / 20 Kb = 3500 times.
> 
> IMHO, not bad.  

Sorry Ruslan, but you should not be permitted to use the abbreviation
"IMHO", because just like me your opinions are rarely ever humble.

> If take the best case for RDBMS -- all records are located together, we get
> 10K * 100 = 1MB. So we still win 50 times.
>   
> ----------------------------------------
> 3) I can point another advantage, Ed.
>   When db engine *knows* something more it can do things more smart.
> 
> At least you get advantage, that you can develop some Valentina Database in
> REALbasic, but this feature will work correctly and in the same way in other
> languages and IDEs. For example Valentina Studio will correctly you show
> order of records.
>     
> Compare to RDBMS. You application know that INDEX field means, and YOUR
> application have algorithms to work with it. But other apps believe that
> this is just USHORT column.
> 

Yes.  But if you implement BytePtr, MediumPtr, and ShortPtr as we have
discussed, then not only will my app know it but other V2-savvy apps such as
Valentina Studio will know it and can take advantage of it as well.  In the
mean time I can declare these as foreign key links to provide clues to both
SQL statements that view this as a traditional RDBMS and to applications
like Valentina Studio.  While in my own code I can use API calls to treat
these as if the were in fact ObjectPtr fields, and thus gain 100% of the
performance benefits with additional 25% to 75% storage benefit, plus any
additional performance benefit that comes from less paging due to smaller
table sizes. 

> ------------------------------------
> 4) and even 4th advantage can be pointed.
> 
> * in RDBMS way * to update indexed we need lock records of table.
> 
> * in Valentina way * we operate on Binary Link, so records of table is not
> locked. This can be good for multi-user concurrency.
> 

Hmm... I've been ignoring concurrency issues so far, but sooner or later
I'll have to deal with those.  I need to think about these some more before
I will be ready to discuss them.

Thanks for the tutorial Ruslan.  You make a great sales pitch!

I guess I can live with a database that offer performance that is 50 to 3500
times faster than using SQL with a traditional RDMS storage structure.

But keep reading that email you are replying to.  I'd like to get another
very large boost in performance by having API calls that treat VArraySets as
Ordered sets so that I can avoid making calls to do what you describe above
more than once instead of making repeated calls throughout the execution of
my application.

--Ed