Binary links Re: Question Backlog for Valentina mailing list.

Ed Kleban Ed at Kleban.com
Wed Nov 16 09:52:58 CST 2005






On 11/16/05 1:16 AM, "Ed Kleban" <Ed at kleban.com> wrote:
  
>>> But I just now read all the doc in both the Kernel and RB Reference about
>>> Binary Links and decided I don't have a clue about what they do or how to
>>> use 'em.
>> 
>> They do effectively the same as you do in Relational model with help of
>> third Table to establish M:M link.
>> 
>> What do this table? It just remember pairs  (id1, id2)
>> 
>> BinaryLink do the same, but use RecIDs and it is NOT a table.
>> It is more effective structure.
> 
> Hmm, more effective than a table.  Meaning not a linear list of pairs, or
> meaning not a list that's the same length as one of the two related tables.
> The former I guess.  Hmm... Well, you could certainly make it more efficient
> by keeping two separate indexed lists so you could do a binary search in
> either direction.  And you could make it more efficient with an indexed list
> that pointed to a "bucket" with a block of corresponding related entries so
> that when doing a binary search you didn't have the overhead of duplicates,
> and once you did the seach you had a linear list of corresponding records.
> Yeah, that's probably how I'd do it... if I could avoid the temptation of
> hashing the indexes to avoid the log2N cost of the binary search for large
> sets of links. 

Ok, I've had some time to sleep on this, and am getting excited!

And of course I have more questions.  :)

Let's run through an example.  Say I have a table of people that has a
recursive relationship for "MotherOf" as a M:1 relationship.  If I use an
ObjectPtr field for the MotherOf relationship, then finding the mother for
any given record is EXCEEDINGLY fast.  I simply go to the MotherOf field,
and there's the record ID I need for the mother.  And if I intend to record
a mother for every person in the table then the representation is also
exceedingly efficient.

I could make it more space efficient, and possibly less speed efficient by
using a UShort field instead of an ObjectPtr if I knew, say I was only going
to have, say 20,000 records maximum.

    Question 1: Would use of a UShort field here for 16-bit encoding
    of RecID actually be slower than use of an ObjectPtr?

In the opposite direction, if I wanted to find all the children of a given
mother, it would take much more time since the DB would have to scan every
record to look for a matching value in the MotherOf field for a given mother
record RecId.  For N records the order of performance of the linear search
would be o(N).

Now lets consider Binary Links.  And since you've not yet commented on my
supposition above, let's assume that the implementation I postulated is
indeed close to what Valentina implements; Namely for a M:1 relationship it
keeps separate "info" for the "M->1" and the "1->M" directions, and that for
the "1->M" direction in fact the execution requires a single binary search
of an indexed list of, in this case known mother records, and results in
obtaining a list of all the associated children.  In this direction the
performance is much better then because: The DB does a single binary search
on a small list of records known to be mothers and ends up immediately with
the same result as would require scanning every record of the table if an
ObjectPtr field was used.

    Question 2:  Am I pretty close so far?

In the reverse direction, find the mother of a given child, I can't conceive
of anything special the binary link could do that would be any faster than
the ObjectPtr case.  So for the M->1 "info" retained by the link I'm
guessing it simply does that.  I.e. it just keeps a mapping for every record
that a link is defined for, which in the example above whever every record
has a valid MotherOf would essentially end up taking the same amount of
space as, and operating in the same amount of time as the ObjectPtr case.

    Question 3:  Am I still on target with this supposition?






More information about the Valentina mailing list