Binary links Re: Question Backlog for Valentina mailing list.
Ed Kleban
Ed at Kleban.com
Wed Nov 16 15:25:39 CST 2005
>
>> Let's run through an example. Say I have a table of people that has a
>> recursive relationship for "MotherOf" as a M:1 relationship. If I use an
>> ObjectPtr field for the MotherOf relationship, then finding the mother for
>> any given record is EXCEEDINGLY fast. I simply go to the MotherOf field,
>> and there's the record ID I need for the mother. And if I intend to record
>> a mother for every person in the table then the representation is also
>> exceedingly efficient.
>>
>> I could make it more space efficient, and possibly less speed efficient by
>> using a UShort field instead of an ObjectPtr if I knew, say I was only going
>> to have, say 20,000 records maximum.
>
> This will speed up also, because HDD -- main break in db.
>
>> Question 1: Would use of a UShort field here for 16-bit encoding
>> of RecID actually be slower than use of an ObjectPtr?
>
> NO.
>
> And we have think about giving to developer such ability to specify type of
> ObjectPtr. This bring few problems:
> - mistake of developer
> - type of ObjectPtr different, then how right correct code?
>
> And honestly was no time to go so deeply.
Well, I suppose you could have 3 types of ObjectPtr: LongPtr, ShortPtr,
BytePtr. Furthermore, you could put a "MaxRecords" constraint on tables
that you intended index by ShortPtr or BytePtr. You already have constraint
checking to support record additions in the case of a Unique field for
example. A "too many records" check would be very fast. Similarly, if the
user did indeed blow it, worst case is that the indexes would wrap modulo
word or byte boundaries and refer to the wrong record. Which is
unfortunate, but not particularly dangerous in that it could point to a
non-existant record that has not been allocated.
In fact, this would be no worse of an error than would result if a user
claimed that a binary link was 1->M or M->1 or 1->1 when it was in fact
M->M. If they supply the wrong assertions, they'll get the wrong results as
the system endeavors to optimize search on their behalf.
> But yes in Valentina 2 you can use FOREIGN KEY now, it works as link,and it
> have constraints. So in general you loose nothing here.
You say "lose nothing here", and in terms of functionality I agree. What I
have not seen you state clearly yet however is that "Because of the way
links are implemented in Valentina, if you use say a single-byte or
short-word key field and use as a relation link to a recId, then the
performance is just as fast as if you had used and ObjectPtr."
Question: Is that the case? Or is there a performance penalty for
using foreign key fields in this manner?
> ------------------------
>> In the opposite direction, if I wanted to find all the children of a given
>> mother, it would take much more time since the DB would have to scan every
>> record to look for a matching value in the MotherOf field for a given mother
>> record RecId. For N records the order of performance of the linear search
>> would be o(N).
>
> But usually you have INDEX for such fields. And ObjectPtr is always indexed
> and FK fields also.
I don't see how that makes any difference or helps. If I want to find all
the children of a mother record and I'm using a an ObjectPtr or KeyField and
not using a binary Link, then doesn't V4RB have to examine every single
record to find out if it is a child of a given mother?
Ah, No it does not! As I finally figured out and discussed below.
> ------------
>> Now lets consider Binary Links. And since you've not yet commented on my
>> supposition above, let's assume that the implementation I postulated is
>> indeed close to what Valentina implements; Namely for a M:1 relationship it
>> keeps separate "info" for the "M->1" and the "1->M" directions, and that for
>> the "1->M" direction in fact the execution requires a single binary search
>> of an indexed list of, in this case known mother records, and results in
>> obtaining a list of all the associated children. In this direction the
>> performance is much better then because: The DB does a single binary search
>> on a small list of records known to be mothers and ends up immediately with
>> the same result as would require scanning every record of the table if an
>> ObjectPtr field was used.
>>
>> Question 2: Am I pretty close so far?
>
> Yes, only remember that ObjectPtr is indexed. So search will be the same -
> binary by index.
>
> ObjectPtr and Binary Link here are equal by speed.
No... I'm making some assumptions that imply that Binary links should be
faster than ObjectPtr links for this case. I'm doing that so that you can
tell me that yes my assumptions are true, or that no my assumption is false.
My assumptions are that... Ah, never mind. I get it. (I think by writing,
so I had to write a couple paragraphs that I just erased).
In essence, the reason that a BinaryLink is equal to ObjectPtr in
performance is because doing a binary search on an indexed objectPtr field
containing recId of parent records IS an implementation of the exact
approach I proposed for doing a single binary lookup leading to a table of
children. I get it. For the record, however there may be a tiny difference
between the use of an ObjectPtr field and a BinaryLlink in that the number
of records you're having to do a binary search on is potentially smaller in
the case of a BinaryLink, whereas in the ObjectPtr link case even though
you're doing a binary search it's still over a span of length equal to the
total number of records.
So in retrospect, I've been making this more complicated than it really is.
A binary link is simply a table of ordered pairs of recIds where both sides
are indexed. Thus it is equally efficient to find all the children of a
parent or ... what may in fact be multiple parents of a child. That's why
even though I keep using the terms "1->M" and "M->1" you keep typing in
reply "M->M". The binary link is inherently a "M->M" relationship and use
of it as "M->1", "1->M", or "1->" are simply special usage side cases. The
reason you want the user to supply whether a given side is "one" or 'Many"
is so that you know whether you can stop your search or processing of an
index after you find the first occurrence, or whether you need to look for
more.
>
>> In the reverse direction, find the mother of a given child, I can't conceive
>> of anything special the binary link could do that would be any faster than
>> the ObjectPtr case.
>
> Right, and now it is ieven slow, because we have implement for both
> direction only the general M M way. We still can improve things to get jump
> in director ToOne as fast and by ObjectPtr.
Ah! There it is in black and white. Just what I said.
>> So for the M->1 "info" retained by the link I'm
>> guessing it simply does that.
>
> In ideal yes.
>
>> I.e. it just keeps a mapping for every record
>> that a link is defined for, which in the example above whever every record
>> has a valid MotherOf would essentially end up taking the same amount of
>> space as, and operating in the same amount of time as the ObjectPtr case.
>>
>> Question 3: Am I still on target with this supposition?
>
> Absolutely correct Ed!
>
> And now COMPARE:
>
> ObjectPtr and BinaryLink
>
> 1 : M 1 : M
>
> SPEED THE SAME
>
> SPACE on disk THE SAME
>
> But !!!!!
>
> 1) Binary link do not infect Structure of Table.
>
> 2) Binary Link can be easy converted tomorrow into M:M if your customer or
> you change mind.
>
> 3) IT IS possible for binary link add parameter, that DEVELOPER swear
> todo searches only in one direction, then we can even more win
> in space AND speed of modification.
>
You'd save in space because you'd only need to create an index for one half
of the linkTable, not both sides.
You'd save in speed because... hmm. ...because you could eliminate the
check for finding additional records after you've found the first one on a
"->1" link.
> We need add all this docs asap :-)
>
More information about the Valentina
mailing list