[ALL] let's think about Query Language for 2.0
Andreas Grosam
agrosam at computerworks.ch
Fri May 23 20:53:42 CDT 2003
On Freitag, Mai 23, 2003, at 04:24 Uhr, Ruslan Zasukhin wrote:
> on 5/23/03 1:52 PM, Andreas Grosam at agrosam at computerworks.ch wrote:
>
> Hi Andreas,
>
>> Here are my thoughts - although i did not address each of your ideas.
>> [sorry please - i apologise for that huge mail ;)]
>>
>> If I understood you right, firstly, your primary concern is to
>> eliminate the overhead imposed by a SQL parser when processing queries
>> specified in a string. IMO, the overhead will be not so significant -
>> compared to a complex query requiring to access a disc.
>
> Not a fact, Andreas.
>
> One of Valentina C++ developers have point me that when he use BitSets
> he
> can get speed of query up to 100 times faster.
>
> Parser really can take MANY time, Andreas.
> Not only because of text. Also optimizer, planner.
> With Bitsets developer __SOMETIMES__ can write VERY optimal plan.
OK, I see. Certainly, a full fledged SQL parser is a dreadful beast.
Besides this however - the API should be independent of the underlying
implementation. I guess, a bitset is too implementation specific. What,
if you sometimes change your Storage Manager using a different
method/layout for storing the records making bitsets inappropriate?
As mentioned in the previous mail - in a relational DBMS the user / app
developer gets access to the data by means of Relations only. The query
should be declarative - even when using an API. Then the implementation
chooses the best method to select the records.
You should assume, that an app developer or a user NEVER is able to
choose the fastest method to get the records. Only an optimiser has
enough knowledge how to choose the best plan. For instance it (the
optimiser) knows: the analytical model - a formula modeling the
hardware and certain components of the DBMS- namely, disk latency,
seek time, sequential read rate, size of buffer cache, buffer replace
policy, read ahead policy, page utilisation, clustering factor,
estimated number of records in the result set, access history, access
method, etc., etc.
What and how does a user choose this: a full scan or a search by using
the btree index?? How will he best partition the files in order to
minimise page faults in the buffer cache? Will he mmap the file or is
it better to read pages and extract single records per ordinary file
IO? Are the columns of a table vertical decomposed?
Well, managing all this and taking into account is the job of the query
optimiser and the query executer. A user must not faced with it! You
should attempt to achieve the same performance as the C++ developer
mentioned above using the bitset. This sounds very difficult - but you
need NOT to make the best optimiser ever in the first release- this is
a part of a DBMS which can be improved over time.
Consider, a Relation is a bitset is a cursors is a vector is a base
file - or what ever. This is totally opaque to the user - he should
use a Relation for accessing records and nothing else. Then you have
the greatest possible freedom of how to implement it.
Andreas
>
> The bigger language parser can understand then longer path to find
> tokens.
> Also, note, that if size of Valentina kernel itself have about
> 0.6-0.8MB
> The Parser looks to get 0.5MB self. Not bad, yes?
>
>
> --
> Best regards,
> Ruslan Zasukhin [ I feel the need...the need for speed ]
> -------------------------------------------------------------
> e-mail: ruslan at paradigmasoft.com
> web: http://www.paradigmasoft.com
>
> To subscribe to the Valentina mail list go to:
> http://lists.macserve.net/mailman/listinfo/valentina
> -------------------------------------------------------------
>
> _______________________________________________
> Valentina mailing list
> Valentina at lists.macserve.net
> http://lists.macserve.net/mailman/listinfo/valentina
>
More information about the Valentina
mailing list