[ALL] let's think about Query Language for 2.0

Fri May 23 20:53:42 CDT 2003

On Freitag, Mai 23, 2003, at 04:24  Uhr, Ruslan Zasukhin wrote:

> on 5/23/03 1:52 PM, Andreas Grosam at agrosam at computerworks.ch wrote:
>
> Hi Andreas,
>
>> Here are my thoughts - although i did not address each of your ideas.
>> [sorry please - i apologise for that huge mail ;)]
>>
>> If I understood you right, firstly, your primary concern is to
>> eliminate the overhead imposed by a SQL parser when processing queries
>> specified in a string. IMO, the overhead will be not so significant -
>> compared to a complex query requiring to access a disc.
>
> Not a fact, Andreas.
>
> One of Valentina C++ developers have point me that when he use BitSets 
> he
> can get speed of query up to 100 times faster.
>
> Parser really can take MANY time, Andreas.
> Not only because of text. Also optimizer, planner.
> With Bitsets developer __SOMETIMES__ can write VERY optimal plan.

OK, I see. Certainly, a full fledged SQL parser is a dreadful beast.

Besides this however - the API should be independent of the underlying 
implementation. I guess, a bitset is too implementation specific. What, 
if you sometimes change your Storage Manager using a different 
method/layout for storing the records making bitsets inappropriate?

As mentioned in the previous mail - in a relational DBMS the user / app 
developer gets access to the data by means of Relations only. The query 
should be declarative - even when using an API. Then the implementation 
chooses the best method to select the records.

You should assume, that an app developer or a user NEVER is able to 
choose the fastest method to get the records. Only an optimiser has 
enough knowledge how to choose the best plan. For instance it (the 
optimiser) knows: the analytical model - a formula modeling the 
hardware and certain components of the DBMS-  namely, disk latency, 
seek time, sequential read rate, size of buffer cache, buffer replace 
policy, read ahead policy, page utilisation, clustering factor, 
estimated number of records in the result set, access history, access 
method, etc., etc.
What and how does a user choose this:  a full scan or a search by using 
the btree index?? How will he best partition the files in order to 
minimise page faults in the buffer cache? Will he mmap the file or is 
it better to read pages and extract single records per ordinary file 
IO? Are the columns of a table vertical decomposed?
Well, managing all this and taking into account is the job of the query 
optimiser and the query executer. A user must not faced with it! You 
should attempt to achieve the same performance as the C++ developer 
mentioned above using the bitset. This sounds very difficult - but you 
need NOT to make the best optimiser ever in the first release- this is 
a part of a DBMS which can be improved over time.
Consider, a Relation is a bitset is a cursors is a vector is a base 
file  - or what ever. This is totally opaque to the user - he should 
use a Relation for accessing records and nothing else. Then you have 
the greatest possible freedom of how to implement it.

Andreas

>
> The bigger language parser can understand then longer path to find 
> tokens.
> Also, note, that if size of Valentina kernel itself have about 
> 0.6-0.8MB
> The Parser looks to get 0.5MB self. Not bad, yes?
>
>
> -- 
> Best regards,
> Ruslan Zasukhin      [ I feel the need...the need for speed ]
> -------------------------------------------------------------
> e-mail: ruslan at paradigmasoft.com
> web: http://www.paradigmasoft.com
>
> To subscribe to the Valentina mail list go to:
> http://lists.macserve.net/mailman/listinfo/valentina
> -------------------------------------------------------------
>
> _______________________________________________
> Valentina mailing list
> Valentina at lists.macserve.net
> http://lists.macserve.net/mailman/listinfo/valentina
>