Speed up SQL
Ivan Smahin
IvanSmahin at gmail.com
Tue Jan 17 05:36:25 CST 2012
On Jan 17, 2012, at 1:27 PM, Ivan Smahin wrote:
>
> On Jan 16, 2012, at 2:07 PM, Beatrix Willius wrote:
>
>>
>> SELECT Message.MailboxID, Message.RecID, Message.OriginatorFrom as [From], Message.ReceiverTo as [To], Message.MainSubject as [Subject], Message.MainDateTime as [Date], Message.AttachmentCount as [Att.], MailboxName(Mailbox.Mailboxpath) as [Mailbox], lower(Message.OriginatorFrom) as [Order_From], lower(Message.ReceiverTo) as [Order_To], lower(Message.MainSubject)as [Order_Subject]
>> FROM message, mailbox
>> where Message.MessageBody REGEX '(?i)osmer' or Message.OriginatorFrom REGEX '(?i)osmer' OR Message.ReceiverTo REGEX '(?i)osmer' OR Message.MainSubject REGEX '(?i)osmer' OR message.RecID in (select MailID from Attachment where Attachment.AttachmentName REGEX '(?i)osmer')
>> ORDER By [Date] ASC
>>
>> is still slow with only one embedded SQL. The select clause can't be changed!
>>
>> The data I'm just uploading contains different data. So to make sure that the SQL finds something exchange "osmer" by "archer".
>>
>>
>
> First of all - why such a query is slow.
> You do a join of two tables. Except a join condition, there are few "OR conditons".
> In some cases - if that conditions depends on fields from a single table only we can optimize the query performing such set of conditions before the join. So, after all, we are joining not all records from both tables but only some subset of records - greatly improve performance.
> I mean:
>
> Message.MessageBody REGEX '(?i)archer'
> or Message.OriginatorFrom REGEX '(?i)archer'
> OR Message.ReceiverTo REGEX '(?i)archer'
> OR Message.MainSubject REGEX '(?i)archer'
>
> will be done before the join. And most important thing - it will be done only once.
>
> But
> OR message.RecID in (
> select
> MailID
> from
> Attachment
> where
> Attachment.AttachmentName REGEX '(?i)archer')
>
> can not be optimized in the same way - because it is a sort of correlated subquery (there are a field from message and Attachment tables). We have to perform this part of predicate after the join and we have to check each message.RecID from the join result.
>
> Even worst - because it is "OR" predicate we can not optimize it at all. So we are doing a join on full set of message's records and then filter them out checking "where" condition against each record from join result!
>
> Solution is:
> In theory - there is no such a correlated query which can not be expressed via join.
> So we have to avoid subqueries - especially in "where" clause against a "big" data.
>
>
> SELECT
> Message.MailboxID,
> Message.RecID,
> Message.OriginatorFrom as [From],
> Message.ReceiverTo as [To],
> Message.MainSubject as [Subject],
> Message.MainDateTime as [Date],
> Message.AttachmentCount as [Att.],
> MailboxName(Mailbox.Mailboxpath) as [Mailbox],
> lower(Message.OriginatorFrom) as [Order_From],
> lower(Message.ReceiverTo) as [Order_To],
> lower(Message.MainSubject)as [Order_Subject]
> FROM
> message INNER JOIN mailbox ON Message.MailboxID = Mailbox.RecID
> LEFT OUTER JOIN Attachment ON message.RecID = MailID
> where
> Message.MessageBody REGEX '(?i)archer'
> or Message.OriginatorFrom REGEX '(?i)archer'
> OR Message.ReceiverTo REGEX '(?i)archer'
> OR Message.MainSubject REGEX '(?i)archer'
> OR Attachment.AttachmentName REGEX '(?i)archer'
> ORDER By
> [Date] ASC
>
> One more join cost much more less.
> "Left join" is to avoid reducing of the first join-result. We just join an existed Attachment records.
>
> "OR Attachment.AttachmentName REGEX '(?i)archer'" filters the records.
>
> --
> Best regards,
> Ivan Smahin
> Senior Software Engineer
> Paradigma Software, Inc
> Valentina - The Ultra-Fast Database
> http://www.valentina-db.com
>
--
Best regards,
Ivan Smahin
Senior Software Engineer
Paradigma Software, Inc
Valentina - The Ultra-Fast Database
http://www.valentina-db.com
More information about the Valentina
mailing list