Monday, November 3, 2008

How SQL compiles

All SELECT SQL statements have this format:


SELECT [ALL/DISTINCT] target list

FROM table/view

WHERE criteria

GROUP BY target list

HAVING criteria

ORDER BY target list



You can have a select statement using only the SELECT. The statements below are legal.



SELECT 'Hello';

SELECT 2 + 2;


This is an example of how SQL parses your statement. You write:



SELECT HP.Member_Id, HP.Prim_Mem_Id, HP.Carrier_Id, HP.Seq, CA.Add1, CA.Add2, CA.City, CA.State, CA.Zip

FROM dbo.Eligibility_HealthPlan HP

INNER JOIN Carrier C ON ( HP.Carrier_Id = C.Carrier_Id )

LEFT JOIN Carrier_Address CA ON (CA.Carrier_Id = HP.Carrier_Id AND CA.Address_Code = HP.Address_Code)

WHERE HP.Member_Id = '10076' and (HP.seq = -1 or HP.seq = 9)

GROUP BY HP.Member_Id, HP.Prim_Mem_Id, HP.Carrier_Id, HP.Seq, CA.Add1, CA.Add2, CA.City, CA.State, CA.Zip

HAVING COUNT(HP.seq)> 0

ORDER BY HP.member_ID


The SQL parser begins with the FROM part of the statement.

FROM

A virtual table (vt1) is created by joining the first two tables in a Cartesian join. Every row of one table is joined to every row of the second table. All of the columns are qualified by the table or alias name.

ON
The ON is one of the filters. The other filters are WHERE and HAVING. Each row in the virtual table (vt1) where the join condition is true is saved and added to the second virtual table (vt2).

NOTE:
OUTER JOINS

If you are writing an outer join then you mark one or both of the tables as preserved. A preserved table has all of its rows returned whether or not the table fulfills the ON condition. To preserve a table, you select the type of join. The LEFT OUTER JOIN means the left table is preserved, RIGHT OUTER JOIN means the right table is preserved and FULL means that both tables are preserved. A virtual table, vt3, is created. The non-preserved table uses NULL to match the rows in the preserved table.

NOTE:
Adding expressions to ON statements.
You can add a logical expression to the ON statement. If you wanted all of the information from eligibililty_healthplan and carrier where the carrier_id are equal and the effective date is not today you would write either of the following statements:

FROM dbo.Eligibility_HealthPlan HP INNER JOIN Carrier C ON (HP.carrier_id = c.carrier_Id and HP.EFF_Date <> NOW())

FROM dbo.Eligibility_HealthPlan HP
INNER JOIN Carrier C ON (HP.carrier_id = c.carrier_Id )
WHERE HP.EFF_Date <> NOW()

Both statements are functionally the same. The statement works exactly the same in an INNER JOIN. You may have a problem with an OUTER JOIN statement.

FROM dbo.Eligibility_HealthPlan HP LEFT OUTER JOIN Carrier C ON (HP.carrier_id = c.carrier_Id and HP.EFF_Date <> NOW())

The eligibility_healthplan is the preserved table. In vt2 table, a row with an effective date that does not equal today is eliminated. In the vt3 table, the effective date rows are added back to the virtual table because that table is preserved.

If you use an outer join, write the query as follows to avoid cpu cycles:

FROM dbo.Eligibility_HealthPlan HP LEFT OUTER JOIN Carrier C ON (HP.carrier_id = c.carrier_Id )
WHERE HP.EFF_Date <> NOW()

WHERE

The WHERE filter is applied to the vt2 or vt3 table. A virtual table, vt4, is created and the rows where the criteria is true are added. You cannot use aggregate functions (COUNT (), MAX (), MIN ()) in a WHERE clause because the data has not been grouped. You cannot refer to an aliased column name because the SELECT statement has not been processed yet.


GROUP BY
The virtual table,vt5, is created. The rows from vt4 are arranged into groups. A single value for each group is generated. Each group is a unique combination of values based on the target list. With a GROUP BY, the rest of the steps expect each group to return a single value.

HAVING
The virtual table, vt6, is created. The HAVING clause filters on the groups. You can use any aggregate function in the filter. The HAVING clause is the only filter that applies to grouped data.

SELECT
The virtual table, vt7, is created. This is the step where the virtual table looks like a result set. Since this is one of the last steps, an alias that you have created in the SELECT statement may not be used in earlier steps. If you write the following:

SELECT YEAR(eff_date) as yearEffDate FROM claimapproved_det WHERE yearEffDate = ‘2006’

You see an sql syntax error.

If you write:

SELECT year(eff_date) as yearEffDate FROM claimapproved_det ORDER BY yearEffDate

The statement works because the ORDER BY step happens after the SELECT and can use the alias.

DISTINCT
This step removes duplicate rows from the vt7 to create vt8. A DISTINCT clause is redundant when there is a GROUP BY.

ORDER BY
The step where the vt8 rows are sorted by the specified columns. This step does not return a virtual table but returns a cursor.

By definition, a set of values does not have a predetermined order to its rows. The set is a collection and the order of the values within the set does not matter. Any query that applies a sort order to the rows is defined as a cursor.

The ORDER BY clause cannot be used in a view, table-valued function, sub-query, or derived table, or common tab
Publish Post
le expression (CTE).

1 comment:

Mike Zimmerman said...

Hey, Lefty. Mike Zimmerman from Whiffling Straits here. Thanks for the comment -- thought I'd return the favor! I'd love to pick your brain a little more about the lefty/righty golf thing. I'm actually laying the groundwork for writing a book and could use all the input I can get. If you're up for it, my personal e-mail is mikezim883@yahoo.com. I'd love to hear from you. Thanks!

Mike