Wednesday, December 22, 2010

Fake Id's In Niagara Falls

Comparison of Methods and Types of joins in Oracle

To build the execution plan the optimizer should perform the following basic actions:

  1. determine the order of evaluation of the tables.
  2. determine the method join.
  3. determine the types of access (access path, eg full scan, rowid, index range, etc).
  4. determine the order of filtering.

The first 3 are the tree structure that supports the implementation plan. The 4th defines the data flow "Flows" through the tree. This time I'll just concentrate on point 2, leaving the other for future notes.

The joins are always made between two sets of data, if the sentences had more than two tables are determined by the first two tables both join and the result is joinea the table below, this result is joinea with the following table and so on .
join
The most common methods are:

  • NESTED LOOP JOIN MERGE JOIN SORT JOIN HASH
  • Cartesian join

Description NESTED LOOP JOIN

The two sets of data processed for nested loop (NL) are called outer loop and inner loop. The outer loop is executed once and the inner loop once for each record returned by the outer loop. The main features of NL are:

  • are the best choice when you need to get the front row as soon as possible, so you do not need to process all the data to start returning results. This is very performing, for example, front-end applications that use pagination. Leverages
  • filters and joins conditions using available indexes.
  • can be used with any type of joins.

Description HASH JOIN

The two sets of data processed for hash join (HJ) are build input and probe input. With the build input is built in memory (or temporary tablespace if there is insufficient physical memory available) a hash table. Once built the build input is used to start processing each record in the input probe the hash table so as to compare whether or not it meets the join condition. The main characteristics of HJ are

  • The hash table is usually built using the smaller dataset.
  • Not all types of joins can be used, such as theta joins and cross joins are not supported.
  • That is starting to return rows from the hash table must be created and processed.
  • HJ
  • can not implement joins using indexes conditions.

MERGE JOIN SORT Description

The two sets of data processed by the merge join (MJ) are read and sorted according to the columns referenced in the join condition. Once the two set are sorted are mixed (merge). The ordering is done in memory as long as physical memory is sufficient, but reaches the memory (pga) temporary space should be used as a support which, as expected, slow down operations. The main characteristics of MJ are

  • Both data set must be ordered before merge
  • The first row of the result set is returned recently when the merge starts.
  • All types of joins are supported.

Types of Joins

There are two possible syntaxes for use with joins: SQL-


ANSI-86 SQL-ANSI-92

The first is the general use, and is the most common, the second is newer and is standard for other database engines, is more common for new generations of developers and DBAs, or for those who come to use sql server. It is also clearer because it separates the filters of joins, which is easier to read and interpret. Now I will give a brief overview of the types of joins with examples in the two notations:


Join

Cross also called Cartesian product. It is generally used when the joins are not specified for some tables. I've also seen some private plans where is the best option, although very rare
 
select emp.ename, dept.dname
from emp, dept

select emp.ename, dept.dname
from emp CROSS JOIN dept


Theta Join

also called inner join, and returns only the rows that satisfy a join condition
 
select emp.enam, salgrade.grade
from emp, salgrade WHERE
emp.sal entre salgrade.hisal

salgrade.local and select emp.ename, salgrade.grade
from emp INNER JOIN on emp.sal entre salgrade salgrade.losal and salgrade.hisal



Equi Join

also called natural join is a special case of theta join where
operators are used only for equality join conditions
 
select emp.ename, dept.dname
from emp, dept WHERE
emp.deptno =
dept.deptno
select emp.ename, dept.dname
from emp NATURAL JOIN dept on emp.deptno = dept.deptno


Self
Join
are a special case of theta join where the joined table is the same.
 
select emp.ename, mgr.ename
from emp, emp WHERE mgr
emp.mgr = mgr.empno


select emp.ename, mgr.ename
JOIN emp mgr from emp on emp.mgr = mgr.
empno


Outer Join

The outer join extends the result set of theta joins. With this kind of join the ranks of all the tables involved are returned if no match with the columns join the other table, returning NULL in the columns of the records that matches a table. Oracle uses its own syntax, but it is advisable to use the ANSI-92 syntax as it is portable to other database engines.

For example, to see the number of employees by department, considering also the departments that have no employees:
 
dept.dname select count (emp.ename)
from emp, dept WHERE
emp.deptno dept.deptno = (+)
group by dept.dname

dept.dname select count (emp.ename)
dept from emp LEFT OUTER JOIN on (dept.deptno = emp.deptno) group by dept.dname


With the new syntax also you can use RIGHT OUTER JOIN and FULL OUTER JOIN.

From Oracle 10g you can use a new type of join (or subtype) called partitioned outer join. This type of join a priori would seem to be related to partitioned tables but not in this case, the concept of partitioning is that the data are divided into subset during the execution
 
dept.dname select count (emp.empno)
dept from emp LEFT JOIN PARTITION BY (emp.job) ON emp.deptno = dept.deptno
group by dept.dname


Semi Join

This type of join between two tables returns only rows of a column in the tables which exist in the other join table.

For example, to see that employees are bonus:
 
scott.emp select * from emp

WHERE exists (select null from bon
scott.bonus WHERE emp.EMPNO = bon.ename)

select * from scott
.
emp emp WHERE empno in (select empno from scott.bonus bon)



Join Anti

This type of join between two tables returns only rows of tables whose columns are NOT join in the other table

For example, to consult employees who do not have bonus:
 
scott.emp select * from emp

where not exists (select null from bon
scott.bonus WHERE emp.EMPNO = bon.ename)

select *

scott.emp from emp WHERE empno NOT IN (select empno from scott.bonus bon)


Once reviewed the types of methods joins joins we return and see some examples as plans are put together according to each method:

As always I will create the environment in order to try and if anyone wants to test it in their own environment can:
 
- I table t1 create table T1

as select rownum
c1,
trunc (dbms_random.value (1100)) c2,
dbms_random.string ('a', 100)
c3 from dual connect by rownum
<= 1000000 -- Creo tabla T2 create table t2 as select rownum c1, trunc(dbms_random.value(1,100000)) c2, dbms_random.string('a',100) c3 from dual connect by rownum <= 2000000 -- Creo un indice para la tabla T2 create index t2_idx on t2(c2) -- Recolecto estadisticas para los segmentos creados: begin dbms_stats.gather_table_stats(ownname => user, tabname => 'T1', cascade => true);
dbms_stats.gather_table_stats (ownname => user, tabname => 'T2', cascade => true);
end;


Now I will show each join method, obviously I'm going to tamper with hints to make it simple:

be forced to use NESTED LOOP JOIN:
 
select / * + Leading (t1) use_nl (t2) index (t2) * / count (1)
from t1, t2 WHERE t1
. c2 = t2.c2
and t1.c3> 'zzz'

Plan hash value: 3705558160

-------------------------- -------------------------------------------------- -
1 para que se use MERGE JOIN:



select /*+ ordered use_merge(t2) */ count(1)
from t1, t2
where t1.c2 = t2.c2 and t1.c3 > 'zzz'
Plan hash value: 1164406001

------------------------------------------------------------------------------------------
Information (identified by operation id):
---------------------------------------------------

4 - filter("T1"."C3">'zzz')
5 - access("T1"."C2"="T2"."C2")
filter("T1"."C2"="T2"."C2")


 Forzamos para que se use HASH JOIN: 



select /*+ leading(t1) use_hash(t2) */ t1.*
from t1, t2
where t1.c2 = t2.c2
and t1.c3 > 'zzz'


Plan hash value: 442409572

---------------------------------------------------------------------------------
(%CPU)
Predicate Information (IDENTIFIED BY operation id): ---------------------------------------


------------ 2 - access ("T1". "C2" = "T2". "C2")
3 - filter ("T1". "C3"> 'zzz' )

Comparing the 3 plans for each method is used only with NL index range rather than using a full scan of the index. NL times are best estimates the value of plan. Running each of the 3 queries can be seen that this estimate is in line with reality and that NL is the fastest. This is because both HJ and MJ can not use the index to find matches over the table T2 based on the values \u200b\u200breturned by the table T1. With NL is used to access that information more promptly, via the index. The lower the selectivity (or stronger) the NL method will have greater advantage over the other two.




0 comments:

Post a Comment