The NightisTooShort: 10 TipstoImproveETL Performance10 Tips to Improve ETL Performance (ADW) 1. Use...
Transcript of The NightisTooShort: 10 TipstoImproveETL Performance10 Tips to Improve ETL Performance (ADW) 1. Use...
BASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I. BR. | GENEVA HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICHBASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I. BR. | GENEVA HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH
danischnider.wordpress.com@dani_schnider
The Night is Too Short:10 Tips to Improve ETL PerformanceDani Schnider, Trivadis AG
BASEL | BERN | BRUGG | BUCHAREST | COPENHAGEN | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. GENEVA | HAMBURG | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH
Dani Schnider• Senior Principal Consultant at Trivadis AG in
Glattbrugg/Zurich
• Trainer of several Trivadis courses
• Co-Author of Books “Data Warehousing mitOracle” and “Data Warehouse Blueprints”
• Oracle ACE
@dani_schnider danischnider.wordpress.com
Blog Post: 10 Tips to Improve ETL Performance
https://danischnider.wordpress.com/2017/07/23/10-tips-to-improve-etl-performance/
Tip 1: Use Set-based Operations
DECLARECURSOR cur_source IS
SELECT * FROM source;BEGIN
FOR c IN cur_source LOOPINSERT INTO targetVALUES c;
END LOOP;END;
INSERT INTO targetSELECT * FROM source
Set-based Row-based
Demo
Tip 3: Drop Unnecessary Indexes
Full Table Scan or Index Scan?
• Full table scans are good• For queries with weak selectivity• High percentage of data is read
• Index scans are good• For queries with strong selectivity• Small percentage of data is read (< 1-2 %)
• Typically for DWH and ETL• High percentage (often 100%) in ETL• Queries with aggregations on large data sets
Tip 4: Avoid Functions in WHERE Conditions
------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows | A-Rows |------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | | 1035 ||* 1 | TABLE ACCESS FULL| ADDRESSES | 1 | 1043 | 1035 |------------------------------------------------------------------1 - filter(("CITY"='Basel' AND "CTR_CODE"='CH'))
SELECT * FROM addressesWHERE ctr_code = 'CH' AND city = 'Basel';
SQL or PL/SQL functions in WHERE conditions are hard to estimate for the optimizer
Tip 4: Avoid Functions in WHERE Conditions
------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows | A-Rows |------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | | 1035 ||* 1 | TABLE ACCESS FULL| ADDRESSES | 1 | 26 | 1035 |------------------------------------------------------------------
1 - filter((UPPER("CITY")='BASEL' AND UPPER("CTR_CODE")='CH'))
SELECT * FROM addressesWHERE UPPER(ctr_code) = 'CH' AND UPPER(city) = 'BASEL';
SQL or PL/SQL functions in WHERE conditions are hard to estimate for the optimizer
Demo
Tip 5: Take Care of OR in WHERE Condition
SELECT CASEWHEN t.empno IS NULL THEN 'INS'WHEN s.empno IS NULL THEN 'DEL'
ELSE 'UPD'END dml_flag
, NVL(s.empno, t.empno) empno, s.ename, s.job, s.mgr, s.sal, s.comm, s.deptno
FROM emp_source sFULL JOIN emp_target t ON (s.empno = t.empno)
WHERE (NVL(s.ename, '(null)') != NVL(t.ename, '(null)'))OR (NVL(s.job, '(null)') != NVL(t.job, '(null)'))OR (NVL(s.mgr, -999999) != NVL(t.mgr, -999999))OR (NVL(s.sal, -999999) != NVL(t.sal, -999999))OR (NVL(s.comm, -999999) != NVL(t.comm, -999999))OR (NVL(s.deptno, -999999) != NVL(t.deptno, -999999))
Example: Delta detection between two tables
Tip 5: Take Care of OR in WHERE Condition
SELECT CASEWHEN t.empno IS NULL THEN 'INS'WHEN s.empno IS NULL THEN 'DEL'
ELSE 'UPD'END dml_flag
, NVL(s.empno, t.empno) empno, s.ename, s.job, s.mgr, s.sal, s.comm, s.deptno
FROM emp_source sFULL JOIN emp_target t ON (s.empno = t.empno)
WHERE DECODE(s.ename, t.ename, 0, 1)+ DECODE(s.job, t.job, 0, 1)+ DECODE(s.mgr, t.mgr, 0, 1)+ DECODE(s.sal, t.sal, 0, 1)+ DECODE(s.comm, t.comm, 0, 1)+ DECODE(s.deptno, t.deptno, 0, 1) > 0
Example: Delta detection between two tables
Tip 7: Use WITH to Split Complex Queries
WITH a AS (SELECT ...FROM t3JOIN t4 ON ...
WHERE ...), b AS (SELECT ...
FROM t5WHERE ...)
, c AS (SELECT ...FROM aJOIN b ON ...)
, d AS (SELECT ...FROM t1JOIN t2 ON ...JOIN c ON ...)
SELECT ...FROM d
WHERE ...
SELECT ...FROM (SELECT ...
FROM t1JOIN t2 ON ...JOIN (SELECT ...
FROM (SELECT ...FROM t3JOIN t4 ON ...
WHERE ...) aJOIN (SELECT ...
FROM t5WHERE ...) b
ON ...WHERE ...) c
) dWHERE ...
WITH a AS (SELECT /*+ materialize */ ...FROM t3JOIN t4 ON ...
WHERE ...), b AS (SELECT /*+ materialize */ ...
FROM t5WHERE ...)
, c AS (SELECT /*+ materialize */ ...FROM aJOIN b ON ...)
, d AS (SELECT /*+ materialize */ ...FROM t1JOIN t2 ON ...JOIN c ON ...)
SELECT ...FROM d
WHERE ...Demo
Tip 8: Run Statements in Parallel
------------------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | || 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | Q1,00 | P->S | QC (RAND) |
| 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| TARGET | Q1,00 | PCWP | || 4 | OPTIMIZER STATISTICS GATHERING | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | Q1,00 | PCWC | || 6 | TABLE ACCESS FULL | SOURCE | Q1,00 | PCWP | |
------------------------------------------------------------------------------------- Degree of Parallelism is 8 because of table property
ALTER SESSION ENABLE PARALLEL DML;INSERT /*+ PARALLEL (target, 8) */ INTO targetSELECT /*+ PARALLEL (source, 8) */ * FROM source;
Tip 9: Perform Direct-Path INSERT
Direct-path INSERT is also used for Parallel DML statements
INSERT /*+ APPEND */ INTO salesSELECT * FROM stage_sales;
Append new table blocks above high water mark is faster than conventional INSERT
Tip 9: Perform Direct-Path INSERT
10 ETL Performance Tipps - DOAG 201821 21.11.2018
Restrictions must be considered:
• If FK constraints are defined,PDML / direct-path is disabled
• Conventional load is used
Recommendation:
• Define reliable constraints
----------------------------------------------------| Id | Operation | Name |----------------------------------------------------| 0 | INSERT STATEMENT | || 1 | LOAD TABLE CONVENTIONAL | SALES || 2 | PX COORDINATOR | || 3 | PX SEND QC (RANDOM) | :TQ10000 || 4 | PX BLOCK ITERATOR | || 5 | TABLE ACCESS STORAGE FULL| STG_SALES |----------------------------------------------------
Note-----
- automatic DOP: Computed Degree of Parallelism is 8- PDML disabled because parent referential constraints
are present
ALTER TABLE salesADD FOREIGN KEY (cust_id) REFERENCES countriesRELY DISABLE NOVALIDATE
Tip 10: Gather Statistics after Loading each Table
10 ETL Performance Tipps - DOAG 201822 21.11.2018
DBMS_STATS.gather_table_stats(ownname => 'DWH',tabname => 'T1',no_invalidate => FALSE);
T1
T2
T3DBMS_STATS.gather_table_stats
(ownname => 'DWH',tabname => 'T2',no_invalidate => FALSE);
| 1 | INSERT STATEMENT | | 1500 || 2 | INSERT | T3 | 1500 || 3 | HASH JOIN | | 1500 || 4 | TABLE ACCESS FULL| T1 | 2000 || 5 | TABLE ACCESS FULL| T2 | 3000 |
ETL Job
T1
2000
T2
3000
T3
1500
Tip 10: Gather Statistics after Loading each Table
10 ETL Performance Tipps - DOAG 201823 21.11.2018
Since Oracle 12c, Online Statistics Gathering is used for the following cases:
• CREATE TABLE AS SELECT
• Direct-Load INSERT into empty table (after TRUNCATE)
• Direct-Load INSERT into non-empty table (ADW only)
------------------------------------------------------| Id | Operation | Name |------------------------------------------------------| 0 | INSERT STATEMENT | || 1 | LOAD AS SELECT | TARGET || 2 | PX COORDINATOR | || 3 | PX SEND QC (RANDOM) | :TQ10000 || 4 | OPTIMIZER STATISTICS GATHERING | || 5 | PX BLOCK ITERATOR | || 6 | TABLE ACCESS STORAGE FULL | SOURCE |------------------------------------------------------
Demo
10 Tips to Improve ETL Performance
1. Use Set-based Operations2. Avoid Nested Loops3. Drop Unnecessary Indexes4. Avoid Functions in WHERE Condition5. Take Care of OR in WHERE Condition6. Reduce Data as Early as Possible7. Use WITH to Split Complex Queries8. Run Statements in Parallel9. Perform Direct-Path INSERT10. Gather Statistics after Loading each Table
10 Tips to Improve ETL Performance – Revised for ADW
https://danischnider.wordpress.com/2018/07/20/10-tips-to-improve-etl-performance-revised-for-adwc/
10 Tips to Improve ETL Performance (ADW)
1. Use Set-based Operations2. Avoid Nested Loops3. Drop Unnecessary Indexes4. Avoid Functions in WHERE Condition5. Take Care of OR in WHERE Condition6. Reduce Data as Early as Possible7. Use WITH to Split Complex Queries8. Run Statements in Parallel9. Perform Direct-Path INSERT10. Gather Statistics after Loading each Table