Friday, 22 February 2008

Parallel DDL, Parallel DML, and Parallel Queries

Parallel DDL, Parallel DML, and Parallel Queries

Oracle 8i supports the following parallel operations:
table scan, nested loop join, sort merge join, hash join, “not in”, group by,
select distinct, union and union all, aggregation, PL/SQL functions called from SQL,
order by, create table as select, create index, rebuild index, rebuild index partition,
move partition, split partition, update, delete, insert ... select,
enable constraint (table scan is parallelized), star transformation, cube, and rollup.
Parallel query can be established at both the instance and schema object level. The default is set at the schema object level.
Parallel DML automatically performs multiple DML activities and removes the disadvantages of manual parallel DML such as requiring multiple open sessions, manual coordination of transaction commits and the complexity associated with determining where to “split” the work.
Use Parallel DML to speed up DML operations against large database objects. Very useful in a DSS requiring performance and scalability.
Parallel DML setup overhead takes away any advantage for using it with OLTP transactions, but can improve performance of batch OLTP type jobs.
Use the ALTER SESSION ENABLE PARALLEL DML statement. Parallel DML is disabled by default.
Set parameter PARALLEL_AUTOMATIC_TUNING= TRUE for Oracle to automatically tune and set intelligent defaults for parallel execution.
When parallel query is set at the instance level, PARALLEL_AUTOMATIC_TUNING=TRUE, parallel execution buffers are allocated from the large pool. Otherwise, parallel execution buffers are allocated from the shared pool.
Can set degree of parallelism in the PARALLEL clause which is part of the CREATE TABLE command. Can specify PARALLEL with or without an integer representing the degree. If degree is not specified, Oracle will determine optimal degree value based upon the PARALLEL_THREADS_PER_CPU parameter.
If OPS is being used, the quantity of CPU’s and instances are used to calculate the default number of query servers or threads used.
Example of PDML:
CREATE TABLE table1_name
PARALLEL
AS SELECT * FROM table2_name ;
Can also specify a parallel hint in the form:
/*+ PARALLEL(table, integer) */
PARALLEL_DEGREE_LIMIT is a resource limit that can be specified for a profile and then assigned to a user. Controls the maximum degree of parallelism that a user can deploy.
Some restrictions when using Parallel DML:
- If a table is not partitioned then no parallel update or delete operations are allowed.
- Once a parallel DML statement has modified a table including a serial direct-load insert, no subsequent serial or parallel DML or query statements can access the table in the same transaction.
- Parallel DML is not supported for triggers.
- Parallel DML is not supported in replication.
- Parallel DML is not allowed with self-referential integrity, delete cascade and deferred integrity.
- Parallel DML is not supported on tables with object columns or LOB columns.
- Parallel DML is not supported on index-organized or clustered tables.
- Parallel DML txns can not be distributed txns.
If you do use hints on both parts and the degrees of parallelism are different, the degree of parallelism that will be used will be that which is specified on the INSERT parallel hint.
If you do a JOIN on two tables that have different degrees of parallelism specified, the degree used for the join operation will be the higher of the two.
There are 2 types of partition-wise joins, full and partial. Benefits of creating a partition-wise join include skipping or reducing the “redistribution” when one or both of the tables in the join are partitioned on the join key. This increases overall performance and reduces the required memory and temp storage.
A full partition-wise join takes a large join and turns it into individual joins between the partitions of each join table. Both tables must be equipartitioned on the join key.
Partial partition-wise joins are used when only one of the tables being joined is partitioned on the join key.
Oracle performs parallel transaction recovery (rollback and rollforward) after transaction failure, process failure, instance failure, and system failure.
Set the FAST_START_PARALLEL_ROLLBACK so that parallel transaction recovery will occur. If not set, no parallel transaction recovery. Valid values are HIGH, LOW, or FALSE. The default value is LOW.
SMON determines if a large recovery transaction becomes a candidate for parallel recovery. When performing parallel recovery, SMON is the coordinator and the slaves are the individual recovery processes.
When the coordinator process fails, PMON recovers the coordinator and then all slave processes rollback their own work in parallel.
When a single slave process fails, PMON will take care of rolling back the work for the failed process. All the remaining slaves’ processes will rollback their own work.
When multiple slave processes fail, PMON will rollback all work serially.
Related Data Dictionary Views:
V$PX_SESSION - contains real-time data for sessions running parallel execution. The columns in this view are:
SADDR - session address
SID - session identifier
SERIAL# - session serial number
QCSID - session identifier of the parallel coordinator
QCSERIAL# - session serial number of the parallel coordinator
QCINST_ID - instance number on which the parallel coordinator is running
SERVER_GROUP - logical group of servers to which this parallel server process belongs
SERVER_SET - logical set of servers that this parallel server process belongs to. A single server group will have at most two server sets
SERVER# - logical number of a parallel server process within a server set
DEGREE - degree of parallelism being used by the server set
REQ_DEGREE - degree of parallelism that was requested by the user when the statement was issued and prior to any resource, multi-user, or load balancing reductions
V$PX_SESSTAT - join view of v$px_session and v$sesstat views. Has info about the session running parallel execution: Include the following columns:
SADDR - session address
SID - session identifier
SERIAL# - session serial number
QCSID - session identifier of the parallel coordinator
QCSERIAL# - session serial number of the parallel coordinator
QCINST_ID - instance number on which the parallel coordinator is running
SERVER_GROUP - logical group of servers to which this parallel server process belongs
SERVER_SET - logical set of servers that this parallel server process belongs to. A single server group will have at most two server sets
SERVER# - logical number of a parallel server process within a server set
DEGREE - degree of parallelism being used by the server set
REQ_DEGREE - degree of parallelism that was requested by the user when the statement was issued and prior to any resource, multi-user, or load balancing reductions
STATISTIC# - Statistic number (identifier)
VALUE - Statistic value
V$PX_PROCESS - Has info for session running parallel execution processes. Includes the following columns:
SERVER_NAME - name of the parallel server (P000, P001, etc)
STATUS - state of the parallel server. Either In Use or Available
PID - process identifier
SPID - OS process ID
SID - session ID of slave, if in use
SERIAL# - session serial number of slave, if in use
V$PX_PROCESS_SYSSTAT - Status of query server and buffer allocation stats. Has parallel statistic info about processes. Has the following columns:
STATISTIC - name of the statistic
VALUE - value of the statistic
V$PQ_SESSTAT - Status of all current server groups. Info about how load balancing and multi-user algorithms are affecting hinted and default values. Will be come obsolete in future release.
V$FILESTAT - Use to evaluate I/O and workload.
V$PQ_TQSTAT - Shows message traffic at the table queue level. Data is valid when queried from a session executing parallel SQL statements.
V$SESSTAT and V$SYSSTAT - Shows parallel execution stats for each session, including total number of queries, DML and DDL statements executed in a given session.
V$SESSION - Includes a new column PDML_ENABLED that contains a value of YES or NO to indicate if PDML is enabled or not.
Examples of Serial and Parallel Direct-Load INSERT
You can specify serial direct-load INSERT with the APPEND hint, for example:
INSERT /*+ APPEND */ INTO emp
SELECT * FROM t_emp;
COMMIT;
You can specify parallel direct-load INSERT by setting the PARALLEL attribute of the table into which rows are inserted, for example:
ALTER TABLE emp PARALLEL (10);
ALTER SESSION ENABLE PARALLEL DML;
INSERT INTO emp
SELECT * FROM t_emp;
COMMIT;
You can also specify parallelism for the SELECT operation by setting the PARALLEL attribute of the table from which rows are selected:
ALTER TABLE emp PARALLEL (10);
ALTER TABLE t_emp PARALLEL (10);
ALTER SESSION ENABLE PARALLEL DML;
INSERT INTO emp
SELECT * FROM t_emp;
COMMIT;
The PARALLEL hint for an INSERT or SELECT operation takes precedence over a table’s PARALLEL attribute. For example, the degree of parallelism in the following INSERT ... SELECT statement is 12 regardless of whether the PARALLEL attributes are set for the EMP and T_EMP tables:
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ PARALLEL(emp,12) */ INTO emp
SELECT /*+ PARALLEL(t_emp,12) */ * FROM t_emp;
COMMIT;
For more information on parallel INSERT statements, see "Rules for Parallelizing
INSERT ... SELECT" on page 26-23.

No comments: