Analytic
functions compute an aggregate value based on a group of rows. They differ from
aggregate functions in that they return multiple rows for each group. The
group of rows is called a window and is defined by the analytic_clause. For
each row, a sliding window of rows is defined. The window determines the range
of rows used to perform the calculations for the current row. Window sizes can
be based on either a physical number of rows or a logical interval such as
time.
Analytic
functions are the last set of operations performed in a query except for the
final ORDER BY
clause. All joins and all WHERE, GROUP BY, and HAVING
clauses are completed before the analytic functions are processed. Therefore,
analytic functions can appear only in the select list or ORDER BY
clause.
Analytic
functions are commonly used to compute cumulative, moving, centered, and
reporting aggregates. The general syntax of analytic function is:
Function(arg1,...,
argn) OVER ( [PARTITION BY <...>] [ORDER BY <....>]
[<window_clause>] )
How are analytic functions different
from group or aggregate functions?
SELECT deptno,
COUNT(*) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30)
GROUP BY deptno;
DEPTNO
DEPT_COUNT
---------------------- ----------------------
20
5
30
6
2 rows selected
Query-1
Consider
the Query-1 and its result. Query-1 returns departments and their
employee count. Most importantly it groups the records into departments in
accordance with the GROUP BY clause. As such any non-"group by"
column is not allowed in the select clause. and its result. Query-1 returns departments and their
employee count. Most importantly it groups the records into departments in
accordance with the GROUP BY clause. As such any non-"group by"
column is not allowed in the select clause.
SELECT empno, deptno,
COUNT(*) OVER (PARTITION BY
deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30);
EMPNO DEPTNO DEPT_COUNT
---------- ---------- ----------
7369 20 5
7566 20 5
7788 20 5
7902 20 5
7876 20 5
7499 30 6
7900 30 6
7844 30 6
7698 30 6
7654 30 6
7521 30 6
11 rows selected.
Query-2
Now
consider the analytic function query (Query-2)
and its result. Note the repeating values of DEPT_COUNT column.
This
brings out the main difference between aggregate and analytic functions. Though
analytic functions give aggregate result they do not group the result set. They
return the group value multiple times with each record. As such any other
non-"group by" column or expression can be present in the select
clause, for example, the column EMPNO in Query-2.
Analytic
functions are computed after all joins, WHERE clause, GROUP BY and HAVING are
computed on the query. The main ORDER BY clause of the query operates after the
analytic functions. So analytic functions can only appear in the select list
and in the main ORDER BY clause of the query.
In
absence of any PARTITION or <window_clause> inside the OVER( ) portion,
the function acts on entire record set returned by the where clause. Note the
results of Query-3 and compare it with the result of aggregate function query Query-4.
SELECT empno, deptno,
COUNT(*) OVER ( ) CNT
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, 1;
EMPNO DEPTNO CNT
---------- ---------- ----------
7782 10 8
7839 10 8
7934 10 8
7369 20 8
7566 20 8
7788 20 8
7876 20 8
7902 20 8
Query-3
SELECT COUNT(*) FROM emp
WHERE deptno IN (10, 20);
COUNT(*)
----------
8
Query-4
How to break the result set in groups
or partitions?
It
might be obvious from the previous example that the clause PARTITION BY is used
to break the result set into groups. PARTITION BY can take any non-analytic SQL
expression.
Some
functions support the <window_clause> inside the partition to further
limit the records they act on. In the absence of any <window_clause>
analytic functions are computed on all the records of the partition clause.
The functions
SUM, COUNT, AVG, MIN, MAX are the common analytic functions the result of which
does not depend on the order of the records.
Functions
like LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST
VALUE depends on order of records. In the next example we will see how to
specify that.
How to specify the order of the
records in the partition?
The
answer is simple, by the "ORDER BY" clause inside the OVER( ) clause.
This is different from the ORDER BY clause of the main query which comes after
WHERE. In this section we go ahead and introduce each of the very useful
functions LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST
VALUE and show how each depend on the order of the record.
The
general syntax of specifying the ORDER BY clause in analytic function is:
ORDER BY <sql_expr> [ASC or
DESC] NULLS [FIRST or LAST]
The
syntax is self-explanatory.
No comments:
Post a Comment