Friday, 10 February 2017

Analytic Functions in Oracle - I

We will take a quick look at the various analytical functions in Oracle in a series of posts. First, we will take a look at the rollup and cube functions. For all the queries in this post, as before, we will use Oracle 12c and tables that are part of the sample schema.

To begin with, let us check out the simple group by clause as shown below:

select job_id, sum(salary) from hr.employees where department_id < 40 group by (job_id);

The results returned are shown below: 










This looks pretty straightforward. This query returns the sum of salaries by job_id from employees table qualified by a filter clause.

Let us add the ROLLUP keyword and add some spice up query as shown below:

select job_id, sum(salary) from hr.employees where department_id < 40 group by rollup(job_id);

The results returned are shown below:











The results look the same but for a new row that contains the total of all sums of salaries.

So, ROLLUP calculates the total. But, the same can be effected by writing the below query:

select job_id, sum(salary) from hr.employees where department_id < 40 group by (job_id) 
union 
select null, sum(salary) from hr.employees where department_id < 40;

The results returned are shown below: 











So, why use ROLLUP at all?

The reason why ROLLUP is preferred is because long SQL statements can be avoided and it is more efficient.

Let us add department_id also to the earlier ROLLUP query as shown below:

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by rollup (department_id, job_id);

The results returned are shown below:  

 












The results show that totals by department_id and a final total.

If you do not want the final total, you can use below query:

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by rollup (job_id), department_id;

The department_id has been left out of the ROLLUP scope.

The results returned are shown below:  













Let us now add manager_id also along with department_id and job_id in the ROLLUP query:

select manager_id, department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by rollup (manager_id, department_id, job_id);

The results returned are shown below:  

















Results are on expected lines much like when department_id was added.

A variant of the above query like the one shown below:

select manager_id, department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by rollup ((manager_id),(department_id, job_id));

The results returned are shown below:  












Rows in sub totals over job_id only are knocked off.

Next, we take a look at CUBE. CUBE are similar to ROLLUP, but CUBE offers to cover the subtotals for any combinations left out by ROLLUP. For starters, let us check the results of the below query:

select job_id, sum(salary) from hr.employees where department_id < 40 group by cube(job_id);

The results returned are shown below:  











The results of the CUBE query with just job_id in scope does not appear to be any different from the one with ROLLUP. However, when the below query is run:

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by cube (department_id, job_id);

we see different results than when we used ROLLUP.




















We see that compared to the 9 records in the case of ROLLUP, we have 14 records when CUBE is used. The extra five are the one where total have been calculated over job_id.

Next, we take a look at the GROUPING function. This function finds the group that contributes to the total or subtotal returning 1 if the group is used and 0 if the group is not used. A quick example is shown below using ROLLUP:

select job_id, sum(salary), grouping(job_id) group_job from hr.employees where department_id < 40 group by rollup(job_id);

The results returned are shown below:  











We can see that GROUP_JOB column has 1 in the row containing total and 0 otherwise.

In the same manner, we add the GROUPING function to two columns: department_id and job_id as shown below:

select department_id, job_id, sum(salary), grouping(job_id) group_job, grouping(department_id) group_department from hr.employees where department_id < 40 group by rollup (department_id, job_id);

The results returned are shown below: 

 








In the same manner, the query for GROUPING in the CUBE query is shown below:

select department_id, job_id, sum(salary), grouping(job_id) group_job, grouping(department_id) group_department from hr.employees where department_id < 40 group by cube (department_id, job_id);

The results returned are shown below:













GROUPING SETS offers another means to use aggregate functions. In the first example, we see that the results that are returned are the same as the normal group by clause:

select job_id, sum(salary) from hr.employees where department_id < 40 group by grouping sets ((job_id));

The results returned are shown below: 










In the next example, the use of GROUPING SETS becomes more clear.

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by grouping sets (department_id,job_id);

The results returned are shown below:
 












Only the totals are calculated across job_id or department_id as these two columns are included in parenthesis. If the complete total is needed, then, we can use () as shown below:

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by grouping sets (department_id,job_id,());

The results returned are shown below:














The last row is the complete total. In the last example, we will use a more complex combination that includes manager_id as well in the query as shown below:

select manager_id, department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by grouping sets ((manager_id, department_id), (job_id),());

The results returned are shown below:














The results are in line with the combination of columns in the grouping sets scope.

This concludes the first post on Analytic Functions in Oracle.