Saturday 25 February 2017

Analytic Functions in Oracle - III

In this post, we will cover RANK, DENSE_RANK and Top N Queries. Oracle 12c is used for all queries.

RANK as the name suggests ranks any measure in a query using sequential numbers starting from 1. If in the ranking process, a tie is encountered, then, the same rank is repeated for all values that tie and the next rank in sequence is used after skipping as many rank values as was repeated in the tie . Let us see that by means of an example. For starters, let us use the same hr.employees table used in the previous posts. We are interested only in the results of the below query:

select * from hr.employees where employee_id <= 109 order by salary desc;

The results returned are shown below:  








The query returns ten records. Let us now run the first query using RANK shown below:

select employee_id, last_name, department_id, salary, rank() over (order by salary desc) "SALARY RANK" from   hr.employees where employee_id <= 109; 

The results returned are shown below:  












The results are as written above: a tie is seen at 17000, so there is a repeat of ranks for 17000 and at the next value of 12008, rank value 3 is skipped for one repeat of 17000 and a rank of 4 is returned. Same is true for the salary value 9000 where rank of 6 is skipped.

DENSE_RANK is similar to RANK except that in case a of tie, the rank values are not skipped. Let us replace RANK with DENSE_RANK in above query to see this contrast.

select employee_id, last_name, department_id, salary, dense_rank() over (order by salary desc) "SALARY DENSE RANK" from  hr.employees where employee_id <= 109;

The results returned are shown below: 











Results show that DENSE_RANK does not skip any values at all and ranks are repeated at ties. 

Let us add the partition keyword and see the results. The query using partition is show below:

select department_id, last_name, salary, rank() over (partition by department_id order by salary desc) "SALARY RANK" from hr.employees where employee_id <= 109;

The results returned are shown below:














The results show that the RANK function is applied department wise. The same can be tried using DENSE_RANK as shown below:

select department_id, last_name, salary, dense_rank() over (partition by department_id order by salary desc) "SALARY DENSE RANK" from hr.employees where employee_id <= 109;

The results returned are shown below:












The results are clearly in line with our understanding of DENSE_RANK as is seen in the ranking of Lorentz.

Next, we will pass values to RANK and DENSE_RANK function as shown below. The results returned are the same as the results shown above.

select rank(24000) within group (order by salary desc) "RANK OF 24000" from hr.employees where employee_id <= 109;






select dense_rank(24000) within group (order by salary desc) "DENSE RANK OF 24000" from hr.employees where employee_id <= 109;





select rank(17000) within group (order by salary desc) "RANK OF 17000" from hr.employees where employee_id <= 109;






select dense_rank(17000) within group (order by salary desc) "DENSE RANK OF 17000" from hr.employees where employee_id <= 109;






select rank(9000) within group (order by salary desc) "RANK OF 9000" from hr.employees where employee_id <= 109; 



select dense_rank(9000) within group (order by salary desc) "DENSE RANK OF 9000" from hr.employees where employee_id <= 109;






If the value passed to RANK or DENSE_RANK function does not exist in the table, then it will return a result such that it fits into the ranking of the existing values in the table. Let us see the below two queries, one for RANK and another for DENSE_RANK:

select rank(16000) within group (order by salary desc) "RANK OF 16000" from hr.employees where employee_id <= 109;







select dense_rank(16000) within group (order by salary desc) "DENSE RANK OF 16000" from hr.employees where employee_id <= 109;



16000 does not exist in the table. So, it is neatly ranked as though it is in the table. The important think to note is that, no errors or nulls are returned.

Last part of this post is the Top N Queries. Such type of queries answer questions like what are top 5 salaries for employees with employee_id less than 109. The query for that question is shown below:

select employee_id, last_name, department_id, salary from hr.employees where employee_id <= 109 order by salary desc fetch first 5 rows only;

The results returned are shown below:









In the same manner, we can use percentage to indicate the number of records returned as shown below:

select employee_id, last_name, department_id, salary from hr.employees where employee_id <= 109 order by salary desc fetch first 50 percent rows only;

The results returned are the same as shown below:









We can ignore the top 2 salary values and then query for the next 5 salaries using OFFSET  as shown below:

select employee_id, last_name, department_id, salary from hr.employees where employee_id <= 109 order by salary desc offset 2 rows fetch first 5 rows only; 

The results returned are the same as shown below:


Friday 17 February 2017

Analytic Functions in Oracle - II

The focus of this post with be OVER in Oracle. OVER in Oracle defines the scope of the analytic functions preceding it in the SQL query. As in earlier posts, we are using Oracle 12c for all the queries.

For starters, let us run the following query:

select count(*) from hr.employees where department_id < 40;

The results returned are shown below:  





Let us add the OVER keyword to this as shown below:

select count(*) over() from hr.employees where department_id < 40;

The results returned are shown below:  















The same result is now returned nine times. Since nothing is specified in the parenthesis following OVER, we see that OVER returns the results of COUNT nine times as there are nine rows that satisfy the filter clause without the COUNT and OVER clauses.

The same behaviour can be seen in the below query where we introduce a partition clause as shown below:

select department_id, count(*) over(partition by department_id) from hr.employees where department_id < 40;

The results returned are shown below: 













 A more meaningful query is shown below:

select distinct department_id, count(*) over(partition by department_id) from hr.employees where department_id < 40;

The results returned are shown below: 








But, the results of this query is the same as the one below without the OVER and the ordering of department_id.

select department_id, count(*) from hr.employees where department_id < 40 group by department_id;
The results returned are shown below:








Now that we have seen how OVER() and OVER() with a PARTITION clause works, let us see next variant that can be used to Moving calculations. For all the variants shown hereafter, the calculations are dependent on the order of the rows of the result. The first query calculates the moving average
of the salary in the previous record and salary in current record:

select employee_id, first_name, last_name, salary, avg(salary) over (order by employee_id rows between 1 preceding and current row) as moving_avg_of_sal from hr.employees where employee_id <= 105;

The results returned are shown below:









The moving average salary of the first row is the same as the salary of that row as there is no previous row. Others salaries are calculated according to the logic correctly.

The second query is similar to the first one with the only difference that the average is calculated based on three values: salary in previous row, salary in current row and salary in next row.

select employee_id, first_name, last_name, salary, avg(salary) over (order by employee_id rows between 1 preceding and 1 following) as moving_avg_of_sal from hr.employees where employee_id <= 105; 

The results returned are shown below:  


It is instructive to note that for the first and the last row in the result set, since there only two records, the averages of only two salaries are shown. 

The third query calculates the  moving average of salaries in all previous records.

select employee_id, first_name, last_name, salary, avg(salary) over (order by employee_id rows between unbounded preceding and current row) as moving_avg_of_sal from hr.employees where employee_id <= 105;

 The results returned are shown below: 







For the first record, there is no average and the same salary is shown. For latter records though, the average salaries are calculated based on the salary in current row and all previous rows.

The last variant of OVER shows salary outputs using analytic functions called FIRST_VALUE and LAST_VALUE. The first query calculates the maximum salary in each department using the analytic function FIRST_VALUE as shown below:

select department_id, last_name, salary, first_value(last_name) over (partition by department_id order by salary desc),first_value(salary) over (partition by department_id order by salary desc) as max_sal from (select * from hr.employees where department_id < 40)
  order by department_id, last_name, salary, max_sal;

 

The results returned are shown below:  








The results confirm the explanation above. Let us now try using the LAST_VALUE function to calculate the minimum salary in each department in the same way we use FIRST_VALUE.

select department_id, last_name, salary, last_value(last_name) over (partition by department_id order by salary desc),last_value(salary) over (partition by department_id order by salary desc) as max_sal from (select * from hr.employees where department_id < 40)
  order by department_id, last_name, salary, max_sal;
 


The results returned are shown below:  








The results belie our expectation. To correct this anomaly, let us the below query with a slight difference in the over clause:

select department_id, last_name, salary, last_value(last_name) over (partition by department_id order by salary desc rows between unbounded preceding and unbounded following), last_value(salary) over (partition by department_id order by salary desc rows between unbounded preceding and unbounded following) as lowest_sal from (select * from hr.employees where department_id < 40)
  order by department_id, last_name, salary, lowest_sal;


The results returned are shown below:  








The "between unbounded preceding and unbounded following" clause does the trick.

This concludes the post on OVER.
 

Friday 10 February 2017

Analytic Functions in Oracle - I

We will take a quick look at the various analytical functions in Oracle in a series of posts. First, we will take a look at the rollup and cube functions. For all the queries in this post, as before, we will use Oracle 12c and tables that are part of the sample schema.

To begin with, let us check out the simple group by clause as shown below:

select job_id, sum(salary) from hr.employees where department_id < 40 group by (job_id);

The results returned are shown below: 










This looks pretty straightforward. This query returns the sum of salaries by job_id from employees table qualified by a filter clause.

Let us add the ROLLUP keyword and add some spice up query as shown below:

select job_id, sum(salary) from hr.employees where department_id < 40 group by rollup(job_id);

The results returned are shown below:











The results look the same but for a new row that contains the total of all sums of salaries.

So, ROLLUP calculates the total. But, the same can be effected by writing the below query:

select job_id, sum(salary) from hr.employees where department_id < 40 group by (job_id) 
union 
select null, sum(salary) from hr.employees where department_id < 40;

The results returned are shown below: 











So, why use ROLLUP at all?

The reason why ROLLUP is preferred is because long SQL statements can be avoided and it is more efficient.

Let us add department_id also to the earlier ROLLUP query as shown below:

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by rollup (department_id, job_id);

The results returned are shown below:  

 












The results show that totals by department_id and a final total.

If you do not want the final total, you can use below query:

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by rollup (job_id), department_id;

The department_id has been left out of the ROLLUP scope.

The results returned are shown below:  













Let us now add manager_id also along with department_id and job_id in the ROLLUP query:

select manager_id, department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by rollup (manager_id, department_id, job_id);

The results returned are shown below:  

















Results are on expected lines much like when department_id was added.

A variant of the above query like the one shown below:

select manager_id, department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by rollup ((manager_id),(department_id, job_id));

The results returned are shown below:  












Rows in sub totals over job_id only are knocked off.

Next, we take a look at CUBE. CUBE are similar to ROLLUP, but CUBE offers to cover the subtotals for any combinations left out by ROLLUP. For starters, let us check the results of the below query:

select job_id, sum(salary) from hr.employees where department_id < 40 group by cube(job_id);

The results returned are shown below:  











The results of the CUBE query with just job_id in scope does not appear to be any different from the one with ROLLUP. However, when the below query is run:

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by cube (department_id, job_id);

we see different results than when we used ROLLUP.




















We see that compared to the 9 records in the case of ROLLUP, we have 14 records when CUBE is used. The extra five are the one where total have been calculated over job_id.

Next, we take a look at the GROUPING function. This function finds the group that contributes to the total or subtotal returning 1 if the group is used and 0 if the group is not used. A quick example is shown below using ROLLUP:

select job_id, sum(salary), grouping(job_id) group_job from hr.employees where department_id < 40 group by rollup(job_id);

The results returned are shown below:  











We can see that GROUP_JOB column has 1 in the row containing total and 0 otherwise.

In the same manner, we add the GROUPING function to two columns: department_id and job_id as shown below:

select department_id, job_id, sum(salary), grouping(job_id) group_job, grouping(department_id) group_department from hr.employees where department_id < 40 group by rollup (department_id, job_id);

The results returned are shown below: 

 








In the same manner, the query for GROUPING in the CUBE query is shown below:

select department_id, job_id, sum(salary), grouping(job_id) group_job, grouping(department_id) group_department from hr.employees where department_id < 40 group by cube (department_id, job_id);

The results returned are shown below:













GROUPING SETS offers another means to use aggregate functions. In the first example, we see that the results that are returned are the same as the normal group by clause:

select job_id, sum(salary) from hr.employees where department_id < 40 group by grouping sets ((job_id));

The results returned are shown below: 










In the next example, the use of GROUPING SETS becomes more clear.

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by grouping sets (department_id,job_id);

The results returned are shown below:
 












Only the totals are calculated across job_id or department_id as these two columns are included in parenthesis. If the complete total is needed, then, we can use () as shown below:

select department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by grouping sets (department_id,job_id,());

The results returned are shown below:














The last row is the complete total. In the last example, we will use a more complex combination that includes manager_id as well in the query as shown below:

select manager_id, department_id, job_id, sum(salary) from hr.employees where department_id < 40 group by grouping sets ((manager_id, department_id), (job_id),());

The results returned are shown below:














The results are in line with the combination of columns in the grouping sets scope.

This concludes the first post on Analytic Functions in Oracle.