Friday 17 February 2017

Analytic Functions in Oracle - II

The focus of this post with be OVER in Oracle. OVER in Oracle defines the scope of the analytic functions preceding it in the SQL query. As in earlier posts, we are using Oracle 12c for all the queries.

For starters, let us run the following query:

select count(*) from hr.employees where department_id < 40;

The results returned are shown below:  





Let us add the OVER keyword to this as shown below:

select count(*) over() from hr.employees where department_id < 40;

The results returned are shown below:  















The same result is now returned nine times. Since nothing is specified in the parenthesis following OVER, we see that OVER returns the results of COUNT nine times as there are nine rows that satisfy the filter clause without the COUNT and OVER clauses.

The same behaviour can be seen in the below query where we introduce a partition clause as shown below:

select department_id, count(*) over(partition by department_id) from hr.employees where department_id < 40;

The results returned are shown below: 













 A more meaningful query is shown below:

select distinct department_id, count(*) over(partition by department_id) from hr.employees where department_id < 40;

The results returned are shown below: 








But, the results of this query is the same as the one below without the OVER and the ordering of department_id.

select department_id, count(*) from hr.employees where department_id < 40 group by department_id;
The results returned are shown below:








Now that we have seen how OVER() and OVER() with a PARTITION clause works, let us see next variant that can be used to Moving calculations. For all the variants shown hereafter, the calculations are dependent on the order of the rows of the result. The first query calculates the moving average
of the salary in the previous record and salary in current record:

select employee_id, first_name, last_name, salary, avg(salary) over (order by employee_id rows between 1 preceding and current row) as moving_avg_of_sal from hr.employees where employee_id <= 105;

The results returned are shown below:









The moving average salary of the first row is the same as the salary of that row as there is no previous row. Others salaries are calculated according to the logic correctly.

The second query is similar to the first one with the only difference that the average is calculated based on three values: salary in previous row, salary in current row and salary in next row.

select employee_id, first_name, last_name, salary, avg(salary) over (order by employee_id rows between 1 preceding and 1 following) as moving_avg_of_sal from hr.employees where employee_id <= 105; 

The results returned are shown below:  


It is instructive to note that for the first and the last row in the result set, since there only two records, the averages of only two salaries are shown. 

The third query calculates the  moving average of salaries in all previous records.

select employee_id, first_name, last_name, salary, avg(salary) over (order by employee_id rows between unbounded preceding and current row) as moving_avg_of_sal from hr.employees where employee_id <= 105;

 The results returned are shown below: 







For the first record, there is no average and the same salary is shown. For latter records though, the average salaries are calculated based on the salary in current row and all previous rows.

The last variant of OVER shows salary outputs using analytic functions called FIRST_VALUE and LAST_VALUE. The first query calculates the maximum salary in each department using the analytic function FIRST_VALUE as shown below:

select department_id, last_name, salary, first_value(last_name) over (partition by department_id order by salary desc),first_value(salary) over (partition by department_id order by salary desc) as max_sal from (select * from hr.employees where department_id < 40)
  order by department_id, last_name, salary, max_sal;

 

The results returned are shown below:  








The results confirm the explanation above. Let us now try using the LAST_VALUE function to calculate the minimum salary in each department in the same way we use FIRST_VALUE.

select department_id, last_name, salary, last_value(last_name) over (partition by department_id order by salary desc),last_value(salary) over (partition by department_id order by salary desc) as max_sal from (select * from hr.employees where department_id < 40)
  order by department_id, last_name, salary, max_sal;
 


The results returned are shown below:  








The results belie our expectation. To correct this anomaly, let us the below query with a slight difference in the over clause:

select department_id, last_name, salary, last_value(last_name) over (partition by department_id order by salary desc rows between unbounded preceding and unbounded following), last_value(salary) over (partition by department_id order by salary desc rows between unbounded preceding and unbounded following) as lowest_sal from (select * from hr.employees where department_id < 40)
  order by department_id, last_name, salary, lowest_sal;


The results returned are shown below:  








The "between unbounded preceding and unbounded following" clause does the trick.

This concludes the post on OVER.