Wednesday, 19 April 2017

Analytic Functions in Oracle - IX

In this post, we will take a look at MODEL keyword. The MODEL or SPREADSHEET keyword expands the functionality of SQL in that it gives the user the ability to treat the result set of a query as a multidimensional array and also apply calculations or formulas on the multidimensional array. These calculations or formulas that can be applied are quite flexible and can be applied, for example, on a column to be based on values in other columns.

We use Oracle 12c as the basis for all our queries. For this post, we will create a new view called sales_model that we will use for our queries related to this post. Sales_model view is based on tables in sample SH schema. The statement for creation of sales_model view is given below:

create or replace view sales_model 
as
select distinct extract(year from c.time_id) sale_year, a.prod_name product, b.channel_desc channel, sum(c.amount_sold) over (partition by extract (year from c.time_id), a.prod_name, b.channel_desc) amount from sh.products a, sh.channels b, sh.sales c
where a.prod_id = c.prod_id and b.channel_id = c.channel_id and prod_name like '%6%MB Memory Card' and  extract(year from c.time_id) in (1999,2000) order by a.prod_name;


returns

view SALES_MODEL created.

Below query will give an idea of  the data in the result that we will play with in the various queries using MODEL:
 
select * from sales_model;
















There are twelve rows having sale amount about two products over two years via different channels. 

Let us take up the first query on SALES_MODEL. The query is shown below:

select sale_year, product,channel, amount, new_amount
from sales_model
spreadsheet
partition by (sale_year, product)
dimension by (channel,amount)
measures (amount new_amount) ignore nav
rules
(new_amount['Internet',any]  = 100)
order by sale_year, product;


The results returned are shown below: 














To understand the results of the above query, we need to understand the different parts of the query:

a) SPREADSHEET keyword is used instead of MODEL.

b)  The query returns five columns: four from the original view and the fifth is calculated as an alias of amount.

c) The result set is partitioned by sale_year and product. The rules that will apply later will be only on the partitioned result set defined by the partition.

d) The columns mentioned in the dimension clause, channel and amount, are the ones used by the rules.

e) The measures clause contains the columns that will be used for the calculation. Here, it is column amount aliased to new_amount.

f) Rules use the columns in the dimension clause to act on the columns in measures clause for calculations defined in the rule. The rule here means that the new_amount column is an alias of amount column with the rule that if the channel column has value 'Internet' and for any value in amount column, the aliased amount column should have a default value of 100. So, rows 5, 8 and 12 contain 100 in new_amount column.

The second example has two rules as shown below:

select sale_year, product,channel, amount, new_amount
from sales_model
spreadsheet
partition by (sale_year, product)
dimension by (channel,amount)
measures (amount new_amount) ignore nav
rules
(new_amount['Internet',any]  = 100,
new_amount['Partners',any]  = 50
)
order by sale_year, product;


The results returned are shown below:














The above query just extends the rule to include 'Partners' also in channel column but, should have a value of 50 and the result is evident in the result set.

The next example shows how we can use calculations on existing column value by using cv() function. "cv" stands for current value.

select sale_year, product,channel, amount, new_amount
from sales_model where sale_year = 2000
spreadsheet
partition by (sale_year, product)
dimension by (channel,amount)
measures (amount new_amount) ignore nav
rules
(new_amount['Internet',any]  = 2*new_amount['Internet',cv()]
)
order by sale_year, product;


The results returned are shown below:








The records returned are fewer as there is a filter clause. The rules are defined such that in the new_amount column, the current values get doubled only when the value in channel column is 'Internet'.

One can also use values in another row for replacement as shown in below query:

select sale_year, product,channel, new_amount
from sales_model where sale_year = 2000
spreadsheet
partition by (sale_year)
dimension by (product,channel)
measures (amount new_amount) ignore nav
rules
(new_amount['64MB Memory Card','Partners']  = new_amount['64MB Memory Card','Internet'])
order by sale_year, product;


The results returned are shown below:


  






 In the results above, when the product is '64MB Memory Card' and channel is 'Partners', the new_amount is set to the value when the product is '64 MP Memory Card' and channel is 'Internet'. This is as per the rule that is set in the query.

Like the result above, we can also use calculations to arrive at a value for new_amount as shown below:

select sale_year, product,channel, new_amount
from sales_model where sale_year = 2000
spreadsheet
partition by (sale_year)
dimension by (product,channel)
measures (amount new_amount) ignore nav
rules
(new_amount['64MB Memory Card','Partners']  = new_amount['64MB Memory Card','Internet'] + new_amount['256MB Memory Card','Internet'])
order by sale_year, product;


The results returned are shown below:









As per the rule, when the product is '64MB Memory Card' and channel is 'Partners' , new_amount value is set to sum of new_amount values when products are '64MB Memory Card' and '256MB Memory Card' and the channel is 'Internet'. 4348.69 is a sum of 2814.4 and 1534.29.

The next query creates new records based on the rules as shown below:

select sale_year, product,channel, new_amount
from sales_model
model
partition by (sale_year)
dimension by (product,channel)
measures (amount new_amount) ignore nav
rules
(new_amount['1024MB Memory Card','Partners']  = new_amount['64MB Memory Card','Partners'] + new_amount['256MB Memory Card','Partners'])
order by sale_year, product;


The results returned are shown below:
















In the rules, we have created a new product called '1024MB Memory Card' with a channel of  'Partners'. Two records are created corresponding to the two sales_year values and  new_amount value is set to sum of new_amount values of the same channel but corresponding to existing two products.

To return only the new rows, use return updated rows clause as shown below:

select sale_year, product,channel, new_amount
from sales_model
model return updated rows
partition by (sale_year)
dimension by (product,channel)
measures (amount new_amount) ignore nav
rules
(new_amount['1024MB Memory Card','Partners']  = new_amount['64MB Memory Card','Partners'] + new_amount['256MB Memory Card','Partners'])
order by sale_year, product;


The results returned are shown below:

 


The return updated rows clause after the model keyword does the trick.

In the below query, we create a record based on a different channel:

select sale_year, product,channel, new_amount
from sales_model where sale_year = 2000
model
partition by (sale_year)
dimension by (product,channel)
measures (amount new_amount) ignore nav
rules
(new_amount['64MB Memory Card','Road Show']  = new_amount['64MB Memory Card','Partners'] + new_amount['256MB Memory Card','Partners'])
order by sale_year, product;


The results returned are shown below:










The newly created record is seen in row 6 with new_amount value as a sum of the two products but having channel value, 'Partners'.

We can also use aggregate functions across rows to set value on records as shown below:

select sale_year, product,channel, new_amount
from sales_model where sale_year = 2000
model
partition by (sale_year)
dimension by (product,channel)
measures (amount new_amount) ignore nav
rules
(new_amount['64MB Memory Card','Road Show']  = 2*max(new_amount)['256MB Memory Card',channel in ('Internet','Direct Sales')])
order by sale_year, product;


The results returned are shown below:









 
The row 6 contains 208919.82 in new_amount that is twice the maximum value,104459.91, between the new_amount values of product, '256MB Memory Card' in two channels, 'Internet' and 'Direct Sales'. 

The last example shows use of the for clause as shown below:

select sale_year, product,channel, new_amount
from sales_model where sale_year = 2000
spreadsheet
partition by (sale_year)
dimension by (product,channel)
measures (amount new_amount) ignore nav
rules
(new_amount['64MB Memory Card',for channel in ('Internet','Partners','Direct Sales')]  = 3*new_amount['64MB Memory Card',cv()])
order by sale_year, product;


The results returned are shown below:









For clause has been used to set values in new_amount with three times the existing value only for product, '64MB Memory Card'.

This concludes the post on MODEL keyword.