Wednesday, 12 April 2017

Analytic Functions in Oracle - VIII

We embark on the next topic in the series, Pattern Matching using MATCH_RECOGNIZE. This powerful feature was introduced in Oracle 12c. We will not be using the familiar HR schema for the queries to explore this topic. Some preparatory work is needed before we start running the queries related to Pattern Matching. We will create a table based on Dow Jones Industrial Average for the period 20th September 2016 to 27th October 2016. The data can be found at this link. The displayed data can be downloaded as a .csv file. To use this data for our queries, let us create a table using below statement:

create table hr.djia_table (trade_date date, open number, high number, low number, close number, volume number, adj_close number);

returns

table HR.DJIA_TABLE created.

Let us now proceed to add data to the newly created table as follows:

1) First, click on the table in SQL Developer as shown below. This will bring up the table details on the right.












2) Click on Action menu as shown below. This will bring up the Import Data ... menu item.






























3) Click on Import Data ... menu item. This will let you choose the file to be imported. Navigate to the directory where the .csv file is located and click Open button.

















4) Click OK button on the next dialog box that comes up.

















5) Then, click on Next button.

















6) Click Next button on the next dialog box.

















7) Click on Next button the next dialog box.

















8) On the next dialog box, enter YYYY-MM-DD in Format field as shown below.






























9) Click on each Source Data Column on left to make sure that the column on the right is correctly mapped. As an example, see that High column in the file is mapped to HIGH column in the table.

















10) Adj Close column in file is mapped to TRADE_DATE by default. Map Adj Close to ADJ_CLOSE as shown below and click on Next button.






























11) On the last dialog box, click on Verify button to verify the parameters.






























We get SUCCESS status for different tests carried out by SQL Developer.

12) Click on Finish button. We get below message.










13) Click OK button and then click on data tab to see the data loaded successfully.





















Before we explain Pattern Matching, let us look at a plot of trade date versus adjusted close in Excel. 





Pattern Matching will help in recognizing patterns in the graph shown above as a V shaped pattern, say, the y-axis values 18392.46094,18293.69922,18261.44922,18094.83008,18339.24023.

Now that we have the data in DJIA_TABLE, we can start firing queries on pattern matching on this table.

As a first example, let us run below query:

select * from hr.djia_table
    match_recognize
        (order by trade_date
        measures
                    strt.trade_date as start_date,
                    last(down.trade_date) as bottom_date,
                    last(up.trade_date) as end_date
        after match skip to last up
        pattern (strt down+ up+)
        define
                     down as down.adj_close < prev(down.adj_close),
                     up as up.adj_close > prev(up.adj_close) ) mr 

        order by mr.start_date;

The results returned are shown below: 
















There are 9 rows that are returned which means that there are 9 V patterns in the data set. Let us break down the query in parts to understand more details about the MATCH_RECOGNIZE clause.

The SELECT at the start of the query will specify the data set that will be used for pattern matching. In the above example, we use DJIA_TABLE. 

MATCH_RECOGNIZE starts with a  PARTITION BY, an optional clause, not used by us here. It is followed by ORDER BY clause. ORDER BY clause is mandatory and is useful for producing consistent results.

Pattern definition specified by the PATTERN keyword can use regular expression against a sequence of rows. For example, + symbol can be used to specify one or more iterations. In the above example, the pattern begins with a strt and is followed by one or more down and later by one or more up. These are used to define the V pattern like PATTERN (STRT DOWN+ UP+). The STRT is the start point, DOWN can be one or more as we do not know after many points, the bottom is reached and UP can be one or more as we do not know after how many points, the top is reached but at the end we have a V shaped pattern comprising a START, one or more DOWNs, and one or more UPs. 

DEFINE is used to specify the pattern of events and pattern variables identifying the individual events within the pattern. In the above example, DEFINE is used to specify details about the down and up events.

MEASURES will specify the measures that will be passed to the select statement such as source data points, pattern data points and aggregate related to a pattern. Here, we can use pattern navigation operations as FIRST, LAST, NEXT, and PREV. In above example, we are using LAST keyword to specify the bottom and end of the V pattern. Two special measures are available called CLASSIFIER() and MATCH_NUMBER(). CLASSIFIER() classifies each row as variables mentioned in PATTERN clause or DEFINE clause. MATCH_NUMBER() tells to which pattern does that row belong starting from 1. We will use these later.

Lastly, we need to specify the output that will be passed to the select statement like summary level or detail level and also where the next search should start once a pattern match has been found.

For the level of information, there are three levels: ONE ROW PER MATCH, ALL ROWS PER MATCH, and ALL ROWS PER MATCH WITH UNMATCHED ROWS. ONE ROW PER MATCH is the default and provides more like a summary. Here, we are using ONE ROW PER MATCH. Since we have not specified, it is used by default. 

To indicate where the next start should start after a pattern match, there are SKIP PAST LAST ROW, SKIP TO NEXT ROW, SKIP TO <VARIABLE>, SKIP TO FIRST (<VARIABLE>), and SKIP TO LAST (<VARIABLE>). In above example, we are using AFTER MATCH SKIP TO LAST UP.

In the second query, we find the number of peaks in the data set. The portions of the SQL that have been modified is updated in bold font. The changed query is shown below:  

select * from hr.djia_table
    match_recognize
        (order by trade_date
        measures
                    strt.trade_date as start_date,
                    last(up.trade_date) as up_date,
                    last(down.trade_date) as end_date

        after match skip to last up
        pattern (strt up+ down+)
        define
                     down as down.adj_close < prev(down.adj_close),
                     up as up.adj_close > prev(up.adj_close) ) mr
        order by mr.start_date;


The results returned are shown below:


















In this case, the query returns 10 rows as there are indeed 10 peaks.

In the next query, we aim for a more detailed query that will give information about the patterns too.

select * from hr.djia_table
    match_recognize
        (order by trade_date
        measures
                    strt.trade_date as start_date,
                    final last(up.trade_date) as up_date,
                    final last(down.trade_date) as end_date,
                    match_number() as match_num,
                    classifier() AS var_match
        all rows per match
        after match skip to last up
        pattern (strt up+ down+)
        define
                     down as down.adj_close < prev(down.adj_close),
                     up as up.adj_close > prev(up.adj_close) ) mr
        order by trade_date;



The results returned are shown below:
















We see that though the source has only 28 records, 37 records are returned.This is because records are repeated where last row in a pattern is also the start row in the next pattern.

The changes from the last script are: 

1) The FINAL keyword used under MEASURES. This makes sure that the last value in up_date or end_date is constant and the final value is shown. Else, we see running values as null that is replaced as the pattern progresses.

2) ALL ROWS PER MATCH gives detailed results and covers all the records in the source data set corresponding to the pattern.

3) MATCH_NUMBER returns the pattern number. In above example, there are ten matches so the MATCH_NUMBER increments accordingly. CLASSIFIER classifies the row according to the pattern variable mentioned in PATTERN clause.

This concludes the post on PATTERN MATCHING in Oracle 12c.