Monday 13 March 2017

Analytic Functions in Oracle - V

The fifth post in this series of Analytic Functions in Oracle takes a look at Regular Expressions as is dealt by Oracle. Regular Expressions are a powerful feature in all languages and Oracle adding it to its collection of features should not come as a surprise

A Regular Expression is a special text used as a search pattern. In other words, a Regular Expression describes a pattern that can be used for searching text in strings. They are specified as a combination of Metacharacters and Literals. These are described below:

1) Metacharacters: These are operators like *,?,[], etc, that are used to create the algorithm to be used in the search process

2) Literals: These are the actual characters or content that are searched for

Oracle has the following functions to support Regular Expressions: REGEXP_LIKE, REGEXP_REPLACE, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_COUNT. We will see examples of these in a series of queries

We will use 12c sample schema for demonstrating the usage of these functions. In the first query, we will use REGEXP_LIKE as seen in the example below:

Use of '.':

select first_name, last_name from hr.employees where regexp_like (last_name, 'A....n');

The results returned are shown below: 






The '.' metacharacter matches any character. So, since there are 4 of those, the query matches any word that starts with A and ends with n and has 4 characters in between.

Use of '+':

select first_name, last_name from hr.employees where regexp_like (last_name, '(rr)+');

The results returned are shown below: 








'+' matches one or more of the preceding subexpression. 

Use of '?':

select first_name, last_name from hr.employees where regexp_like (last_name, '(at?a)');

The results returned are shown below: 







'?' matches zero or one of the preceding subexpression. De Haan is selected because there is no t in between in aa. Pataballa is selected because there is one occurrence of t between aa.

Use of '*':

select first_name, last_name from hr.employees where regexp_like (last_name, 'al*a');


The results returned are shown below: 
 





'*' matches zero or more of the preceding subexpression.

Use of '^':

select last_name from hr.employees  where regexp_like (last_name, '^K(*)');

The results returned are shown below:











This query returns all employees starting with letter, K.

select first_name, last_name from hr.employees where regexp_like (last_name, '^Ma');

The results returned are shown below:











This query returns all employees starting with letters, Ma.

Use of '{}':

select first_name, last_name from hr.employees where regexp_like (last_name, 'l{2}');

The results returned are shown below:


















This query returns last_name containing l character two times.

select first_name, last_name from hr.employees where regexp_like (last_name, 'a{2,}');

The results returned are shown below:





This query returns all employees having last_names having a two times or more.

select first_name, last_name from hr.employees where regexp_like (last_name, 'h{1,2}');

The results returned are shown below:



















The last query returns all employees having last_names having at least one h but not more than two h.

Use of '[]':

select first_name, last_name from hr.employees where regexp_like (last_name, '[zx]');

The results returned are shown below:










The last query returns first_name, last_name from all employees having last_name containing x, or z.

Use of '/s':

select first_name, last_name from hr.employees where regexp_like (last_name, '(\s)');

The results returned are shown below: 






The last query returns first_name, last_name from all employees having last_name containing a space.

Use of '\1':

select last_name from hr.employees where regexp_like (last_name, '([aeiou])\1', 'i');

The results returned are shown below: 













The last query returns last_name containing two consecutive characters mentioned in []. i means the matching is case insensitive.

Use of '$':

select last_name from hr.employees where regexp_like (last_name, 'z$', 'i');

The results returned are shown below: 








The last query returns last_name ending with z.

Use of '^$':

select last_name from hr.employees where regexp_like (last_name, '^K[a-z]+r$');

The results returned are shown below:






The last query returns last_name starting with K and ending in r.

Use of '[:blank:]':

select last_name from hr.employees where regexp_like (last_name, '[[:blank:]]');

 The results returned are shown below:





The last query returns last_name from all employees having last_name containing a blank.

Use of '[:digit:]':

select last_name,salary from hr.employees where regexp_like (salary, '[[:digit:]]{5}');

The results returned are shown below: 




















The last query returns last_name, salary of all employees having currency in 5 digits.

Use of '[[=n=]]':

select regexp_substr('Danke schön', '[[=o=]]') from dual;

The results returned are shown below: 







The last query uses REGEX_SUBSTR to extract any o in the input string.

select regexp_substr('El Niño', '[[=n=]]',1,1), regexp_substr('El Niño', '[[=n=]]',1,2) from dual;

The results returned are shown below: 






The last query uses REGEX_SUBSTR to extract any n in the input string. In the first expression, the search starts at first position and locates the first instance of  n and the second expression locates the second instance of n.

Use of '\w':

select regexp_substr('A@6', '\w',1,2) from dual;

The results returned are shown below: 







The last query returns the first word character starting from the first position and is a second instance.

Use of '\W':

select regexp_substr('test@test', '\W') from dual;

The results returned are shown below:







This query returns first non-word character in the input expression. 

Use of REGEXP_REPLACE:

select last_name, phone_number, regexp_replace(phone_number, '([/.])', ' ') as modified_phone_number from  hr.employees where regexp_like (last_name, '^K(*)') ;

The results returned are shown below: 









REGEXP_REPLACE function replaces '.' character in phone_number with a space character.

Use of REGEXP_INSTR:

select regexp_instr('Copyright ©  2005, 2012 Oracle.All Rights Reserved.', '[^[:alpha:]]') from dual;

The results returned are shown below: 






REGEXP_INSTR returns the position of the search criteria, a non alphanumeric character, in the input string.

Use of REGEXP_COUNT:

select last_name, regexp_count(last_name, '[A-z]'), phone_number,  regexp_count(phone_number, '[0-9]{3}')  from hr.employees where regexp_like (last_name, '^K(*)');

The results returned are shown below: 








REGEXP_COUNT gives the count based on inputs. In the first expression, REGEXP_COUNT returns the number of alpha characters in the last name and in the second expression, REGEXP_COUNT returns the number of three continuous numbers.

This concludes the post on Regular Expressions in Oracle.