The fifth post in this series of Analytic Functions in Oracle takes a look at Regular Expressions as is dealt by Oracle. Regular Expressions are a powerful feature in all languages and Oracle adding it to its collection of features should not come as a surprise
A Regular Expression is a special text used as a search pattern. In other words, a Regular Expression describes a pattern that can be used for searching text in strings. They are specified as a combination of Metacharacters and Literals. These are described below:
1) Metacharacters: These are operators like *,?,[], etc, that are used to create the algorithm to be used in the search process
2) Literals: These are the actual characters or content that are searched for
Oracle has the following functions to support Regular Expressions: REGEXP_LIKE, REGEXP_REPLACE, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_COUNT. We will see examples of these in a series of queries
We will use 12c sample schema for demonstrating the usage of these functions. In the first query, we will use REGEXP_LIKE as seen in the example below:
Use of '.':
select first_name, last_name from hr.employees where regexp_like (last_name, 'A....n');
The results returned are shown below:
The '.' metacharacter matches any character. So, since there are 4 of those, the query matches any word that starts with A and ends with n and has 4 characters in between.
Use of '+':
select first_name, last_name from hr.employees where regexp_like (last_name, '(rr)+');
The results returned are shown below:
'+' matches one or more of the preceding subexpression.
Use of '?':
select first_name, last_name from hr.employees where regexp_like (last_name, '(at?a)');
The results returned are shown below:
'?' matches zero or one of the preceding subexpression. De Haan is selected because there is no t in between in aa. Pataballa is selected because there is one occurrence of t between aa.
Use of '*':
select first_name, last_name from hr.employees where regexp_like (last_name, 'al*a');
The results returned are shown below:
'*' matches zero or more of the preceding subexpression.
Use of '^':
select last_name from hr.employees where regexp_like (last_name, '^K(*)');
The results returned are shown below:
This query returns all employees starting with letter, K.
select first_name, last_name from hr.employees where regexp_like (last_name, '^Ma');
The results returned are shown below:
This query returns all employees starting with letters, Ma.
Use of '{}':
select first_name, last_name from hr.employees where regexp_like (last_name, 'l{2}');
The results returned are shown below:
This query returns last_name containing l character two times.
select first_name, last_name from hr.employees where regexp_like (last_name, 'a{2,}');
The results returned are shown below:
This query returns all employees having last_names having a two times or more.
select first_name, last_name from hr.employees where regexp_like (last_name, 'h{1,2}');
The results returned are shown below:
The last query returns all employees having last_names having at least one h but not more than two h.
Use of '[]':
select first_name, last_name from hr.employees where regexp_like (last_name, '[zx]');
The results returned are shown below:
The last query returns first_name, last_name from all employees having last_name containing x, or z.
Use of '/s':
select first_name, last_name from hr.employees where regexp_like (last_name, '(\s)');
The results returned are shown below:
The last query returns first_name, last_name from all employees having last_name containing a space.
Use of '\1':
select last_name from hr.employees where regexp_like (last_name, '([aeiou])\1', 'i');
The results returned are shown below:
The last query returns last_name containing two consecutive characters mentioned in []. i means the matching is case insensitive.
Use of '$':
select last_name from hr.employees where regexp_like (last_name, 'z$', 'i');
The results returned are shown below:
The last query returns last_name ending with z.
Use of '^$':
select last_name from hr.employees where regexp_like (last_name, '^K[a-z]+r$');
The results returned are shown below:
The last query returns last_name starting with K and ending in r.
Use of '[:blank:]':
select last_name from hr.employees where regexp_like (last_name, '[[:blank:]]');
The results returned are shown below:
The last query returns last_name from all employees having last_name containing a blank.
Use of '[:digit:]':
select last_name,salary from hr.employees where regexp_like (salary, '[[:digit:]]{5}');
The results returned are shown below:
The last query returns last_name, salary of all employees having currency in 5 digits.
Use of '[[=n=]]':
select regexp_substr('Danke schön', '[[=o=]]') from dual;
The results returned are shown below:
The last query uses REGEX_SUBSTR to extract any o in the input string.
select regexp_substr('El Niño', '[[=n=]]',1,1), regexp_substr('El Niño', '[[=n=]]',1,2) from dual;
The results returned are shown below:
The last query uses REGEX_SUBSTR to extract any n in the input string. In the first expression, the search starts at first position and locates the first instance of n and the second expression locates the second instance of n.
Use of '\w':
select regexp_substr('A@6', '\w',1,2) from dual;
The results returned are shown below:
The last query returns the first word character starting from the first position and is a second instance.
Use of '\W':
select regexp_substr('test@test', '\W') from dual;
The results returned are shown below:
This query returns first non-word character in the input expression.
Use of REGEXP_REPLACE:
select last_name, phone_number, regexp_replace(phone_number, '([/.])', ' ') as modified_phone_number from hr.employees where regexp_like (last_name, '^K(*)') ;
The results returned are shown below:
REGEXP_REPLACE function replaces '.' character in phone_number with a space character.
Use of REGEXP_INSTR:
select regexp_instr('Copyright © 2005, 2012 Oracle.All Rights Reserved.', '[^[:alpha:]]') from dual;
The results returned are shown below:
REGEXP_INSTR returns the position of the search criteria, a non alphanumeric character, in the input string.
Use of REGEXP_COUNT:
select last_name, regexp_count(last_name, '[A-z]'), phone_number, regexp_count(phone_number, '[0-9]{3}') from hr.employees where regexp_like (last_name, '^K(*)');
The results returned are shown below:
REGEXP_COUNT gives the count based on inputs. In the first expression, REGEXP_COUNT returns the number of alpha characters in the last name and in the second expression, REGEXP_COUNT returns the number of three continuous numbers.
This concludes the post on Regular Expressions in Oracle.
A Regular Expression is a special text used as a search pattern. In other words, a Regular Expression describes a pattern that can be used for searching text in strings. They are specified as a combination of Metacharacters and Literals. These are described below:
1) Metacharacters: These are operators like *,?,[], etc, that are used to create the algorithm to be used in the search process
2) Literals: These are the actual characters or content that are searched for
Oracle has the following functions to support Regular Expressions: REGEXP_LIKE, REGEXP_REPLACE, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_COUNT. We will see examples of these in a series of queries
We will use 12c sample schema for demonstrating the usage of these functions. In the first query, we will use REGEXP_LIKE as seen in the example below:
Use of '.':
select first_name, last_name from hr.employees where regexp_like (last_name, 'A....n');
The results returned are shown below:
The '.' metacharacter matches any character. So, since there are 4 of those, the query matches any word that starts with A and ends with n and has 4 characters in between.
Use of '+':
select first_name, last_name from hr.employees where regexp_like (last_name, '(rr)+');
The results returned are shown below:
'+' matches one or more of the preceding subexpression.
Use of '?':
select first_name, last_name from hr.employees where regexp_like (last_name, '(at?a)');
The results returned are shown below:
'?' matches zero or one of the preceding subexpression. De Haan is selected because there is no t in between in aa. Pataballa is selected because there is one occurrence of t between aa.
Use of '*':
select first_name, last_name from hr.employees where regexp_like (last_name, 'al*a');
The results returned are shown below:
'*' matches zero or more of the preceding subexpression.
Use of '^':
select last_name from hr.employees where regexp_like (last_name, '^K(*)');
The results returned are shown below:
This query returns all employees starting with letter, K.
select first_name, last_name from hr.employees where regexp_like (last_name, '^Ma');
The results returned are shown below:
This query returns all employees starting with letters, Ma.
Use of '{}':
select first_name, last_name from hr.employees where regexp_like (last_name, 'l{2}');
The results returned are shown below:
This query returns last_name containing l character two times.
select first_name, last_name from hr.employees where regexp_like (last_name, 'a{2,}');
The results returned are shown below:
This query returns all employees having last_names having a two times or more.
select first_name, last_name from hr.employees where regexp_like (last_name, 'h{1,2}');
The results returned are shown below:
The last query returns all employees having last_names having at least one h but not more than two h.
Use of '[]':
select first_name, last_name from hr.employees where regexp_like (last_name, '[zx]');
The results returned are shown below:
The last query returns first_name, last_name from all employees having last_name containing x, or z.
Use of '/s':
select first_name, last_name from hr.employees where regexp_like (last_name, '(\s)');
The results returned are shown below:
The last query returns first_name, last_name from all employees having last_name containing a space.
Use of '\1':
select last_name from hr.employees where regexp_like (last_name, '([aeiou])\1', 'i');
The results returned are shown below:
The last query returns last_name containing two consecutive characters mentioned in []. i means the matching is case insensitive.
Use of '$':
select last_name from hr.employees where regexp_like (last_name, 'z$', 'i');
The results returned are shown below:
The last query returns last_name ending with z.
Use of '^$':
select last_name from hr.employees where regexp_like (last_name, '^K[a-z]+r$');
The results returned are shown below:
The last query returns last_name starting with K and ending in r.
Use of '[:blank:]':
select last_name from hr.employees where regexp_like (last_name, '[[:blank:]]');
The results returned are shown below:
The last query returns last_name from all employees having last_name containing a blank.
Use of '[:digit:]':
select last_name,salary from hr.employees where regexp_like (salary, '[[:digit:]]{5}');
The results returned are shown below:
The last query returns last_name, salary of all employees having currency in 5 digits.
Use of '[[=n=]]':
select regexp_substr('Danke schön', '[[=o=]]') from dual;
The results returned are shown below:
The last query uses REGEX_SUBSTR to extract any o in the input string.
select regexp_substr('El Niño', '[[=n=]]',1,1), regexp_substr('El Niño', '[[=n=]]',1,2) from dual;
The results returned are shown below:
The last query uses REGEX_SUBSTR to extract any n in the input string. In the first expression, the search starts at first position and locates the first instance of n and the second expression locates the second instance of n.
Use of '\w':
select regexp_substr('A@6', '\w',1,2) from dual;
The results returned are shown below:
The last query returns the first word character starting from the first position and is a second instance.
Use of '\W':
select regexp_substr('test@test', '\W') from dual;
The results returned are shown below:
This query returns first non-word character in the input expression.
Use of REGEXP_REPLACE:
select last_name, phone_number, regexp_replace(phone_number, '([/.])', ' ') as modified_phone_number from hr.employees where regexp_like (last_name, '^K(*)') ;
The results returned are shown below:
REGEXP_REPLACE function replaces '.' character in phone_number with a space character.
Use of REGEXP_INSTR:
select regexp_instr('Copyright © 2005, 2012 Oracle.All Rights Reserved.', '[^[:alpha:]]') from dual;
The results returned are shown below:
REGEXP_INSTR returns the position of the search criteria, a non alphanumeric character, in the input string.
Use of REGEXP_COUNT:
select last_name, regexp_count(last_name, '[A-z]'), phone_number, regexp_count(phone_number, '[0-9]{3}') from hr.employees where regexp_like (last_name, '^K(*)');
The results returned are shown below:
REGEXP_COUNT gives the count based on inputs. In the first expression, REGEXP_COUNT returns the number of alpha characters in the last name and in the second expression, REGEXP_COUNT returns the number of three continuous numbers.
This concludes the post on Regular Expressions in Oracle.