You are on page 1of 12

Oracle10g Regular Expression

Oracle database 10g added a new feature "Regular Expression" enhancement that can be used with SQL and PL/SQL statements.

What are Regular Expressions?


Regular expressions specify patterns to search for in string data using standardized syntax conventions. A regular expression can specify complex patterns of character sequences. For example, the following regular expression A regular expression is specified using two types of characters:

Metacharacters--operators that specify algorithms for performing the search. Literals--the actual characters to search for.

Oracle Database implements regular expression support compliant with the POSIX Extended Regular Expression (ERE) specification.

There are four regular expression functions :REGEXP_LIKE REGEXP_SUBSTR REGEXP_INSTR REGEXP_REPLACE The functions REGEXP_SUBSTR, REGEXP_INSTR and REGEXP_REPLACE basically extend the functionality of other well known string functions SUBSTR, INSTR and REPLACE. REGEXP_LIKE is basically an operator that is similar to existing LIKE operator. The above regular expression functions are very efficient and powerful. A regular expression is basically the combination of one or more literals with some metacharacters , this is used for pattern matching. Oracle Regular Expression functions support the POSIX (Portable Operating System Interface) standard character classes.

Matching Modes
i case insensitive matching c case sensitive matching n the dot match any character, including newlines. m the caret and dollar match at the start and end of each line

Symbol * | ^/$

Description

Matches zero or more occurrences Alternation operator for specifying alternative matches Matches the start of line and the end of line Bracket expression for a matching list matching any one of the expressions represented in the list If the caret is inside the bracket, it negates the expression. Matches exactly m times Matches at least m times but no more than n times Specifies a character class and matches any character in that class Can have four different meanings: (1) stand for itself; (2) quote the next character; (3) introduce an operator; (4) do nothing Matches one or more occurrences Matches zero or one occurrence
Matches any character in the supported character set (except NULL)

[] [^exp] {m} {m,n} [: :] \ + ? . () \n [==] [..]

Grouping expression (treated as a single subexpression) Backreference expression Specifies equivalence classes Specifies one collation element (such as a multicharacter element)

=== == Example========== user oe =======


SELECT UNIQUE REGEXP_REPLACE (catalog_url, 'http://([^/]+).*', '\1') FROM oe.product_information ;

Here is an explanation of how the string was processed: http:// The expression starts by looking for this string literal; there are no special metacharacters here. ([^/]+) Then, the expression looks for a series of characters provided that they are "not" slash (/). .* The expression finishes by consuming the rest of the string with this part of the expression. \1 The matching expression is replaced with backreference 1, which is whatever was matched between the first set of parentheses.

==================== =======Example of regexp_substr ==========


1. SELECT FROM WHERE AND regexp_substr(to_char(translated_name), '^[a-z]+') oe.product_descriptions language_id = 'PT' translated_name like 'G%' ;

Note that the data is not displayed. The ^ is outside the bracket, which means that you are searching for any strings or substrings that start with any character from a to z. =============== ======== Perform the same query, but this time use the case-insensitive 'i' switched on. Execute the following script:
2. SELECT FROM WHERE AND regexp_substr(to_char(translated_name), '^[a-z]+', 1, 1, 'i') oe.product_descriptions language_id = 'PT' translated_name like 'G%' ;

================= =======

The results are still incomplete because the returned strings are trimmed as soon as a nonEnglish character is encountered. This is because the range [a-z] is sensitive to NLS_LANGUAGE. You thus need to set the NLS_LANGUAGE parameter appropriately to return the complete results. Execute the following query:
3. ALTER SESSION SET NLS_LANGUAGE=PORTUGUESE; SELECT regexp_substr(to_char(translated_name), '^[a-z]+', 1, 1, 'i') FROM oe.product_descriptions WHERE language_id = 'PT' AND translated_name like 'G%' ;

================== ======== The final step is to view the results in both English and Portuguese to ensure that the translation has taken place. Execute the following script:
4. SELECT REGEXP_SUBSTR(i.product_name, '^[a-z]+', 1, 1, 'i') || ' = ' || regexp_substr(to_char(d.translated_name), '^[a-z]+', 1, 1, 'i') FROM oe.product_descriptions d, oe.product_information i WHERE d.language_id = 'PT' AND d.translated_name like 'G%' AND i.product_id = d.product_id ; ALTER SESSION SET NLS_LANGUAGE=AMERICAN;

=== ====

by oracle========== ==================== =========Details ================

CREATE TABLE test ( testcol VARCHAR2(50)); INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INSERT INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO INTO test test test test test test test test test test test test test test test test test test VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES VALUES ('abcde'); ('12345'); ('1a4A5'); ('12a45'); ('12aBC'); ('12abc'); ('12ab5'); ('12aa5'); ('12AB5'); ('ABCDE'); ('123-5'); ('12.45'); ('1a4b5'); ('1 3 5'); ('1 45'); ('1 5'); ('a b c d'); ('a b c d e');

INSERT INSERT INSERT INSERT INSERT INSERT INSERT COMMIT;

INTO INTO INTO INTO INTO INTO INTO

test test test test test test test

VALUES VALUES VALUES VALUES VALUES VALUES VALUES

('a e'); ('Steven'); ('Stephen'); ('111.222.3333'); ('222.333.4444'); ('333.444.5555'); ('abcdefabcdefabcxyz');

==== Count's occurrences based on a regular expression ====


SELECT REGEXP_COUNT(testcol, '2a', 1, 'i') RESULT FROM test; SELECT REGEXP_COUNT(testcol, 'e', 1, 'i') RESULT FROM test;

==== REGEXP_INSTR=============
-------------------------- ----------Syntax : REGEXP_INSTR (source_string, pattern [, position [, occurrence [, return_option [, match_parameter ]]]]) REGEXP_INSTR returns the position of the matched substring.In case no match , the function returns 0. source_string is search value pattern is a valid regular expression position is an integer n (n >0) that indicates the position from where serach should begin (default is 1) occurrence is an integer n (n >0) that indicates the search for which occurrence of pattern in source_string. return_option gives the position w.r.t. occurrence, for value 0 => returns the position of the first character of the occurrence , for value 1 => returns the position of the character following the occurrence match_parameter can be one or more combination of valid Matching Modes ('i','c','n','m').

== Find words beginning with 's' or 'r' or 'p' followed by any 4 alphabetic characters: case
insensitive========

SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 0, 'i') RESULT FROM dual;

SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 1, 'i') RESULT FROM dual; SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 0, 'i') RESULT FROM dual; SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 1, 'i') RESULT FROM dual; ==== Find the position of try, trying, tried or tries======= SELECT REGEXP_INSTR('We are trying to make the subject easier.', 'tr(y(ing)?|(ied)|(ies))') RESULTNUM FROM dual; === Using Sub-Expression option====== SELECT testcol, REGEXP_INSTR(testcol, 'ab', 1, 1, 0, 'i',0) FROM test; SELECT testcol, REGEXP_INSTR(testcol, 'ab', 1, 1, 0, 'i',1) FROM test; SELECT testcol, REGEXP_INSTR(testcol, 'a(b)', 1, 1, 0, 'i',1) FROM test;

==================== =======

================ REGEXP_LIKE ==============


Syntax : REGEXP_LIKE(source_string, pattern [, match_parameter]) source_string can be any character expression and will be used as search value. pattern is any valid regular expression. match_parameter can be one or more combination of valid Matching Modes ('i','c','n','m').

===AlphaNumeric Characters===

SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alnum:]]'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{3}');

SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{5}'); -------------- -------==== Alphabetic Characters=== SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alpha:]]'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{3}'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{5}'); ===== Control Characters======== INSERT INTO test VALUES ('zyx' || CHR(13) || 'wvu'); COMMIT; SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:cntrl:]]{1}'); === Digits========== SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:digit:]]'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:digit:]]{3}'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:digit:]]{5}'); ===== Lower Case======== SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:lower:]]'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:lower:]]{2}'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:lower:]]{3}'); SELECT *

FROM test WHERE REGEXP_LIKE(testcol, '[[:lower:]]{5}'); ===== Printable Characters========= SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:print:]]{5}'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:print:]]{6}'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:print:]]{7}'); ==== Punctuation== TRUNCATE TABLE test; SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:punct:]]'); ===== Spaces========= SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:space:]]'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:space:]]{2}'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:space:]]{3}'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:space:]]{5}'); ==== Upper Case===== SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:upper:]]'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:upper:]]{2}'); SELECT * FROM test WHERE REGEXP_LIKE(testcol, '[[:upper:]]{3}');

======== Values Starting with 'a%b' ======== SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '^ab*'); ===== 'a' is the third value======== SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '^..a.'); ==== Contains two consecutive occurances of the letter 'a' or 'z'============= SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '([az])\1', 'i'); ==== Begins with 'Ste' ends with 'en' and contains either 'v' or 'ph' in the center=== SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '^Ste(v|ph)en$');

=== Use a regular expression in a check constraint=== CREATE TABLE mytest (c1 VARCHAR2(20), CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$'))); ============================== ======================

===========REGEXP_REPLACE=============
Syntax : REGEXP_REPLACE(source_char, pattern [, replace_string [, position [, occurrence [, match_parameter ]]]]) REGEXP_REPLACE returns source_char by replacing any matched substring for a regular expression pattern. source_char is search value pattern is a valid regular expression replace_string is a character string used for replacement this can be used as backreferences to subexpressions in the form \n where n is 1 to 9. position is an integer n (n >0) that indicates the position from where serach should begin (default is 1) occurrence is an integer n (n >0) that indicates the occurrence of replace operation, for value 0 => replace all the occurrences of the matched substring , for value n => replaces the nth occurrence. match_parameter can be one or more combination of valid Matching Modes ('i','c','n','m','x')

== Looks for the pattern xxx.xxx.xxxx and reformats pattern to (xxx) xxx-xxxx======= col testcol format a15 col result format a15 SELECT testcol, REGEXP_REPLACE(testcol,

'([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3') RESULT FROM test WHERE LENGTH(testcol) = 12; === Put a space after every character=== SELECT testcol, REGEXP_REPLACE(testcol, '(.)', '\1 ') RESULT FROM test WHERE testcol like 'S%'; ==== Replace multiple spaces with a single space========= SELECT REGEXP_REPLACE('500 Oracle Parkway, Redwood Shores, CA', '( ){2,}', ' ') RESULT FROM dual;

=== Insert a space between a lower case character followed by an upper case character=== SELECT REGEXP_REPLACE('George McGovern', '([[:lower:]])([[:upper:]])', '\1 \2') CITY FROM dual; ==== Replace the period with a string (note use of '\')==== SELECT REGEXP_REPLACE('We are trying to make the subject easier.','\.',' for you.') REGEXT_SAMPLE FROM dual; ================ == CREATE TABLE t( testcol VARCHAR2(10)); INSERT INTO t VALUES ('1'); INSERT INTO t VALUES ('2 '); INSERT INTO t VALUES ('3 new '); col newval format a10 SELECT LENGTH(testcol) len, testcol origval, REGEXP_REPLACE(testcol, '\W+$', ' ') newval, LENGTH(REGEXP_REPLACE(testcol, '\W+$', ' ')) newlen FROM t; ====================== =====================

============= REGEXP_SUBSTR===========
Syntax : REGEXP_SUBSTR(source_string, pattern [, position [, occurrence [, match_parameter ]]])

REGEXP_SUBSTR returns the substring for a regular expression pattern. source_string is search value pattern is a valid regular expression position is an integer n (n >0) that indicates the position from where serach should begin (default is 1) occurrence is an integer n (n >0) that indicates the search for which occurrence of pattern in source_string. match_parameter can be one or more combination of valid Matching Modes ('i','c','n','m'). == Searches for a comma followed by one or more occurrences of non-comma characters
followed by a comma==

SELECT REGEXP_SUBSTR('500 Oracle Parkway, Redwood Shores, CA', ',[^,]+,') RESULT FROM dual; === Look for http:// followed by a substring of one or more alphanumeric characters and
optionally, a period (.)===

col result format a50 SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database', 'http://([[:alnum:]]+\.?){3,4}/?') RESULT FROM dual; === Extracts try, trying, tried or tries=== SELECT REGEXP_SUBSTR('We are trying to make the subject easier.','tr(y(ing)?|(ied)|(ies))') FROM dual; === Extract the 3rd field treating ':' as a delimiter=== SELECT REGEXP_SUBSTR('system/pwd@orabase:1521:sidval', '[^:]+', 1, 3) RESULT FROM dual; === Extract from string with vertical bar delimiter======== CREATE TABLE regexp ( testcol VARCHAR2(50)); INSERT INTO regexp (testcol) VALUES ('One|Two|Three|Four|Five'); SELECT * FROM regexp;

SELECT REGEXP_SUBSTR(testcol,'[^|]+', 1, 3) FROM regexp; ===== Equivalence classes======== SELECT REGEXP_SUBSTR('iSelfSchooling NOT ISelfSchooling', '[[=i=]]SelfSchooling') RESULT FROM dual;

--------------------- --------set serveroutput on DECLARE x VARCHAR2(2); y VARCHAR2(2); c VARCHAR2(40) := '1:3,4:6,8:10,3:4,7:6,11:12'; BEGIN x := REGEXP_SUBSTR(c,'[^:]+', 1, 1); y := REGEXP_SUBSTR(c,'[^,]+', 3, 1); dbms_output.put_line(x ||' '|| y); END; /

Thanks : Md. Asaduzzaman Sikder ( 01712114756) azsikder2000@yahoo.com.

You might also like