by simon baird

Thursday, July 20, 2006

Oracle PL/SQL function to split strings into tokens

I was trying to find how to split or tokenise (tokenize) a string in Oracle SQL. Technically you can do it directly with just instr and substr but your statement becomes very long and hard to read, mainly because you can't reuse the result of an instr. I found this which was interesting but not what I needed, so I wrote my own function to do the job. I'll post it here in the hope that it will be useful to someone.
create or replace function get_token(
   the_list  varchar2,
   the_index number,
   delim     varchar2 := ','
)
   return    varchar2
is
   start_pos number;
   end_pos   number;
begin
   if the_index = 1 then
       start_pos := 1;
   else
       start_pos := instr(the_list, delim, 1, the_index - 1);
       if start_pos = 0 then
           return null;
       else
           start_pos := start_pos + length(delim);
       end if;
   end if;

   end_pos := instr(the_list, delim, start_pos, 1);

   if end_pos = 0 then
       return substr(the_list, start_pos);
   else
       return substr(the_list, start_pos, end_pos - start_pos);
   end if;

end get_token;
/

Example usage:
select
   get_token('foo,bar,baz',1), -- 'foo'
   get_token('foo,bar,baz',3), -- 'baz'
   --
   get_token('a,,b',2),        -- '' (null)
   get_token('a,,b',3),        -- 'b'
   --
   get_token('a|b|c',2,'|'),   -- 'b'
   get_token('a|b|c',4,'|')    -- '' (null)
from
   dual
/

Notes
  • Remember that the index starts at one not zero just like strings in PL/SQL.
  • Empty tokens are counted. You get null if you ask for an empty token.
  • You also get null if you ask for an index greater than the number of tokens.
  • The delimiter defaults to a comma but you can specify any delimiter. A delimiter of more than one char should work also.

Update 22/8/12

When I wrote this I didn't know about regexp_substr. (Or maybe the version of Oracle I was using didn't have it). You can do this much easier using the regexp_substr method. Have a look at the regexp_substr docs or see some examples in the comments below. For example:

regexp_substr('abc_def_ghi','[^_]+',1,3);

==> ghi.

101 comments:

Anonymous said...

Hey! Thanks for the SQL token function, it was just what I needed!

Jack said...

This function rocks my socks. Thank you, stranger!

Anonymous said...

Thanks.

Anonymous said...

Great ! ! !

Anonymous said...

Nice and simple - just what I needed. Many thanks!

Anonymous said...

pretty much the best tokenizer ever! thanks!

Hans said...

thanks a lot. Very usefull !

Pedro said...

Very GOOD!!!

5ky123d said...

thank you mr. simon(i hope,... it's your name... )

Anonymous said...

Good work, friend! Helped me a lot :)

Anonymous said...

Yeah Thanks

Anonymous said...

Thank you so much! You are my new hero.

Fabrix said...

Thank's a lot. I saved a lot of time. I've posted your solution on my blog and I've put a link pointing to your blog. Hope it's ok for you.

http://fabrixb.blogspot.com/2007/07/oracle-plsql-function-to-split-strings.html

Anonymous said...

Thank you! It was just what i need!

greg said...

You saved my life :) Thank you very much.

JessiG said...

Awesome! Thanks!

Anonymous said...

Thank you so much, this rocks!

Anonymous said...

Not quite the same, but similar ... Oracle supplies a PL/SQL function to parse a[.b[.c]][@link] into a,b,c,link. It actually uses the Oracle parser to do the parsing.

dbms_utility.name_tokenize()

Phil said...

Excellent function! Thanks for posting it.

Anonymous said...

I do never write comments. But this solution is something. Thank you a lot for saving a working day to me. Maybe more. I wouдd give you a matrioska from Russia for this :-)

Anonymous said...

In Spanish: Excelente función. Gracias por compartirla!!

Anonymous said...

hey thanks a lot for this wonderful function ....really helped me a lot

Frito-Jay said...

Here is a version that uses regular expressions:

create or replace function get_token(
the_list varchar2,
the_index number,
delim varchar2 := ','
)
return varchar2
is
begin

return regexp_substr(the_list, '[[:alnum:]]+', regexp_instr(the_list, delim, 1, the_index, 1));

end get_token;

Anders Mathisen said...

hi! Thanks, exactly what I needed, had to modify it a bit to support multiple whitespaces, I posted a comment on my blog about it

http://blog.thej2eestruggle.com/2008/01/removing-multiple-whitespace-from.html

Anonymous said...

thank you

this function has helped me much

Anonymous said...

Thank you
your function help me
Pon007

Anonymous said...

The world's a better place when someone like you takes the time to write cool functions and then share them. Thanks!!

Anonymous said...

Dear friend. You saved me a couple of days in writing and testing this. world is a beautiful place because of people who SHARE. Thanks friend. best wishes, Gopal.

griffin said...

thx1.0e+6 4 this function

Anonymous said...

Thanks, this script got me started. I found that I sometimes had quoted data with the delimiter inside such as (,"comma,inside",) and modified this to work for me.

Here it is:
function get_token(
the_list varchar2,
the_index number,
delim varchar2 := ',',
l_escape varchar2 := '"'
)
return varchar2
is
reg_exp varchar2(200);
result varchar2(3000);
begin
-- Comment exampls assume , delimiter and " escape
-- Search for ,text until , or text until comma from start of string
-- before variables are inserted (^|,)(".*"|[^,]*)'
reg_exp:= '(^|' || delim ||')(' || l_escape || '.*' || l_escape || '|[^' || delim || ']*)';
result:= regexp_substr(the_list, reg_exp , 1,the_index);

--remove any begining delimiters
reg_exp := '^' || delim;
result:= regexp_replace(result,reg_exp,'');
return result;

end get_token;

Brian said...

Worked great! Thanks for posting.

Anonymous said...

great function

Anonymous said...

Hey !! This function is excellent. Thanks for posting this.

Anonymous said...

Thanks a lot for writing a generic tokenizer - I was looking for exactly this one! -Mukul

Anonymous said...

You simply rule! Thank you so much

couak said...

if you add feature of getting tokens from the end, the function will rock
example : get_token ('a,b,c', -1) will return 'c'

simon said...

yeah that would be cool.

Anonymous said...

I modified the function for the backend feature :
create or replace
function get_token
(
the_list varchar2,
the_index number,
delim varchar2 :=';'
)
return varchar2
is
start_pos number;
end_pos number;
begin
if the_index = 1 then
start_pos := 1;
elsif the_index < 0 then
start_pos := instr(the_list, delim, -1, abs(the_index))+1;
else
start_pos := instr(the_list, delim, 1, the_index - 1);
if start_pos = 0 then
return null;
else
start_pos := start_pos + length(delim);
end if;
end if;

if the_index < 0 then
end_pos := instr(the_list, delim, start_pos+1, 1);
else
end_pos := instr(the_list, delim, start_pos, 1);
end if;

if end_pos = 0 then
return substr(the_list, start_pos);
else
return substr(the_list, start_pos, end_pos - start_pos);
end if;

end get_token;

Anonymous said...

Thanks man, exactly what I needed!

Anonymous said...

you are the best!!!

damupi said...

u r a crack my man!!!

congratulations

Samsa said...

Thank you a lot!

Mil gracias!

Anonymous said...

Here is one that is recursive:

/*

Recursive PL/SQL function that returns the nth element in a delimited String.
The default delimiter is a ',' (comma) and the level indicator defaults to 0.


*/

CREATE OR REPLACE FUNCTION getStringElement(p_string VARCHAR2,
p_element NUMBER,
p_delimiter VARCHAR2 := ',',
p_level NUMBER := 0) RETURN VARCHAR2
IS

v_string VARCHAR2(2000) := NULL;
v_element VARCHAR2(2000) := NULL;

v_level NUMBER(4) := 0;


BEGIN

v_level := p_level + 1;

v_element := substr(p_string||p_delimiter,1,instr(p_string||p_delimiter,p_delimiter)-1);

IF ((v_level >= p_element) OR (v_element IS NULL)) THEN

RETURN v_element;

ELSE

v_string := substr(p_string||p_delimiter,instr(p_string||p_delimiter,p_delimiter)+1,length(p_string));

RETURN getStringElement(v_string,p_element,p_delimiter,v_level);

END IF;


END;
/

Example:

The statement:

SELECT getStringElement('This is an interesting test of recursion in PL/SQL',7,' ') from dual;

will return the value "recursion".

http://radio.weblogs.com/0137094/2008/04/12.html

Rameswara said...

Its a great job. It helped me a lot and saved my lot of time.

Great... Great... Great...

Anonymous said...

Hey thanks man,
You have done a great job. It saved my time.

Anonymous said...

Very nice. Just used it in some code.

Ethan Post

Anonymous said...

OK. Gonna use this for a SQL script for Geographic Information Project for dutch municipalities called BAG.

u§ur said...

Thanks.

I have added your code to my site.

http://www.coders2coders.com/articles.php?article_id=78

Anonymous said...

Thank you very much.
Exactly what I needed.

Anonymous said...

You are great!!

Anonymous said...

Really useful, saved some time. Thanks!!

Anonymous said...

I just searched for the split function and urs was the first to get on Google...thanks buddy... and thanks google too..cheers!

siwi arie said...

thank you very much for your code :D
very helpfull..

Twylla said...

Very helpful - thanks! I am using it in my Data Warehouse ETL code.

Anonymous said...

thank you so much!!!

Anonymous said...

if the_index =0 you will get an Error.

:(

Anyone fixed that ???

Ivan Milosavljević said...

For index = 0 just change first line to
if (the_index = 1 or the_index = 0) then

Anonymous said...

SAVED ME FROM LOOPING THROUGH A CURSOR & SUBSTR ing the text

Thanks a lot!!!!

dhanie said...

you`re so nice.
it`s help & inspire manythings for my final project of undergraduate..
i`ve been searching & trying few days to make this 'tokenizer' on pl-sql but always failed.
thanks for your help.
^_^

darkchip said...

hi, great tip; you save a lot time for me.

thk´s.

Marija said...

This is why I love internet :)
Here is my small contribution:
The get_token from Anonimus that supports index < 0 has a bug for delimiters longer then 1 and when U have a list that ends with delimiter.
So I fixed it and here it is:
create or replace function get_token (
p_red varchar2 := '111;222;333;444;',
p_index number :=1,
p_delimiter varchar2 :=';'
)
return varchar2
is
l_start number;
l_end number;

begin
if p_index = 1 then
l_start := 1;

elsif p_index < 0 then
if substr (p_red, -length(p_delimiter))= p_delimiter then
l_start := instr(p_red, p_delimiter, -1, abs(p_index)+1) + length(p_delimiter);
else
l_start := instr(p_red, p_delimiter, -1, abs(p_index))+ length(p_delimiter);
end if;
else
l_start := instr(p_red, p_delimiter, 1, p_index - 1) ;

if l_start = 0 then
return null;
else
l_start := l_start + length(p_delimiter);
end if;

end if;

if p_index < 0 then
l_end := instr(p_red, p_delimiter, l_start+1, 1);
else
l_end := instr(p_red, p_delimiter, l_start, 1);
end if;

if l_end = 0 then
return substr(p_red, l_start);
else
return substr(p_red, l_start, l_end - l_start);
end if;

end get_token;

Anonymous said...

Thanks for the function. I was looking for the same one. As I am not much familar to functions it really helped me a lot

Anonymous said...

Thank you so much for posting the function.

Anonymous said...

I needed exactly the function that you wrote. Thanks to Google I found it and thanks to you I saved time (and CPU)!

Anonymous said...

This might also be useful.

http://www.oracle.com/technology/oramag/code/tips2007/070907.html

Anonymous said...

This might also be useful.

http://www.oracle.com/technology/oramag/code
/tips2007
/070907.html

My earlier comment had a broken link.

Anonymous said...

In italian: Grazie per aver condiviso questa utile funzione

simon said...

sei il benvenuto

Anonymous said...

Thank you!

Anonymous said...

Dear Sir,
I Can't Express My Feeling Toward Your Great Efforts...
You Have Done A Great Contribution To The Community...
Thank God
And I Thank You Very Much...


One God, The Creator, The Ruler.
All The Prophets From Adam Till Moses, Jesus And Mohamed Peace Be Upon All Of Them,
All Original Scripts
is My Belief....

Anonymous said...

You are the man!!!

Anonymous said...

You saved the day! Thanks for the code. This is what I was looking for. Great job!

Anonymous said...

Is there something we can do to this function that can return array of values from the pipe seperated string? like

function ('a|b|c|d', '|') may return
a
b
c
d

Anonymous said...

ok for the pl/sql solution, the "purist solution" would lead to : RETURN(a combination of simple predefined functions like "instr","substr","NVL")

Anonymous said...

this one made my life easy !!

Anonymous said...

Many thanks, quite possibly saved me a full workday!

Anonymous said...

THX THX THX

Anonymous said...

Thank you very much, just what I needed! Saved me some valuable time. Thanks for posting this useful snippet of code.

Anonymous said...

You should also consider using regular expressions e.g. regexp_substr etc.

Mike B said...

That's awesome! What a travesty oracle does not have a "split" function.

Anonymous said...

you are simply superb. great job done by you. excellent utilization of instr()

Luciano said...

Thank you so much, you're great!

EverGreen said...

First of all thanks for this blog. It helped me a lot.

I tried to use the regex solution as i have to split csv based fields, but it is not working.

It returns null for index >1 and returns entire string as it is for index = 1.

Please help!!!

Anonymous said...

Thanks a lot. Ur my life saver.

Anonymous said...

Hi simon baird,

Great Job. Thanks for your psot. your code really usful, ans I am using in project.

Thanks,
Kiran Nagaraju
Bangalore, India.
nkwm09@gmail.com

Anonymous said...

YOU ARE THE MAN!

Kaiz said...

hi, i have something in this format (ck1=1/ar1=1,ar2=2,ar3=3)


i have to split the string with delimiter "/" and then replace ck1 and ar* respectively.

Can you tell me how to do it using your funtion?

Anonymous said...

Thx so much, 5 years later this code saved me lots of time.
great job

Anonymous said...

This function works good.
But what i need is i have space(delimiter) more than one in a string, i want all the words exactly.

Example String:
'abc def ghi jkl'

i need
abc
def
ghi
jkl

please help me...

Thanks in advance......

Anonymous said...

Hi I have multiple spaces between words in the String. I want to separate the words exactly. please help me......
For Examle:
'abc def ghi jkl mno'

same example with another delimiter:

'abc///def//ghi/jkl/////mno'

I need result as:
abc
def
ghi
jkl
mno

KSW said...

I am new to Oracle PL SQL. This function helped me out very much. Thank you very much.

Ken

Christopher said...

+1 (y)

Anonymous said...

As much as reinventing the wheel can be cool, use this instead

regexp_substr('abc_def_ghi','[^_]+',1,3);

==> ghi.

David.

Anonymous said...

AWESOM MAN.. U REALLY ROCK!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! THXS 4 D CODE

Anonymous said...

Nice work, saved me a TON of time!

Anonymous said...

Really awesome!

Thanks a lot!

Anonymous said...

How to split the string 'ID=9999,DATE=27-AUG-2013,PROCESS_OPTION=100' ?

I need to get the values 999,27-AUG-2013,100 because these 2 values should need to pass as parameters to a procedure . The ID is a sequence number which will keep on increasing in every transaction.
Your help will be much appreciated.

Anonymous said...

superb code.It helped me to complete my task... nicely written.. thanks..

Anonymous said...

superb code. It helped to complete my task.. nicely written.. thanks

Anonymous said...

superb code. It helped to complete my task.. nicely written.. thanks

Martina Christy said...
This comment has been removed by a blog administrator.