by simon baird

Thursday, July 20, 2006

Oracle PL/SQL function to split strings into tokens

I was trying to find how to split or tokenise (tokenize) a string in Oracle SQL. Technically you can do it directly with just instr and substr but your statement becomes very long and hard to read, mainly because you can't reuse the result of an instr. I found this which was interesting but not what I needed, so I wrote my own function to do the job. I'll post it here in the hope that it will be useful to someone.
create or replace function get_token(
   the_list  varchar2,
   the_index number,
   delim     varchar2 := ','
)
   return    varchar2
is
   start_pos number;
   end_pos   number;
begin
   if the_index = 1 then
       start_pos := 1;
   else
       start_pos := instr(the_list, delim, 1, the_index - 1);
       if start_pos = 0 then
           return null;
       else
           start_pos := start_pos + length(delim);
       end if;
   end if;

   end_pos := instr(the_list, delim, start_pos, 1);

   if end_pos = 0 then
       return substr(the_list, start_pos);
   else
       return substr(the_list, start_pos, end_pos - start_pos);
   end if;

end get_token;
/

Example usage:
select
   get_token('foo,bar,baz',1), -- 'foo'
   get_token('foo,bar,baz',3), -- 'baz'
   --
   get_token('a,,b',2),        -- '' (null)
   get_token('a,,b',3),        -- 'b'
   --
   get_token('a|b|c',2,'|'),   -- 'b'
   get_token('a|b|c',4,'|')    -- '' (null)
from
   dual
/

Notes
  • Remember that the index starts at one not zero just like strings in PL/SQL.
  • Empty tokens are counted. You get null if you ask for an empty token.
  • You also get null if you ask for an index greater than the number of tokens.
  • The delimiter defaults to a comma but you can specify any delimiter. A delimiter of more than one char should work also.


Update 17/5/07
I've added the notes above and colourised the code to make it look prettier. To the commenters below, you're welcome!

Update 12/9/08
This version provided by Anonymous below supports negative indexes, eg
get_token('foo,bar,baz',-1), -- 'baz'
get_token('foo,bar,baz',-2), -- 'bar'
Also check the comments below for a regex based version that can handle quoted csv style fields.
create or replace function get_token (
    the_list varchar2,
    the_index number,
    delim varchar2 :=';'
)
    return varchar2
is
    start_pos number;
    end_pos number;

begin
    if the_index = 1 then
        start_pos := 1;

    elsif the_index < 0 then
        start_pos := instr(the_list, delim, -1, abs(the_index)) + 1;

    else
        start_pos := instr(the_list, delim, 1, the_index - 1);

        if start_pos = 0 then
            return null;
        else
            start_pos := start_pos + length(delim);
        end if;

    end if;

    if the_index < 0 then
        end_pos := instr(the_list, delim, start_pos+1, 1);
    else
        end_pos := instr(the_list, delim, start_pos, 1);
    end if;

    if end_pos = 0 then
        return substr(the_list, start_pos);
    else
        return substr(the_list, start_pos, end_pos - start_pos);
    end if;

end get_token;

60 comments:

Anonymous said...

Hey! Thanks for the SQL token function, it was just what I needed!

Jack said...

This function rocks my socks. Thank you, stranger!

Anonymous said...

Thanks.

Anonymous said...

Great ! ! !

Anonymous said...

Nice and simple - just what I needed. Many thanks!

Anonymous said...

pretty much the best tokenizer ever! thanks!

Hans said...

thanks a lot. Very usefull !

Pedro said...

Very GOOD!!!

5ky123d said...

thank you mr. simon(i hope,... it's your name... )

Anonymous said...

Good work, friend! Helped me a lot :)

Anonymous said...

Yeah Thanks

Anonymous said...

Thank you so much! You are my new hero.

Fabrix said...

Thank's a lot. I saved a lot of time. I've posted your solution on my blog and I've put a link pointing to your blog. Hope it's ok for you.

http://fabrixb.blogspot.com/2007/07/oracle-plsql-function-to-split-strings.html

Anonymous said...

Thank you! It was just what i need!

greg said...

You saved my life :) Thank you very much.

JessiG said...

Awesome! Thanks!

Anonymous said...

Thank you so much, this rocks!

Anonymous said...

Not quite the same, but similar ... Oracle supplies a PL/SQL function to parse a[.b[.c]][@link] into a,b,c,link. It actually uses the Oracle parser to do the parsing.

dbms_utility.name_tokenize()

Phil said...

Excellent function! Thanks for posting it.

Anonymous said...

I do never write comments. But this solution is something. Thank you a lot for saving a working day to me. Maybe more. I wouдd give you a matrioska from Russia for this :-)

Anonymous said...

In Spanish: Excelente función. Gracias por compartirla!!

Anonymous said...

hey thanks a lot for this wonderful function ....really helped me a lot

Frito-Jay said...

Here is a version that uses regular expressions:

create or replace function get_token(
the_list varchar2,
the_index number,
delim varchar2 := ','
)
return varchar2
is
begin

return regexp_substr(the_list, '[[:alnum:]]+', regexp_instr(the_list, delim, 1, the_index, 1));

end get_token;

Anders Mathisen said...

hi! Thanks, exactly what I needed, had to modify it a bit to support multiple whitespaces, I posted a comment on my blog about it

http://blog.thej2eestruggle.com/2008/01/removing-multiple-whitespace-from.html

Anonymous said...

thank you

this function has helped me much

Anonymous said...

Thank you
your function help me
Pon007

Anonymous said...

The world's a better place when someone like you takes the time to write cool functions and then share them. Thanks!!

Anonymous said...

Dear friend. You saved me a couple of days in writing and testing this. world is a beautiful place because of people who SHARE. Thanks friend. best wishes, Gopal.

griffin said...

thx1.0e+6 4 this function

Anonymous said...

Thanks, this script got me started. I found that I sometimes had quoted data with the delimiter inside such as (,"comma,inside",) and modified this to work for me.

Here it is:
function get_token(
the_list varchar2,
the_index number,
delim varchar2 := ',',
l_escape varchar2 := '"'
)
return varchar2
is
reg_exp varchar2(200);
result varchar2(3000);
begin
-- Comment exampls assume , delimiter and " escape
-- Search for ,text until , or text until comma from start of string
-- before variables are inserted (^|,)(".*"|[^,]*)'
reg_exp:= '(^|' || delim ||')(' || l_escape || '.*' || l_escape || '|[^' || delim || ']*)';
result:= regexp_substr(the_list, reg_exp , 1,the_index);

--remove any begining delimiters
reg_exp := '^' || delim;
result:= regexp_replace(result,reg_exp,'');
return result;

end get_token;

Brian said...

Worked great! Thanks for posting.

Anonymous said...

great function

Anonymous said...

Hey !! This function is excellent. Thanks for posting this.

Anonymous said...

Thanks a lot for writing a generic tokenizer - I was looking for exactly this one! -Mukul

Anonymous said...

You simply rule! Thank you so much

couak said...

if you add feature of getting tokens from the end, the function will rock
example : get_token ('a,b,c', -1) will return 'c'

simon said...

yeah that would be cool.

Anonymous said...

I modified the function for the backend feature :
create or replace
function get_token
(
the_list varchar2,
the_index number,
delim varchar2 :=';'
)
return varchar2
is
start_pos number;
end_pos number;
begin
if the_index = 1 then
start_pos := 1;
elsif the_index < 0 then
start_pos := instr(the_list, delim, -1, abs(the_index))+1;
else
start_pos := instr(the_list, delim, 1, the_index - 1);
if start_pos = 0 then
return null;
else
start_pos := start_pos + length(delim);
end if;
end if;

if the_index < 0 then
end_pos := instr(the_list, delim, start_pos+1, 1);
else
end_pos := instr(the_list, delim, start_pos, 1);
end if;

if end_pos = 0 then
return substr(the_list, start_pos);
else
return substr(the_list, start_pos, end_pos - start_pos);
end if;

end get_token;

Anonymous said...

Thanks man, exactly what I needed!

Anonymous said...

you are the best!!!

damupi said...

u r a crack my man!!!

congratulations

Samsa said...

Thank you a lot!

Mil gracias!

Anonymous said...

Here is one that is recursive:

/*

Recursive PL/SQL function that returns the nth element in a delimited String.
The default delimiter is a ',' (comma) and the level indicator defaults to 0.


*/

CREATE OR REPLACE FUNCTION getStringElement(p_string VARCHAR2,
p_element NUMBER,
p_delimiter VARCHAR2 := ',',
p_level NUMBER := 0) RETURN VARCHAR2
IS

v_string VARCHAR2(2000) := NULL;
v_element VARCHAR2(2000) := NULL;

v_level NUMBER(4) := 0;


BEGIN

v_level := p_level + 1;

v_element := substr(p_string||p_delimiter,1,instr(p_string||p_delimiter,p_delimiter)-1);

IF ((v_level >= p_element) OR (v_element IS NULL)) THEN

RETURN v_element;

ELSE

v_string := substr(p_string||p_delimiter,instr(p_string||p_delimiter,p_delimiter)+1,length(p_string));

RETURN getStringElement(v_string,p_element,p_delimiter,v_level);

END IF;


END;
/

Example:

The statement:

SELECT getStringElement('This is an interesting test of recursion in PL/SQL',7,' ') from dual;

will return the value "recursion".

http://radio.weblogs.com/0137094/2008/04/12.html

Rameswara said...

Its a great job. It helped me a lot and saved my lot of time.

Great... Great... Great...

Anonymous said...

Hey thanks man,
You have done a great job. It saved my time.

Anonymous said...

Very nice. Just used it in some code.

Ethan Post

Anonymous said...

OK. Gonna use this for a SQL script for Geographic Information Project for dutch municipalities called BAG.

u§ur said...

Thanks.

I have added your code to my site.

http://www.coders2coders.com/articles.php?article_id=78

Anonymous said...

Thank you very much.
Exactly what I needed.

Anonymous said...

You are great!!

Anonymous said...

Really useful, saved some time. Thanks!!

Anonymous said...

I just searched for the split function and urs was the first to get on Google...thanks buddy... and thanks google too..cheers!

siwi arie said...

thank you very much for your code :D
very helpfull..

Twylla said...

Very helpful - thanks! I am using it in my Data Warehouse ETL code.

Anonymous said...

thank you so much!!!

Anonymous said...

if the_index =0 you will get an Error.

:(

Anyone fixed that ???

Ivan Milosavljević said...

For index = 0 just change first line to
if (the_index = 1 or the_index = 0) then

RRave said...

Dear Sir,

I hope you are doing well. I got this email address from one of your contribution web site. I have launched a web site www.codegain.com and it is basically aimed C#,JAVA,VB.NET,ASP.NET,AJAX,Sql Server,Oracle,WPF,WCF and etc resources, programming help, articles, code snippet, video demonstrations and problems solving support. I would like to invite you as an author and a supporter.
Looking forward to hearing from you and hope you will join with us soon.

Thank you
RRaveen
Founder CodeGain.com

Anonymous said...

SAVED ME FROM LOOPING THROUGH A CURSOR & SUBSTR ing the text

Thanks a lot!!!!

dhanie said...

you`re so nice.
it`s help & inspire manythings for my final project of undergraduate..
i`ve been searching & trying few days to make this 'tokenizer' on pl-sql but always failed.
thanks for your help.
^_^