Thursday, July 20, 2006

Oracle PL/SQL function to split strings into tokens

I was trying to find how to split or tokenise (tokenize) a string in Oracle SQL. Technically you can do it directly with just instr and substr but your statement becomes very long and hard to read, mainly because you can't reuse the result of an instr. I found this which was interesting but not what I needed, so I wrote my own function to do the job. I'll post it here in the hope that it will be useful to someone.
create or replace function get_token(
   the_list  varchar2,
   the_index number,
   delim     varchar2 := ','
)
   return    varchar2
is
   start_pos number;
   end_pos   number;
begin
   if the_index = 1 then
       start_pos := 1;
   else
       start_pos := instr(the_list, delim, 1, the_index - 1);
       if start_pos = 0 then
           return null;
       else
           start_pos := start_pos + length(delim);
       end if;
   end if;

   end_pos := instr(the_list, delim, start_pos, 1);

   if end_pos = 0 then
       return substr(the_list, start_pos);
   else
       return substr(the_list, start_pos, end_pos - start_pos);
   end if;

end get_token;
/

Example usage:
select
   get_token('foo,bar,baz',1), -- 'foo'
   get_token('foo,bar,baz',3), -- 'baz'
   --
   get_token('a,,b',2),        -- '' (null)
   get_token('a,,b',3),        -- 'b'
   --
   get_token('a|b|c',2,'|'),   -- 'b'
   get_token('a|b|c',4,'|')    -- '' (null)
from
   dual
/

Notes
  • Remember that the index starts at one not zero just like strings in PL/SQL.
  • Empty tokens are counted. You get null if you ask for an empty token.
  • You also get null if you ask for an index greater than the number of tokens.
  • The delimiter defaults to a comma but you can specify any delimiter. A delimiter of more than one char should work also.

Update 22/8/12

When I wrote this I didn't know about regexp_substr. (Or maybe the version of Oracle I was using didn't have it). You can do this much easier using the regexp_substr method. Have a look at the regexp_substr docs or see some examples in the comments below. For example:

regexp_substr('abc_def_ghi','[^_]+',1,3);

==> ghi.

99 comments:

  1. Hey! Thanks for the SQL token function, it was just what I needed!

    ReplyDelete
  2. This function rocks my socks. Thank you, stranger!

    ReplyDelete
  3. AnonymousMay 17, 2007

    Great ! ! !

    ReplyDelete
  4. AnonymousMay 24, 2007

    Nice and simple - just what I needed. Many thanks!

    ReplyDelete
  5. AnonymousMay 26, 2007

    pretty much the best tokenizer ever! thanks!

    ReplyDelete
  6. thanks a lot. Very usefull !

    ReplyDelete
  7. thank you mr. simon(i hope,... it's your name... )

    ReplyDelete
  8. AnonymousJune 16, 2007

    Good work, friend! Helped me a lot :)

    ReplyDelete
  9. AnonymousJune 20, 2007

    Yeah Thanks

    ReplyDelete
  10. AnonymousJune 30, 2007

    Thank you so much! You are my new hero.

    ReplyDelete
  11. Thank's a lot. I saved a lot of time. I've posted your solution on my blog and I've put a link pointing to your blog. Hope it's ok for you.

    http://fabrixb.blogspot.com/2007/07/oracle-plsql-function-to-split-strings.html

    ReplyDelete
  12. AnonymousJuly 04, 2007

    Thank you! It was just what i need!

    ReplyDelete
  13. You saved my life :) Thank you very much.

    ReplyDelete
  14. Awesome! Thanks!

    ReplyDelete
  15. Thank you so much, this rocks!

    ReplyDelete
  16. Not quite the same, but similar ... Oracle supplies a PL/SQL function to parse a[.b[.c]][@link] into a,b,c,link. It actually uses the Oracle parser to do the parsing.

    dbms_utility.name_tokenize()

    ReplyDelete
  17. Excellent function! Thanks for posting it.

    ReplyDelete
  18. I do never write comments. But this solution is something. Thank you a lot for saving a working day to me. Maybe more. I wouĞ´d give you a matrioska from Russia for this :-)

    ReplyDelete
  19. In Spanish: Excelente función. Gracias por compartirla!!

    ReplyDelete
  20. hey thanks a lot for this wonderful function ....really helped me a lot

    ReplyDelete
  21. Here is a version that uses regular expressions:

    create or replace function get_token(
    the_list varchar2,
    the_index number,
    delim varchar2 := ','
    )
    return varchar2
    is
    begin

    return regexp_substr(the_list, '[[:alnum:]]+', regexp_instr(the_list, delim, 1, the_index, 1));

    end get_token;

    ReplyDelete
  22. hi! Thanks, exactly what I needed, had to modify it a bit to support multiple whitespaces, I posted a comment on my blog about it

    http://blog.thej2eestruggle.com/2008/01/removing-multiple-whitespace-from.html

    ReplyDelete
  23. thank you

    this function has helped me much

    ReplyDelete
  24. Thank you
    your function help me
    Pon007

    ReplyDelete
  25. The world's a better place when someone like you takes the time to write cool functions and then share them. Thanks!!

    ReplyDelete
  26. Dear friend. You saved me a couple of days in writing and testing this. world is a beautiful place because of people who SHARE. Thanks friend. best wishes, Gopal.

    ReplyDelete
  27. thx1.0e+6 4 this function

    ReplyDelete
  28. Thanks, this script got me started. I found that I sometimes had quoted data with the delimiter inside such as (,"comma,inside",) and modified this to work for me.

    Here it is:
    function get_token(
    the_list varchar2,
    the_index number,
    delim varchar2 := ',',
    l_escape varchar2 := '"'
    )
    return varchar2
    is
    reg_exp varchar2(200);
    result varchar2(3000);
    begin
    -- Comment exampls assume , delimiter and " escape
    -- Search for ,text until , or text until comma from start of string
    -- before variables are inserted (^|,)(".*"|[^,]*)'
    reg_exp:= '(^|' || delim ||')(' || l_escape || '.*' || l_escape || '|[^' || delim || ']*)';
    result:= regexp_substr(the_list, reg_exp , 1,the_index);

    --remove any begining delimiters
    reg_exp := '^' || delim;
    result:= regexp_replace(result,reg_exp,'');
    return result;

    end get_token;

    ReplyDelete
  29. Worked great! Thanks for posting.

    ReplyDelete
  30. great function

    ReplyDelete
  31. AnonymousMay 07, 2008

    Hey !! This function is excellent. Thanks for posting this.

    ReplyDelete
  32. AnonymousJune 10, 2008

    Thanks a lot for writing a generic tokenizer - I was looking for exactly this one! -Mukul

    ReplyDelete
  33. You simply rule! Thank you so much

    ReplyDelete
  34. if you add feature of getting tokens from the end, the function will rock
    example : get_token ('a,b,c', -1) will return 'c'

    ReplyDelete
  35. yeah that would be cool.

    ReplyDelete
  36. I modified the function for the backend feature :
    create or replace
    function get_token
    (
    the_list varchar2,
    the_index number,
    delim varchar2 :=';'
    )
    return varchar2
    is
    start_pos number;
    end_pos number;
    begin
    if the_index = 1 then
    start_pos := 1;
    elsif the_index < 0 then
    start_pos := instr(the_list, delim, -1, abs(the_index))+1;
    else
    start_pos := instr(the_list, delim, 1, the_index - 1);
    if start_pos = 0 then
    return null;
    else
    start_pos := start_pos + length(delim);
    end if;
    end if;

    if the_index < 0 then
    end_pos := instr(the_list, delim, start_pos+1, 1);
    else
    end_pos := instr(the_list, delim, start_pos, 1);
    end if;

    if end_pos = 0 then
    return substr(the_list, start_pos);
    else
    return substr(the_list, start_pos, end_pos - start_pos);
    end if;

    end get_token;

    ReplyDelete
  37. Thanks man, exactly what I needed!

    ReplyDelete
  38. you are the best!!!

    ReplyDelete
  39. u r a crack my man!!!

    congratulations

    ReplyDelete
  40. Thank you a lot!

    Mil gracias!

    ReplyDelete
  41. Here is one that is recursive:

    /*

    Recursive PL/SQL function that returns the nth element in a delimited String.
    The default delimiter is a ',' (comma) and the level indicator defaults to 0.


    */

    CREATE OR REPLACE FUNCTION getStringElement(p_string VARCHAR2,
    p_element NUMBER,
    p_delimiter VARCHAR2 := ',',
    p_level NUMBER := 0) RETURN VARCHAR2
    IS

    v_string VARCHAR2(2000) := NULL;
    v_element VARCHAR2(2000) := NULL;

    v_level NUMBER(4) := 0;


    BEGIN

    v_level := p_level + 1;

    v_element := substr(p_string||p_delimiter,1,instr(p_string||p_delimiter,p_delimiter)-1);

    IF ((v_level >= p_element) OR (v_element IS NULL)) THEN

    RETURN v_element;

    ELSE

    v_string := substr(p_string||p_delimiter,instr(p_string||p_delimiter,p_delimiter)+1,length(p_string));

    RETURN getStringElement(v_string,p_element,p_delimiter,v_level);

    END IF;


    END;
    /

    Example:

    The statement:

    SELECT getStringElement('This is an interesting test of recursion in PL/SQL',7,' ') from dual;

    will return the value "recursion".

    http://radio.weblogs.com/0137094/2008/04/12.html

    ReplyDelete
  42. Its a great job. It helped me a lot and saved my lot of time.

    Great... Great... Great...

    ReplyDelete
  43. Hey thanks man,
    You have done a great job. It saved my time.

    ReplyDelete
  44. Very nice. Just used it in some code.

    Ethan Post

    ReplyDelete
  45. OK. Gonna use this for a SQL script for Geographic Information Project for dutch municipalities called BAG.

    ReplyDelete
  46. Thank you very much.
    Exactly what I needed.

    ReplyDelete
  47. You are great!!

    ReplyDelete
  48. Really useful, saved some time. Thanks!!

    ReplyDelete
  49. I just searched for the split function and urs was the first to get on Google...thanks buddy... and thanks google too..cheers!

    ReplyDelete
  50. thank you very much for your code :D
    very helpfull..

    ReplyDelete
  51. Very helpful - thanks! I am using it in my Data Warehouse ETL code.

    ReplyDelete
  52. AnonymousMay 08, 2009

    thank you so much!!!

    ReplyDelete
  53. AnonymousMay 12, 2009

    if the_index =0 you will get an Error.

    :(

    Anyone fixed that ???

    ReplyDelete
  54. For index = 0 just change first line to
    if (the_index = 1 or the_index = 0) then

    ReplyDelete
  55. AnonymousJuly 08, 2009

    SAVED ME FROM LOOPING THROUGH A CURSOR & SUBSTR ing the text

    Thanks a lot!!!!

    ReplyDelete
  56. you`re so nice.
    it`s help & inspire manythings for my final project of undergraduate..
    i`ve been searching & trying few days to make this 'tokenizer' on pl-sql but always failed.
    thanks for your help.
    ^_^

    ReplyDelete
  57. hi, great tip; you save a lot time for me.

    thk´s.

    ReplyDelete
  58. This is why I love internet :)
    Here is my small contribution:
    The get_token from Anonimus that supports index < 0 has a bug for delimiters longer then 1 and when U have a list that ends with delimiter.
    So I fixed it and here it is:
    create or replace function get_token (
    p_red varchar2 := '111;222;333;444;',
    p_index number :=1,
    p_delimiter varchar2 :=';'
    )
    return varchar2
    is
    l_start number;
    l_end number;

    begin
    if p_index = 1 then
    l_start := 1;

    elsif p_index < 0 then
    if substr (p_red, -length(p_delimiter))= p_delimiter then
    l_start := instr(p_red, p_delimiter, -1, abs(p_index)+1) + length(p_delimiter);
    else
    l_start := instr(p_red, p_delimiter, -1, abs(p_index))+ length(p_delimiter);
    end if;
    else
    l_start := instr(p_red, p_delimiter, 1, p_index - 1) ;

    if l_start = 0 then
    return null;
    else
    l_start := l_start + length(p_delimiter);
    end if;

    end if;

    if p_index < 0 then
    l_end := instr(p_red, p_delimiter, l_start+1, 1);
    else
    l_end := instr(p_red, p_delimiter, l_start, 1);
    end if;

    if l_end = 0 then
    return substr(p_red, l_start);
    else
    return substr(p_red, l_start, l_end - l_start);
    end if;

    end get_token;

    ReplyDelete
  59. Thanks for the function. I was looking for the same one. As I am not much familar to functions it really helped me a lot

    ReplyDelete
  60. Thank you so much for posting the function.

    ReplyDelete
  61. I needed exactly the function that you wrote. Thanks to Google I found it and thanks to you I saved time (and CPU)!

    ReplyDelete
  62. This might also be useful.

    http://www.oracle.com/technology/oramag/code/tips2007/070907.html

    ReplyDelete
  63. This might also be useful.

    http://www.oracle.com/technology/oramag/code
    /tips2007
    /070907.html

    My earlier comment had a broken link.

    ReplyDelete
  64. In italian: Grazie per aver condiviso questa utile funzione

    ReplyDelete
  65. Dear Sir,
    I Can't Express My Feeling Toward Your Great Efforts...
    You Have Done A Great Contribution To The Community...
    Thank God
    And I Thank You Very Much...


    One God, The Creator, The Ruler.
    All The Prophets From Adam Till Moses, Jesus And Mohamed Peace Be Upon All Of Them,
    All Original Scripts
    is My Belief....

    ReplyDelete
  66. AnonymousJuly 13, 2010

    You are the man!!!

    ReplyDelete
  67. You saved the day! Thanks for the code. This is what I was looking for. Great job!

    ReplyDelete
  68. Is there something we can do to this function that can return array of values from the pipe seperated string? like

    function ('a|b|c|d', '|') may return
    a
    b
    c
    d

    ReplyDelete
  69. ok for the pl/sql solution, the "purist solution" would lead to : RETURN(a combination of simple predefined functions like "instr","substr","NVL")

    ReplyDelete
  70. this one made my life easy !!

    ReplyDelete
  71. Many thanks, quite possibly saved me a full workday!

    ReplyDelete
  72. THX THX THX

    ReplyDelete
  73. Thank you very much, just what I needed! Saved me some valuable time. Thanks for posting this useful snippet of code.

    ReplyDelete
  74. You should also consider using regular expressions e.g. regexp_substr etc.

    ReplyDelete
  75. That's awesome! What a travesty oracle does not have a "split" function.

    ReplyDelete
  76. you are simply superb. great job done by you. excellent utilization of instr()

    ReplyDelete
  77. Thank you so much, you're great!

    ReplyDelete
  78. First of all thanks for this blog. It helped me a lot.

    I tried to use the regex solution as i have to split csv based fields, but it is not working.

    It returns null for index >1 and returns entire string as it is for index = 1.

    Please help!!!

    ReplyDelete
  79. AnonymousJune 17, 2011

    Thanks a lot. Ur my life saver.

    ReplyDelete
  80. AnonymousJune 21, 2011

    Hi simon baird,

    Great Job. Thanks for your psot. your code really usful, ans I am using in project.

    Thanks,
    Kiran Nagaraju
    Bangalore, India.
    nkwm09@gmail.com

    ReplyDelete
  81. YOU ARE THE MAN!

    ReplyDelete
  82. hi, i have something in this format (ck1=1/ar1=1,ar2=2,ar3=3)


    i have to split the string with delimiter "/" and then replace ck1 and ar* respectively.

    Can you tell me how to do it using your funtion?

    ReplyDelete
  83. Thx so much, 5 years later this code saved me lots of time.
    great job

    ReplyDelete
  84. This function works good.
    But what i need is i have space(delimiter) more than one in a string, i want all the words exactly.

    Example String:
    'abc def ghi jkl'

    i need
    abc
    def
    ghi
    jkl

    please help me...

    Thanks in advance......

    ReplyDelete
  85. Hi I have multiple spaces between words in the String. I want to separate the words exactly. please help me......
    For Examle:
    'abc def ghi jkl mno'

    same example with another delimiter:

    'abc///def//ghi/jkl/////mno'

    I need result as:
    abc
    def
    ghi
    jkl
    mno

    ReplyDelete
  86. I am new to Oracle PL SQL. This function helped me out very much. Thank you very much.

    Ken

    ReplyDelete
  87. AnonymousJune 06, 2012

    As much as reinventing the wheel can be cool, use this instead

    regexp_substr('abc_def_ghi','[^_]+',1,3);

    ==> ghi.

    David.

    ReplyDelete
  88. AnonymousJuly 12, 2012

    AWESOM MAN.. U REALLY ROCK!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! THXS 4 D CODE

    ReplyDelete
  89. Nice work, saved me a TON of time!

    ReplyDelete
  90. Really awesome!

    Thanks a lot!

    ReplyDelete
  91. How to split the string 'ID=9999,DATE=27-AUG-2013,PROCESS_OPTION=100' ?

    I need to get the values 999,27-AUG-2013,100 because these 2 values should need to pass as parameters to a procedure . The ID is a sequence number which will keep on increasing in every transaction.
    Your help will be much appreciated.

    ReplyDelete
  92. superb code.It helped me to complete my task... nicely written.. thanks..

    ReplyDelete
  93. superb code. It helped to complete my task.. nicely written.. thanks

    ReplyDelete
  94. superb code. It helped to complete my task.. nicely written.. thanks

    ReplyDelete
  95. This comment has been removed by a blog administrator.

    ReplyDelete

Note: Only a member of this blog may post a comment.