Aug 25

Typically String.split() or StringTokenizer class is user to break up a string into tokens. Problem with these methods is that they do not handle quoted text as you may require.

For example, consider the following string:

The mans name was "Big Fred"

Using split() or StringTokenizer on this would give us 6 tokens: (The) (mans) (name) (was) (“Big) (Fred”). Typically this is not what we want.

This is where the StreamTokenizer class comes in handy as it gives better control over the parsing process including identifying quoted text. The following example shows its usage:

String s = "The mans name was \"Big Fred\"";
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
st.quoteChar('"');
while (st.nextToken() != StreamTokenizer.TT_EOF) {
     System.out.println(st.sval);
}

Now we get the 5 tokens as required: (The) (mans) (name) (was) (Big Fred)

written by objects \\ tags: , , , ,


One Response to “How to tokenize a string and preserve quoted tokens”

  1. fouding Says:

    Thank you so much! This is exactly what I want!

Leave a Reply

You must be logged in to post a comment.