Dec 03

We often need to parse text and have vaious classes at our disposal to help us. For example String.split() is often used to break up a string on a given delimiter.

But what if we want to break up a paragraph of text into sentences. We could split on the period (.) character but this will not work because the period character may occur within the sentence. Different languages may also use a different character to mark the end of a sentence.

The BreakIterator class solves this problem for us providing implementations for breaking a string into sentences, words, lines or characters. The following example shows it’s usage for breaking a paragraph into sentences.

BreakIterator bi = BreakIterator.getSentenceInstance();
bi.setText(text);
int index = 0;
while (bi.next() != BreakIterator.DONE) {
    String sentence = text.substring(index, bi.current());
    System.out.println("Sentence: " + sentence);
    index = bi.current();
}

written by objects \\ tags: , , , ,


Leave a Reply

You must be logged in to post a comment.