Mar 05

Often we need to left or right pad a string, for example when you need a set of strings to be all the same length.

Java introduced Formatter class in 1.5 to provide printf style formatting. We can use this formatting to pad our Strings with spaces. The String class provides a utility format() method that supports this formatting as shown in the following example methods.

public static String rightPad(String s, int width) {
     return String.format("%1$-" + width + "s", s);  
}

public static String leftPad(String s, int width) {
    return String.format("%1$#" + width + "s", s);  
}

written by objects \\ tags: , , , , ,

Feb 22

Unicode string can contain lots lots of characters we don’t always want to deal with.

If you know specifically what characters you want to get rid of then you can use replaceAll() to get rid of them. But for a more general solution Java provides the java.text.Normalizer class.

The Normalizer class transforms Unicode text into an equivalent composed or decomposed form. Here is an example of its usage:

// Use Canonical decomposition
String normalized = Normalizer.normalize(unicodeString, 
   Normalizer.Form.NFD);

written by objects \\ tags: , , , ,

Dec 03

We often need to parse text and have vaious classes at our disposal to help us. For example String.split() is often used to break up a string on a given delimiter.

But what if we want to break up a paragraph of text into sentences. We could split on the period (.) character but this will not work because the period character may occur within the sentence. Different languages may also use a different character to mark the end of a sentence.

The BreakIterator class solves this problem for us providing implementations for breaking a string into sentences, words, lines or characters. The following example shows it’s usage for breaking a paragraph into sentences.

BreakIterator bi = BreakIterator.getSentenceInstance();
bi.setText(text);
int index = 0;
while (bi.next() != BreakIterator.DONE) {
    String sentence = text.substring(index, bi.current());
    System.out.println("Sentence: " + sentence);
    index = bi.current();
}

written by objects \\ tags: , , , ,