Feb 22

Unicode string can contain lots lots of characters we don’t always want to deal with.

If you know specifically what characters you want to get rid of then you can use replaceAll() to get rid of them. But for a more general solution Java provides the java.text.Normalizer class.

The Normalizer class transforms Unicode text into an equivalent composed or decomposed form. Here is an example of its usage:

// Use Canonical decomposition
String normalized = Normalizer.normalize(unicodeString, 

written by objects \\ tags: , , , ,