Unicode string can contain lots lots of characters we don’t always want to deal with.
If you know specifically what characters you want to get rid of then you can use replaceAll() to get rid of them. But for a more general solution Java provides the java.text.Normalizer class.
The Normalizer class transforms Unicode text into an equivalent composed or decomposed form. Here is an example of its usage:
// Use Canonical decomposition
String normalized = Normalizer.normalize(unicodeString,
Normalizer.Form.NFD);
written by objects
\\ tags: ascii, decomposition, Normalizer, string, unicode
We’ve all seen characters in our text being displayed with squares or questions marks instead of the expected character. Reason for this is the font being used does not include glyphs for the character(s) found in the string.
To display chinese (or any language) text requires a font that supports the characters used by the language. The Font classes canDisplayUpTo() method tests whether that Font is capable of displaying all the characters in a given string.
Font[] allFonts = GraphicsEnvironment.getLocalGraphicsEnvironment().getAllFonts();
// The chinese text we will use to test
String sample = "\u4e00"; // unicode encoding
// Loop for each available font
for (int j = 0; j < allFonts.length; j++) {
// Test if the font can display the sample text
if (allFonts[j].canDisplayUpTo(sample) == -1) {
System.out.println(allFonts[j].getFontName());
}
}
written by objects
\\ tags: character, chinese, font, glyph, unicode
A lopp can be used to display all through all the characters and the isDefined() method of the Character class can be used to determine if a given character is a valid Unicode character or not
for (int i=0; i<=Integer.MAX_VALUE; i++)
{
if (Character.isDefined(i))
{
System.out.println(Integer.toHexString(i)+": "+
new String(Character.toChars(i)));
}
}
written by objects
\\ tags: character, hex, unicode
Recent Comments