Feb 22

Unicode string can contain lots lots of characters we don’t always want to deal with.

If you know specifically what characters you want to get rid of then you can use replaceAll() to get rid of them. But for a more general solution Java provides the java.text.Normalizer class.

The Normalizer class transforms Unicode text into an equivalent composed or decomposed form. Here is an example of its usage:

// Use Canonical decomposition
String normalized = Normalizer.normalize(unicodeString, 
   Normalizer.Form.NFD);

written by objects \\ tags: , , , ,

Dec 02

We’ve all seen characters in our text being displayed with squares or questions marks instead of the expected character. Reason for this is the font being used does not include glyphs for the character(s) found in the string.

To display chinese (or any language) text requires a font that supports the characters used by the language. The Font classes canDisplayUpTo() method tests whether that Font is capable of displaying all the characters in a given string.

    	Font[] allFonts = GraphicsEnvironment.getLocalGraphicsEnvironment().getAllFonts();
    	
    	// The chinese text we will use to test
    	
    	String sample = "\u4e00";    // unicode encoding
    	
    	// Loop for each available font
    	
    	for (int j = 0; j < allFonts.length; j++) {
    		
    		// Test if the font can display the sample text
    		
    	    if (allFonts[j].canDisplayUpTo(sample) == -1) { 
    	        System.out.println(allFonts[j].getFontName());
    	    }
    	}

written by objects \\ tags: , , , ,

Feb 28

A lopp can be used to display all through all the characters and the isDefined() method of the Character class can be used to determine if a given character is a valid Unicode character or not

for (int i=0; i<=Integer.MAX_VALUE; i++)
{
    if (Character.isDefined(i))
    {
         System.out.println(Integer.toHexString(i)+": "+
             new String(Character.toChars(i)));
    }
}

written by objects \\ tags: , ,