Nov 19

You can specify a schema for validation when configuring the DocumentBuilderFactory that is used to parse the XML

static final String JAXP_SCHEMA_LANGUAGE =
    "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
static final String W3C_XML_SCHEMA =
    "http://www.w3.org/2001/XMLSchema";

DocumentBuilderFactory factory = 
   DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);

DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(url.openStream()); 

written by objects \\ tags: , , , , ,

Oct 23

By default a StAX parser will break (typically latge) CHARACTER event into pieces to avoid creating large strings. You have no control over where this break occurs.

You can use a factory property “javax.xml.stream.isCoalescing” to control this behaviour and force it to combine adjacent CHARACTER events into a single event.

inputFactory.setProperty(
   XMLInputFactory.IS_COALESCING, Boolean.TRUE);

written by objects \\ tags: , , , ,

Mar 25

When an XML parser encounters an error while parsing it generally just gives up. Sometimes this is not what we want and would instead would like it to soldier on.

In these case Xerces has a useful feature that allows you to instruct it to continue parsing

parser.setFeature(
   "http://apache.org/xml/features/continue-after-fatal-error", 
   true);

Once that is set the parser will happily continue parsing even after it has encountered an error.

written by objects \\ tags: , ,