You can specify a schema for validation when configuring the DocumentBuilderFactory that is used to parse the XML
static final String JAXP_SCHEMA_LANGUAGE =
static final String W3C_XML_SCHEMA =
DocumentBuilderFactory factory =
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(url.openStream());
By default a StAX parser will break (typically latge) CHARACTER event into pieces to avoid creating large strings. You have no control over where this break occurs.
You can use a factory property “javax.xml.stream.isCoalescing” to control this behaviour and force it to combine adjacent CHARACTER events into a single event.
When an XML parser encounters an error while parsing it generally just gives up. Sometimes this is not what we want and would instead would like it to soldier on.
In these case Xerces has a useful feature that allows you to instruct it to continue parsing
Once that is set the parser will happily continue parsing even after it has encountered an error.