When an XML parser encounters an error while parsing it generally just gives up. Sometimes this is not what we want and would instead would like it to soldier on.
In these case Xerces has a useful feature that allows you to instruct it to continue parsing
parser.setFeature(
"http://apache.org/xml/features/continue-after-fatal-error",
true);
Once that is set the parser will happily continue parsing even after it has encountered an error.
written by objects
\\ tags: feature, parser, xerces
You can try HTML Tidy.
May not work depending on the quality of the HTML but worth a try.
written by objects
\\ tags: html, HTML Tidy, xhtml
DOM Load and Save API provides a means for serializing XML data. The following example shows how to serialize an XML DOM document and produce ‘pretty’ indented output.
// First load your xml into a DOM
// That’s covered in another answer (see below)
// then check if DOM Load and Save is supported
DOMImplementationLS DOMiLS = null;
if ((doc.getFeature("Core", "3.0") != null)
&& (doc.getFeature("LS", "3.0") != null))
{
// It is support so grab the available implementation
DOMiLS = (DOMImplementationLS) (doc.getImplementation())
.getFeature("LS", "3.0");
}
else
{
throw new RuntimeException("DOM Load and Save unsupported");
}
// Next create your LS output destination
LSOutput lso = DOMiLS.createLSOutput();
// create a stream to write the resulting xml to
// we'll use a file in this example
OutputStream out = new FileOutputStream(outFile);
lso.setByteStream(out);
// create a LS serializer
// and tell it to make the output 'pretty'
LSSerializer lss = DOMiLS.createLSSerializer();
lss.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
// finally serialize the xml to your output stream
result = lss.write(doc, lso);
out.close();
See also:
written by objects
\\ tags: Document, DOM, Indentation, Load and Save, pretty, serialization
Recent Comments