Mar 25

When an XML parser encounters an error while parsing it generally just gives up. Sometimes this is not what we want and would instead would like it to soldier on.

In these case Xerces has a useful feature that allows you to instruct it to continue parsing

parser.setFeature(
   "http://apache.org/xml/features/continue-after-fatal-error",
   true);

Once that is set the parser will happily continue parsing even after it has encountered an error.

written by objects \\ tags: , ,

Nov 27

You can try HTML Tidy.
May not work depending on the quality of the HTML but worth a try.

written by objects \\ tags: , ,

Apr 19

DOM Load and Save API provides a means for serializing XML data. The following example shows how to serialize an XML DOM document and produce ‘pretty’ indented output.


// First load your xml into a DOM
// That’s covered in another answer (see below)

// then check if DOM Load and Save is supported

DOMImplementationLS DOMiLS = null;
if ((doc.getFeature("Core", "3.0") != null)
	&& (doc.getFeature("LS", "3.0") != null))
{
	// It is support so grab the available implementation

	DOMiLS = (DOMImplementationLS) (doc.getImplementation())
		.getFeature("LS", "3.0");
}
else
{
	throw new RuntimeException("DOM Load and Save unsupported");
}

// Next create your LS output destination

LSOutput lso = DOMiLS.createLSOutput();

// create a stream to write the resulting xml to
// we'll use a file in this example

OutputStream out = new FileOutputStream(outFile);
lso.setByteStream(out);

// create a LS serializer
// and tell it to make the output 'pretty'

LSSerializer lss = DOMiLS.createLSSerializer();
lss.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);

// finally serialize the xml to your output stream

result = lss.write(doc, lso);

out.close();

See also:

written by objects \\ tags: , , , , ,