i’ll be darned – thanks to the fact that i was stumbling through jonas’ archives, i discovered that even though i’d read the semantic web: a primer , i guess i skimmed over a blurb about a nice little tool called, tidy. it can do alot of things, not the least of which is acting as a preprocessor for xslt:

“Once the richer information has been embedded in a page, a program still needs to transform it into the format it requires. At this point another W3C technology, XSLT, has a lot to offer. Given an XHTML page as input, it is useful for selecting and transforming the contents of that page. It provides an excellent bridge from older HTML technology to the nascent XML-based Semantic Web applications. A tool of
singular utility when used in conjunction with an XSLT processor is Dave Raggett’s “Tidy,” which can take HTML and turn it into XHTML. As most web authoring tools still don’t have XHTML support, HTML will be created by web authors for some time to come. Tidy facilitates the processing of normal HTML with XSLT, enabling authors of such documents to participate in the Semantic Web.”

whump also points to an older article in webreview on using tidy to convert an existing HTML page into XHTML:

“Weighing in at under 200 KB, HTML Tidy is the closest you’ll get to a perfect HTML utility.”

