Igor Kromin |   Consultant. Coder. Blogger. Tinkerer. Gamer.

I've come across a curious error in one of the services I help maintain, it was an XSL transformation error with a message like "Illegal HTML character: decimal 150". That's not something I've come across before so I've looked into it and immediately found this StackOverflow question about the exact same issue.

This error comes up when trying to use a stylesheet to generate a HTML 4.x document and trying to output certain HTML character codes that are illegal in the HTML 4.x specification. The cause of the error is due to HTML 4.x not defining any legal characters for ASCII codes 127-159 (inclusive) - as documented in this character table.

From the SO question I linked above, the solution is fairly simple...define a character-map for all of the illegal characters and map them to a space character for example...
 XSLT
<xsl:character-map name="no-control-characters">
<xsl:output-character character="&#127;" string=" "/>
<xsl:output-character character="&#128;" string=" "/>
...
<xsl:output-character character="&#159;" string=" "/>
</xsl:character-map>


Then to apply the map, add a use-character-maps attribute to the output element, like so...
 XSLT
<xsl:output ... use-character-maps="no-control-characters"/>


The above works well if you're OK with mapping all illegal characters to a space, but what if you want to map each one to something more meaningful like a UTF-8 character code?

That is also easy to do but you have to consider the fact that XSLT by default will perform output escaping on all text nodes. This can be disabled by setting disable-output-escaping to "yes" but I would not advise doing so as that would disable this escaping on all codes, not just the ones you're trying to map.



So if we want to map the HTML code &#128; to &#8364; via the character-map, we can't do this...
 Incorrect XSLT
<xsl:character-map name="no-control-characters">
...
<xsl:output-character character="&#128;" string="&#8364;"/>
...
</xsl:character-map>


The reason for that is because XSLT will interpret that as map any string "&#128;" to "€", i.e. it will use the escaped literal for the mapped character instead of the code specified in the stylesheet.

The workaround is to escape the escape code i.e. the & needs to be changed to &amp; like so...
 Incorrect XSLT
<xsl:character-map name="no-control-characters">
...
<xsl:output-character character="&#128;" string="&amp;#8364;"/>
...
</xsl:character-map>


The same approach can now be applied to the rest of the illegal HTML codes. In my case, we were able to map 21 out of the 32 characters to something meaningful and the rest were mapped to a question mark character. A good outcome overall.

-i

Hope you found this post useful...

...so please read on! I love writing articles that provide beneficial information, tips and examples to my readers. All information on my blog is provided free of charge and I encourage you to share it as you wish. There is a small favour I ask in return however - engage in comments below, provide feedback, and if you see mistakes let me know.

If you want to show additional support and help me pay for web hosting and domain name registration, donations, no matter how small, are always welcome!

Use of any information contained in this blog post/article is subject to this disclaimer.
comments powered by Disqus
Other posts you may like...