Igor Kromin |   Consultant. Coder. Blogger. Tinkerer. Gamer.

I was surprised to find that StringEscapeUtils in the Apache Commons Lang library doesn't let you specify whether it should double encode existing XML entities or not. After all, even PHP lets you do this. There is a very simple workaround for that however, so read on.

In PHP if you want to avoid double-encoding you simply pass false to the htmlentities() function like so:
 PHP
$strOrig = "&";
$strEnc = htmlentities($strOrig, ENT_XML1, "UTF-8", false);


This will output & instead of & i.e. the string is not double encoded.

To achieve the same result with Java and Apache Commons Lang StringEscapeUtils all you have to do is:
 Java
String strOrig = "&";
String strTemp = StringEscapeUtils.unescapeXml(strOrig);
String strEnc = StringEscapeUtils.escapeXml(strTemp);


That's simple after you see it! Just unescape the string first, then escape it. That will take care of any already encoded entities and will avoid double encoding.



-i

A quick disclaimer...

Although I put in a great effort into researching all the topics I cover, mistakes can happen. Use of any information from my blog posts should be at own risk and I do not hold any liability towards any information misuse or damages caused by following any of my posts.

All content and opinions expressed on this Blog are my own and do not represent the opinions of my employer (Oracle). Use of any information contained in this blog post/article is subject to this disclaimer.
Hi! You can search my blog here ⤵
NOTE: (2022) This Blog is no longer maintained and I will not be answering any emails or comments.

I am now focusing on Atari Gamer.