Convert Unicode To Html Entities Hex
Solution 1:
For the missing hex-encoding in the related question:
$output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) {
list($utf8) = $match;
$binary = mb_convert_encoding($utf8, 'UTF-32BE', 'UTF-8');
$entity = vsprintf('&#x%X;', unpack('N', $binary));
return$entity;
}, $input);
This is similar to @Baba's answer using UTF-32BE and then unpack
and vsprintf
for the formatting needs.
If you prefer iconv
over mb_convert_encoding
, it's similar:
$output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) {
list($utf8) = $match;
$binary = iconv('UTF-8', 'UTF-32BE', $utf8);
$entity = vsprintf('&#x%X;', unpack('N', $binary));
return$entity;
}, $input);
I find this string manipulation a bit more clear then in Get hexcode of html entities.
Solution 2:
Your string looks like UCS-4
encoding you can try
$first = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
$char = current($m);
$utf = iconv('UTF-8', 'UCS-4', $char);
returnsprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $string);
Output
string'Français' (length=13)
Solution 3:
Firstly, when I faced this problem recently, I solved it by making sure my code-files, DB connection, and DB tables were all UTF-8 Then, simply echoing the text works. If you must escape the output from the DB use htmlspecialchars()
and not htmlentities()
so that the UTF-8 symbols are left alone and not attempted to be escaped.
Would like to document an alternative solution because it solved a similar problem for me.
I was using PHP's utf8_encode()
to escape 'special' characters.
I wanted to convert them into HTML entities for display, I wrote this code because I wanted to avoid iconv or such functions as far as possible since not all environments necessarily have them (do correct me if it is not so!)
$foo = 'This is my test string \u03b50';
echo unicode2html($foo);
functionunicode2html($string) {
return preg_replace('/\\\\u([0-9a-z]{4})/', '&#x$1;', $string);
}
Hope this helps somebody in need :-)
Solution 4:
See How to get the character from unicode code point in PHP? for some code that allows you to do the following :
Example use :
echo"Get string from numeric DEC value\n";
var_dump(mb_chr(50319, 'UCS-4BE'));
var_dump(mb_chr(271));
echo"\nGet string from numeric HEX value\n";
var_dump(mb_chr(0xC48F, 'UCS-4BE'));
var_dump(mb_chr(0x010F));
echo"\nGet numeric value of character as DEC string\n";
var_dump(mb_ord('ď', 'UCS-4BE'));
var_dump(mb_ord('ď'));
echo"\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('ď', 'UCS-4BE')));
var_dump(dechex(mb_ord('ď')));
echo"\nEncode / decode to DEC based HTML entities\n";
var_dump(mb_htmlentities('tchüß', false));
var_dump(mb_html_entity_decode('tchüß'));
echo"\nEncode / decode to HEX based HTML entities\n";
var_dump(mb_htmlentities('tchüß'));
var_dump(mb_html_entity_decode('tchüß'));
echo"\nUse JSON encoding / decoding\n";
var_dump(codepoint_encode("tchüß"));
var_dump(codepoint_decode('tch\u00fc\u00df'));
Output :
Get string fromnumericDECvalue
string(4) "ď"
string(2) "ď"
Get string fromnumeric HEX value
string(4) "ď"
string(2) "ď"
GetnumericvalueofcharacterasDECintint(50319)
int(271)
Getnumericvalueofcharacteras HEX string
string(4) "c48f"
string(3) "10f"
Encode / decode toDEC based HTML entities
string(15) "tchüß"
string(7) "tchüß"
Encode / decode to HEX based HTML entities
string(15) "tchüß"
string(7) "tchüß"
Use JSON encoding / decoding
string(15) "tch\u00fc\u00df"
string(7) "tchüß"
Solution 5:
You can also use mb_encode_numericentity
which is supported by PHP 4.0.6+ (link to PHP doc).
functionunicode2html($value) {
return mb_encode_numericentity($value, [
// start codepoint// | end codepoint// | | offset// | | | mask0x0000, 0x001F, 0x0000, 0xFFFF,
0x0021, 0x002C, 0x0000, 0xFFFF,
0x002E, 0x002F, 0x0000, 0xFFFF,
0x003C, 0x003C, 0x0000, 0xFFFF,
0x003E, 0x003E, 0x0000, 0xFFFF,
0x0060, 0x0060, 0x0000, 0xFFFF,
0x0080, 0xFFFF, 0x0000, 0xFFFF
], 'UTF-8', true);
}
In this way it is also possible to indicate which ranges of characters to convert into hexadecimal entities and which ones to preserve as characters.
Usage example:
$input = array(
'"Meno più, PIÙ o meno"',
'\'ÀÌÙÒLÈ PERCHÉ perché è sempre così non si sà\'',
'<script>alert("XSS");</script>',
'"`'
);
$output = array();
foreach ($inputas$str)
$output[] = unicode2html($str)
Result:
$output = array(
'"Meno più, PIÙ o meno"',
''ÀÌÙÒLÈ PERCHÉ perchéè sempre così non si sà'',
'<script>alert("XSS");</script>',
'"`'
);
Post a Comment for "Convert Unicode To Html Entities Hex"