Convert hex value to unicode character

I'm trying to convert the hex value 1f600, which is the smiley emoji to its character representation by:

String.fromCharCode(parseInt("1f600", 16));

but this just generates a square symbol.

Answers:

Answer

Most emojis require two code units, including that one. fromCharCode works in code units (JavaScript's "characters" are UTF-16 code units except invalid surrogate pairs are tolerated), not code points (actual Unicode characters).

In modern environments, you'd use String.fromCodePoint or just a Unicode codepoint escape sequence (\u{XXXXX} rather than \uXXXX, which is for code units). There's also no need for parseInt:

console.log(String.fromCodePoint(0x1f600));
console.log("\u{1f600}");

In older environments, you have to supply the surrogate pair, which in that case is 0xD83D 0xDE00:

console.log("\uD83D\uDE00");

...or use a polyfill for fromCodePoint.

If for some reason you don't want to use a polyfill in older environments, and your starting point is a code point, you have to figure out the code units. You can see how to do that in MDN's polyfill linked above, or here's how the Unicode UTF-16 FAQ says to do it:

Using the following type definitions

typedef unsigned int16 UTF16;
typedef unsigned int32 UTF32;

the first snippet calculates the high (or leading) surrogate from a character code C.

const UTF16 HI_SURROGATE_START = 0xD800
UTF16 X = (UTF16) C;
UTF32 U = (C >> 16) & ((1 << 5) - 1);
UTF16 W = (UTF16) U - 1;
UTF16 HiSurrogate = HI_SURROGATE_START | (W << 6) | X >> 10;

where X, U and W correspond to the labels used in Table 3-5 UTF-16 Bit Distribution. The next snippet does the same for the low surrogate.

const UTF16 LO_SURROGATE_START = 0xDC00
UTF16 X = (UTF16) C;
UTF16 LoSurrogate = (UTF16) (LO_SURROGATE_START | X & ((1 << 10) - 1));
Answer

JavaScript uses UTF-16, so instead of U+1F600 you need to get U+D83D U+DE00 - that is, String.fromCharCode(0xd83d, 0xde00)

Note that you can use 0x#### instead of parseInt("####",16).


To convert a UTF-8 position to its UTF-16 equivalent, here's the steps:

var input = 0x1f600;
var code = input - 0x10000;
var high = (code >> 10) + 0xD800;
var low = (code & 0x3FF) + 0xDC00;
var output = String.fromCharCode(high, low);
Answer

use fromCodepoint function instead of fromCharCode

String.fromCodePoint(0x1f600)

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.