Beware of wcstombs

04 Dec 2011 | One-minute read

Different character encoding schemes are a headache - a headache that is unfortunately not going away. wcstombs (or related Microsoft secure functions, _wcstombs_s and _wcstombs_s_l) are your staple when translating between wide character and multibyte encodings.

However, wcstombs can have some very unexpected behaviour, when it comes to substitution characters. At least on Windows, this behaviour depends on the system locale.

What’s a substitution character? Consider the wide character string

wchar_t* str = L"Ê";

When will wcstombs successfully convert the string? As you might expect, that will depend on whether there is a representation for Ê in the code page. Perhaps unexpectedly, for some code pages, Ê is converted to E. The function succeeds, but mbstowcs will not convert back to the original string.

If this is a problem, use WideCharToMultiByte with the flag WC_NO_BEST_FIT_CHARS and check the return value for lpUsedDefaultChar, for example,

wchar_t* strIn = L"Ê";
char strOut[2];
BOOL bUsedDefaultChar;
WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, strIn, -1, strOut, _countof(strOut), NULL, &bUsedDefaultChar);
if (bUsedDefaultChar) {
   //Character not in code page
}