Introduction to the Unicode Character Issue
Localization projects, as all experts know, require great attention to detail. The correct technical terms, suitable language usage, and adaptive translation are some crucial factors needed to produce a successful localization project.
However, no matter how accurately you have written or translated those texts, if users can’t read what you have written, all your efforts will be poured into the drain. This is also a complete nightmare for clients as they are missing 100% of the shots they have taken. All they want to do is to promote their services/products to their audiences, while all the other side could see is a bunch of question marks and empty boxes.
You will mostly run into these issues when dealing with languages that don’t use the Roman alphabet. When the first computers and, eventually, coding systems were developed, no one thought about the internet, globalization, and smartphones the way we are using them now. This was all Star Trek to many, and most programmers wanted to find the quickest way possible: “Let us deal with that issue in the future!”
So, basically, the US programmers did it their way, and the Japanese programmers, well, did theirs. Unfortunately, technology evolved so rapidly that they couldn’t keep up with the problem. Remember the Y2K problem in 1999? Pretty much the same.
Diversification of the Encoding Systems
The reason we mentioned Japanese here is that they were one of the most technically advanced countries back then, and Russia (using the Cyrillic script) was a close third. Modern day tech giant, China, was not yet on the horizon in the early days of programming.
But here’s where China and Korea come into play – forming the “CJK languages”. They are also the most important ones from an Asian language iGaming localization point of view. Those three languages all depend on Chinese Han characters for written Chinese (hanzi), Japanese (kanji), and Korean (hanja).
The only problem is that all of them use these characters differently, and the Unicode consortium tries to solve this since the end of the 1980s.
There’s a certain number of symbols that have been copied from one East Asian language to another. They’re technically the same symbol, so Unicode only has one slot for that symbol. Then there’s a second category where the symbol has been copied, but one group draws it a little differently from the other (the Japanese might like to put a little flick at the end of one line, or the Chinese draw the line a little slantier). And a third category where one group has developed a simplified symbol, which means again the traditional and the simplified symbols are the same thing but drawn differently. The two symbols are equivalent, the new one is just a new suggestion for how to draw it.
Are you confused? So is your computer!
Even a huge encyclopedia like Wikipedia runs into these issues from time to time.
That’s why it is absolutely crucial to prepare your online casino or sports betting website for this issue when you’re an iGaming operator. Have your programmers look into this and then get it proofread by a reliable iGaming translation agency like Translation Royale.
So, how do you fix it?
Before we jump into the problem and solutions, we need to explain to you how encoding systems work and how Unicode comes into play in this modern tech-savvy world. We might get a little technical here and there but bear with us.
Dark Days Before Unicode
As you might all know, computers only understand numbers (1,0), and the only way they can recognize texts is by assigning a specific set of numbers to each character. It is called the ‘Encoding system’, and Unicode is one of the systems that have been developed and used.
During the early days, traditional encoding systems could not support all the languages in the world. Even for one single language like Simplified Chinese, none of the systems could cover all the letters, punctuation, and other special characters.
To make things even more difficult, different encoding systems also clashed with each other by using the same number for different characters, or different numbers for the same character. This resulted in an increased risk of errors when data was transferred from one computer to another. It was chaotic for anyone dealing with more than one language at that time.
Finally, in October 1991, the first version of Unicode was released to eliminate all these issues by serving as a standardized system of character codes for every language, program, and platform in the world.
So, what is Unicode, and why is it important?
According to the Unicode Consortium, Unicode is defined as “the universal character encoding, maintained by the Unicode Consortium. This encoding standard provides the basis for processing, storage, and interchange of text data in any language in all modern software and information technology protocols.”
Thanks to this standardized format, we can now read all the variety of languages in the world on our devices, and programmers can develop content in their own native language without any hassle.
It is currently at version 12.1, and for those who are curious, you can check out the complete list of Unicode character chart, which has close to 150,000 codes and counting!
The Consequences of Mumble-Jumble Texts
Even though Unicode was created to wipe out the mismatched character display problems, there are still problems with viewing the correct language format.
Simply speaking, this character display issue occurs when there is a conflict between software and operating systems that use different character sets. By default, the operating systems use the Unicode character encoding system, whereas the programs which use a different character set are considered as non-Unicode.
In the translation and localization industry, this can happen when the source files are written in an encoding system, which is different from the recipient’s.
But the language experts are not the only ones who are affected by this. In today’s globalized world, where everyone is communicating in different languages across different devices, anyone can come across this kind of situation. Movie subtitles and social media platforms are the two most common places to spot such character display problems.
Here is a comparison of a Romanian movie subtitle before and after the characters are corrected.
As you can see from the picture, the Latin characters ‘ş,’ ‘ă’ and ‘ţ’ are wrongly substituted by ‘þ’ and ‘ã.’ In this case, you can at least try to figure out what the actual sentence is if you know the language.
Sometimes, you might even see weird symbols like empty rectangles and question marks (⍰, �) instead of correct text. These symbols are called ‘Mojibake,’ and they appear when the system cannot recognize the input characters and replace the invalid ones with question marks.
And this Unicode problem isn’t just limited only to languages that use Roman alphabets. It also happens to Hebrew, Arabic, Russian, Greek, and Asian languages such as Chinese, Japanese, and Korean.
This means that regardless of the languages we use and how we use them, we might run into this problem one day. So, how do we fix it?
How to Fix Unicode Display Problems on Windows
As much as this problem is annoying for users, the solution is fairly easy (even for non-techies!) If you are using the Windows operating system on your laptop, just simply follow the steps below.
1. Open ‘Control Panel’ either from the ‘Start’ menu or by typing it out in the search box.
2. Then select ‘Clock and Region’ from the settings and choose ‘Region’ again.
3. A pop-up box, which has two tabs ‘Formats’ and ‘Administrative,’ will then appear on your screen.
4. Click on the ‘Administrative’ tab and in the ‘Language for non-Unicode programs’ section, click on ‘Change system locale.’ You’ll notice that the current language used for non-Unicode programs/files is selected by default.
5. Click on the drop-down menu to view all the available languages.
6. Select a new language that you want to use and click ‘OK.’
7. The language setting will now be changed, and you will be prompted to restart your computer. Make sure you close all your apps and files and hit ‘Restart now.’
8. Your computer will then restart, and you will now be able to see texts in programs that do not support Unicode.
One important thing to note is that this new language setting will be applied to all non-Unicode programs. So, if you want to run a non-Unicode app that uses a different language/character set, you will have to change the setting to a new language accordingly/ the respective language again.
Step Summary: Start > Control panel > Clock and Region > Region > Administrative > Language for non-Unicode programs > Current language for non-Unicode programs > Change system locale > Change to the preferred language > Restart > Voilà!
How to Fix Unicode Display Problems on Mac
If you are a Mac user, this is one of the possible solutions for you:
- Go to ‘Spotlight Search’ and search for ‘Font Book.’ Font Book is a built-in font management utility software for macOS users.
- Go to ‘All Fonts’ and select the font of the preferred language.
- Right-click on that language and select ‘Enable the font (name)’ option.
If you are trying to view Unicode texts on a website, please refresh it again. All the non-Unicode texts will now be readable on your laptop.
Globalization is already a trend, we know that no one wants to fall behind it. Thanks to Unicode, not only can we communicate with people around the world at ease, but localization experts can also seamlessly work in several languages across any platform and device.
From a business point of view, if this kind of technical problem becomes the bottleneck of an international marketing project, it will unnecessarily hinder the workflow, and of course, delivering the optimal result. That is why it is important to know why such problem occurs and how we can quickly resolve the issue.
How did you like May Thawdar Oo’s blog post “The Culprit Behind the Unicode Character Display Problem and How to Solve It”? Let us know in the comments if you have anything to add, have another content idea for iGaming blog posts, or just want to say “hello.”