I’ve got the requirement to replace replace of special characters by normal characters. For instance, ö becomes o; á becomes a, č becomes c, etc.
The background is, that we have customers from all over Europe and they enter their addresses. We have to forward these addresses to a service provider which is unfortunately not able to handle special characters.
The obvious Idea is to have a mapping list where all replacements are defined and then iterate over this list in order to replace all the special characters in the address.
This will work of course, but I don’t like it. First i have to manage all possible values (probably 100 or even more) in a list, seconds I have to iterate over this list with the different address parts like name, street, city, etc.
Is there not a smarter approach ? Maybe by transforming and using an encoding that does not support such special characters, i.e. ASCII as encoding, could this work ? Or are there any other suggestions ?
Use a hash-table. I don’t know any other approach. See also the discussions and links in http://stackoverflow.com/questions/17215431/unicode-to-ascii-standardized-transcription
In deed, approach with encoding to ASCII does not work, the diacritic characters are going lost.
I’ll keep you updated in case i have really to implement the requirement 😉
ICU can do a transformation from Latin to ASCII http://unicode.org/repos/cldr/trunk/common/transforms/Latin-ASCII.xml
If this ICU transformation is the correct solution for you Andi and you really have to implement it then we can enhance the Bridge with a string method to do a transformation implemented with ICU.
Here http://userguide.icu-project.org/transforms/general you find more information about ICU transformations.
I’m not sure. I understand from the xml you refer that it defines rules how to transform Latin to ASCII. But i could not find a such a rule for simple diacritics like ä, é, etc.
But when you can implement functionality as in http://demo.icu-project.org/icu-bin/translit, this would solve my problem. When inserting sample ‘Accents’, use Latin in source1 and ASCII in target1, then i get the result in need to have:
“La mort d’Olivier Bécaille” — Émile Zola;
“Das Vermächtnis des alten Pilgers” von Rainer M. Schröder (Österreich, ÖVP);
“Smyčcový koncert As dur” — Antonín Dvořák.
objektiv på 32 mm, og 8x forstørrelse.
“La mort d’Olivier Becaille” — Emile Zola;
“Das Vermachtnis des alten Pilgers” von Rainer M. Schroder (Osterreich, OVP);
“Smyccovy koncert As dur” — Antonin Dvorak.
objektiv pa 32 mm, og 8x forstorrelse.
Can you implement this functionality ?
yes I can implement the same functionality. In the document http://unicode.org/repos/cldr/trunk/common/transforms/Latin-ASCII.xml I found this comment: “Here we remove accents from Latin characters.”
i have checked it again with my customer, and yes, this is exactly what would solve the problem. So, I’m looking forward for this new feature in E2E Bridge ! 🙂