unidecode
ASCII transliterations of Unicode text (Python 3 module)
Install
- All systems
-
curl cmd.cat/unidecode.sh
- Debian
-
apt-get install python3-unidecode
- Ubuntu
-
apt-get install python3-unidecode
- Arch Linux
-
pacman -S python3-unidecode
- Kali Linux
-
apt-get install python3-unidecode
- Fedora
-
dnf install python3-unidecode
- Windows (WSL2)
-
sudo apt-get update
sudo apt-get install python3-unidecode
- Raspbian
-
apt-get install python3-unidecode
- Dockerfile
- dockerfile.run/unidecode
python3-unidecode
ASCII transliterations of Unicode text (Python 3 module)
It often happens that you have text data in Unicode, but you need to represent it in ASCII for display. One could represent non-roman Unicode characters as "???" or "\\15BA\\15A0\\1610", but neither is useful to the user reading the text. Unidecode tries to represent it in ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F), where the compromises taken when mapping between two character sets are chosen to be near what a human with a US keyboard would choose. This module generally produces better results than simply stripping accents from characters (which can be done in Python with built-in functions). It is based on hand-tuned character mappings that for example also contain ASCII approximations for symbols and non-Latin alphabets. unidecode is a Python 3 port of the Text::Unidecode Perl module.
python-unidecode
ASCII transliterations of Unicode text (Python module)
It often happens that you have text data in Unicode, but you need to represent it in ASCII for display. One could represent non-roman Unicode characters as "???" or "\\15BA\\15A0\\1610", but neither is useful to the user reading the text. Unidecode tries to represent it in ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F), where the compromises taken when mapping between two character sets are chosen to be near what a human with a US keyboard would choose. This module generally produces better results than simply stripping accents from characters (which can be done in Python with built-in functions). It is based on hand-tuned character mappings that for example also contain ASCII approximations for symbols and non-Latin alphabets. unidecode is a Python port of the Text::Unidecode Perl module.