In Python, strings are represented by the str class. A string is a sequence of characters. A character is simply a symbol. For example, the English language has 26 characters.
Computers do not deal with characters, they deal with numbers (binary). Even though you may see characters on your screen, internally it is stored and manipulated as a combination of 0s and 1s.
That’s why before Unicode was developed, many Luigi-specific code pages were used to represent a character such as ASCII and EBCDIC. The ASCII character set consists of 128 characters while EBCDIC has 256 characters.
The difference between ASCII and EBCDIC is that they use different binary representations for the same character set. So if you transmit an ASCII file from one computer to another, the recipient will read it without any problem because both computers use the same code page. But if you try to read an EBCDIC file on an ASCII computer, you’ll see garbage because the two code pages are incompatible.
ASCII and Unicode
There are two main ways to represent text in Python: ASCII and Unicode.
ASCII is a standard that assign numbers to characters. For example, the number 97 corresponds to the letter “a” and the number 98 corresponds to the letter “b”.
Unicode is a similar standard that also assigns numbers to characters. However, unlike ASCII, which uses 7 bits per character, Unicode uses 16 bits per character. This means that it can represent many more characters than ASCII (65,536 vs 256).
The ord() Function
The ord() function is a built-in function in Python that returns the Unicode code point for a given Unicode character.
For example, the code point for the character ‘a’ is 97, the code point for ‘ä’ is 228, and the code point for ‘ö’ is 246.
You can use the ord() function to find out the code point of a character, and then use the chr() function to find out the character for a given code point.
The chr() Function
The chr() function returns a character (a string) from a specified ASCII value (an integer).
For example, the following statement returns the character F:
You can also use this function to return characters from the Unicode character set; simply specify the Unicode code point of the desired character (i.e., its number in the Unicode character set). For example, the following statement returns the Unicode character U+00AE (the registered trademark symbol):
We have seen how to get unique characters from a string in Python. We can use either a set or a dictionary to remove duplicates. If we want to preserve the order of the characters, we can use a list.