View|Character Map

Select Character Map in the view menu to show a grid with all the characters available in a particular text encoding. Read the help topic for Convert|Text Encoding to learn more about text encodings. By default, the character map shows the characters supported by the encoding used by the active file. You can select a different encoding in the drop-down list on the character map’s toolbar. Selecting an encoding in the character map only affects the character map. It does not affect the file you’re editing.

If the selected encoding is an 8-bit encoding, the character map displays all characters in the encoding except non-printable control characters. The Windows, Mac, and DOS encodings, except those for Far East languages, are encodings with 224 characters and 32 control characters. The ISO-8859 and EBCDIC encodings are encodings with 192 characters and 64 control characters. Some character maps have holes in them indicated by crossed-out grid cells. These indicate positions in the encoding that do not define any character.

The various Unicode transformations define thousands of characters. So do the Windows, Mac, and EUC encodings for Korean, Japanese and Chinese. EditPad’s character map displays the characters in the order given to them by the Unicode standard, even for the Windows, Mac, and EUC encodings. To make it easier for you to find the character you want, you can filter the map to show only characters of a certain type. You can filter by Unicode category, Unicode block, and/or Unicode script. To enable a filter, simply select a choice from the drop-down list. To disable it, select (all categories), (all blocks), or (all scripts) at the top of the list. If you enable multiple filters, all of them apply at the same time. If you select the “decimal digits” category and the Thai block, the map will show Thai digits only (or nothing at all if the encoding doesn’t support Thai).

If certain characters appear as squares, that means the character cannot be displayed using the current font. You can select a different font with Options|Font. Particularly when showing all Unicode characters, you may see a lot of squares. While Unicode tries to define characters for all human languages and scripts, most fonts only support one script, or one group of closely related scripts. Microsoft supplies a font called “Arial Unicode” with recent versions of Windows. This font can display nearly all Unicode characters.

When you hover the mouse over a character in the map, its decimal and hexadecimal number in the encoding will be shown in the status bar. Double-click on a character to insert it into the file.

Because you can make the character map show any encoding, it is possible that the character map shows characters that cannot be represented by the encoding used by the active file. If you try to insert an unsupported character, it will show up as a question mark in the file, and EditPad will show a warning that the character could not be inserted. The question mark is not a placeholder but a permanent question mark. Press Backspace or use Edit|Undo to remove the question mark. To insert the proper character, change the file’s encoding first, and then insert the character again.

To change the file’s encoding, click the button next to the drop-down list with the encodings. A menu will pop up. The Text Encoding item at the bottom is identical to the Text Encoding item in the Convert menu. It pops up a window that allows you to change the encoding. If you have already selected the encoding you want in the drop-down list on the character map, you can use the Display File with Encoding item or the Convert File to Encoding item to change the file’s encoding without using the Convert|Text Encoding popup window. The Display File with Encoding item corresponds to the “interpret the data as being encoded with another character set” choice in the Text Encoding window. The Convert File to Encoding item corresponds to the “encode the original data with another character set” choice. Both choices are explained in detail in the Convert|Text Encoding help topic.

The left-hand side of the character map toolbar has several buttons to insert various representations of the character you selected in the character map. The leftmost button inserts the character itself. The next two buttons insert the character’s code page index in decimal and hexadecimal notation. These buttons are only available when the character map shows an 8-bit code page. They insert the character’s position in the code page being shown by the character map. This is not necessarily the character’s index in the code page being used by the file you’re editing.

The remaining three buttons insert the Unicode code point of the selected character in three different notations: a Unicode escape, a decimal numeric character reference, or a hexadecimal numeric character reference. They always insert the Unicode code point, regardless of the encoding used by the file or the encoding shown in the character map. For example, if you have the euro symbol selected in the character map, these buttons insert \u20AC, €, and € because the euro symbol occupies code point U+20AC in the Unicode standard.

The edit box at the right-hand side of the character map toolbar allows you to look up characters in the character map. To look up a character, type in the character itself or its representation, and press Enter. The “look up” box supports various notations. If you type in a single character, that character is selected in the character map. If you type in a Unicode escape such as \u20AC or a numeric character reference such as € or € then the character represented by that Unicode code point is selected in the character map. This works even when the character map is showing a non-Unicode encoding. You can also type in just a decimal number such as 169 or just a hexadecimal number such as A9. If you want 80 to be taken as a hexadecimal number, type in 0x80 or 80h. How the number is interpreted depends on whether the character map is displaying an 8-bit encoding or not. If the encoding is 8-bit, the drop-down lists with Unicode categories, blocks, and scripts will be invisible. Then the number is interpreted as an index to that 8-bit code page. In the Windows code pages, for example, 0x80 is the euro symbol. If the character map encoding is not 8-bit, meaning the three Unicode drop-down lists are visible, then the number is taken as a Unicode code point, even if the encoding is not Unicode. If you have the multi-byte Windows 932 code page selected, 20AC is interpreted as the euro symbol, while 0x80 is interpreted as the control character represented by code point U+0080, even though the euro symbol is represented by the single byte 0x80 in code page 932.

View|Character Map

See Also