Sometimes when writing a program, it is necessary to include a piece of text that the program will use. The program might display the piece of text to the user, or store it in a file, or send it across a network, or some other action. The piece of text can be written directly into the source code by enclosing it in double quotation marks, for example:
"Hello, world!"
The quotation marks are not included in the actual/produced text. Text in double quotation marks is equivalent to a variable of the following type, a read-only reference to read-only constant-size text ("read-only" means capable of being read/used but not modified). "n" represents the size of the text (the size is related to the number of characters in the text).
ReadOnly[Ref[ReadOnly[ConstantSize[TText, n]]]]
Because the quoted text acts as a read-only text variable, the quoted text can be used with commands that accept text variables. For example, consider the "MText.Set" command. It sets the contents/value of a text variable to whatever text you want. It has 2 parameters. Both parameters are references to text variables.
The first parameter is the text variable to set, and the second parameter is some other text variable that the first text variable should be set to. In other words, the first parameter is the destination text variable, and the second parameter is the source text variable, and the source is copied into the destination. The type of the second parameter is:
Ref[ReadOnly[TText]]
And that type is compatible with the aforementioned type of quoted text. Therefore, it is possible to do the following:
var message, TText; var greeting, TText; MText.Set message, "Hello, world!"; MText.Set greeting, message;
Note how the quoted text is behaving like a read-only variable containing text.
In some other programming languages, the name "string" is used to mean text. In KL, the name "string" is NOT used. Using the name "string" for text does not make much sense. Call it what it is! Therefore in KL a variable containing text is called a text variable, not a string variable.
You might wonder, considering that a piece of text in the source code is required to be enclosed in quotation marks, what happens if you want a quotation mark to appear in the text? It is possible to do this, but the quotation mark in the text cannot be written directly (because a quotation mark would cause the compiler to end the quoted text sequence). Instead the quotation mark must be specified by name, not written directly.
In the following example, the compiler will replace each <quot> with a quotation mark character:
"Delete the file <quot>Bestiality.mpg<quot> permanently?"
Other characters can also be specified by name, such as the ampersand character (a character name listing is provided further ahead). The following 2 examples produce exactly the same text:
"Smith <amp> Wesson" "Smith & Wesson"
The character names are case-sensitive. For example, "<AMP>" is invalid (there is a character named "amp" but no character named "AMP"). Also note that some characters exist in 2 versions, an uppercase version, and a lowercase version, and both versions have the same character name except with different capitalization. For example, <agrave> produces the lowercase "a" with grave mark (à), while <Agrave> produces the uppercase "A" with grave mark (À).
If you use a character name that is unknown to the compiler, the compiler will display an error message telling you that it does not recognize the character name. The following examples are all invalid:
"Invalid <nachos> character" "Invalid <foo bar> character" "Empty char name <> test" "Empty char name < > test"
The following characters cannot appear literally/directly in a quoted text sequence, because they would make the source code confusing, ambiguous, unreliable, or disruptive to the compiler. If you want one of these characters to be included in the produced text, you must specify it by name not literally.
| Name | Number | Description |
|---|---|---|
| quot | 0x0022 | Double quotation mark. |
| lt | 0x003C | Less-than symbol. |
| gt | 0x003E | Greater-than symbol. |
| lcbra | 0x007B | Left curly bracket/brace. |
| rcbra | 0x007D | Right curly bracket/brace. |
| br | 0x000A | Line break. |
| crtn | 0x000D | Carriage return. |
| tab | 0x0009 | Horizontal tab. |
| 0x0060 | Grave accent. |
The apostrophe character (also known as single quotation mark) can appear literally in a quoted text sequence enclosed in double quotation marks, without having to specify it by name. For example:
"Sorry I'm late, I'll leave early to make up for it."
Quoted text cannot directly include a line break in the source code. For example, the following is invalid:
"First line Second line"
Instead the line break must be specified by name:
"First line<br>Second line"
Or using the concatenation feature (described ahead):
"First line<br>" "Second line"
If there are multiple quoted text sequences with nothing between them other than whitespace, they are concatenated together with NO extra characters inserted between (the whitespace is ignored). For example:
"test" "ing"
Is equivalent to:
"testing"
Another example:
"pine" "apple"
Is equivalent to:
"pineapple"
NOT to:
"pine<br>apple"
The main purpose of this concatenation feature is to allow long pieces of text to be written on multiple lines, without including those line breaks in the produced text, and without introducing ambiguity as to whether line breaks are to be included in the produced text. For example, a long paragraph can be written on multiple lines using this concatenation feature. This is useful to avoid a ridiculously long line when the source code is being displayed/edited in a text editor that is not using soft-wrapping.
Whitespace characters can be normal space, line break, or tab. The number of whitespace characters between the quoted text sequences can be zero or more.
The following is invalid, and is rejected by the compiler:
"test{comment}text"
Following is a valid way of having a comment inside quoted text, utilizing the aforementioned concatenation feature. 1 or more comments can appear between the quoted text sequences (with or without whitespace) without disrupting the concatenation.
"test"{comment}"text"
That is equivalent to:
"testtext"
Or if you wanted the curly brackets included in the produced text, then specify the curly bracket characters by name:
"test<lcbra>comment<rcbra>text"
In a quoted text sequence in source code, characters can be specified literally, by name, or by number. Only some characters can be specified by name, and only some literally, but all can be specified by number. The following 3 examples produce exactly the same text:
"Smith <amp> Wesson" {by name}
"Smith <#26> Wesson" {by number}
"Smith & Wesson" {literally}
To specify a character by number, it is the same as specifying a character by name except replace the name with a hash (#) followed by the Unicode number of the character in hexadecimal, with or without leading zeros. Only hexadecimal digits (0-9,A-F,a-f) are permitted after the hash (#). The number must have a minimum of 1 hexadecimal digit and a maximum of 8 hexadecimal digits (32 bits) including leading zero digits.
The following 2 examples produce exactly the same text, containing the Greek capital letter Omega:
"<Omega> watches are low-tech antiques." "<#03A9> watches are low-tech antiques."
Usually characters are specified by name in preference to number, but if the compiler does not have a name for the desired character, then it must be specified by number instead.
Note that <#A9> (for example) does NOT mean insert a byte (8 bits) containing the value 0xA9. Rather it means insert the Unicode character number 0xA9 (in this case, 0xA9 is the Copyright character). The quoted text sequence "<#A9>" actually produces text 2 bytes in size, because in both the Unicode UTF8 and UTF16 text encodings, the Unicode character number 0xA9 is encoded using 2 bytes.
When a character is specified by number, the compiler inserts the character regardless of whether any actual character/symbol is assigned to that number (defined for that number) in the Unicode standard, and regardless of whether any font contains graphics for that character number.
The "null" character (character number zero) can be inserted using <#0>. Text variables are NOT null-terminated, therefore the presence of a null character in a text variable has no special effect. However, the text containing a null character might disrupt a program written in a programming language that does use null-termination (such as C/C++), if the text is transmitted/conveyed to that program.
Consider the situation of writing a program where you want to embed pieces of text containing many characters that are required to be written by name, such as the "<" and ">" characters. For example, the HTML language uses "<" and ">" characters extensively. However literal/quoted text sequences in KL also use "<" and ">", so to include a snippet of HTML in KL source code requires writing those characters by name. This becomes rather messy if those characters must be used frequently.
For example, imagine you wanted to embed this piece of HTML text into the source code of your KL program:
<A HREF="docs/">Documentation</A>
To write it in KL source code in literal/quoted text in the normal manner, the less-than ("<"), greater-than (">"), and quotation mark characters must be specified by name:
"<lt>A HREF=<quot>docs/<quot><gt>Documentation<lt>/A<gt>"
That quickly becomes confusing/tedious/unreadable for a large amount of HTML. Fortunately KL has a feature known as configured literal text, allowing the above to be written in source code as:
~~"" ~"<A HREF="docs/">Documentation</A>"~
There are 2 parts to that. The first part begins with tilde + tilde + quote and ends with quote, and it sets the text configuration. It is empty in the above example. Here is a non-empty example of setting the text configuration:
~~"crtn br,lsbra,rsbra"
That is not making a text sequence. Rather it is setting the configuration for all following configured literal text sequences. It consists of 3 parameters separated by commas. The first parameter specifies the character(s) to use for line breaks. The second parameter specifies the character(s) to use to begin a special character sequence. The third parameter specifies the character(s) to use to end a special character sequence. The characters must be specified by name or hexadecimal number, with numbers prefixed with "#". Thus the above is setting the configuration to this:
- Replace line breaks with 2 characters, the carriage return and linefeed characters ("crtn br").
- Use left-square-bracket ("lsbra") to begin a special character sequence (for specifying a character by name or number).
- Use right-square-bracket ("rsbra") to end a special character sequence.
After setting the text configuration, you can write one or more text sequences that use the last configuration. To specify that a literal text sequence should use the last configuration, begin it with tilde + quote and end it with quote + tilde. For example:
~"<A HREF="docs/">DocuCaf[eacute]</A>"~
That source code (assuming use of the aforementioned example configuration) is equivalent to the following source code that does not use text configurations. The following should be written all on 1 line:
"<lt>A HREF=<quot>docs/<quot><gt>DocuCaf<eacute><lt>/A<gt>"
A configured literal text sequence beginning with tilde + quote can contain the following characters literally without having to specify them by name: Quotation mark (0x0022), less-than (0x003C), greater-than (0x003E), left curly bracket (0x007B), right curly bracket (0x007D), grave (0x0060). Except that if the quotation mark is to be immediately followed by a tilde (0x007E), then the quotation mark must be specified by name because a quotation mark + tilde is interpreted as the end of the text sequence as previously explained.
Like a normal literal text sequence, a configured literal text sequence beginning with tilde + quote can only contain line break and tab characters if they are specified by name. However, a special multiple-line sequence is also supported, and it does allow the use of line breaks and tabs without specifying them by name. A multiple-line sequence begins with tilde + dash + dash + dash + quote and ends with the reverse of that. For example:
~---" <HTML> <TITLE>Main Page</TITLE> <UL> <LI><A HREF="dl/">Downloads</A> <LI><A HREF="docs/">Documentation Caf[eacute]</A> <LI><A HREF="contact/">Contact</A> <LI><A HREF="info/">More Info</A> </UL> </HTML> "---~
The line breaks in the source code within the multiple-line text sequence will be replaced with whatever character or characters were specified in the last text configuration (first parameter). Line breaks specified by name or number are not replaced. Whatever type of line break characters are being used in the source code file (for example whether the source code file is a text file in Unix or DOS format) has absolutely no effect on the result or meaning of the program.
If the text within a multiple-line text sequence begins with a line break, that line break is always removed (if the text begins with multiple line breaks, only the first one is always removed). Likewise if the text ends with a line break, that line break is always removed (only the very last one). Line breaks specified by name or number are NOT removed. For example:
~~"br,lsbra,rsbra" ~---" a b c "---~
That source code is equivalent to this source code:
"a<br>b<br>c"
If a parameter in the text configuration is empty, it means no characters. The following 3 equivalent configurations disable use of special character sequences (the ability to specify characters by name or number):
~~"br,," ~~"br, , " ~~"br"
In the following example, all the line breaks in the text are removed except for line breaks specified by name or number:
~~",lsbra,rsbra" ~---" a b c [bull] [br] "---~
That source code is equivalent to this source code:
"abc<bull><br>"
The following 4 configurations are equivalent:
~~"crtn br,lsbra,rsbra" ~~"#0D #0A,lsbra,rsbra" ~~"#0D #0A, lsbra, rsbra" ~~"#0D #0A, #5B, #5D"
A text configuration remains in effect until the end of the file. If a text sequence using text configuration appears with no prior text configuration in the file, then the compiler produces an error message.
The text configuration is not set using a command because this processing occurs during tokenization, before commands are processed.
The concatenation feature described in a previous section can be used with configured literal text sequences beginning with tilde + quote, but NOT with multiple-line text sequences beginning with tilde + dashes + quote. Also, normal text sequences can be concatenated with configured text sequences and vice-versa.
Following is a complete listing of all characters that can appear literally/directly in a normal quoted text sequence in source code stored in a plain text file. Literally/directly meaning without specifying the character by name or number. All other Unicode characters (including latin alphabet characters with diacritical marks) must be specified by name or number when the source code is stored in a plain text file.
| Number | Ch. | Description |
|---|---|---|
| 0x0061 | a | Small letter a. |
| 0x0062 | b | Small letter b. |
| 0x0063 | c | Small letter c. |
| 0x0064 | d | Small letter d. |
| 0x0065 | e | Small letter e. |
| 0x0066 | f | Small letter f. |
| 0x0067 | g | Small letter g. |
| 0x0068 | h | Small letter h. |
| 0x0069 | i | Small letter i. |
| 0x006A | j | Small letter j. |
| 0x006B | k | Small letter k. |
| 0x006C | l | Small letter l. |
| 0x006D | m | Small letter m. |
| 0x006E | n | Small letter n. |
| 0x006F | o | Small letter o. |
| 0x0070 | p | Small letter p. |
| 0x0071 | q | Small letter q. |
| 0x0072 | r | Small letter r. |
| 0x0073 | s | Small letter s. |
| 0x0074 | t | Small letter t. |
| 0x0075 | u | Small letter u. |
| 0x0076 | v | Small letter v. |
| 0x0077 | w | Small letter w. |
| 0x0078 | x | Small letter x. |
| 0x0079 | y | Small letter y. |
| 0x007A | z | Small letter z. |
| 0x0041 | A | Capital letter A. |
| 0x0042 | B | Capital letter B. |
| 0x0043 | C | Capital letter C. |
| 0x0044 | D | Capital letter D. |
| 0x0045 | E | Capital letter E. |
| 0x0046 | F | Capital letter F. |
| 0x0047 | G | Capital letter G. |
| 0x0048 | H | Capital letter H. |
| 0x0049 | I | Capital letter I. |
| 0x004A | J | Capital letter J. |
| 0x004B | K | Capital letter K. |
| 0x004C | L | Capital letter L. |
| 0x004D | M | Capital letter M. |
| 0x004E | N | Capital letter N. |
| 0x004F | O | Capital letter O. |
| 0x0050 | P | Capital letter P. |
| 0x0051 | Q | Capital letter Q. |
| 0x0052 | R | Capital letter R. |
| 0x0053 | S | Capital letter S. |
| 0x0054 | T | Capital letter T. |
| 0x0055 | U | Capital letter U. |
| 0x0056 | V | Capital letter V. |
| 0x0057 | W | Capital letter W. |
| 0x0058 | X | Capital letter X. |
| 0x0059 | Y | Capital letter Y. |
| 0x005A | Z | Capital letter Z. |
| 0x0030 | 0 | Digit zero. |
| 0x0031 | 1 | Digit one. |
| 0x0032 | 2 | Digit two. |
| 0x0033 | 3 | Digit three. |
| 0x0034 | 4 | Digit four. |
| 0x0035 | 5 | Digit five. |
| 0x0036 | 6 | Digit six. |
| 0x0037 | 7 | Digit seven. |
| 0x0038 | 8 | Digit eight. |
| 0x0039 | 9 | Digit nine. |
| 0x0020 | Normal space. | |
| 0x002E | . | Dot/period. |
| 0x002C | , | Comma. |
| 0x003A | : | Colon. |
| 0x003B | ; | Semicolon. |
| 0x0021 | ! | Exclamation mark. |
| 0x003F | ? | Question Mark. |
| 0x0027 | ' | Apostrophe. |
| 0x0023 | # | Hash. |
| 0x0024 | $ | Dollar sign. |
| 0x0025 | % | Percent sign. |
| 0x0026 | & | Ampersand. |
| 0x0040 | @ | At mark. |
| 0x005E | ^ | Circumflex/caret. |
| 0x007E | ~ | Tilde. |
| 0x002A | * | Asterisk. |
| 0x002B | + | Plus Sign. |
| 0x002D | - | Minus sign or dash. |
| 0x003D | = | Equal. |
| 0x0028 | ( | Open round bracket. |
| 0x0029 | ) | Close round bracket. |
| 0x005B | [ | Open square bracket. |
| 0x005D | ] | Close square bracket. |
| 0x002F | / | Forward slash. |
| 0x005C | \ | Backward slash. |
| 0x007C | | | Vertical Line. |
| 0x005F | _ | Underscore. |
The following table lists all valid character names that can be used in literal/quoted text sequences in KL source code. The "Number" column is the Unicode number of the character, in hexadecimal. All the character names are adopted from the HTML4 standard, except for the characters marked with "Not HTML".
All the character numbers are adopted from the Unicode standard. Unicode allows computers to consistently represent and exchange text in any of the world's writing systems and languages. Unicode defines more than 100 000 different characters, providing a standardized digital representation for each. If you want to use a character not included in the following table, then you must specify it by number because there is no name in KL for it.
| Name | Number | Description |
|---|---|---|
| br | 0x000A | Line break. (Not exactly HTML.) |
| crtn | 0x000D | Carriage return. (Not HTML.) |
| tab | 0x0009 | Horizontal tab. (Not HTML.) |
| sp | 0x0020 | Normal space. (Not HTML.) |
| nbsp | 0x00A0 | Non-breaking space. |
| ensp | 0x2002 | En space. |
| emsp | 0x2003 | Em space. |
| thinsp | 0x2009 | Thin space. |
| lcbra | 0x007B | Left curly bracket/brace. (Not HTML. See also laquo, etc.) |
| rcbra | 0x007D | Right curly bracket/brace. (Not HTML.) |
| lsbra | 0x005B | Left square bracket/brace. (Not HTML.) |
| rsbra | 0x005D | Right square bracket/brace. (Not HTML.) |
| lrbra | 0x0028 | Left round bracket/brace / parenthesis. (Not HTML.) |
| rrbra | 0x0029 | Right round bracket/brace / parenthesis. (Not HTML.) |
| amp | 0x0026 | Ampersand symbol. |
| ndash | 0x2013 | En dash. |
| mdash | 0x2014 | Em dash. |
| shy | 0x00AD | Soft hyphen. |
| bull | 0x2022 | Bullet (solid small circle). |
| loz | 0x25CA | Lozenge. |
| dagger | 0x2020 | Dagger. |
| Dagger | 0x2021 | Double dagger. |
| permil | 0x2030 | Per-mille sign. |
| deg | 0x00B0 | Degree sign. |
| micro | 0x00B5 | Micro sign. |
| para | 0x00B6 | Paragraph sign. |
| spades | 0x2660 | Solid spade suit (cards). |
| clubs | 0x2663 | Solid club suit (cards). |
| hearts | 0x2665 | Solid heart suit (cards). |
| diams | 0x2666 | Solid diamond suit (cards). |
| cent | 0x00A2 | Cent sign. |
| pound | 0x00A3 | Pound sign. |
| curren | 0x00A4 | Currency sign. |
| yen | 0x00A5 | Yen sign. |
| euro | 0x20AC | Euro sign. |
| iexcl | 0x00A1 | Inverted exclamation mark. |
| iquest | 0x00BF | Inverted question mark. |
| middot | 0x00B7 | Middle dot. |
| brvbar | 0x00A6 | Broken vertical bar. |
| sect | 0x00A7 | Section sign. |
| uml | 0x00A8 | Umlaut/diaeresis. |
| acute | 0x00B4 | Acute accent. |
| cedil | 0x00B8 | Spacing cedilla. |
| macr | 0x00AF | Spacing macron. |
| circ | 0x02C6 | Modifier letter circumflex accent. |
| tilde | 0x02DC | Small tilde. |
| ordf | 0x00AA | Feminine ordinal indicator. |
| ordm | 0x00BA | Masculine ordinal indicator. |
| copy | 0x00A9 | Copyright sign. |
| trade | 0x2122 | Trademark sign. |
| reg | 0x00AE | Registered trade mark sign. |
| apos | 0x0027 | Apostrophe. |
| quot | 0x0022 | Double quotation mark. |
| laquo | 0x00AB | Left-pointing double angle quotation mark. |
| raquo | 0x00BB | Right-pointing double angle quotation mark. |
| lsquo | 0x2018 | Left single quotation mark. |
| rsquo | 0x2019 | Right single quotation mark. |
| sbquo | 0x201A | Single low-9 quotation mark. |
| bdquo | 0x201E | Double low-9 quotation mark. |
| ldquo | 0x201C | Left double quotation mark. |
| rdquo | 0x201D | Right double quotation mark. |
| lsaquo | 0x2039 | Single left-pointing angle quotation mark. |
| rsaquo | 0x203A | Single right-pointing angle quotation mark. |
| Agrave | 0x00C0 | Latin capital letter A with grave. |
| Aacute | 0x00C1 | Latin capital letter A with acute. |
| Acirc | 0x00C2 | Latin capital letter A with circumflex. |
| Atilde | 0x00C3 | Latin capital letter A with tilde. |
| Auml | 0x00C4 | Latin capital letter A with umlaut/diaeresis. |
| Aring | 0x00C5 | Latin capital letter A with ring above. |
| AElig | 0x00C6 | Latin capital ligature AE. |
| Ccedil | 0x00C7 | Latin capital letter C with cedilla. |
| Egrave | 0x00C8 | Latin capital letter E with grave. |
| Eacute | 0x00C9 | Latin capital letter E with acute. |
| Ecirc | 0x00CA | Latin capital letter E with circumflex. |
| Euml | 0x00CB | Latin capital letter E with diaeresis. |
| Igrave | 0x00CC | Latin capital letter I with grave. |
| Iacute | 0x00CD | Latin capital letter I with acute. |
| Icirc | 0x00CE | Latin capital letter I with circumflex. |
| Iuml | 0x00CF | Latin capital letter I with umlaut/diaeresis. |
| ETH | 0x00D0 | Latin capital letter ETH. |
| Ntilde | 0x00D1 | Latin capital letter N with tilde. |
| OElig | 0x0152 | Latin capital ligature OE. |
| Ograve | 0x00D2 | Latin capital letter O with grave. |
| Oacute | 0x00D3 | Latin capital letter O with acute. |
| Ocirc | 0x00D4 | Latin capital letter O with circumflex. |
| Otilde | 0x00D5 | Latin capital letter O with tilde. |
| Ouml | 0x00D6 | Latin capital letter O with diaeresis. |
| Oslash | 0x00D8 | Latin capital letter O with stroke. |
| Scaron | 0x0160 | Latin capital letter S with caron. |
| Ugrave | 0x00D9 | Latin capital letter U with grave. |
| Uacute | 0x00DA | Latin capital letter U with acute. |
| Ucirc | 0x00DB | Latin capital letter U with circumflex. |
| Uuml | 0x00DC | Latin capital letter U with diaeresis. |
| Yacute | 0x00DD | Latin capital letter Y with acute. |
| Yuml | 0x0178 | Latin capital letter Y with diaeresis. |
| THORN | 0x00DE | Latin capital letter THORN. |
| szlig | 0x00DF | Latin small letter sharp s. |
| agrave | 0x00E0 | Latin small letter a with grave. |
| aacute | 0x00E1 | Latin small letter a with acute. |
| acirc | 0x00E2 | Latin small letter a with circumflex. |
| atilde | 0x00E3 | Latin small letter a with tilde. |
| auml | 0x00E4 | Latin small letter a with diaeresis. |
| aring | 0x00E5 | Latin small letter a with ring above. |
| aelig | 0x00E6 | Latin small ligature ae. |
| ccedil | 0x00E7 | Latin small letter c with cedilla. |
| egrave | 0x00E8 | Latin small letter e with grave. |
| eacute | 0x00E9 | Latin small letter e with acute. |
| ecirc | 0x00EA | Latin small letter e with circumflex. |
| euml | 0x00EB | Latin small letter e with diaeresis. |
| igrave | 0x00EC | Latin small letter i with grave. |
| iacute | 0x00ED | Latin small letter i with acute. |
| icirc | 0x00EE | Latin small letter i with circumflex. |
| iuml | 0x00EF | Latin small letter i with diaeresis. |
| eth | 0x00F0 | Latin small letter eth. |
| ntilde | 0x00F1 | Latin small letter n with tilde. |
| oelig | 0x0153 | Latin small ligature oe. |
| ograve | 0x00F2 | Latin small letter o with grave. |
| oacute | 0x00F3 | Latin small letter o with acute. |
| ocirc | 0x00F4 | Latin small letter o with circumflex. |
| otilde | 0x00F5 | Latin small letter o with tilde. |
| ouml | 0x00F6 | Latin small letter o with diaeresis. |
| oslash | 0x00F8 | Latin small letter o with stroke. |
| scaron | 0x0161 | Latin small letter s with caron. |
| ugrave | 0x00F9 | Latin small letter u with grave. |
| uacute | 0x00FA | Latin small letter u with acute. |
| ucirc | 0x00FB | Latin small letter u with circumflex. |
| uuml | 0x00FC | Latin small letter u with diaeresis. |
| yacute | 0x00FD | Latin small letter y with acute. |
| yuml | 0x00FF | Latin small letter y with diaeresis. |
| thorn | 0x00FE | Latin small letter thorn. |
| fnof | 0x0192 | Latin small f with hook = function. |
| Alpha | 0x0391 | Greek capital letter alpha. |
| Beta | 0x0392 | Greek capital letter beta. |
| Gamma | 0x0393 | Greek capital letter gamma. |
| Delta | 0x0394 | Greek capital letter delta. |
| Epsilon | 0x0395 | Greek capital letter epsilon. |
| Zeta | 0x0396 | Greek capital letter zeta. |
| Eta | 0x0397 | Greek capital letter eta. |
| Theta | 0x0398 | Greek capital letter theta. |
| Iota | 0x0399 | Greek capital letter iota. |
| Kappa | 0x039A | Greek capital letter kappa. |
| Lambda | 0x039B | Greek capital letter lambda. |
| Mu | 0x039C | Greek capital letter mu. |
| Nu | 0x039D | Greek capital letter nu. |
| Xi | 0x039E | Greek capital letter xi. |
| Omicron | 0x039F | Greek capital letter omicron. |
| Pi | 0x03A0 | Greek capital letter pi. |
| Rho | 0x03A1 | Greek capital letter rho. |
| Sigma | 0x03A3 | Greek capital letter sigma. |
| Tau | 0x03A4 | Greek capital letter tau. |
| Upsilon | 0x03A5 | Greek capital letter upsilon. |
| Phi | 0x03A6 | Greek capital letter phi. |
| Chi | 0x03A7 | Greek capital letter chi. |
| Psi | 0x03A8 | Greek capital letter psi. |
| Omega | 0x03A9 | Greek capital letter omega. |
| alpha | 0x03B1 | Greek small letter alpha. |
| beta | 0x03B2 | Greek small letter beta. |
| gamma | 0x03B3 | Greek small letter gamma. |
| delta | 0x03B4 | Greek small letter delta. |
| epsilon | 0x03B5 | Greek small letter epsilon. |
| zeta | 0x03B6 | Greek small letter zeta. |
| eta | 0x03B7 | Greek small letter eta. |
| theta | 0x03B8 | Greek small letter theta. |
| iota | 0x03B9 | Greek small letter iota. |
| kappa | 0x03BA | Greek small letter kappa. |
| lambda | 0x03BB | Greek small letter lambda. |
| mu | 0x03BC | Greek small letter mu. |
| nu | 0x03BD | Greek small letter nu. |
| xi | 0x03BE | Greek small letter xi. |
| omicron | 0x03BF | Greek small letter omicron. |
| pi | 0x03C0 | Greek small letter pi. |
| rho | 0x03C1 | Greek small letter rho. |
| sigmaf | 0x03C2 | Greek small letter final sigma. |
| sigma | 0x03C3 | Greek small letter sigma. |
| tau | 0x03C4 | Greek small letter tau. |
| upsilon | 0x03C5 | Greek small letter upsilon. |
| phi | 0x03C6 | Greek small letter phi. |
| chi | 0x03C7 | Greek small letter chi. |
| psi | 0x03C8 | Greek small letter psi. |
| omega | 0x03C9 | Greek small letter omega. |
| thetasym | 0x03D1 | Greek small letter theta symbol. |
| upsih | 0x03D2 | Greek upsilon with hook symbol. |
| piv | 0x03D6 | Greek pi symbol. |
| hellip | 0x2026 | Horizontal ellipsis = three dot leader. |
| prime | 0x2032 | Prime = minutes = feet. |
| Prime | 0x2033 | Double prime = seconds = inches. |
| oline | 0x203E | Overline = spacing overscore. |
| weierp | 0x2118 | Script capital P = power set. |
| image | 0x2111 | Blackletter capital I = imaginary part. |
| real | 0x211C | Blackletter capital R = real part symbol. |
| alefsym | 0x2135 | Alef symbol = first transfinite cardinal. |
| larr | 0x2190 | Leftwards arrow. |
| uarr | 0x2191 | Upwards arrow. |
| rarr | 0x2192 | Rightwards arrow. |
| darr | 0x2193 | Downwards arrow. |
| harr | 0x2194 | Left right arrow. |
| crarr | 0x21B5 | Downwards arrow with corner leftwards. |
| lArr | 0x21D0 | Leftwards double arrow. |
| uArr | 0x21D1 | Upwards double arrow. |
| rArr | 0x21D2 | Rightwards double arrow. |
| dArr | 0x21D3 | Downwards double arrow. |
| hArr | 0x21D4 | Left right double arrow. |
| lt | 0x003C | Less-than symbol. |
| gt | 0x003E | Greater-than symbol. |
| le | 0x2264 | Less-than-or-equal-to symbol. |
| ge | 0x2265 | Greater-than-or-equal-to symbol. |
| ne | 0x2260 | Not-equal-to symbol. |
| times | 0x00D7 | Multiplication sign. |
| divide | 0x00F7 | Division sign. |
| plusmn | 0x00B1 | Plus-or-minus sign. |
| not | 0x00AC | Not sign. |
| sup1 | 0x00B9 | Superscript digit one. |
| sup2 | 0x00B2 | Superscript digit two. |
| sup3 | 0x00B3 | Superscript digit three. |
| forall | 0x2200 | For all. |
| part | 0x2202 | Partial differential. |
| exist | 0x2203 | There exists. |
| empty | 0x2205 | Empty set = null set = diameter. |
| nabla | 0x2207 | Nabla = backward difference. |
| isin | 0x2208 | Element of. |
| notin | 0x2209 | Not an element of. |
| ni | 0x220B | Contains as member. |
| prod | 0x220F | N-ary product = product sign. |
| sum | 0x2211 | N-ary sumation. |
| minus | 0x2212 | Minus sign. |
| lowast | 0x2217 | Asterisk operator. |
| radic | 0x221A | Square root = radical sign. |
| prop | 0x221D | Proportional to. |
| infin | 0x221E | Infinity. |
| ang | 0x2220 | Angle. |
| and | 0x2227 | Logical and = wedge. |
| or | 0x2228 | Logical or = vee. |
| cap | 0x2229 | Intersection = cap. |
| cup | 0x222A | Union = cup. |
| int | 0x222B | Integral. |
| there4 | 0x2234 | Therefore. |
| sim | 0x223C | Tilde operator = varies with = similar to. |
| cong | 0x2245 | Approximately equal to. |
| asymp | 0x2248 | Almost equal to = asymptotic to. |
| equiv | 0x2261 | Identical to. |
| sub | 0x2282 | Subset of. |
| sup | 0x2283 | Superset of. |
| nsub | 0x2284 | Not a subset of. |
| sube | 0x2286 | Subset of or equal to. |
| supe | 0x2287 | Superset of or equal to. |
| oplus | 0x2295 | Circled plus = direct sum. |
| otimes | 0x2297 | Circled times = vector product. |
| perp | 0x22A5 | Up tack = orthogonal to = perpendicular. |
| sdot | 0x22C5 | Dot operator. |
| lceil | 0x2308 | Left ceiling = apl upstile. |
| rceil | 0x2309 | Right ceiling. |
| lfloor | 0x230A | Left floor = apl downstile. |
| rfloor | 0x230B | Right floor. |
| lang | 0x2329 | Left-pointing angle bracket = bra. |
| rang | 0x232A | Right-pointing angle bracket = ket. |
| frac14 | 0x00BC | Fraction one quarter. |
| frac12 | 0x00BD | Fraction one half. |
| frac34 | 0x00BE | Fraction three quarters. |
| frasl | 0x2044 | Fraction slash. |
| zwnj | 0x200C | Zero width non-joiner. |
| zwj | 0x200D | Zero width joiner. |
| lrm | 0x200E | Left-to-right mark. |
| rlm | 0x200F | Right-to-left mark. |
| orepl | 0xFFFC | Object Replacement Character. (Not HTML.) |
| repl | 0xFFFD | Replacement Character. (Not HTML.) |