The Haxial Programming Language
Variables

A "variable" in a program is like a named slot or container for storing something such as a number or a piece of text. The "value" of a variable is the number stored inside it. It is named "variable" because it is variable/changing meaning its value changes (or can change) during the execution/running of the program, as opposed to a "constant" that always has the same value during the entire execution of the program.

Variables can be defined within a Command Implementation using the "Variable" (shortcut "var") command. For example:

Variable chippies, TUInt, 123;

Or using shortcuts:

var chippies, uint, 123;

That command defines a variable named "chippies" of type "TUInt" with an initial value of 123. The name cannot contain spaces or dots or symbols. It can only contain the characters "A" to "Z" (in uppercase or lowercase), the numeric digits "0" to "9", and the underscore character ("_").

To help avoid naming conflicts, the variable name cannot begin with an uppercase letter (A-Z) or a numeric digit (0-9). However the name can contain uppercase letters and numeric digits AFTER the first character. The first character can be a lowercase letter (a-z) or underscore. Words that begin with a numeric digit are interpreted as numbers not names.

Variable names are case-sensitive, for example "foobar" and "fooBar" are considered to be different names, thus it is possible to have 2 variables one named "foobar" and the other named "fooBar". The minimum size of a name is 1 character. The maximum size is not defined exactly but is within the range 63 to 255 characters (inclusive).

Some examples of valid variable names:

foobar
fooBar
foo_bar
fooBAR3
love2eat

Some examples of invalid variable names:

Foobar
FooBar
FOOBAR
foo bar
foo.bar
foo-bar
2Hot4U
porn*

The second parameter of the "Variable" command defines the type of the variable. Most commonly the variable is a type of integer. An integer is a whole number, thus an integer variable is incapable of storing a fractional portion or any digits after the radix point (decimal point or binary point). For example, an integer can be 18 but not 18.4. If you perform division with integers, the fractional part of the answer is simply lost (if the answer is 18.4, it becomes 18).

Other variables types include floating-point (capable of storing a fractional portion), text, array, a reference to some target, etc. The second parameter of the "var" command (the type) is actually a variable/attribute type expression. For more information, see the Variable/Attribute Types section of this documentation.

A defined variable is accessible/usable from the point where it is defined (from the "Variable" command) to the end of the Command Implementation, regardless of whether it is defined inside an "If" section (or inside a loop). It is not permitted to have 2 variables with the same name defined in the same Command Implementation, regardless of whether they are defined inside different "If" sections.

Setting a Variable

To set a variable to a value (meaning to change the value in a variable to a different value), use the "Set" (or "set") command. The first parameter of the "Set" command is the variable to change, and the second parameter is the value to change/set the variable to.

A variable can also be set to the value of another variable. The value/content of the other variable is copied/duplicated to the variable being set. The variables do not become linked to each other.

The "Set" command can also be used to set a variable to the result of an expression (an arithmetic calculation). For more information about expressions, see the Expressions section of the documentation.

Here are some examples of using the "Set" command:

{Set to literal constant value.}
Set chippies, 666;

{Set to value of another variable.}
Set chippies, cheese;

{Set to result of calculation.}
Set chippies, a + (b / c) * 3;

The "Set" command has an optional shortcut syntax using the "=" symbol. The following 2 lines of source code are valid and equivalent:

Set n, 123;
n = 123;

The operand on the right side of the "=" can be an expression (same as the normal "Set" command). The operand on the left side of the "=" cannot be an expression, it can only be a name (a multi-part name containing dots is also accepted). The compiler distinguishes between the normal command syntax and the "=" shortcut syntax simply by checking whether the second syntax element is a "=" symbol operator.

The shortcut symbol "=#" can be used to set the reference or pointer contained in a variable that is of a reference or pointer type (as opposed to setting the target of the reference/pointer). If the operand on the left side is a reference variable, then it is equivalent to a "SetRef" command. If the operand on the left side is a pointer variable, then it is equivalent to a "SetPtr" command. The last 2 lines of the following example are equivalent:

Variable x, RefN[TUInt32];
Variable y, RefN[TUInt32];
SetRef x, y;
x #= y;

Another optional shortcut syntax combines addition and assignment (setting) of the same variable. The following 3 lines of source code are valid and equivalent:

Set n, n + 3;
n = n + 3;
n += 3;

The shortcut also exists for subtraction, multiplication, and some other operations. Following is a complete list of the shortcut symbol operators combining assignment with another operation:

DescriptionSymbol
Add+=
Subtract-=
Multiply*=
Divide/=
Division Remainder%=
Bit AND&=
Bit XOR^=
Bit OR|=
Bit Shift Left/Up<<=
Bit Shift Right/Down>>=

Shortcuts for incrementing a variable (adding 1) and decrementing a variable (subtracting 1) also exist. The "Increment" (or "incr") command has 1 parameter, the variable to add 1 to. The "Decrement" (or "decr") command also has 1 parameter, the variable to subtract 1 from. An even shorter shortcut uses "++" and "--" symbol operators written after a variable name. The following 6 lines of source code are valid and equivalent:

Set n, n + 1;
Increment n;
incr n;
n = n + 1;
n += 1;
n++;

And likewise for decrementing:

Set n, n - 1;
Decrement n;
decr n;
n = n - 1;
n -= 1;
n--;

No space is required between the "++" and ";". The semicolon (";") is considered to be a non-combining symbol character (like brackets), and thus "n++;" is considered to be 3 syntax elements ("n", "++", ";"). The compiler distinguishes between the normal command syntax and the "++" (or "--") shortcut syntax simply by checking whether the second syntax element is a "++" symbol operator.

The symbol operators "++", "--", "=", "+=", etc represent "Set" commands, and are not considered to be functions, and are thus not valid within expressions (and would be confusing or ambiguous if used within an expression). For example, the following 5 commands are invalid:

Set n, p * q++;   {INVALID}
Set i, i++;       {INVALID}
i = i++ * 2;      {INVALID}
Set k, n *= 3;    {INVALID}
Set j, n = m / 2; {INVALID}
Signed versus Unsigned

The integer variable types are classified as signed or unsigned. An unsigned integer variable is incapable of storing negative numbers (it can only store positive or unsigned numbers). A signed integer variable is capable of storing both negative and positive numbers.

To be precise, unsigned is not called "positive only" because you could actually use it for negative numbers, if you simply interpreted its entire range of values to mean negative numbers instead of positive. A signed integer variable is capable of storing both negative and positive numbers, whereas an unsigned is capable of only one or the other.

Because unsigned does not need to remember whether the integer is positive or negative, it can store numbers twice as large as a signed integer using the same amount of storage space.

Use unsigned unless you specifically need the capability of negative numbers because unsigned is simpler, slightly faster (frex in detection of overflow), tends to be more secure and reliable, and can store bigger numbers.

Security example: A security check might require verifying that an incoming integer is less than a certain maximum permitted value (such as less than the size of a buffer). This could be implemented with a simple "if a < b" statement/command. However, if the integer is signed, then it may pass the check by being negative, yet a negative value was unexpected and causes problems. Whereas if the integer was unsigned, negative numbers are impossible.

Integer Types

Following is a table of integer variable types. The first column contains the official name of the type, and the second column contains a shortcut name that can also be used. The Size column indicates the size in bits, and in brackets the size in bytes (1 byte = 8 bits).

NameShortcutSizeMinimum ValueMaximum Value
TUInt8uint88 (1)0255
TSInt8sint88 (1)-128+127
TUInt16uint1616 (2)065535
TSInt16sint1616 (2)-32768+32767
TUInt32uint3232 (4)04294967295
TSInt32sint3232 (4)-2147483648+2147483647
TUInt64uint6464 (8)018446744073709551615
TSInt64sint6464 (8)-9223372036854775808+9223372036854775807
TUIntuint32 (4)04294967295
TSIntsint32 (4)-2147483648+2147483647
TBoolbool101
TBool8bool88 (1)0255
TChar8char88 (1)0255
TChar16char1616 (2)065535
TChar32char3232 (4)04294967295
TCharchar32 (4)04294967295

"TUInt8" is an 8-bit unsigned integer, and is the smallest addressable unit of memory. "TUInt" is the same as "TUInt32", and "TSInt" is the same as "TSInt32".

"TChar" and "TChar32" are the same as "TUInt32", but are used when the integer is a Unicode character number (for text). "TChar16" is the same as "TUInt16", but is used when the integer is a Unicode UTF16 value. "TChar8" is the same as "TUInt8", but is used when the integer is a Unicode UTF8 value (a byte in UTF8 text).

"TUInt" ("TUInt32") is the most commonly used variable type. Do not use 64-bit unless you actually need to be able to store numbers bigger than a 32-bit type can store, because 64-bit requires extra memory and processing compared to 32-bit.

The minimum and maximum values may seem like strange arbitrary numbers, but that is only because the computer is based on binary whereas the above table gives the minimum and maximum in decimal. For example, the maximum value of TUInt16 is 65535 in decimal or 1111111111111111 in binary, and that is equal to (2**16)-1 (meaning 2 to the power of 16, then minus 1).

Silent Upgrading of Size

The compiler is permitted to silently upgrade a local variable or parameter in a Command Implementation to a bigger size (frex TUInt16 upgraded to TUInt32), if it decides that the program will execute faster with the bigger size. Thus the specified size is actually a minimum required size, not an exact size.

For example, if you define a local variable to be TUInt8, the compiler will disallow the variable to be set to a constant value outside the range of TUInt8 (0-255), but at runtime the variable might actually be a TUInt32 variable with 32-bit arithmetic being performed (and with 32-bit overflow checking), and it might contain a number bigger than 255.

You should not assume that it is impossible for a TUInt8 variable to contain a number greater than 255. Also, you should not rely on an ability of a TUInt8 variable to store numbers greater than 255.

This upgrading only applies to local variables and parameters in a Command Implementation. It does not apply to records. A TUInt8 attribute in a record should NOT be silently upgraded to a bigger size. The amount of extra memory used by upgrading a local variable is insignificant, but in the case of records it may be significant and is thus not done.

The main reason for the possible upgrading of size is that some CPU architectures do not have any 8-bit or 16-bit registers for performing calculations. For example, the PowerPC architecture has only 32-bit or 64-bit registers.

Booleans

A boolean variable is an integer variable intended to represent 1 of 2 possible values: true or false (or yes and no, or on and off).

A single binary bit in a computer is capable of storing 1 of 2 possible values (0 or 1) and is thus ideal for boolean variables. However in practise, it is not always efficient or practical for a computer to access a single bit at a time, so a boolean variable may actually be implemented as an 8-bit or 32-bit variable. In these cases, zero means false and non-zero (any other number) means true, with 1 being the normal representation of true.

Thus the "TBool" variable type may be capable of storing numbers >1 but only 0 and 1 are guaranteed. A program should interpret a "TBool" variable as having only 2 states even if in practise other numbers are possible.

The size of the "TBool" type is not precisely defined. If there is a situation where you need the size to be precisely defined, then you can use the "TBool8" type that is guaranteed to be 8 bits in size when used as a record attribute or for a numeric array. There is no "TBool1" type because of the excessive difficulty of supporting it and the lack of advantage gained by supporting it.

When testing whether a boolean value is true, do not check whether it is equal to 1. Instead check whether it is not zero. Remember zero means false, and non-zero means true.

Initial Value

The third parameter of the "var" command is the initial value of the variable. The variable will be pre-initialized to the specified constant value. Pre-initialization of all variables is required because it improves reliability and performance (by a small amount).

If variables can be defined as uninitialized (no initial value specified), then the initial value of the variable is a semi-random garbage value that may or may not change each time the program is run or the command invoked. This introduces the possibility of the programmer making the mistake of using/reading the value in a variable before it is set to something, resulting in a garbage value being fed into the program, typically resulting in malfunction. Whereas having all variables pre-initialized eliminates the possibility of this programming mistake. In the C++ language, variables are uninitialized by default, and may optionally be initialized.

Note that when the compiler encounters a "var" command, it does NOT generate an instruction for that command. The "var" command provides information for the compiler; it does not exist as any instruction in the final executable program file.

The variable is NOT set to the initial value at the point in the program where the "var" command appears. Rather all the variables are pre-initialized in a single step when the Command Implementation (containing the variable) is invoked.

Pre-initialization of variables occurs every time the Command Implementation is invoked, not only the first time, and not when the program is loaded for execution. Thus the variables are reset every time the Command Implementation is invoked (or created and destroyed every time).

The initial value must be a constant (a fixed/unchanging value that can be determined at compile-time). It cannot be another variable, and cannot be an expression that can only be calculated at runtime (an expression that can be calculated at compile-time is acceptable).

If the third parameter of the "var" command (the initial value) is omitted, then the initial value will be zero or empty or null (depending on the type of variable). If the possible range of values for the variable includes both zero and null, then null (in preference to zero) is used when the initial value parameter is omitted.

Shortcuts

**** TO DO: Support a shortcut that allows a "var" command to be combined with the invocation of another command that has an output parameter. For example if "TestCommand" has 2 parameters, the first an input integer, and the second an output integer, then this:

TestCommand 123, @pizzaNum;

Would be the same as writing:

var pizzaNum, uint, 0;
TestCommand 123, pizzaNum;

Where the variable type is obtained from the type of the output parameter, and the initial value is always zero or null.

The "@" shortcut can only be used with output parameters, not input parameters and not input/output parameters.

**** TO DO: "set" and "Create" commands have output parameters, and thus the "@" shortcut can be used with them. Give examples.

**** TO DO: Consider using "#" or "+" or "Var[pizzaNum]" instead of "@", because "@" might be better used to mean "address of".

Floating-Point Types

Floating-point variables are capable of storing positive and negative numbers with a fractional portion (for example, "152.9734"). The floating-point variable types are defined by the IEEE-754 standard. There is no unsigned floating-point type.

Following is a table of floating-point variable types. The first column contains the official name of the type, and the second column contains a shortcut name that can also be used. The "Rx" column contains the radix/base of the type (binary or decimal). "Size" is the size in bits (significand + exponent + sign). "Exp Min" is minimum exponent value. "Exp Max" is the maximum exponent value. "Acc." is the accuracy, given as the number of decimal digits (approximate for radix 2 types).

NameShortcutRxSizeExp MinExp MaxAcc.
TFloat16float16210+5+1=16-14+153
TFloat32float32223+8+1=32-126+1277
TFloat64float64252+11+1=64-1022+102316
TFloat128float1282112+15+1=128-16382+1638334
TDecFloat32dfloat321020+11+1=32-95+967
TDecFloat64dfloat641050+13+1=64-383+38416
TDecFloat128dfloat12810110+17+1=128-6143+614434

The name "floating-point" refers to the fact that the radix point (or decimal point or binary point) can be placed anywhere within the digits of the number. The position of the radix point is indicated separately in the internal representation of a floating-point number. Floating-point numbers can be thought of as a computer equivalent of "scientific notation" (a × 10b). The advantage of floating-point over fixed-point (and integer) variables is that it supports a much wider range of values. The disadvantage is that it has higher processing requirements.

"Scientific notation" is based on decimal (a × 10b). IEEE-754 floating-point for computers is usually based on binary (a × 2b) but decimal types may also be supported.

Some numbers cannot be represented exactly in binary floating-point regardless of the size/precision of the type. For example the decimal fractions 1/3 and 1/5, or the decimal numbers 0.1 and 0.01, cannot be represented exactly in binary floating-point. The nearest representable number is used instead. Or unfortunately sometimes the result of an arithmetic calculation (such as 0.1 squared) might be a number near the answer but not the nearest.

Also note that if the same variable is used for many cumulative arithmetic operations, rounding errors can accumulate, making some algorithms produce increasingly inaccurate results the more iterations they are run.

Thus IEEE-754 floating-point is best used only in situations where an exact answer is not required. If safety and reliability is important, then to be safe you should think of IEEE-754 floating-point as giving only approximate or nearly-exact answers.

Note that some calculations initially appear to require use of floating-point, but on closer inspection can be done using only integers. For example the formula x*(n/d) with x=660, n=25, d=100 appears to require floating-point because n/d is 0.25. However, the formula can be rearranged to (x*n)/d to produce the same result using only integers (provided that x*n will not cause arithmetic overflow, and the result being rounded to a whole number is acceptable).

Conflicting Variable Names

2 variables in the same Command Implementation cannot have the same name. 2 variables in different Command Implementations can have the same name. Consider the following valid example:

module MyProgram;
    cmd BakeChippies;
        var chippies, uint, 123;
    ecmd;

    cmd EatChippies;
        var chippies, uint, 123;
    ecmd;
emodule;

The full path name of the first "chippies" variable is "MyProgram.BakeChippies.chippies", and the full path name of the second is "MyProgram.EatChippies.chippies". Because the full path names are different, they do not conflict.

It is possible for the name of a variable to be the same as a shortcut name of one of the standard intrinsic commands. For example, "loc" is the shortcut name of the command for defining a location for use with the "Goto" command. It is possible and accepted to define a variable with the name "loc", for example:

module MyProgram;
    cmd BakeChippies;
        var loc, uint, 123;
        set loc, 456;

        goto testLocation;
        loc testLocation;
    ecmd;
emodule;

The variable named "loc" and the command named "loc" do not conflict because the full path names are different. The full path name of the variable "loc" is "MyProgram.BakeChippies.loc" (the parent is "MyProgram.BakeChippies"). Whereas the full path name of the command "loc" is "Location" or shortcut "loc" (the parent is the base/root of the namespace).

2 items can have the same name if they are children of different parents. This is similar to a hierarchical file system, where you can have 2 files with the same name provided they are in different folders.