When a program is running/executing, the temporary data that the program creates and uses in the course of its operation is stored in fast Random-Access Memory (RAM). The data may also be stored elsewhere (for example, on a disk) but it is primarily/usually/firstly stored in RAM.
A pointer is a value that points to some other piece of data in RAM. A pointer indicates a location where a piece of data is stored. In other words, a pointer is the address of some piece of data. Thus a pointer is like the street address of a house. The data that the pointer points to is known as the target of the pointer.
Pointers are much the same concept as references, except lower level and unchecked/unverified/uncontrolled. Pointers are intended for low-level tasks, system implementation, and special optimizations. Normal application programs should use references not pointers.
Pointers increase the risk of security and reliability defects. To protect security and reliability, a system may deny access to pointers in all programs that run in that system, except for privileged/authorized programs or code modules.
References uses automatic memory management, whereas pointers use manual memory management. The automatic memory management system of references operates blithely unaware of pointers. The existence of a pointer to a record does not prevent destruction/deletion of that record. The removal of the last pointer to a record does not cause automatic deletion of that record.
It is possible for a pointer to be invalid. A pointer is invalid if it points to a non-existent or deleted record/object/variable, or points to the wrong type of data, or points to a random location. If a pointer is not used correctly and carefully, it can cause a crash, obscure malfunction, or security hole.
Pointers are always nullable. Attempting to access the target of a null pointer can cause a crash (whereas for a reference, it would cause a recoverable error).
An instance of a pointer is actually a 32-bit or 64-bit unsigned integer containing a RAM address (whether it is 32-bit or 64-bit depends on the CPU and OS). A RAM address is the ordinal of a byte in RAM. You can view RAM as a big long sequence/array of bytes numbered from 0 to the total number of bytes of RAM minus 1.
The ordinal of the first byte in RAM is 0. The ordinal of the second byte in RAM is 1. The ordinal of the third byte in RAM is 2. Thus if the value of a pointer is 2, it points to the third byte in RAM.
If a piece of data occupies multiple bytes, and a pointer points to that piece of data, then the pointer actually contains the ordinal of the first byte of that data. For example, a TUInt32 (an unsigned 32-bit integer) is 4 bytes in size. If it is stored in RAM in byte ordinals 12, 13, 14, and 15, then a pointer to this TUInt32 will contain the number 12.
A pointer can be retargeted. This means we make the pointer point to some other piece of data (usually of the same type). For example, if we changed the RAM address in the pointer from 12 to 16, then it would be pointing to a different TUInt32 located immediately after the original one.
The actual instance of a pointer does not contain anything other than the RAM address (byte ordinal). It does not contain the size of the target data, and it does not contain anything that identifies the type of the target data. The size and type of the target data is known only by implication, or sometimes by performing certain calculations.
A null pointer is a pointer that points to nothing, meaning a pointer that has no target. Null pointers are frequently convenient for various purposes. A pointer is considered to be null if it contains a RAM address of 0 (zero). Yes zero is also the ordinal of the first byte in RAM. Thus zero has a dual meaning for pointers.
In practise, the dual meaning of ordinal 0 causes no problems because operating systems typically reserve the first several kilobytes (or more) of RAM and deny access to read or modify it. Thus 0 can safely be interpreted as a null pointer.
With assistance from the CPU, the operating system may use "virtual memory" techniques to cause pointers to be translated to different storage locations whenever they are used. For example, a pointer may contain an address of 12288, and the target data appears to be stored at address 12288, but it is actually stored at address 24576. 12288 is the virtual address, and 24576 is the actual address.
In practise, programmers can usually ignore use of virtual memory by the system, because it is implemented in a way that it makes it mostly invisible to application programs running in the system. The address translations are performed automatically behind the scenes. Thus programmers can continue to view RAM as a big long sequence/array of bytes, even if it is actually more complex than that in reality.
Programmers should not assume that it is safe to read or modify a byte of RAM at any address. Usually the operating system requires that a program first allocate/reserve a range of byte ordinals before reading or modifying any byte in that range. Usually a program requests a certain count/number of bytes be allocated, and the system gives the program a pointer to the allocated range of bytes after deciding where it will be located.
A type expression is used to specify the type of a variable, parameter, etc. In a type expression, the "Ptr" operator is used to make a pointer to a type. The "Ptr" operator has 1 operand the same as the "Ref" operator, indicating what type of data the pointer will be pointing to (the type of the target). The actual target instance is not specified in the type expression, only the type of the target is specified.
The following example type expression specifies a pointer to an unsigned 32-bit integer:
Ptr[TUInt32]
The following example specifies a pointer to a read-only (unmodifiable) unsigned 32-bit integer. This example is written twice, the second time using shortcuts.
Ptr[ReadOnly[TUInt32]] Ptr ReadOnly uint32
It is possible (and sometimes useful) to define a pointer that points to another pointer. The following type expression is a pointer to a pointer to a TUInt32.
Ptr[Ptr[TUInt32]]
When supplying a pointer as an operand to the intrinsic functions "Add", "Sub", "Mul", "Div", "BitOr", "Equal", "Less", "Minimum", and other related functions, the function uses the target of the pointer. Likewise for references (pointers and references are consistent in this regard).
In the following example, the variable "x" is set to the result of the addition of the target of "inA" and the target of "inB". The pointers themselves are not added, rather the targets of the pointers are added.
Command TestPtrs;
Parameter inA, Ptr[TUInt32];
Parameter inB, Ptr[TUInt32];
Variable x, TUInt32;
Set x, Add[inA,inB];
EndCommand;
Unlike references, pointers are not automatically checked for being null. The aforementioned functions do not check whether the pointer is null. Thus if the pointer is null, the function attempts to access the target regardless, and this typically causes the program to crash. It is the responsibility of the programmer to check if the pointer is null, if necessary.
The following example defines a Command Implementation named "Crash" with 1 parameter. This example invokes the Command Implementation defined in the previous example ("TestPtrs"). For the first parameter of "TestPtrs" ("inA"), "Crash" supplies its parameter named "inFoo". For the second parameter of "TestPtrs" ("inB"), "Crash" supplies a null pointer. The "Add" function in "TestPtrs" will typically cause the program to crash when it attempts to access the target of the null pointer.
Command Crash;
Parameter inFoo, Ptr[TUInt32];
TestPtrs inFoo, Null;
EndCommand;
The following functions can be used with variables that are of a pointer type.
| Function | Description |
|---|---|
| MakePtr[v] | Makes and returns a pointer to the specified variable, parameter, or attribute. The return type is Ptr[GetType[v]]. If the operand is a pointer, then this function returns a pointer to that pointer. |
| TargetOf[p] | The operand must be a pointer (or a reference). This function returns the target of the pointer (directly, not a copy of the target). Means that the target of the pointer should be used/read/modified, not the pointer itself. If the operand is a pointer to a pointer, the result is the first target not the ultimate target. |
| TargetOf2[p] | Shortcut for TargetOf[TargetOf[p]]. The operand must be a pointer to a pointer, or a pointer to a pointer to a pointer, etc. |
| GetTarget[p] | The operand must be a pointer (or a reference). This function returns either a copy of the target of the pointer, or a read-only version of the target. Typically crashes if given a null pointer. |
| The operand must be a pointer or reference. Returns the size (in bytes) of the target of the operand. Same as GetSize[TargetOf[p]]. | |
| MovePtrFwd[p, i] | Returns a pointer equal to the first operand (a pointer) moved forward by the number of bytes specified by the second operand (an integer type). The return type is the same as the type of the first operand. The RAM address in the returned pointer is equal to the RAM address in the first operand added with the the second operand. |
| MovePtrBack[p, i] | Returns a pointer equal to the first operand (a pointer) moved backward by the number of bytes specified by the second operand (an integer type). The return type is the same as the type of the first operand. The RAM address in the returned pointer is equal to the RAM address in the first operand subtracted by the the second operand. |
| IncrPtr[p] IncrPtr[p, i] | Returns a pointer equal to the first operand (a pointer) incremented a specified number of times, meaning moved forward by a multiple of the size of the target type. For example, if the type of the first operand is Ptr[TUInt32], and the second operand is 2, then the function returns a pointer equal to the first operand moved forward by 2×4 = 8 bytes (if the RAM address was 492 it becomes 500). The second operand is optional, and defaults to 1 if omitted. IncrPtr[p,i] is same as MovePtrFwd[p, Mul[GetTargetSize[p], i]]. |
| DecrPtr[p] DecrPtr[p, i] | The opposite of the "IncrPtr" function. "DecrPtr" is the same as "IncrPtr" except that "DecrPtr" moves the pointer backwards whereas "IncrPtr" moves the pointer forwards. |
| Same[a, b, ...] | Any of the operands can be references, pointers, or variables. Returns true if all the operands refer to, point to, or are the same object. Compares the identity of the objects, not whether they contain equal data values. If all the operands are pointers, the behavior is the same as the "EqualPtr" function. |
| NotSame[a, b] | Shortcut for Not[Same[a, b]]. |
| EqualTarget[a, b, ...] | Each operand must be a pointer or a reference. Returns true if all the target values are equal (regardless of whether the pointers/references are equal). If an operand is a pointer to a pointer, the first target is used, not the ultimate target. |
| NotEqualTarget[a, b] | Shortcut for Not[EqualTarget[a,b]]. |
| EqualPtr[a, b, ...] | All the operands must be pointers. Returns true if all the pointers are equal to each other. Compares the pointers themselves (the RAM addresses), NOT the targets of the pointers. The target values and types are completely ignored. If pointers are equal, they point to the exact same target instance. If pointers are not equal, they point to different targets that potentially have equal values. If all the pointers are null, returns true. If any but not all of the pointers are null, returns false. |
| NotEqualPtr[a, b] | Shortcut for Not[EqualPtr[a,b]]. |
| LessPtr[a, b] | Both operands must be pointers. The target types can be different. Returns true if the address in the first operand is lower/less than the address in the second operand. "LessPtr" compares the pointers themselves and ignores the targets, whereas the normal "Less" function compares the targets. |
| GreaterPtr[a, b] | Both operands must be pointers. The target types can be different. Returns true if the address in the first operand is higher/greater than the address in the second operand. "GreaterPtr" compares the pointers themselves and ignores the targets, whereas the normal "Greater" function compares the targets. |
| LessEqualPtr[a, b] | Both operands must be pointers. The target types can be different. Returns true if the address in the first operand is lower/less than or equal to the address in the second operand. The targets of the pointers are ignored. |
| GreaterEqualPtr[a, b] | Both operands must be pointers. The target types can be different. Returns true if the address in the first operand is higher/greater than or equal to the address in the second operand. The targets of the pointers are ignored. |
| IsNullPtr[p] | The operand must be a pointer. The return type is boolean. Returns true if the pointer is null (if the RAM address is zero). Returns false if the pointer is not null. The target of the pointer is ignored. |
| NotNullPtr[p] | Shortcut for Not[IsNullPtr[p]]. |
| Null null | Zero-arity function. Used to indicate a null pointer (or reference). A null pointer has no target. A null pointer contains the address zero. A null pointer can be used when the target data is missing, unsupplied, or non-applicable. |
| PtrToUInt[p] | Converts or typecasts a pointer to an unsigned integer type. The return type of this function is either TUInt32 or TUInt64, depending on whether 32-bit or 64-bit pointers are being used. The return value is the integer RAM address (ordinal of a byte in RAM). The target of the pointer is not accessed. If the pointer is null, the returned integer is zero. |
| UIntToPtr[i, pt] | Converts or typecasts an unsigned integer to a specified pointer type, and returns the pointer. The first operand is the integer. The type of the first operand is either TUInt32 or TUInt64, depending on whether 32-bit or 64-bit pointers are being used. The second operand must be a pointer type expression. The return type is the same as the second operand. If the second operand is omitted, it is assumed to be Ptr[TAny]. |
The following functions have symbols for use with the symbol infix shortcut:
| Name | Symbol |
|---|---|
| Same | ==# |
| NotSame | !=# |
| LessPtr | <# |
| GreaterPtr | ># |
| LessEqualPtr | <=# |
| GreaterEqualPtr | >=# |
The following commands can be used with variables that are of a pointer type. Some commands have optional shortcut/abbreviated names.
| MakePtr p, v; mkp p, v; |
| The variable/attribute supplied for the first parameter is modified to contain a pointer to the variable/attribute supplied for the second parameter. The type of the first variable must be a pointer to the type of the second variable, or compatible. |
| Set p, x; set p, x; |
| If a variable/attribute of a pointer type is supplied for the first parameter, then this command becomes the same as the "SetTarget" command. This behavior is consistent with the behavior for references. |
| SetTarget p, v; sett p, v; |
| Sets the value of the target of a pointer (first parameter) to the specified value (second parameter). Does NOT set the pointer to a different target, rather it sets the target to a different value. In other words, the specified value is copied into the target of the specified pointer. |
| SetPtr dp, sp; setp dp, sp; |
| Sets a pointer variable (first parameter) to contain the same address as another pointer (second parameter). In other words, the pointer supplied in the second parameter is copied into the pointer variable supplied in the first parameter. The type of the pointers must be the same or compatible. The target values of both pointers are ignored (not read or modified). |
| SetPtrNull p; setpnull p; |
| Sets the specified pointer variable to a null pointer. Same as: SetPtr p, Null; |
| MovePtrFwd p, i; mvpf p, i; |
| Moves a pointer (first parameter) forward by a specified number of bytes (second parameter, an integer type). A variable/attribute of a pointer type must be supplied for the first parameter, and it will be modified. The value and type of the target of the pointer are ignored. The second parameter is simply added to the RAM address in the pointer. |
| MovePtrBack p, i; mvpb p, i; |
| Opposite of the "MovePtrFwd" command. Moves the pointer backwards. The second parameter is simply subtracted from the RAM address in the pointer. |
| IncrPtr p; IncrPtr p, i; incp p; incp p, i; |
| Increments a pointer (first parameter) a specified number of times, meaning it moves the pointer forward by a multiple of the size of the target type. For example, if the type of the first operand is Ptr[TUInt32], and the second operand is 2, then the commands moves the pointer forward by 2×4 = 8 bytes (if the RAM address was 492 it becomes 500). The second parameter is optional, and defaults to 1 if omitted. Same as: MovePtrFwd p, Mul[GetTargetSize[p],i]. |
| DecrPtr p; DecrPtr p, i; decp p; decp p, i; |
| Opposite of the "IncrPtr" command. Decrements the pointer (moves it backwards). |
| Load type, sourcePtr, offset, variable; |
| Reads a value located at an offset forward from a pointer, and puts the value into the specified variable. Described in more detail further ahead in this documentation. |
| Store type, destinationPtr, offset, value; |
| Stores the specified value to a location at an offset forward from a pointer. Described in more detail further ahead in this documentation. |
The "Load" command loads/retrieves/reads a value from a specified location in memory (RAM) and puts it into a variable. The "Store" command stores/puts/writes a value in a variable to a specified location in memory. Thus they operate in opposite directions.
The location in memory is specified as a pointer plus an offset (distance) in bytes. The "Load" and "Store" commands are intended for low-level tasks, and access to them may be restricted or denied.
The first parameter of the "Load" and "Store" commands is a type expression indicating the format of the value in memory/RAM. Usually an integer type such as "TUInt8", "TUInt32", "TSInt32", etc is supplied for this parameter.
The second parameter of the "Load" command is a source pointer (type is Ptr[ReadOnly[TAny]]). The second parameter of the "Store" command is a destination pointer (type is Ptr[TAny]). The target of the supplied pointer can be any type.
The third parameter for both "Load" and "Store" is an offset in bytes that will be added to the pointer. For example, if the offset is 1, then the first byte at the pointer location is skipped, and the load or store begins at the second byte. If the offset is zero, that is equivalent to not using the offset parameter.
The fourth and final parameter for "Load" is a variable to receive the loaded value. For the "Store" command, it is the value to store.
The following example stores the value 1234 into an unsigned 16-bit integer located 12 bytes forward from the pointer "dp".
Store TUInt16, dp, 12, 1234;
The following example reads the unsigned 32-bit integer located 8 bytes forward from the pointer "sp", and puts the value into the variable named "v".
Variable v, TUInt32; Load TUInt32, sp, 8, v;
This section explains the concept of "Big Endian" versus "Little Endian". If you already know the meaning of these, you can skip this section.
Consider the decimal number 1036. If the digit 6 within 1036 is incremented (to 7), then the difference between the old number (1036) and the new number (1037) is 1. However if the digit 3 within 1036 is incremented (to 4), then the difference between the old number (1036) and the new number (1046) is 10. Likewise if the digit 0 within 1036 is incremented, the difference is 100, and if the digit 1 is incremented, the difference is 1000.
Thus incrementing the right-most digit (6) makes the smallest/least change to the number. Accordingly, the digit 6 within 1036 is the least significant digit, and the digit 1 is the most significant digit.
By convention, when writing decimal numbers within English text, we write the most significant digits first, and the least significant digits last. English text is written left-to-right, therefore the most significant digit is the left-most digit, and the least significant digit is the right-most digit.
Although in English text we write the most significant digit first, it is possible to devise an alternate system of writing numbers where the least significant digit is written first (least to most significant). Thus the number 1036 would be written as "6301" in the alternate system. It is exactly the same number with the exact same numeric meaning, however in the alternate system we have for some reason decided to write the digits in the opposite order.
Some computer CPUs store the digits of their integers with the most significant digit first, while others store the least significant digit first. If they store the the most significant digit first, then they are known as "Big Endian" (big end first). If they store the least significant digits first, they are known as "Little Endian" (little end first).
The above example used decimal numbers (radix 10) for easy comprehension, however computers are based on binary numbers (radix 2), so when we say that a CPU stores the most significant digits first, we mean binary digits not decimal digits.
Actually we really mean radix 256 digits, also known as bytes. The memory/RAM in a computer appears to the programmer as a big long sequence of radix 256 digits (bytes). 256 is 2 to the power of 8, and this means that a byte is equivalent to 8 binary digits (8 bits). A byte is the smallest addressable unit in the pointer/address system of the CPU, and this is why the memory appears as a big long sequence of radix 256 digits (bytes) and not radix 2 digits (bits).
For example, consider how the decimal number 446000991 is stored in a computer. First we must convert that decimal number (radix 10) to bytes (radix 256). Alternatively you can convert it to radix 2 and then form groups of 8. The result is 4 bytes. The most significant byte contains the number 26, and the least significant byte contains the number 95. A Big Endian CPU stores bytes in order of most to least significant, like this:
26, 149, 111, 95
| Offset | Decimal | Hexadecimal |
|---|---|---|
| 0 | 26 | 0x1A |
| 1 | 149 | 0x95 |
| 2 | 111 | 0x6F |
| 3 | 95 | 0x5F |
Whereas a Little Endian CPU stores the bytes in order of least to most significant, like this:
95, 111, 149, 26
| Offset | Decimal | Hexadecimal |
|---|---|---|
| 0 | 95 | 0x5F |
| 1 | 111 | 0x6F |
| 2 | 149 | 0x95 |
| 3 | 26 | 0x1A |
If there are 2 computers and the CPU in the first computer is Little Endian while the CPU in the second computer is Big Endian, and these 2 computers want to exchange data, then one of the computers must convert the integer data to be exchanged to the Endian format of the other computer before transmission, or convert received integer data to its own format from the format of the other computer.
Usually a file format or a network protocol is defined to be always Little Endian or always Big Endian. If the Endian format of the CPU does not match the Endian format of the file format or network protocol, then the CPU must convert between its Endian format and the Endian format mandated/required by the file format or network protocol.
Occasionally a file format or network protocol supports both Endian formats, and it provides a way to determine which Endian format was used when the data was created.
Some file formats or network protocols avoid the Endian issue altogether by using only sequences of single bytes, never bigger integers formed from 2, 4, or 8 bytes that would introduce a question of Endian format.
By default, the "Load" and "Store" commands use whatever Endian format the host system/CPU normally/natively/optimally uses, however they can optionally be made to perform conversions to Little Endian or Big Endian formats.
The first parameter (the type expression) may use the function named "LittleEndian" or the function named "BigEndian", to specify the format of the value in memory.
The following example stores the value 12345 into a Little Endian format unsigned 32-bit integer located 20 bytes forward from the pointer "dp".
Store LittleEndian[TUInt32], dp, 20, 12345;
That can also be written using shortcuts:
Store LE uint32, dp, 20, 12345;
Or if you wanted to use Big Endian format, then it would be:
Store BigEndian[TUInt32], dp, 20, 12345; Store BE uint32, dp, 20, 12345;
In the following example, a Command Implementation named "Test1" is defined. It stores 4 values to the pointer "dstData", 2 TUInt8 values, and 2 TUInt16 values, a total of 6 bytes. This example uses some shortcut names.
cmd Test1;
prm inX, uint16;
prm inY, uint16;
prm dstData, Ptr TAny;
prm outSize, Out uint;
Store uint8, dstData, 0, 0x1F;
Store uint8, dstData, 1, 0xAA;
Store BE uint16, dstData, 2, inX;
Store BE uint16, dstData, 4, inY;
Set outSize, 6;
ecmd;
If inX is 5000 (0x1388) and inY is 3001 (0xBB9), then the 6 bytes beginning at "dstData" will be set to the following values:
0x1F, 0xAA, 0x13, 0x88, 0x0B, 0xB9
The above example specifies Big Endian byte order. If it specified Little Endian byte order, then the 6 bytes would be:
0x1F, 0xAA, 0x88, 0x13, 0xB9, 0x0B
The "BigEndian" and "LittleEndian" functions can be used on the TUInt8 (uint8) and TSInt8 (sint8) types, however they have no actual effect because a TUInt8 or a TSInt8 is a single byte and is thus the same in either format.
CPUs usually run faster if the integers are stored in an aligned location, meaning a location where the address is a multiple of the size of the integer. For example, a 4-byte integer (such as TUInt32) is fastest if located at an address that is a multiple of 4. However alignment can waste memory in some situations, so is not always used.
The first parameter (the type expression) may use the function named "Aligned" or the function named "Unaligned", to explicitly specify whether or not the value is aligned in memory. The compiler can then optimize the code differently depending on whether the value is expected to be aligned or unaligned.
The following 2 equivalent examples store the value 1234 into an unaligned Big Endian unsigned 16-bit integer located 12 bytes forward from the pointer "dp".
Store Unaligned[BigEndian[TUInt16]], dp, 12, 1234; Store BigEndian[Unaligned[TUInt16]], dp, 12, 1234;
If neither "Aligned" or "Unaligned" is used, then it defaults to assuming aligned.