***** TO DO: REF SUBSETS TO BE CHANGED TO ITERATORS OR A SUBSET/SLICE STORAGE METHOD *****
It is possible for there to be multiple dwellings at a single street address if the building at that address is an apartment block. Then an address may be, for example, "Apartment 23, 81 Acland Street". The apartment number specifies a part of the building. Similarly, references in KRE can optionally specify only a part of the object, known as a subset range.
For example, if you have a text object containing a large amount of text, you may wish to pass that text to another part of the program, but you do not want to pass all of the text. You want to pass only a part of it (a subset). You could copy the desired part to a new text object, and then pass that new text object instead, but then that requires additional memory and processing overhead. Thus KRE allows a reference to contain a subset range (start offset and end offset), identifying a range of the text. It is useful for specifying subsets of text and arrays.
It is important to realize that the subset range is stored with the reference, not with the object itself. Thus there can be multiple references to the one same object, and each reference can specify a different subset range. And if you modify the subset range, it affects only that reference, not any other references to the same object.
The subset range is permitted to start and/or end beyond the end of the text or array. This may occur if, for example, the text or array was shortened/truncated using a different reference. Using a reference with an invalid subset range may cause an error but will not crash. The subset end offset is not permitted to ever be less than the start offset.
References existing in places other than command local variables do not support a subset range. If you wish to use a subset range, you must copy the reference to a command local variable first.
The subset range is contiguous. It cannot be discontiguous or multi-part.
In the C++ language, a subset can effectively be specified by obtaining a pointer to the object and then incrementing the pointer so that instead of pointing to the start of the object, it points to somewhere within the object. KRE does not support this because it introduces the possibility of crashes. KRE provides the more reliable subset range feature instead.
Subset ranges are specified in terms of offsets or ordinals. They are basically equivalent, 2 names for the same thing, so you can use them interchangeably in most cases. By convention, if we are talking about a text object, we say offset, but if we are talking about an array object, then we say ordinal. Both offsets and ordinals begin at 0 (the first item is 0).
That convention exists because ordinal tends to imply that you can always find the next item by adding 1 to the ordinal (and that is the case for the MNumArray module). Whereas offset tends to imply that adding 1 does not necessarily find the next item (and that is the case for MText, where adding 1 does not always yield the next character because character size varies).
Following we say offset, but if you are using subsets in relation to an array, you can read it as saying ordinal -- it works much the same way in either case, except for the above-mentioned difference.
There are 2 methods of specifying a subset range. You can use whichever method is easier in your particular situation. The subset commands with "Range" in the name specify the subset as start-offset and end-offset, whereas the commands with "Extent" in the name specify the subset as offset and count.
Start-offset is inclusive and end-offset is exclusive, meaning the subset does include the item located at the offset of start-offset, but does NOT include the item located at the offset of end-offset. (Remember you can read that paragraph as saying offset or ordinal.)
The following formulas summarize the mathematical relationship between StartOffset and EndOffset (for Ranges) versus Offset and Count (for Extents):
EndOffset = Offset + Count
Count = EndOffset - StartOffset
Offset = StartOffset
Offset = EndOffset - Count
For example, say we have the text "abc" and we want a subset that contains only the middle character ("b"). If using Ranges, then StartOffset is 1 and EndOffset is 2. If using Extents, then Offset is 1 and Count is 1. More examples following, using "burrito" as the entire text.
| Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|---|
| Value | b | u | r | r | i | t | o |
| Subset | StartOffset | EndOffset | Offset | Count |
|---|---|---|---|---|
| "bur" | 0 | 3 | 0 | 3 |
| "ur" | 1 | 3 | 1 | 2 |
| "urr" | 1 | 4 | 1 | 3 |
| "rit" | 3 | 6 | 3 | 3 |
| "rito" | 3 | 7 | 3 | 4 |
| "urrito" | 1 | 7 | 1 | 6 |
| "o" | 6 | 7 | 6 | 1 |
| "i" | 4 | 5 | 4 | 1 |
| "" | 0 | 0 | 0 | 0 |
| "" | 3 | 3 | 3 | 0 |
| "" | 7 | 7 | 7 | 0 |
If EndOffset is less than StartOffset, the subset range is invalid.
If StartOffset or EndOffset is greater than the total number/count of items, the subset range is invalid (it includes non-existent items).
If StartOffset or EndOffset exactly equals the total number/count of items, then the subset starts or ends at the end of the items.
If StartOffset equals EndOffset, then the subset is empty (a subset can be empty at any position).
You might think it would be less confusing to have StartOffset and EndOffset both be inclusive. However that would be awkward mathematically because instead of the Count being simply EndOffset minus StartOffset, it would become EndOffset minus StartOffset then plus 1. Furthermore, how would you represent an empty subset at any position?
Each reference variable in a Command Implementation has a boolean value associated with it that indicates whether or not the reference is a subset. If it is set to false, the reference refers to the entire object. If it is set to true, the reference refers to a subset of the object (where the subset is a single contiguous range).
It may seem as if the enable boolean is an unnecessary complication. It may seem that we could have the subset range enabled at all times, and then in situations where we do not want a subset, the subset range could be effectively disabled by setting it to include everything (all items).
For example:
- We create a new empty text object. Both the subset-start-offset and the subset-end-offset are initially set to zero.
- Then we insert the text "abc" at offset zero. Insert means inserting into the subset. The subset-end-offset is therefore updated to 3.
- Then we again insert the text "abc" at offset zero. The subset-end-offset is updated to 6.
- Then we append the text "xx". Append means appending to the subset. The subset-end-offset is updated to 8.
- Then we remove 3 characters at offset 3. Remove means removing from the subset. The subset-end-offset is updated to 5.
In that example, the subset range is enabled and updated when changes are made, and throughout it continues to be a subset that encompasses the entire object. No problem. Therefore why do we need the boolean to disable subsets?
A problem occurs when there exists multiple references to the same object, with each reference having its own independent subset range (a useful capability). For example:
- In command A, we create a new empty text object. The subset range of the reference is initially set to 0:0.
- We invoke a command B, passing the reference as a parameter to command B. Parameters are copied, thus the reference is copied to B (B receives a duplicate of the reference).
- Command B appends "abc" to the object using its reference. The subset range of the reference in B is updated to 0:3.
- Command B finishes and returns to command A.
- The subset range of the reference in A remains as 0:0, and thus surprisingly command A does not see the text appended by command B.
Whereas if there is a boolean to enable/disable subsets, the reference for the new text object created by command A is initially set to disable the subset range, and thus both command A and B are operating using non-subset references, and command A sees the text appended by command B, as is the expected behavior.
If you have a reference to an object and you want to make that reference become a subset, or if it already is a subset and you want to change the subset range (change which part is the subset), then you can invoke either of the following commands:
SetSubsetRange inRef, inStartOffset, inEndOffset; SetSubsetExtent inRef, inOffset, inCount;
- The commands modify the subset range/extent of the specified reference. The previous subset range of the reference is ignored (not used) and is replaced by the new range.
- The reference inRef is enabled as a subset. It makes no difference whether inRef was previously enabled as a subset.
- The subset range/extent is validated (made valid, if it was invalid). If the start-offset or end-offset is beyond the end, it is set to the end.
- If the end-offset is less than the start-offset, it is changed to the start-offset.
- Internally the reference stores the subset range in either "Range" or "Extent" format and if you invoke the command for the other format, it converts the parameters to its internal format.
- After inOffset is validated, if inOffset + inCount causes arithmetic overflow, then inCount is changed to the total number/count of items minus inOffset.
- Fails if inRef is a null reference.
If you have a reference to an object, and the reference is a subset, and you want to obtain the numbers of the subset range, then you can invoke either of the following commands:
GetSubsetRange inRef, outStartOffset, outEndOffset; GetSubsetExtent inRef, outOffset, outCount;
- You supply uint32 variables for the "out" parameters and they will be set to contain the numbers of the subset range/extent of the reference inRef.
- The reference inRef is not modified.
- If the reference is not a subset, then outStartOffset (or outOffset) is set to zero, and outEndOffset (or outCount) is set to the total number/count of items (as if the reference were a subset that included everything).
- Fails if inRef is a null reference.
If you have a reference to an object and you want to make a subset of it but you do not want to change the reference you have (meaning you want the subset to be made as a new separate reference to the same object), then you can invoke either of the following commands:
MakeSubsetRange outRef, inRef, inStartOffset, inEndOffset; MakeSubsetExtent outRef, inRef, inOffset, inCount;
- You supply a reference variable for the outRef parameter, and it will be set to a reference that is a copy of the reference inRef but with the specified subset range applied and enabled.
- The subset range/extent is validated in the same way as the SetSubsetRange/Extent commands.
- The reference inRef is not modified.
- Any existing reference in the outRef variable is removed and replaced by the new reference.
- Fails if inRef is a null reference. The outRef variable usually contains a null reference initially and that is acceptable.
If you have a subset reference and you want to convert it back to a normal non-subset reference, then you can invoke the following command:
DisableSubset inRef;
- Does nothing if the reference is already non-subset.
- Fails if inRef is a null reference.
***** TO BE RESOLVED *****
A potential problem in the current implementation:
For example, if you have a subset reference, and you pass it (copy the reference) to another Command Implementation that invokes MNumArray::Append on its copy of the reference. MNumArray::Append extends the subset range to include the newly appended item. However, only the subset range in the second Command Implementation is updated, not the range in the reference in the first Command Implementation. Thus when the second Command Implementation returns to the first, the first does not see the appended item because it is outside of its subset range that was not updated (because there were 2 references).
A solution is to change the reference parameter in the second Command Implementation from input to inout, then the change to the subset range will be copied back to the first Command Implementation. But does this effectively mean that all references will have to be passed as inout otherwise they will only be able to use subset ranges in a read-only manner?
Perhaps it would be better for subset ranges to be taken out of the references, and instead put into separate subset objects that contain a subset range and an internal reference to the real object, and then they act like the real object but limited to the subset range. These subset objects could use an optimized mem alloc strategy to help mitigate the overhead of creating another object.
Alternatively, the subset range could be stored within the object itself. This yields the best performance but is less useful, and if the subset range is used when reading the object, then reading the object sort of modifies the object but not exactly.
***** ITERATORS *****
The concept of subsets should probably be merged with iterator objects. For example, if you can create an iterator object that iterates through a subset of the real object, and you insert new items via/using the iterator object, then they are inserted into the subset relative to the start of the subset.
One advantage of iterators: For example, a command can iterate through a list of object references without having to know whether the list is implemented as an MRefArray or an MChain (the same program code works for both).
An iterator object encapsulates the internal mechanism of how the iteration occurs. By having different types of iterators (for example forward iterators and backward iterators), the same program code can iterate through a list/array in different directions (or in different orders) depending on the type of iterator supplied.
An iterator object internally contains a reference to the real object it iterates through. Thus if you have an iterator object and you want to use it, then only the iterator object need be specified -- the real object does not need to be specified along with the iterator object because the iterator object already contains an internal reference to the real object.
Random-access iterators should be supported as well as sequential iterators. Random-access means accessing an item by specifying an ordinal relative to the start (or end) of the subset.