Inform version 6.21(G0.35) to 6.33
Maintained by IFTF: <specs@ifarchive.org>
(Last update: Mar 1, 2014)
Copyright 2020 by the Interactive Fiction Technology Foundation. This specification is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License: http://creativecommons.org/licenses/by-nc-sa/3.0
This document and further Glulx information can be found at: https://github.com/iftechfoundation/ifarchive-if-specs
This document describes data conventions used by the Inform 6 compiler when generating Glulx game files. The I6 library needs to read some of this data. This document serves as a way to coordinate compiler and library code. These conventions are not part of the Glulx VM specification.
Most game authors will never have to worry about this document.
In these tables, I'm using the terms "long", "short", and "byte" to describe four-byte, two-byte, and one-byte values respectively.
Most of this information exists in Inform-generated Z-code files as well. Some of the text below assumes that you are familiar with Inform's Z-machine conventions.
This is data that the Glulx Inform compiler places immediately after the header.
long: ASCII 'Info'
long: 0100 (the memory layout described in this document)
long: the Inform version number, rendered as ASCII 'X.YY'
long: the Glulx compiler back-end version number, rendered as ASCII 'X.YY'
short: game release number
byte[6]: game serial number (conventionally the compile date)
The game objects no longer have to be stored sequentially in memory, although the compiler does in fact store them that way. The objects form an overall linked list (separate from the usual Inform linked containment tree.) The objectloop statement follows this linked list. Therefore, it is legal to add entirely new objects to the middle or end of this list. (You cannot add objects to the beginning, because objectloops always start with the metaclass object Class.)
The object structure:
This structure is generated by the compiler, and used by the veneer and library.
byte: 70 (type identifier for objects)
byte[7]: attributes
long: next object in the overall linked list
long: hardware name string
long: property table address
long: parent object
long: sibling object
long: child object
The seven-byte attribute list (56 attributes) is actually variable, and depends on the $NUM_ATTR_BYTES memory setting. The compiler assures that its length is always of the form 4i+3, so that the long fields that follow fall on four-byte boundaries.
If the $GLULX_OBJECT_EXT_BYTES setting is used, that many bytes are appended to the object structure. (They are initialized to zero, and not used by any I6 language feature.) The compiler constant GOBJ_EXT_START may be used to find the start of this extra space (in bytes, counting from the start of the object structure).
The compiler constant GOBJ_TOTAL_LENGTH contains the total size of the object, which is (1 + NUM_ATTR_BYTES + 6*4 + GLULX_OBJECT_EXT_BYTES). If $GLULX_OBJECT_EXT_BYTES is not set, then GOBJ_EXT_START will be the same as GOBJ_TOTAL_LENGTH.
This structure is generated by the compiler, and used by the veneer.
long: number of properties
...each property: {
short: property ID
short: property length (in words)
long: property data address
short: flags
}
long[...]: all data in this table, sequentially.
The list of property entries in this table is sorted by property ID. This allows the library to search tables with a fast binary search.
Note that the property length is a number of words. The Inform obj.#prop operator is computed by multiplying this value by 4. (The program most likely divides it by 4 again immediately, but that's legacy syntax for you.)
The flags field currently only supports one flag:
Properties 1 to INDIV_PROP_START-1 are common properties; individual properties are numbered INDIV_PROP_START and up. They are kept in the same table. The first eight individual properties are the usual Inform metaclass messages: create, recreate, destroy, remaining, copy, call, print, print_to_array.
The value of INDIV_PROP_START is currently hardwired to 256, in the compiler. It will eventually be made flexible. The veneer and library use the constant INDIV_PROP_START, so they can adapt to any value the compiler defines.
Glulx Inform uses 32-bit values to represent "inherited" properties such as FishClass::color. The upper 16 bits contains the property id; the lower 16 bits contains a class number. (This is similar to the way Z-code Inform works, but it's simpler, since there's no need to use the high bits for the individual/common property flag.)
This system allows up to 65535 classes and 65535 properties. There is no limit on the number of properties in an object, or the amount of property data.
This structure is generated by the compiler, and used by the library.
long: number of words
...each word: {
byte: 60 (type identifier for dict words)
bytes[]: lower-case text, zero-padded (nine bytes by default)
short: flags
short: verb number
short: unused (zero)
}
The words, of course, are sorted in alphabetical order (ISO 8859 Latin-1), with the zero padding in short words sorted before any other letter.
The flags and verb number are defined as in the Z-machine. These values are currently only 0 to 255, even though there is room for 16-bit values. Future compiler versions may allow more than 255 verbs.
The nine-character word length is actually variable, controlled by the $DICT_WORD_SIZE memory setting. The library uses the compiler constants DICT_WORD_SIZE, #dict_par1, #dict_par2, and #dict_par3, so it can adapt to any value the compiler defines.
If the compiler is told to generate a Unicode dictionary ($DICT_CHAR_SIZE=4), then the format is instead:
long: number of words
...each word: {
byte: 60 (type identifier for dict words)
bytes[3]: unused (zero)
words[]: Unicode text, zero-padded (nine words by default)
short: flags
short: verb number
shorts[2]: unused (zero)
}
Note that, in this form, the dictionary entry size is a multiple of four. The compiler also takes care that a Unicode dictionary will start at a word-aligned address.
In the dictionary, the compiler currently lower-cases only the ASCII characters 'A' through 'Z'. All others (including accented Latin-1 characters) are stored in the dictionary unchanged. This should be regarded as a bug; the compiler ought to apply the Unicode lower-case algorithm, followed by Normalization Form C (or perhaps KC).
This table is generated by the compiler, and used by the veneer. It is also used by the string-decoding table generated by the compiler, and therefore by the decompression routines of the Glulx VM.
long: number of dynamic strings
...each string: {
long: address of string or function
}
The compiler initializes all the entries in this table to the address of a three-space string.
These structures are generated by the compiler, and used by the library.
First, there's an array of pointers to grammar tables:
long: number of verbs
...each verb: {
long: address of grammar table for this verb
}
Each grammar table has this form:
byte: number of lines
...each grammar line: {
short: action number
byte: flags
...each token in the line: {
byte: token type
long: token data
}
byte: ENDIT (15)
}
The flags field currently only supports one flag:
This is nearly identical to the grammar version 2 format in Z-machine Inform. The only differences are that the token data is 4 bytes long, and the switch flag is no longer stuck in the action number.
This structure is generated by the compiler, and used by the library.
long: number of actions
...each action: {
long: address of routine for this action
}
This structure is used only by the library. KeyboardPrimitive fills in the input buffer; Tokenise__ reads it and fills in the parse table.
The input buffer:
long: number of characters entered
bytes[INPUT_BUFFER_LEN-4]: characters.
Note that, unlike on the Z-machine, there is no "maximum length" field to be filled in before KeyboardPrimitive is called. Also note that this could in theory handle input lines of any length.
The parse table:
long: number of words entered
...each word: {
long: dict word address (or zero)
long: word length (in characters)
long: word position (in input buffer)
}
The Tokenise__ routine fills this in. It uses period, comma, and double-quote as word separators. (The word separator list is fixed, but can be changed by modifying or replacing Tokenise__.)
By a happy coincidence, in both Z-code and Glulx code, parsetable-->
1 is the value of the first word entered. This makes the bi-platform library a little bit simpler.