Document Date: | January 14, 1985 |
---|---|
From: | Jerry Morrison, Electronic Arts |
Status of Standard: | Released and in use |
As home computer hardware evolves to better and better media machines, the demand increases for higher quality, more detailed data. Data development gets more expensive, requires more expertise and better tools, and has to be shared across projects. Think about several ports of a product on one CD-ROM with 500MB of common data!
Development tools need standard interchange file formats. Imagine scanning in images of “player” shapes, moving them to a paint program for editing, then incorporating them into a game. Or writing a theme song with a Macintosh score editor and incorporating it into an Amiga game. The data must at times be transformed, clipped, filled out, and moved across machine kinds. Media projects will depend on data transfer from graphic, music, sound effect, animation, and script tools.
Customers should be able to move their own data between independently developed software products. And they should be able to buy data libraries usable across many such products. The types of data objects to exchange are open-ended and include plain and formatted text, raster and structured graphics, fonts, music, sound effects, musical instrument descriptions, and animation.
The problem with expedient file formats typically memory dumps is that they're too provincial. By designing data for one particular use (e.g. a screen snapshot), they preclude future expansion (would you like a full page picture? a multi-page document?). In neglecting the possibility that other programs might read their data, they fail to save contextual information (how many bit planes? what resolution?). Ignoring that other programs might create such files, they're intolerant of extra data (texture palette for a picture editor), missing data (no color map), or minor variations (smaller image). In practice, a filed representation should rarely mirror an in-memory representation. The former should be designed for longevity; the latter to optimize the manipulations of a particular program. The same filed data will be read into different memory formats by different programs.
The IFF philosophy: “A little behind-the-scenes conversion when programs read and write files is far better than N×M explicit conversion utilities for highly specialized formats.”
So we need some standardization for data interchange among development tools and products. The more developers that adopt a standard, the better for all of us and our customers.
Here is our offering: Electronic Arts' IFF standard for Interchange File Format. The full name is “EA IFF 1985.” Alternatives and justifications are included for certain choices. Public domain subroutine packages and utility programs are available to make it easy to write and use IFF-compatible programs.
Part 1 introduces the standard. Part 2 presents its requirements and
background. Parts 3, 4, and 5 define the primitive data types,
FORM
s, and LIST
s, respectively, and how to define new
high level types. Part 6 specifies the top level file structure. Appendix A is
included for quick reference and Appendix B names the committee responsible for
this standard.
FTXT
” IFF Formatted
Text, from Electronic Arts. IFF supplement document for a
text format.ILBM
” IFF Interleaved
Bitmap, from Electronic Arts. IFF supplement document for a
raster image format.Part 2 is about the background, requirements, and goals for the standard. It's geared for people who want to design new types of IFF objects. People just interested in using the standard may wish to skip this part.
A standard should be long on prescription and short on overhead. It should give lots of rules for designing programs and data files for synergy. But neither the programs nor the files should cost too much more than the expedient variety. While we're looking to a future with CD-ROMs and perpendicular recording, the standard must work well on floppy disks.
For program portability, simplicity, and efficiency, formats should be designed with more than one implementation style in mind. (In practice, pure stream I/O is adequate although random access makes it easier to write files.) It ought to be possible to read one of many objects in a file without scanning all the preceding data. Some programs need to read and play out their data in real time, so we need good compromises between generality and efficiency.
As much as we need standards, they can't hold up product schedules. So we also need a kind of decentralized extensibility where any software developer can define and refine new object types without some “standards authority” in the loop. Developers must be able to extend existing formats in a forward- and backward-compatible way. A central repository for design information and example programs can help us take full advantage of the standard.
For convenience, data formats should heed the restrictions of various processors and environments. E.g. word-alignment greatly helps 68000 access at insignificant cost to 8088 programs.
Other goals include the ability to share common elements over a list of objects and the ability to construct composite objects containing other data objects with structural information like directories.
And finally, “Simple things should be simple and complex things should be possible.” — Alan Kay.
Let's think ahead and build programs that read and write files for each other and for programs yet to be designed. Build data formats to last for future computers so long as the overhead is acceptable. This extends the usefulness and life of today's programs and data.
To maximize interconnectivity, the standard file structure and the specific object formats must all be general and extensible. Think ahead when designing an object. It should serve many purposes and allow many programs to store and read back all the information they need; even squeeze in custom data. Then a programmer can store the available data and is encouraged to include fixed contextual details. Recipient programs can read the needed parts, skip unrecognized stuff, default missing data, and use the stored context to help transform the data as needed.
IFF addresses these needs by defining a standard file structure, some initial data object types, ways to define new types, and rules for accessing these files. We can accomplish a great deal by writing programs according to this standard, but don't expect direct compatibility with existing software. We'll need conversion programs to bridge the gap from the old world.
IFF is geared for computers that readily process information in 8-bit bytes. It assumes a “physical layer” of data storage and transmission that reliably maintains “files” as strings of 8-bit bytes. The standard treats a “file” as a container of data bytes and is independent of how to find a file and whether it has a byte count.
This standard does not by itself implement a clipboard for cutting and pasting data between programs. A clipboard needs software to mediate access, to maintain a “contents version number” so programs can detect updates, and to manage the data in “virtual memory.”
The basic problem is how to represent information in a way that's program-independent, compiler- independent, machine-independent, and device-independent.
The computer science approach is “data abstraction,” also known as “objects,” “actors,” and “abstract data types.” A data abstraction has a “concrete representation” (its storage format), an “abstract representation” (its capabilities and uses), and access procedures that isolate all the calling software from the concrete representation. Only the access procedures touch the data storage. Hiding mutable details behind an interface is called “information hiding.” What data abstraction does is abstract from details of implementing the object, namely the selected storage representation and algorithms for manipulating it.
The power of this approach is modularity. By adjusting the access procedures we can extend and restructure the data without impacting the interface or its callers. Conversely, we can extend and restructure the interface and callers without making existing data obsolete. It's great for interchange!
But we seem to need the opposite: fixed file formats for all programs to access. Actually, we could file data abstractions (“filed objects”) by storing the data and access procedures together. We'd have to encode the access procedures in a standard machine-independent programming language la PostScript. Even still, the interface can't evolve freely since we can't update all copies of the access procedures. So we'll have to design our abstract representations for limited evolution and occasional revolution (conversion).
In any case, today's microcomputers can't practically store data abstractions. They can do the next best thing: store arbitrary types of data in “data chunks,” each with a type identifier and a length count. The type identifier is a reference by name to the access procedures (any local implementation). The length count enables storage-level object operations like “copy” and “skip to next” independent of object type.
Chunk writing is straightforward. Chunk reading requires a trivial parser to scan each chunk and dispatch to the proper access/conversion procedure. Reading chunks nested inside other chunks requires recursion, but no lookahead or backup.
That's the main idea of IFF. There are, of course, a few other details.
Where our needs are similar, we borrow from existing standards.
Our basic need to move data between independently developed programs is similar to that addressed by the Apple Macintosh desk scrap or “clipboard” [Inside Macintosh chapter “Scrap Manager”]. The Scrap Manager works closely with the Resource Manager, a handy filer and swapper for data objects (text strings, dialog window templates, pictures, fonts) including types yet to be designed [Inside Macintosh chapter “Resource Manager”]. The Resource Manager is a kin to Smalltalk's object swapper.
We will probably write a Macintosh desk accessory that converts IFF files to and from the Macintosh clipboard for quick and easy interchange with programs like MacPaint and Resource Mover.
Macintosh uses a simple and elegant scheme of 4-character “identifiers” to identify resource types, clipboard format types, file types, and file creator programs. Alternatives are unique ID numbers assigned by a central authority or by hierarchical authorities, unique ID numbers generated by algorithm, other fixed length character strings, and variable length strings. Character string identifiers double as readable signposts in data files and programs. The choice of 4 characters is a good tradeoff between storage space, fetch/compare/store time, and name space size. We'll honor Apple's designers by adopting this scheme.
“PICT
” is a good example of a standard structured
graphics format (including raster images) and its many uses [Inside
Macintosh chapter “QuickDraw”]. Macintosh provides QuickDraw
routines in ROM to create, manipulate, and display
PICT
s. Any application can create a PICT
by simply
asking QuickDraw to record a sequence of drawing commands. Since it's just as
easy to ask QuickDraw to render a PICT
to a screen or a printer,
it's very effective to pass them between programs, say from an illustrator to a
word processor. An important feature is the ability to store
“comments” in a PICT
which QuickDraw will ignore.
Actually, it passes them to your optional custom “comment
handler.”
PostScript, Adobe's print file standard, is a more general way to represent
any print image (which is a specification for putting marks on paper)
[PostScript Language Manual]. In fact, PostScript is a
full-fledged programming language. To interpret a PostScript program is to
render a document on a raster output device. The language is defined in layers:
a lexical layer of identifiers, constants, and operators; a layer of reverse
polish semantics including scope rules and a way to define new subroutines; and
a printing-specific layer of built-in identifiers and operators for rendering
graphic images. It is clearly a powerful (Turing equivalent) image definition
language. PICT
and a subset of PostScript are candidates for
structured graphics standards.
A PostScript document can be printed on any raster output device (including
a display) but cannot generally be edited. That's because the original
flexibility and constraints have been discarded. Besides, a PostScript program
may use arbitrary computation to supply parameters like placement and size to
each operator. A QuickDraw PICT
, in comparison, is a more
restricted format of graphic primitives parameterized by constants. So a
PICT
can be edited at the level of the primitives, e.g. move or
thicken a line. It cannot be edited at the higher level of, say, the bar chart
data which generated the picture.
PostScript has another limitation: Not all kinds of data amount to marks on paper. A musical instrument description is one example. PostScript is just not geared for such uses.
“DIF” is another example of data being stored in a general format usable by future programs [DIF Technical Specification]. DIF is a format for spreadsheet data interchange. DIF and PostScript are both expressed in plain ASCII text files. This is very handy for printing, debugging, experimenting, and transmitting across modems. It can have substantial cost in compaction and read/write work, depending on use. We won't store IFF files this way but we could define an ASCII alternate representation with a converter program.
InterScript is Xerox' standard for interchange of editable documents [Introduction to InterScript]. It approaches a harder problem: How to represent editable word processor documents that may contain formatted text, pictures, cross-references like figure numbers, and even highly specialized objects like mathematical equations? InterScript aims to define one standard representation for each kind of information. Each InterScript-compatible editor is supposed to preserve the objects it doesn't understand and even maintain nested cross-references. So a simple word processor would let you edit the text of a fancy document without discarding the equations or disrupting the equation numbers.
Our task is similarly to store high level information and preserve as much content as practical while moving it between programs. But we need to span a larger universe of data types and cannot expect to centrally define them all. Fortunately, we don't need to make programs preserve information that they don't understand. And for better or worse, we don't have to tackle general-purpose cross-references yet.
Atomic components such as integers and characters that are interpretable directly by the CPU are specified in one format for all processors. We chose a format that's most convenient for the Motorola MC68000 processor [M68000 16/32-Bit Microprocessor Programmer's Reference Manual].
N.B.: Part 3 dictates the format for “primitive” data types where and only where used in the overall file structure and standard kinds of chunks (Cf. Chunks). The number of such occurrences will be small enough that the costs of conversion, storage, and management of processor- specific files would far exceed the costs of conversion during I/O by “foreign” programs. A particular data chunk may be specified with a different format for its internal primitive types or with processor- or environment- specific variants if necessary to optimize local usage. Since that hurts data interchange, it's not recommended. (Cf. Designing New Data Sections, in Part 4.)
All data objects larger than a byte are aligned on even byte addresses relative to the start of the file. This may require padding. Pad bytes are to be written as zeros, but don't count on that when reading.
This means that every odd-length “chunk” (see below) must be padded so that the next one will fall on an even boundary. Also, designers of structures to be stored in chunks should include pad fields where needed to align every field larger than a byte. Zeros should be stored in all the pad bytes.
Justification: Even-alignment causes a little extra work for files that are used only on certain processors but allows 68000 programs to construct and scan the data in memory and do block I/O. You just add an occasional pad field to data structures that you're going to block read/write or else stream read/write an extra byte. And the same source code works on all processors. Unspecified alignment, on the other hand, would force 68000 programs to (dis)assemble word and long-word data one byte at a time. Pretty cumbersome in a high level language. And if you don't conditionally compile that out for other processors, you won't gain anything.
Numeric types supported are two's complement binary integers in the format used by the MC68000 processor high byte first, high word first the reverse of 8088 and 6502 format. They could potentially include signed and unsigned 8, 16, and 32 bit integers but the standard only uses the following:
UBYTE 8 bits unsigned WORD 16 bits signed UWORD 16 bits unsigned LONG 32 bits signed
The actual type definitions depend on the CPU and the compiler. In this document, we'll express data type definitions in the C programming language. [See C, A Reference Manual.] In 68000 Lattice C:
typedef unsigned char UBYTE; /* 8 bits unsigned */ typedef short WORD; /* 16 bits signed */ typedef unsigned short UWORD; /* 16 bits unsigned */ typedef long LONG; /* 32 bits signed */
The following character set is assumed wherever characters are used, e.g. in
text strings, IDs, and TEXT
chunks (see below).
Characters are encoded in 8-bit ASCII. Characters in the
range NUL
(hex 0) through DEL
(hex 7F) are well
defined by the 7-bit ASCII standard. IFF uses
the graphic group “
” (SP
, hex 20)
through “~
” (hex 7E).
Most of the control character group hex 01 through hex 1F have no standard
meaning in IFF. The control character LF
(hex 0A) is
defined as a “newline” character. It denotes an intentional line
break, that is, a paragraph or line terminator. (There is no way to store an
automatic line break. That is strictly a function of the margins in the
environment the text is placed.) The control character ESC
(hex
1B) is a reserved escape character under the rules of ANSI
standard 3.64–1979 American National Standard Additional Control Codes
for Use with ASCII, ISO standard 2022,
and ISO/DIS standard 6429.2.
Characters in the range hex 7F through hex FF are not globally defined in
IFF. They are best left reserved for future standardization. But
note that the FORM
type FTXT
(formatted text) defines
the meaning of these characters within FTXT
forms. In particular,
character values hex 7F through hex 9F are control codes while characters hex
A0 through hex FF are extended graphic characters like 'é', as per the
ISO and ANSI standards cited above. [See
the supplementary document “FTXT
”
IFF Formatted Text.]
A “creation date” is defined as the date and time a stream of data bytes was created. (Some systems call this a “last modified date.”) Editing some data changes its creation date. Moving the data between volumes or machines does not.
The IFF standard date format will be one of those used in MS-DOS, Macintosh, or Amiga DOS (probably a 32-bit unsigned number of seconds since a reference point).
Issue: Investigate these three.
A “type ID,” “property name,”
“FORM
type,” or any other IFF identifier
is a 32-bit value: the concatenation of four ASCII
characters in the range “
” (SP
, hex
20) through “~
” (hex 7E). Spaces (hex 20) should not
precede printing characters; trailing spaces are ok. Control characters are
forbidden.
typedef CHAR ID[4];
IDs are compared using a simple 32-bit case-dependent equality test.
Data section type IDs (aka FORM
types) are restriced IDs.
(Cf.
Data Sections.) Since they may be stored in filename extensions (Cf. Single
Purpose Files) lower case letters and punctuation marks are forbidden. Trailing
spaces are ok.
Carefully choose those four characters when you pick a new ID. Make them
mnemonic so programmers can look at an interchange format file and figure out
what kind of data it contains. The name space makes it possible for developers
scattered around the globe to generate ID values with minimal collisions so
long as they choose specific names like “MUS4
” instead
of general ones like “TYPE
” and
“FILE
.” EA will “register”
new FORM
type IDs and format descriptions as they're devised, but
collisions will be improbable so there will be no pressure on this
“clearinghouse” process. Appendix A has a list of currently defined
IDs.
Sometimes it's necessary to make data format changes that aren't backward
compatible. Since IDs are used to denote data formats in IFF, new
IDs are chosen to denote revised formats. Since programs won't read chunks
whose IDs they don't recognize (see Chunks, below), the new IDs keep old
programs from stumbling over new data. The conventional way to chose a
“revision” ID is to increment the last character if it's a digit or
else change the last character to a digit. E.g.
first and second revisions of the ID “XY
” would be
“XY1
” and “XY2
.” Revisions of
“CMAP
” would be “CMA1
” and
“CMA2
.”
Chunks are the building blocks in the IFF structure. The form expressed as a C typedef is:
typedef struct { ID ckID; LONG ckSize; /* sizeof(ckData) */ UBYTE ckData[/* ckSize */]; } Chunk;
We can diagram an example chunk a “CMAP
” chunk
containing 12 data bytes like this:
ckID | “CMAP ” |
---|---|
ckSize | 12 |
ckData | 0, 0, 0, 32 0, 0, 64, 0 0, 0, 64, 0 (12 bytes) |
The fixed header part means “Here's a type ckID
chunk
with ckSize
bytes of data.”
The ckID
identifies the format and purpose of the chunk. As a
rule, a program must recognize ckID
to interpret
ckData
. It should skip over all unrecognized chunks. The
ckID
also serves as a format version number as long as we pick new
IDs to identify new formats of ckData
(see above).
The following ckID
s are universally reserved to identify chunks
with particular IFF meanings: “LIST
,”
“FORM
,” “PROP
,”
“CAT
,” and
“
.” The special ID
“
” (4 spaces) is a
ckID
for “filler” chunks, that is, chunks that fill
space but have no meaningful contents. The IDs “LIS1
”
through “LIS9
,” “FOR1
”
through “FOR9
,” and “CAT1
”
through “CAT9
” are reserved for future “version
number” variations. All IFF-compatible software must account
for these 23 chunk IDs. Appendix A has a list of predefined IDs.
The ckSize
is a logical block size how many data bytes are in
ckData
. If ckData
is an odd number of bytes long, a 0
pad byte follows which is not included in ckSize
. (Cf. Alignment.) A chunk's total physical size is
ckSize
rounded up to an even number plus the size of the header.
So the smallest chunk is 8 bytes long with ckSize
= 0. For the
sake of following chunks, programs must respect every chunk's
ckSize
as a virtual end-of-file for reading its
ckData
even if that data is malformed, e.g. if nested contents are truncated.
We can describe the syntax of a chunk as a regular expression with
“#” representing the ckSize
, i.e. the length of the following {braced} bytes. The
“[0]” represents a sometimes needed pad byte. (The regular
expressions in this document are collected in Appendix A along with an
explanation of notation.)
Chunk := ID #{ UBYTE* } [0]
One chunk output technique is to stream write a chunk header, stream write the chunk contents, then random access back to the header to fill in the size. Another technique is to make a preliminary pass over the data to compute the size, then write it out all at once.
In a string of ASCII text, LF
denotes a
forced line break (paragraph or line terminator). Other control characters are
not used. (Cf. Characters.)
The ckID
for a chunk that contains a string of plain,
unformatted text is “TEXT
.” As a practical matter, a
text string should probably not be longer than 32767 bytes. The standard allows
up to 231 - 1 bytes.
When used as a data property (see below), a text string chunk may be 0 to
255 characters long. Such a string is readily converted to a C string or a
Pascal STRING[255]
. The ckID
of a property must be
the property name, not “TEXT
.”
When used as a part of a chunk or data property, restricted C string format
is normally used. That means 0 to 255 characters followed by a NUL
byte (ASCII value 0).
Data properties specify attributes for following (non-property) chunks. A
data property essentially says “identifier = value,” for example
“XY = (10, 200),” telling something about following chunks.
Properties may only appear inside data sections
(“FORM
” chunks, cf. Data
Sections) and property sections (“PROP
” chunks, cf. Group PROP
).
The form of a data property is a special case of Chunk. The
ckID
is a
property name as well as a property type. The ckSize
should be
small since data properties are intended to be accumulated in
RAM when reading a file. (256 bytes is a reasonable upper
bound.) Syntactically:
Property := Chunk
When designing a data object, use properties to describe context information like the size of an image, even if they don't vary in your program. Other programs will need this information.
Think of property settings as assignments to variables in a programming
language. Multiple assignments are redundant and local assignments temporarily
override global assignments. The order of assignments doesn't matter as long as
they precede the affected chunks. (Cf.
LIST
s, CAT
s, and Shared Properties.)
Each object type (FORM
type) is a local name space for property
IDs. Think of a “CMAP
” property in a
“FORM
ILBM
” as the qualified ID
“ILBM.CMAP
.” Property IDs specified when an object
type is designed (and therefore known to all clients) are called
“standard” while specialized ones added later are
“nonstandard.”
Issue: A standard mechanism for “links”
or “cross references” is very desirable for things like combining
images and sounds into animations. Perhaps we'll define “link”
chunks within FORM
s that refer to other FORM
s or to
specific chunks within the same and other FORM
s. This needs
further work. EA IFF 1985 has no standard link mechanism. For now,
it may suffice to read a list of, say, musical instruments, and then just refer
to them within a musical score by index number.
Issue: We may need a standard form for references to other files. A “file ref” could name a directory and a file in the same type of operating system as the ref's originator. Following the reference would expect the file to be on some mounted volume. In a network environment, a file ref could name a server, too.
Issue: How can we express operating-system independent file refs?
Issue: What about a means to reference a portion of another file? Would this be a “file ref” plus a reference to a “link” within the target file?
The first thing we need of a file is to check: Does it contain IFF data and, if so, does it contain the kind of data we're looking for? So we come to the notion of a “data section.”
A “data section” or IFF
“FORM
” is one self-contained “data object”
that might be stored in a file by itself. It is one high level data object such
as a picture or a sound effect. The IFF structure
“FORM
” makes it self- identifying. It could be a
composite object like a musical score with nested musical instrument
descriptions.
FORM
A data section is a chunk with ckID
“FORM
” and this arrangement:
FORM := "FORM" #{ FormType (LocalChunk | FORM | LIST | CAT)* } FormType := ID LocalChunk := Property | Chunk
The ID “FORM
” is a syntactic keyword like
“struct” in C. Think of a “struct ILBM
”
containing a field “CMAP
.” If you see
“FORM
” you'll know to expect a FORM
type
ID (the structure name, “ILBM
” in this example) and a
particular contents arrangement or “syntax” (local chunks,
FORM
s, LIST
s, and CAT
s).
(LIST
s and CAT
s are discussed in part 5, below.) A
“FORM
ILBM
,” in particular, might contain
a local chunk “CMAP
,” an
“ILBM.CMAP
” (to use a qualified name).
So the chunk ID “FORM
” indicates a data section. It
implies that the chunk contains an ID and some number of nested chunks. In
reading a FORM
, like any other chunk, programs must respect its
ckSize
as a virtual end-of-file for reading its contents, even if
they're truncated.
The FormType (or FORM
type) is a restricted ID that may not
contain lower case letters or punctuation characters. (Cf. Type IDs. Cf. Single
Purpose Files.)
The type-specific information in a FORM
is composed of its
“local chunks”: data properties and other chunks. Each
FORM
type is a local name space for local chunk IDs. So
“CMAP
” local chunks in other FORM
types
may be unrelated to “ILBM.CMAP
.” More than that, each
FORM
type defines semantic scope. If you know what a
FORM
ILBM
is, you'll know what an
ILBM.CMAP
is.
Local chunks defined when the FORM
type is designed (and
therefore known to all clients of this type) are called “standard”
while specialized ones added later are “nonstandard.”
Among the local chunks, property chunks give settings for various details like text font while the other chunks supply the essential information. This distinction is not clear cut. A property setting cancelled by a later setting of the same property has effect only on data chunks in between. E.g. in the sequence:
prop1 = x (propN = value)* prop1 = y
where the propNs are not prop1, the setting prop1 = x has no effect.
The following universal chunk IDs are reserved inside any FORM
:
“LIST
,” “FORM
,”
“PROP
,” “CAT
,”
“
,”
“LIS1
” through “LIS9
,”
“FOR1
” through “FOR9
,” and
“CAT1
” through “CAT9
.” (Cf. Chunks. Cf. Group
LIST
. Cf. Group PROP
.)
For clarity, these universal chunk names may not be FORM
type IDs,
either.
Part 5, below, talks about grouping FORM
s into
LIST
s and CAT
s. They let you group a bunch of
FORM
s but don't impose any particular meaning or constraints on
the grouping. Read on.
FORM
sA FORM
chunk inside a FORM
is a full-fledged data
section. This means you can build a composite object like a multi-frame
animation sequence from available picture FORM
s and sound effect
FORM
s. You can insert additional chunks with information like
frame rate and frame count.
Using composite FORM
s, you leverage on existing programs that
create and edit the component FORM
s. Those editors may even look
into your composite object to copy out its type of component, although it'll be
the rare program that's fancy enough to do that. Such editors are not allowed
to replace their component objects within your composite object. That's because
the IFF standard lets you specify consistency requirements for the
composite FORM
such as maintaining a count or a directory of the
components. Only programs that are written to uphold the rules of your
FORM
type should create or modify such FORM
s.
Therefore, in designing a program that creates composite objects, you are
strongly requested to provide a facility for your users to import and export
the nested FORM
s. Import and export could move the data through a
clipboard or a file.
Here are several existing FORM
types and rules for defining new
ones.
FTXT
An FTXT
data section contains text with character formatting
information like fonts and faces. It has no paragraph or document formatting
information like margins and page headers. FORM
FTXT
is well matched to the text representation in Amiga's Intuition environment.
See the supplemental document “FTXT
” IFF
Formatted Text.
ILBM
“ILBM
” is an InterLeaved BitMap image with color
map; a machine-independent format for raster images. FORM
ILBM
is the standard image file format for the Commodore-Amiga
computer and is useful in other environments, too. See the supplemental
document “ILBM
” IFF Interleaved
Bitmap.
PICS
The data chunk inside a “PICS
” data section has ID
“PICT
” and holds a QuickDraw picture.
Issue: Allow more than one PICT
in a
PICS
?
[See Inside Macintosh chapter
“QuickDraw” for details on PICT
s and how to create and
display them on the Macintosh computer.]
The only standard property for PICS
is
“XY
,” an optional property that indicates the position
of the PICT
relative to “the big picture.” The
contents of an XY
is a QuickDraw Point.
Note: PICT
may be limited to Macintosh use, in which
case there'll be another format for structured graphics in other
environments.
Some other Macintosh resource types could be adopted for use within
IFF
files; perhaps MWRT
, ICN
, ICN#
, and
STR#
.
Issue: Consider the candidates and reserve some more IDs.
Supplemental documents will define additional object types. A supplement
needs to specify the object's purpose, its FORM
type ID, the IDs
and formats of standard local chunks, and rules for generating and interpreting
the data. It's a good idea to supply typedefs and an example source program
that accesses the new object. See “ILBM
”
IFF
Interleaved Bitmap for a good example.
Anyone can pick a new FORM
type ID but should reserve it with
Electronic Arts at their earliest convenience.
Issue: EA contact person? Hand this off to another organization?
While decentralized format definitions and extensions are possible in IFF, our preference is to get design consensus by committee, implement a program to read and write it, perhaps tune the format, and then publish the format with example code. Some organization should remain in charge of answering questions and coordinating extensions to the format.
If it becomes necessary to revise the design of some data section, its
FORM
type ID will serve as a version number (Cf. Type IDs). E.g. a
revised “VDEO
” data section could be called
“VDE1
.” But try to get by with compatible revisions
within the existing FORM
type.
In a new FORM
type, the rules for primitive data types and
word-alignment (Cf. Primitive Data Types) may be
overriden for the contents of its local chunks but not for the chunk structure
itself if your documentation spells out the deviations. If machine-specific
type variants are needed, e.g. to store vast
numbers of integers in reverse bit order, then outline the conversion algorithm
and indicate the variant inside each file, perhaps via different
FORM
types. Needless to say, variations should be minimized.
In designing a FORM
type, encapsulate all the data that other
programs will need to interpret your files. E.g. a
raster graphics image should specify the image size even if your program always
uses 320 × 200 pixels × 3 bitplanes. Receiving programs are then
empowered to append or clip the image rectangle, to add or drop bitplanes, etc.
This enables a lot more compatibility.
Separate the central data (like musical notes) from more specialized information (like note beams) so simpler programs can extract the central parts during read-in. Leave room for expansion so other programs can squeeze in new kinds of information (like lyrics). And remember to keep the property chunks manageably short let's say 2 256 bytes.
When designing a data object, try to strike a good tradeoff between a super-general format and a highly-specialized one. Fit the details to at least one particular need, for example a raster image might as well store pixels in the current machine's scan order. But add the kind of generality that makes it usable with foreseeable hardware and software. E.g. use a whole byte for each red, green, and blue color value even if this year's computer has only 4-bit video DACs. Think ahead and help other programs so long as the overhead is acceptable. E.g. run compress a raster by scan line rather than as a unit so future programs can swap images by scan line to and from secondary storage.
Try to design a general purpose “least common multiple” format that encompasses the needs of many programs without getting too complicated. Let's coalesce our uses around a few such formats widely separated in the vast design space. Two factors make this flexibility and simplicity practical. First, file storage space is getting very plentiful, so compaction is not a priority. Second, nearly any locally-performed data conversion work during file reading and writing will be cheap compared to the I/O time.
It must be ok to copy a LIST
or FORM
or
CAT
intact, e.g. to incorporate it
into a composite FORM
. So any kind of internal references within a
FORM
must be relative references. They could be relative to the
start of the containing FORM
, relative from the referencing chunk,
or a sequence number into a collection.
With composite FORM
s, you leverage on existing programs that
create and edit the components. If you write a program that creates composite
objects, please provide a facility for your users to import and export the
nested FORM
s. The import and export functions may move data
through a separate file or a clipboard.
Finally, don't forget to specify all implied rules in detail.
LIST
s, CAT
s, and Shared PropertiesData often needs to be grouped together like a list of icons. Sometimes a
trick like arranging little images into a big raster works, but generally
they'll need to be structured as a first class group. The objects
“LIST
” and “CAT
” are
IFF-universal mechanisms for this purpose.
Property settings sometimes need to be shared over a list of similar
objects. E.g. a list of icons may share one color
map. LIST
provides a means called “PROP
”
to do this. One purpose of a LIST
is to define the scope of a
PROP
. A “CAT
,” on the other hand, is
simply a concatenation of objects.
Simpler programs may skip LIST
s and PROP
s
altogether and just handle FORM
s and CAT
s. All
“fully-conforming” IFF programs also know about
“CAT
,” “LIST
,” and
“PROP
.” Any program that reads a FORM
inside a LIST
must process shared PROP
s to correctly
interpret that FORM
.
CAT
A CAT
is just an untyped group of data objects.
Structurally, a CAT
is a chunk with chunk ID
“CAT
” containing a “contents type”
ID followed by the nested objects. The ckSize
of each contained
chunk is essentially a relative pointer to the next one.
CAT := "CAT " #{ ContentsType (FORM | LIST | CAT)* } ContentsType := ID -- a hint or an “abstract data type” ID
In reading a CAT
, like any other chunk, programs must respect
it's ckSize
as a virtual end-of-file for reading the nested
objects even if they're malformed or truncated.
The “contents type” following the CAT
's
ckSize
indicates what kind of FORM
s are inside. So a CAT
of
ILBM
s would store “ILBM
” there. It's just
a hint. It may be used to store an “abstract data type.” A
CAT
could just have blank contents ID
(“
”) if it contains more than
one kind of FORM
.
CAT
defines only the format of the group. The group's meaning
is open to interpretation. This is like a list in LISP: the
structure of cells is predefined but the meaning of the contents as, say, an
association list depends on use. If you need a group with an enforced meaning
(an “abstract data type” or Smalltalk “subclass”), some
consistency constraints, or additional data chunks, use a composite
FORM
instead (Cf. Composite
FORM
s).
Since a CAT
just means a concatenation
of objects, CAT
s are rarely nested. Programs should really merge
CAT
s rather than nest them.
LIST
A LIST
defines a group very much like CAT
but it
also gives a scope for PROP
s (see below). And unlike
CAT
s, LIST
s should not be merged without
understanding their contents.
Structurally, a LIST
is a chunk with ckID
“LIST
” containing a “contents type” ID,
optional shared properties, and the nested contents (FORM
s,
LIST
s, and CAT
s), in that order. The
ckSize
of each
contained chunk is a relative pointer to the next one. A LIST
is
not an arbitrary linked list the cells are simply concatenated.
LIST := "LIST" #{ ContentsType PROP* (FORM | LIST | CAT)* } ContentsType := ID
PROP
PROP
chunks may appear in LIST
s (not in
FORM
s or CAT
s). They supply shared properties for the
FORM
s in that LIST
. This ability to elevate some
property settings to shared status for a list of forms is useful for both
indirection and compaction. E.g. a list of images
with the same size and colors can share one “size” property and one
“color map” property. Individual FORM
s can override
the shared settings.
The contents of a PROP
is like a FORM
with no data
chunks:
PROP := "PROP" #{ FormType Property* }
It means, “Here are the shared properties for FORM
type
<<FormType>>
.”
A LIST
may have at most one PROP
of a
FORM
type, and all the PROP
s must appear before any
of the FORM
s or nested LIST
s and CAT
s.
You can have subsequences of FORM
s sharing properties by making
each subsequence a LIST
.
Scoping: Think of property settings as variable bindings in nested blocks of a programming language. Where in C you could write:
TEXT_FONT text_font = Courier; /* program's global default */ File(); { TEXT_FONT text_font = TimesRoman; /* shared setting */ { TEXT_FONT text_font = Helvetica; /* local setting */ Print("Hello "); /* uses font Helvetica */ } { Print("there."); /* uses font TimesRoman */ } }
An IFF file could contain:
LIST { PROP TEXT { FONT {TimesRoman} /* shared setting */ } FORM TEXT { FONT {Helvetica} /* local setting */ CHRS {Hello } /* uses font Helvetica */ } FORM TEXT { CHRS {there.} /* uses font TimesRoman */ } }
The shared property assignments selectively override the reader's global
defaults, but only for FORM
s within the group. A
FORM
's own property assignments selectively override the global
and group-supplied values. So when reading an IFF file, keep
property settings on a stack. They're designed to be small enough to hold in
main memory.
Shared properties are semantically equivalent to copying
those properties into each of the nested FORM
s right after their
FORM
type IDs.
LIST
Optional “properties for LIST
” store the origin of
the list's contents in a PROP
chunk for the fake FORM
type “LIST
.” They are the properties originating
program “OPGM
,” processor family
“OCPU
,” computer type “OCMP
,”
computer serial number or network address “OSN
,”
and user name “UNAM
.” In our imperfect world, these
could be called upon to distinguish between unintended variations of a data
format or to work around bugs in particular originating/receiving program
pairs.
Issue: Specify the format of these properties.
A creation date could also be stored in a property but let's ask that file creating, editing, and transporting programs maintain the correct date in the local file system. Programs that move files between machine types are expected to copy across the creation dates.
An IFF file is just a single chunk of type FORM
,
LIST
, or CAT
. Therefore an IFF file can
be recognized by its first 4 bytes: “FORM
,”
“LIST
,” or “CAT
.” Any
file contents after the chunk's end are to be ignored.
Since an IFF file can be a group of objects, programs that
read/write single objects can communicate to an extent with programs that
read/write groups. You're encouraged to write programs that handle all the
objects in a LIST
or CAT
. A graphics editor, for
example, could process a list of pictures as a multiple page document, one page
at a time.
Programs should enforce IFF's syntactic rules when reading and
writing files. This ensures robust data transfer. The public domain
IFF reader/writer subroutine package does this for you. A utility
program “IFFCheck
” is available that scans an
IFF file and checks it for conformance to IFF's
syntactic rules. IFFCheck
also prints an outline of the chunks in
the file, showing the ckID
and ckSize
of each. This
is quite handy when building IFF programs. Example programs are
also available to show details of reading and writing IFF
files.
A merge program “IFFJoin
” will be available that
logically appends IFF files into a single CAT
group.
It “unwraps” each input file that is a CAT
so that the
combined file isn't nested CAT
s.
If we need to revise the IFF standard, the three anchoring IDs
will be used as “version numbers.” That's why IDs
“FOR1
” through “FOR9
,”
“LIS1
” through “LIS9
,” and
“CAT1
” through “CAT9
” are
reserved.
IFF formats are designed for reasonable performance with floppy
disks. We achieve considerable simplicity in the formats and programs by
relying on the host file system rather than defining universal grouping
structures like directories for LIST
contents. On huge storage
systems, IFF files could be leaf nodes in a file structure like a
B-tree. Let's hope the host file system implements that for us!
Thre are two kinds of IFF files: single purpose files and scrap files. They differ in the interpretation of multiple data objects and in the file's external type.
A single purpose IFF file is for normal “document” and “archive” storage. This is in contrast with “scrap files” (see below) and temporary backing storage (non-interchange files).
The external file type (or filename extension, depending on the host file
system) indicates the file's contents. It's generally the FORM
type of the data contained, hence the restrictions on FORM
type
IDs.
Programmers and users may pick an “intended use” type as the
filename extension to make it easy to filter for the relevant files in a
filename requestor. This is actually a “subclass” or
“subtype” that conveniently separates files of the same
FORM
type that have different uses. Programs cannot demand
conformity to its expected subtypes without overly restricting data interchange
since they cannot know about the subtypes to be used by future programs that
users will want to exchange data with.
Issue: How to generate 3-letter
MS-DOS extensions from 4-letter FORM
type
IDs?
Most single purpose files will be a single FORM
(perhaps a
composite FORM
like a musical score containing nested
FORM
s like musical instrument descriptions). If it's a
LIST
or a CAT
, programs should skip over unrecognized
objects to read the recognized ones or the first recognized one. Then a program
that can read a single purpose file can read something out of a “scrap
file,” too.
A “scrap file” is for maximum interconnectivity in getting data
between programs; the core of a clipboard function. Scrap files may have type
“IFF
” or filename extension
“.IFF
.”
A scrap file is typically a CAT
containing alternate
representations of the same basic information. Include as many alternatives as
you can readily generate. This redundancy improves interconnectivity in
situations where we can't make all programs read and write super-general
formats. [Inside Macintosh chapter “Scrap Manager.”]
E.g. a graphically-annotated musical score might
be supplemented by a stripped down 4-voice melody and by a text (the
lyrics).
The originating program should write the alternate representations in order of “preference”: most preferred (most comprehensive) type to least preferred (least comprehensive) type. A receiving program should either use the first appearing type that it understands or search for its own “preferred” type.
A scrap file should have at most one alternative of any type. (A
LIST
of same type objects is ok as one of the alternatives.) But
don't count on this when reading; ignore extra sections of a type. Then a
program that reads scrap files can read something out of single purpose
files.
Here are some notes on building programs that read IFF files.
If you use the standard IFF reader module
“IFFR.C
,” many of these rules and details will be
automatically handled. (See “Support Software” in Appendix A.) We
recommend that you start from the example program
“ShowILBM.C
.” You should also read up on recursive
descent parsers. [See, for example, Compiler Construction, An Advanced
Course.]
FORM
,”
“LIST
,” or “CAT
,”
it's not an IFF-85 file.FORM
chunk you encounter, you must recognize its
FORM
type ID to understand the contained “local
chunks.” Even if you don't recognize the FORM
type, you
can still scan it for nested FORM
s, LIST
s, and
CAT
s of interest.LIST
, FORM
, PROP
, and
CAT
are generic groups. They always contain a subtype ID
followed by chunks.CAT
of FORM
s in a
file. You may treat the FORM
s like document pages to sequence
through or just use the first FORM
.LIST
s.
“Fully IFF-conforming” readers are those that
handle LIST
s, even if just to read the first FORM
from a file. If you do look into a LIST
, you must process
shared properties (in PROP
chunks) properly. The idea is to
get the correct data or none at all.FORM
s for nested FORM
types that they do
recognize. For example, a musical score may contain nested instrument
descriptions and an animation file may contain still pictures.Note to programmers: Processing PROP
chunks is not simple!
You'll need some background in interpreters with stack frames. If this is
foreign to you, build programs that read/write only one FORM
per
file. For the more intrepid programmers, the next paragraph summarizes how to
process LIST
s and PROP
s. See the general
IFF reader
module “IFFR.C
” and the example program
“ShowILBM.C
” for details.
Allocate a stack frame for every LIST
and FORM
you
encounter and initialize it by copying the stack frame of the parent
LIST
or FORM
. At the top level, you'll need a stack
frame initialized to your program's global defaults. While reading each
LIST
or FORM
, store all encountered properties into
the current stack frame. In the example ShowILBM
, each stack frame
has a place for a bitmap header property ILBM.BMHD
and a color map
property ILBM.CMAP
. When you finally get to the
ILBM
's BODY
chunk, use the property settings
accumulated in the current stack frame.
An alternate implementation would just remember PROP
s
encountered, forgetting each on reaching the end of its scope (the end of the
containing LIST
). When a FORM
XXXX
is
encountered, scan the chunks in all remembered PROP
s
XXXX
, in order, as if they appeared before the chunks actually in
the FORM
XXXX
. This gets trickier if you read
FORM
s inside of FORM
s.
Here are some notes on building programs that write IFF files,
which is much easier than reading them. If you use the standard
IFF writer module “IFFW.C
” (see
“Support Software” in Appendix A), many of these rules and details
will automatically be enforced. See the example program
“Raw2ILBM.C
.”
FORM
, LIST
,
or CAT
chunk.FORM
,”
“LIST
,” or “CAT
,”
followed by a LONG
ckSize
. There should be no
data after the chunk end.LIST
, FORM
, PROP
, and
CAT
are generic. They always contain a subtype ID followed by
chunks. These three IDs are universally reserved, as are
“LIS1
” through “LIS9
,”
“FOR1
” through “FOR9
,”
“CAT1
” through “CAT9
,”
and “
.”FORM
s of types that they
recognize, but don't edit and replace the nested FORM
s and
don't add or remove them. That could make the containing structure
inconsistent. You may write a new file containing items you copied (or
copied and modified) from another IFF file, but don't copy
structural parts you don't understand.PROP
s may only appear inside LIST
s.The following C typedefs describe standard IFF structures.
Declarations to use in practice will vary with the CPU and
compiler. For example, 68000 Lattice C produces efficient comparison code if we
define ID as a “LONG
.” A macro
“MakeID
” builds these IDs at compile time.
/* Standard IFF types, expressed in 68000 Lattice C. */ typedef unsigned char UBYTE; /* 8 bits unsigned */ typedef short WORD; /* 16 bits signed */ typedef unsigned short UWORD; /* 16 bits unsigned */ typedef long LONG; /* 32 bits signed */ typedef char ID[4]; /* 4 chars in ' ' through '~' */ typedef struct { ID ckID; LONG ckSize; /* sizeof(ckData) */ UBYTE ckData[/* ckSize */]; } Chunk; /* ID typedef and builder for 68000 Lattice C. */ typedef LONG ID; /* 4 chars in ' ' through '~' */ #define MakeID(a,b,c,d) ( (a)<<24 | (b)<<16 | (c)<<8 | (d) ) /* Globally reserved IDs. */ #define ID_FORM MakeID('F','O','R','M') #define ID_LIST MakeID('L','I','S','T') #define ID_PROP MakeID('P','R','O','P') #define ID_CAT MakeID('C','A','T',' ') #define ID_FILLER MakeID(' ',' ',' ',' ')
Here's a collection of the syntax definitions in this document.
Chunk := ID #{ UBYTE* } [0] Property := Chunk FORM := "FORM" #{ FormType (LocalChunk | FORM | LIST | CAT)* } FormType := ID LocalChunk := Property | Chunk CAT := "CAT " #{ ContentsType (FORM | LIST | CAT)* } ContentsType := ID -- a hint or an "abstract data type" ID LIST := "LIST" #{ ContentsType PROP* (FORM | LIST | CAT)* } PROP := "PROP" #{ FormType Property* }
In this extended regular expression notation, the token
“#
” represents a ckSize
LONG
count of the following {braced} data bytes. Literal items are shown in
“quotes,” [square bracketed items] are optional, and
“*
” means 0 or more instances. A sometimes-needed pad
byte is shown as “[0]
.”
This is a table of currently defined chunk IDs. We may also borrow some Macintosh IDs and data formats.
FORM
, LIST
, PROP
,
CAT
.FOR1
–FOR9
,
LIS1
–LIS9
,
CAT1
–CAT9
.FORM
type IDsFORM
type
IDs.)FORM
type IDs.)8SVX
8-bit sampled sound voice, ANBM
animated
bitmap, FNTR
raster font, FNTV
vector font,
FTXT
formatted text, GSCR
general-use musical
score, ILBM
interleaved raster bitmap image,
PDEF
Deluxe Print page definition, PICS
Macintosh picture, PLBM
(obsolete), USCR
Uhuru
Sound Software musical score, UVOX
Uhuru Sound Software
Macintosh voice, SMUS
simple musical score, VDEO
Deluxe Video Construction Set video.
,” TEXT
,
PICT
.PROP
LIST
property IDsOPGM
,
OCPU
, OCMP
, OSN
,
UNAM
.These public domain C source programs are available for use in building IFF-compatible programs:
IFF.H
, IFFR.C
, IFFW.C
IFFCheck.C
PACKER.H
, Packer.C
, UnPacker.C
ILBM
files.ILBM.H
, ILBMR.C
, ILBMW.C
FORM
ILBM
. ILBMR
calls IFFR
and
UnPacker
. ILBMW
calls IFFW
and
Packer
.ShowILBM.C
IFFR
and
ILBMR
modules. This Commodore-Amiga program reads and displays a
FORM
ILBM
.Raw2ILBM.C
ILBM
writer program. As a
demonstration, it reads a raw raster image file and writes the image as a
FORM
ILBM
file.ILBM2Raw.C
ILBM
reader program.
Reads a FORM
ILBM
file and writes it into a raw
raster image.REMALLOC.H
, Remalloc.c
INTUALL.H
READPICT.H
, ReadPict.c
ILBM
file, read it into a bitmap and a color mapPUTPICT.H
, PutPict.c
ILBM
file.GIO.H
, Gio.c
giocall.c
ilbmdump.c
ILBM
file, prints out
ascii representation for including in C files.bmprintc.c
Here's a box diagram for an example IFF file, a raster image
FORM
ILBM
. This FORM
contains a bitmap
header property chunk BMHD
, a color map property chunk
CMAP
, and a raster data chunk BODY
. This particular
raster is 320 × 200 pixels × 3 bit planes uncompressed. The
“0
” after the CMAP
chunk represents a
zero pad byte; included since the CMAP
chunk has an odd length.
The text to the right of the diagram shows the outline that would be printed by
the IFFCheck
utility program for this particular file.
|
|
This second diagram shows a LIST
of two FORM
s
ILBM
sharing a common BMHD
property and a common
CMAP
property. Again, the text on the right is an outline a la
IFFCheck
.
|
|
The following people contributed to the design of this IFF standard: