|
| 1 | + |
| 2 | + Definition of the data types used in the file format list |
| 3 | + |
| 4 | +In the file format list, several short mnemonics are used to describe |
| 5 | +the structure of the data stored. Here I describe the structure (and |
| 6 | +possible conversion) between some of these types. As some types have |
| 7 | +different sizes across the platforms, for most types the byte order and |
| 8 | +bit size is given to describe it. |
| 9 | + |
| 10 | +ASCIIZ A sequence of characters(->char), terminated |
| 11 | + with the special character with the value 0. |
| 12 | + Note that ASCIIZ strings as most structures on |
| 13 | + Intel machines should not be larger than |
| 14 | + 64Kb due to the ancient segmentation used. |
| 15 | +BCD Binary coded decimal |
| 16 | + A decimal number is converted into a hexadecimal number |
| 17 | + which has the same digits as the decimal number. |
| 18 | + (10d becomes 10h, 21d becomes 21h) |
| 19 | +Bitmap If a value is declared as bitmapped, that means that |
| 20 | + every bit in this value might have a different meaning. |
| 21 | + The bytes are numbered from right to left, the least |
| 22 | + significant bit has the number 0. After the bit number, |
| 23 | + there are either two statements, separated by a |
| 24 | + slash("/"), which are the two meanings if the bit is |
| 25 | + set / not set, or one single statement, which is the |
| 26 | + meaning of this bit, if it is set. |
| 27 | +Byte 8 bit unsigned number. Smallest unit a record |
| 28 | + consists of. All offsets are in the unit bytes. |
| 29 | + (0-255) |
| 30 | +Char Synonym for byte, most values are between 32 and |
| 31 | + 255. (#0-#255) |
| 32 | +DWord 32 bit signed number. Well, maybe some of the |
| 33 | + formats use a DWord which is a 32 bit unsigned |
| 34 | + number, but as files tend not to be greater than |
| 35 | + 2GB, this won't be my concern. To convert |
| 36 | + between Intel and Motorola format, you have to |
| 37 | + swap bytes #2 & #3 and bytes #1 & #4.(-2Gb-+2Gb) |
| 38 | +Int Integer. Signed 16-bit number. |
| 39 | + (-32767-+32767) |
| 40 | +LString A string which is preceeded by the length. Also |
| 41 | + named "counted" string. Used by most Pascal |
| 42 | + implementations Maximum length is 255 bytes, but it can |
| 43 | + contain any char. |
| 44 | +Nybble The upper or lower four bits of a byte. A nybble |
| 45 | + is a single hex digit and can have values from |
| 46 | + 0 to 15. A signed nybble can have values from |
| 47 | + -8 to 7 with bit 3 being the sign bit. |
| 48 | +Paragraph A multiple of 16. A paragraph was the resolution of the |
| 49 | + Intel chip 64K segments. |
| 50 | +Word 16 bit unsigned number. Note that byte order is |
| 51 | + important, wether you have a Motorola machine or |
| 52 | + an Intel one. Conversion between the two formats |
| 53 | + is simply by swapping byte #1 with byte #2. |
| 54 | + (0-65535) |
| 55 | + |
| 56 | + How to identify different files |
| 57 | + |
| 58 | +While searching for different file formats, I found the following programs |
| 59 | +helpful to gather information about different files. They all are DOS programs |
| 60 | +since I'm not familiar with other platforms (except Windows). Most of them |
| 61 | +should be available on SimTel CDs or via FTP at ftp.cdrom.com, except for my |
| 62 | +program TF, which is still in beta. |
| 63 | + |
| 64 | +LIST.COM v9.0a by Vernon Buerg |
| 65 | + List is a file lister which supports both text and hex-view. |
| 66 | + |
| 67 | +HIEW.EXE v4.18 by Sen |
| 68 | + Another file lister with build-in disassembler. |
| 69 | + |
| 70 | +FILE.EXE v2.0 by Felix von Leitner |
| 71 | + File is a file identification program. |
| 72 | + |
| 73 | +Q.COM v3.01 by SemWare |
| 74 | + QEdit is the editor I'm editing the list with. |
| 75 | + |
| 76 | +TF.EXE v0.38 by me |
| 77 | + The program that started it all. A "simple" file identification |
| 78 | + program - no more, since it has grown too big by now. |
| 79 | + Still unreleased, since it is not really extensible yet. |
| 80 | + |
| 81 | + The file formats list meta list ;) |
| 82 | + |
| 83 | +The file format list uses a certain format to make it readable by programs which |
| 84 | +convert it into the WinHelp format or create program structures out of the |
| 85 | +lists. This format is very similar to the format used by Ralf Brown in his PC |
| 86 | +interrupt list but was extended by me to accomodate for the specific needs of |
| 87 | +this list : |
| 88 | + |
| 89 | +Each topic in the list is delimited by a line of 45 chars, in which the |
| 90 | +first 8 contain the char '-'. After these, there follows one character which |
| 91 | +contains the type of topic. The different topics are described in the list |
| 92 | +itself, the char '!' denotes an information topic - like the list of chars and |
| 93 | +their meaning. After the topic identifier, there follows another '-' char and |
| 94 | +then the topic name, not containing any '-' chars. After the topic name, there |
| 95 | +may be some other descriptors like for Motorola byte ordering, guesswork marking |
| 96 | +or other purposes, see the main list for further information. The line is ended |
| 97 | +with at least one '-' char. Take the following prototype : |
| 98 | + |
| 99 | +--------?-TEST------------------------------ |
| 100 | + |
| 101 | +OFFSET Count TYPE Description |
| 102 | +EXTENSION: |
| 103 | +OCCURENCES: |
| 104 | +PROGRAMS: |
| 105 | +REFERENCE: |
| 106 | +SEE ALSO: |
| 107 | +VALIDATION: |
| 108 | + |
| 109 | +Sub-topics like different records are mostly delimited by three dashes ('-'). |
| 110 | +I suggest folding them up and making them available as a popup window. |
| 111 | + |
| 112 | +Tables have the following format : |
| 113 | +(see table 0000) |
| 114 | +for a table reference and |
| 115 | +(Table 0000) |
| 116 | +for the beginning of a table. The end of a table is undefined (yet). |
| 117 | + |
| 118 | + |
| 119 | + A primer on file formats |
| 120 | + |
| 121 | + Abbrevations |
| 122 | +Throughout the list, many abbrevations are used, some in the reference |
| 123 | +section. Here some are explained : |
| 124 | + |
| 125 | +c't |
| 126 | +The c't is a german computer magazine, which developed the Borland |
| 127 | +Pascal for OS/2 patch. They release source code in files called |
| 128 | +CTmmyy.*. Note that comments in the source code and the language in |
| 129 | +the issues tend to be german :-) |
| 130 | + |
| 131 | +DDJxxyy |
| 132 | +(Doctor Dobb's Journal) |
| 133 | +The DDJ is a monthly publication by M&T/US which is intended for the |
| 134 | +professional programmer. The four digits after the name indicate the |
| 135 | +month/year of the issue referred to. Most of the sourcecode published |
| 136 | +in the issue is available electronically on Compu$erve and other BBSes. |
| 137 | +The files have the name DDJyymm. |
| 138 | + |
| 139 | +PDN |
| 140 | +Programmer's Distribution Net |
| 141 | +A network dedicated to the distribution of source code useful to |
| 142 | +programmers. Often linked with Fido-nodes. |
| 143 | + |
| 144 | +Contributions to this list were made by : |
| 145 | + Ralf Brown (The .EXE file formats from the INTERRUPT List, general layout) |
| 146 | + David Dilworth (david.dilworth@sierraclub.org) |
| 147 | + Daniel Dissett (ddissett@netcom.com) |
| 148 | + Marcus Groeber (marcusg@ph-cip.uni-koeln.de) |
| 149 | + Darrel Hankerson (hankedr@mail.auburn.edu) |
| 150 | + Carl Hauser (chauser.parc@xerox.com) |
| 151 | + Jouni Miettunen (jon@stekt.oulu.fi) |
| 152 | + Jan Nicolai Langfeldt (janl@ifi.uio.no) |
| 153 | + Mark Ouellet (Telix .FON structures) |
| 154 | + Greg Roelofs (roe2@midway.uchicago.edu) |
| 155 | + Robert Rothenburg Walking-Owl (wlkngowl@unix.asb.com) |
| 156 | + Jesus Villena (CONVERT.EXE, a digital sample conversion program) |
| 157 | + Christos Zoulas (christos@deshaw.com) |
| 158 | + JAL / Nostalgia |
| 159 | + David McDuffee, (75530,2626@compuserve.com) |
| 160 | + |
| 161 | +Information gleaned from other programs : |
| 162 | + Formats for Word and WordPerfect (Selke's filetype) |
0 commit comments