Skip to content

Commit ba7354a

Browse files
committed
Information about old archive formats
[ci skip]
1 parent d77567e commit ba7354a

15 files changed

Lines changed: 9759 additions & 0 deletions

Documentation/future/ARCHIVES.TXT

Lines changed: 1346 additions & 0 deletions
Large diffs are not rendered by default.

Documentation/future/FILEFMTS.DOC

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
2+
Definition of the data types used in the file format list
3+
4+
In the file format list, several short mnemonics are used to describe
5+
the structure of the data stored. Here I describe the structure (and
6+
possible conversion) between some of these types. As some types have
7+
different sizes across the platforms, for most types the byte order and
8+
bit size is given to describe it.
9+
10+
ASCIIZ A sequence of characters(->char), terminated
11+
with the special character with the value 0.
12+
Note that ASCIIZ strings as most structures on
13+
Intel machines should not be larger than
14+
64Kb due to the ancient segmentation used.
15+
BCD Binary coded decimal
16+
A decimal number is converted into a hexadecimal number
17+
which has the same digits as the decimal number.
18+
(10d becomes 10h, 21d becomes 21h)
19+
Bitmap If a value is declared as bitmapped, that means that
20+
every bit in this value might have a different meaning.
21+
The bytes are numbered from right to left, the least
22+
significant bit has the number 0. After the bit number,
23+
there are either two statements, separated by a
24+
slash("/"), which are the two meanings if the bit is
25+
set / not set, or one single statement, which is the
26+
meaning of this bit, if it is set.
27+
Byte 8 bit unsigned number. Smallest unit a record
28+
consists of. All offsets are in the unit bytes.
29+
(0-255)
30+
Char Synonym for byte, most values are between 32 and
31+
255. (#0-#255)
32+
DWord 32 bit signed number. Well, maybe some of the
33+
formats use a DWord which is a 32 bit unsigned
34+
number, but as files tend not to be greater than
35+
2GB, this won't be my concern. To convert
36+
between Intel and Motorola format, you have to
37+
swap bytes #2 & #3 and bytes #1 & #4.(-2Gb-+2Gb)
38+
Int Integer. Signed 16-bit number.
39+
(-32767-+32767)
40+
LString A string which is preceeded by the length. Also
41+
named "counted" string. Used by most Pascal
42+
implementations Maximum length is 255 bytes, but it can
43+
contain any char.
44+
Nybble The upper or lower four bits of a byte. A nybble
45+
is a single hex digit and can have values from
46+
0 to 15. A signed nybble can have values from
47+
-8 to 7 with bit 3 being the sign bit.
48+
Paragraph A multiple of 16. A paragraph was the resolution of the
49+
Intel chip 64K segments.
50+
Word 16 bit unsigned number. Note that byte order is
51+
important, wether you have a Motorola machine or
52+
an Intel one. Conversion between the two formats
53+
is simply by swapping byte #1 with byte #2.
54+
(0-65535)
55+
56+
How to identify different files
57+
58+
While searching for different file formats, I found the following programs
59+
helpful to gather information about different files. They all are DOS programs
60+
since I'm not familiar with other platforms (except Windows). Most of them
61+
should be available on SimTel CDs or via FTP at ftp.cdrom.com, except for my
62+
program TF, which is still in beta.
63+
64+
LIST.COM v9.0a by Vernon Buerg
65+
List is a file lister which supports both text and hex-view.
66+
67+
HIEW.EXE v4.18 by Sen
68+
Another file lister with build-in disassembler.
69+
70+
FILE.EXE v2.0 by Felix von Leitner
71+
File is a file identification program.
72+
73+
Q.COM v3.01 by SemWare
74+
QEdit is the editor I'm editing the list with.
75+
76+
TF.EXE v0.38 by me
77+
The program that started it all. A "simple" file identification
78+
program - no more, since it has grown too big by now.
79+
Still unreleased, since it is not really extensible yet.
80+
81+
The file formats list meta list ;)
82+
83+
The file format list uses a certain format to make it readable by programs which
84+
convert it into the WinHelp format or create program structures out of the
85+
lists. This format is very similar to the format used by Ralf Brown in his PC
86+
interrupt list but was extended by me to accomodate for the specific needs of
87+
this list :
88+
89+
Each topic in the list is delimited by a line of 45 chars, in which the
90+
first 8 contain the char '-'. After these, there follows one character which
91+
contains the type of topic. The different topics are described in the list
92+
itself, the char '!' denotes an information topic - like the list of chars and
93+
their meaning. After the topic identifier, there follows another '-' char and
94+
then the topic name, not containing any '-' chars. After the topic name, there
95+
may be some other descriptors like for Motorola byte ordering, guesswork marking
96+
or other purposes, see the main list for further information. The line is ended
97+
with at least one '-' char. Take the following prototype :
98+
99+
--------?-TEST------------------------------
100+
101+
OFFSET Count TYPE Description
102+
EXTENSION:
103+
OCCURENCES:
104+
PROGRAMS:
105+
REFERENCE:
106+
SEE ALSO:
107+
VALIDATION:
108+
109+
Sub-topics like different records are mostly delimited by three dashes ('-').
110+
I suggest folding them up and making them available as a popup window.
111+
112+
Tables have the following format :
113+
(see table 0000)
114+
for a table reference and
115+
(Table 0000)
116+
for the beginning of a table. The end of a table is undefined (yet).
117+
118+
119+
A primer on file formats
120+
121+
Abbrevations
122+
Throughout the list, many abbrevations are used, some in the reference
123+
section. Here some are explained :
124+
125+
c't
126+
The c't is a german computer magazine, which developed the Borland
127+
Pascal for OS/2 patch. They release source code in files called
128+
CTmmyy.*. Note that comments in the source code and the language in
129+
the issues tend to be german :-)
130+
131+
DDJxxyy
132+
(Doctor Dobb's Journal)
133+
The DDJ is a monthly publication by M&T/US which is intended for the
134+
professional programmer. The four digits after the name indicate the
135+
month/year of the issue referred to. Most of the sourcecode published
136+
in the issue is available electronically on Compu$erve and other BBSes.
137+
The files have the name DDJyymm.
138+
139+
PDN
140+
Programmer's Distribution Net
141+
A network dedicated to the distribution of source code useful to
142+
programmers. Often linked with Fido-nodes.
143+
144+
Contributions to this list were made by :
145+
Ralf Brown (The .EXE file formats from the INTERRUPT List, general layout)
146+
David Dilworth (david.dilworth@sierraclub.org)
147+
Daniel Dissett (ddissett@netcom.com)
148+
Marcus Groeber (marcusg@ph-cip.uni-koeln.de)
149+
Darrel Hankerson (hankedr@mail.auburn.edu)
150+
Carl Hauser (chauser.parc@xerox.com)
151+
Jouni Miettunen (jon@stekt.oulu.fi)
152+
Jan Nicolai Langfeldt (janl@ifi.uio.no)
153+
Mark Ouellet (Telix .FON structures)
154+
Greg Roelofs (roe2@midway.uchicago.edu)
155+
Robert Rothenburg Walking-Owl (wlkngowl@unix.asb.com)
156+
Jesus Villena (CONVERT.EXE, a digital sample conversion program)
157+
Christos Zoulas (christos@deshaw.com)
158+
JAL / Nostalgia
159+
David McDuffee, (75530,2626@compuserve.com)
160+
161+
Information gleaned from other programs :
162+
Formats for Word and WordPerfect (Selke's filetype)

0 commit comments

Comments
 (0)