Tips for Creating CD-ROMs to Distribute Web-Oriented Material
Introduction
These guidelines are intended to help people successfully produce CD-ROMs
as a means to distribute primarily web-oriented material. Some issues arise
because of the use of CD-ROMs, others arise because the content is web-oriented,
or from a combination of both factors. These recommendations can be applied
at any stange of the development of the content, but it will become clear
that in many cases, they are most easily applied from the very beginning.
These guidelines are the result of our experience creating the first
edition of the "Overview of Computational Science: HPCC Technology
and Applications" CD-ROM for the CEWES MSRC PET Program. This CD-ROM
contains more than 300 MB of educational material, standards documents,
technical reports, and two online books.
Filesystem and Cross-Platform Issues
There are a number of things to watch out for which stem from distinctive
characteristics of the CD-ROM filesystems available, or the need for the
CD-ROM to work correctly in a multi-platform environment. Though some observations
below may seem redundant (i.e. "Avoid Mixed-Case Filenames" and
"Use Only Lowercase Filenames"), they are presented separately
because they arise from different sources which may not be relevant to
all situations. For more information the various CD-ROM filesystems, please
see the appendix "A Primer on CD-ROM Filesystems".
- Use Only ISO 9660 Level 1 or ISO 9660 Level 2 Filesystems For Portability
- These filesystems should work on (nearly) all platforms. ISO 9660 Level
2, which may be refered to simply as "long filenames" by the
CD burner software, will not work with MS-DOS or Windows 3.11. For
more information, see the appendix "A Primer on CD-ROM Filesystems".
- Limit Length and Character Set of File Names
- For ISO 9660 Level 1 filesystems, limit filenames to 8.3 format,
from the characterset [A-Z0-9_].
- For ISO 9660 Level 2 filesystems, use no more than 30 character
filenames from the character set []. When long filenames are selected in
Corel (now Adaptec) CD Creator, it warns that filenames must still be unique
at the 8.3 level or discs/files might not be readable on some Windows 95
systems. We have checked this on a number of Windows 95 machines of varying
ages and found that in practice, it does not appear to be a problem.
- In general use only one period (".") in a filename.
The CD burner software we have worked with seems to truncate longer filenames,
and it is possible for long filenames to map on top of each other.
- Avoid Mixed-Case Filenames
- Use of mixed-case filenames in source material increases the chances
that multiple filenames will map on top of each other on the CD-ROM. The
easiest way to avoid this problem is use only a single case for filenames
from the beginning.
- Use Only Lowercase Filenames in URLs
- This requirement is driven by the various operating systems' interpretations
of CD-ROM file names. Since unix is case sensitive and generally maps CD-ROM filenames
to lowercase, this is the way to go for portability. Macintosh and Microsoft
systems treat CD-ROM filenames in a case insensitive fashion, so this works
across the board. Of course if you are developing the CD-ROM material on
a unix system, this means you should also make the filenames themselves
entirely lowercase.
- Use Only Relative URLs
- Each operating system has a different way of referring to the root
of the CD-ROM filesystem. Microsoft systems may refer to D:\ or
some other drive letter, while Macintosh uses the CD-ROMs volume name (set
when the disc is burned) at the head of the path (or Untitled if
the disc has no volume name). On unix systems, the system administrator
typically sets the mount point, but /CDROM is one common example.
Consequently, it is impossible to reproduce the server-oriented approach
of referring to the server's document root, as in /icons/pic.gif.
Instead, all local URLs must be relative to the current location, such
as ../../icons/pic.gif.
- Avoid "//" in Path Part of URLs
- Two slashes ("//") instead of one ("/") in the
resource path part of a URL is a common error in web content -- especially
when URLs are constructed or processed mechanically. In general, Windows
and unix platforms will treat a double slash as a single slash, however
this is not the case with the Macintosh. Errors of this type should be
detected by careful use of the grep command, or by running a link check
on a Macintosh (however, see general cautions below on the use of
link checkers).
Web Browser-Related Issues
Using a web browser to access material from a filesystem is not always
the same as accessing it through an HTTP server.
- Make index.html Explicit
- HTTP servers typically append index.html to URLs which end in
a directory rather than a file (i.e. http://host/directory or http://host/directory/).
When used on a filesystem, browsers will instead produce a directory listing,
thus exposing the user to all of the files in the directory, and losing
the desired link. Consequently, all URLs which refer to the CD-ROM should
end with an explicit file name, with index.html being the usual
default. Note that this problem is hard to detect with a link checker or
with basic tools like grep.
Content Issues
Some of these observations apply to network distribution as well as
CD-ROM distribution, but others are unique to the fact that you're using
a filesystem.
- JavaScript
- Most link checkers and other tools do not deal with JavaScript, which
may contain URLs, so it is easy to miss problems which may crop up in placing
JavaScript-containing material on a CD-ROM.
- Java
- We have not dealt with Java so far, but it seems the the convention
of mixed-case naming of classes is likely to be problematic. If using a
single case is not a workable solution, another approach to consider is
design CD-ROM with archives of the Java code (i.e. zip, tar; one for each
target platform) arranged so that they can be installed onto the user's
harddisk and run from there. By wrapping up the Java in an archive which
is unpacked onto the same kind of filesystem as it was designed to run
on, so case of class names, etc. can be preserved where it is necessary.
- Graphics File Formats
- GIF and JPEG formats seem to be widely implemented. XBM files were
also read by both Netscape Communicator 4 and MS Internet Explorer 4.
- Use PDF Rather Than PostScript
- Relatively few PC users have access to PostScript printers or on-line
viewers. PDF files can be viewed online and printed using Adobe's freely
available Acrobat Reader software, which is available for Mac, PC, and
many unix platforms. PDF files are also generally a good deal smaller than
their PostScript counterparts. Existing PostScript files can be "distilled"
into PDF format by the Distiller component of the complete Adobe Acrobat
package, or generated directly from most application through special "printer" drivers
(on Mac and Windows platforms). The complete Acrobat package is not
free, but is (at this writing) well under $200 street price, and much lower
than that with the academic discount. The complete Acrobat package is available
for Mac, Windows, and several unix platforms.
- Use RealAudio/RealVideo/RealPlayer for Audio and Video
- There are a variety of ways to present audio and video content on the
web. Portability of some formats (i.e. WAVaudio players and QuickTime video
players appear to be available only for Mac and Windows platforms) is
a concern, but a substantial portion of this particular issue is simply
the need to decide on one format for all content-contributors on a project
to use.
Tools Issues
These are things we found useful, or "gotchas" we discovered.
- Most Link Checkers Aren't Designed for Filesystems
- Even if a link checker operates on a filesystem rather than actually
accessing the HTTP server (as most seem to, for speed), they generally
will not check for the above problems. On the other hand, they will catch
a lot of basic problems and should definitely be used. Just be aware of
their limitations.
- Beware of Transferring tar Files
- Our development work took place under unix, and the CD burner we used
was on a Windows 95 system. We tried to transfer the entire tree by taring
it up, FTPing it to the PC (using binary mode), and using WinZip to untar
it. Unfortunately, this seems to have corrupted some PDF files and images.
Our guess is that WinZip was trying to be helpful by converting end-of-line
characters from unix to PC norms, but this is not necessarily the right
thing to do for all files.
Appendix: A Primer on CD-ROM Filesystems
- ISO 9660 Level 1
- In practice, nearly all CD-ROMs produced use the ISO 9660 standard
filesystem (also known as High Sierra). CD-ROMs produced to this standard
are readable on nearly all modern computers. In order to achieve this portability,
the ISO 9660 filesystem is designed for the lowest common denominator system,
which (at the time of the standard) was MS-DOS. As a result, the basic
ISO 9660 filesystem allows names in "8.3" format (8 character
filename, ".", 3 character extension) with the charcters [A-Z0-9_].
Note that only uppercase letters are allowed, and only one period.
Also, only 8 levels of directories are allowed. explain how measured.
CD-ROMs written in this format are reported to be readable by all
"interesting" platforms: MS-DOS, Windows3.11, Windows95,
WindowsNT, Macintosh, and unix.
Not surprisingly, given the restrictive nature of the basic ISO 9660
filesystem (ISO 9660 Level 1, though it seems rarely to be referred to
in that way), a number of extensions have been developed. Also not surprisingly,
the extensions do not seem to offer the same breadth of implementation
as ISO 9660 Level 1.
- ISO 9660 Level 2
- ISO 9660 Level 2 offers longer filenames (32 characters total,
but two are taken up by the file version number (see below)), and more
freedom in the character set ([A-Z][0-9]_- ???). Only one
period may appear in a filename. directory depth??? Empirical
evidence indicates ISO 9660 Level 2 filesystems can be read by Windows95,
WindowsNT, Macintosh, and unix platforms. On MS-DOS and Windows3.11 systems,
we expect that an ISO 9660 Level 2 disc should either appear with 8.3-style
names, or be unreadable.
- ISO 9660 Level 3
- There is an ISO 9660 Level 3, but as far as I can tell, it is
primarily about how the CD-R is written ("packetizing"), and
the filename and directory depth limitations are the same as Level 2. It
is not clear, however, if Level 3 discs can be read on "all"
platforms.
- Rockridge Extensions
- The Rockridge Extensions to ISO 9660 were created to allow unix platforms
to capture file permissions and longer POSIX-style filenames. Directory
depth is ???. The Rockridge extensions are widely implemented
on unix platforms, but not elsewhere.
- Joliet Filesystem
- The Joliet filesystem is a Microsoft extension which allows
???, and of course is only implemented on recent Windows
platforms, though there is a Linux kernel patch that apparently supports
the Joliet filesystem. Windows3.11 and MS-DOS systems supposedly see the
truncated 8.3 name of the same form as if they were reading a Windows95
filesystem (what is proper name?)
- Hierarchical File System
- HFS is the Macintosh filesystem, and can be written on CD-ROM as
well. It allows 31 character filenames, mixed case, and a larger character
set.
Mixing Multiple Standards/Extensions
From my reading HFS can be combined with ISO 9660-based
filesystems by writing a "hybrid" disc, with separate partitions
(tracks???) for each. Obviously this cuts the space available
roughly in half.
It also appears that Rockridge and Joliet extensions can
be combined on the same disc, in this case without a price in storage capacity.
From my reading, this fill not be useful on WindowsNT 3.51, nor
Windows 3.11 or MS-DOS (of course). It is not clear if Macintosh widely
support either of these two extensions.
Operating System Treatment of CD-ROM Filesystems
One must also consider the fact that different operating systems may
treat CD-ROM filesystems differently. All current Microsoft operating
systems (MS-DOS, Windows3.11, Windows95, WindowsNT) are case insensitive.
In other words the case of the filename or any reference to it (i.e. in
an operating system command or HTML document) is irrelevant. On the Macintosh,
filenames are generally case sensitive, but it appears that CD-ROMs (ISO
9660, at least), are treated in a case insensitive fashion. By contrast,
unix filesystems are case sensitive, and most unix implementations
of the ISO 9660 filesystem either map all filenames to lowercase,
or offer some control over the mapping at the system level (root access
typically required to change).
File Versioning
The ISO 9660 filesystem provides for (really requires) that every
filename also have a version number. These are typically represented as
";1" (or another integer) appended to the filename (in the style
of VMS). Most ISO 9660 implementations on interesting platforms appear
to more or less ignore the file version information. Microsoft systems
don't display it at all; Macintoshes display it in directory listings,
but otherwise seem to ignore it; unix systems either ignore it or offer
some control over the behavior. It appears that from the user (or CD-ROM
producer) point of view, ISO 9660 file versions are irrelevant and can
be safely ignored. Note, however, that in ISO 9660 Level 2, two of the
32 characters allowed for the filename are reserved for the version identifier,
but the user still need not worry about the version number.