Tips for Creating CD-ROMs to Distribute Web-Oriented Material

Introduction

These guidelines are intended to help people successfully produce CD-ROMs as a means to distribute primarily web-oriented material. Some issues arise because of the use of CD-ROMs, others arise because the content is web-oriented, or from a combination of both factors. These recommendations can be applied at any stange of the development of the content, but it will become clear that in many cases, they are most easily applied from the very beginning.

These guidelines are the result of our experience creating the first edition of the "Overview of Computational Science: HPCC Technology and Applications" CD-ROM for the CEWES MSRC PET Program. This CD-ROM contains more than 300 MB of educational material, standards documents, technical reports, and two online books.

Filesystem and Cross-Platform Issues

There are a number of things to watch out for which stem from distinctive characteristics of the CD-ROM filesystems available, or the need for the CD-ROM to work correctly in a multi-platform environment. Though some observations below may seem redundant (i.e. "Avoid Mixed-Case Filenames" and "Use Only Lowercase Filenames"), they are presented separately because they arise from different sources which may not be relevant to all situations. For more information the various CD-ROM filesystems, please see the appendix "A Primer on CD-ROM Filesystems".

Use Only ISO 9660 Level 1 or ISO 9660 Level 2 Filesystems For Portability
These filesystems should work on (nearly) all platforms. ISO 9660 Level 2, which may be refered to simply as "long filenames" by the CD burner software, will not work with MS-DOS or Windows 3.11. For more information, see the appendix "A Primer on CD-ROM Filesystems".
Limit Length and Character Set of File Names
Avoid Mixed-Case Filenames
Use of mixed-case filenames in source material increases the chances that multiple filenames will map on top of each other on the CD-ROM. The easiest way to avoid this problem is use only a single case for filenames from the beginning.
Use Only Lowercase Filenames in URLs
This requirement is driven by the various operating systems' interpretations of CD-ROM file names. Since unix is case sensitive and generally maps CD-ROM filenames to lowercase, this is the way to go for portability. Macintosh and Microsoft systems treat CD-ROM filenames in a case insensitive fashion, so this works across the board. Of course if you are developing the CD-ROM material on a unix system, this means you should also make the filenames themselves entirely lowercase.
Use Only Relative URLs
Each operating system has a different way of referring to the root of the CD-ROM filesystem. Microsoft systems may refer to D:\ or some other drive letter, while Macintosh uses the CD-ROMs volume name (set when the disc is burned) at the head of the path (or Untitled if the disc has no volume name). On unix systems, the system administrator typically sets the mount point, but /CDROM is one common example. Consequently, it is impossible to reproduce the server-oriented approach of referring to the server's document root, as in /icons/pic.gif. Instead, all local URLs must be relative to the current location, such as ../../icons/pic.gif.
Avoid "//" in Path Part of URLs
Two slashes ("//") instead of one ("/") in the resource path part of a URL is a common error in web content -- especially when URLs are constructed or processed mechanically. In general, Windows and unix platforms will treat a double slash as a single slash, however this is not the case with the Macintosh. Errors of this type should be detected by careful use of the grep command, or by running a link check on a Macintosh (however, see general cautions below on the use of link checkers).

Web Browser-Related Issues

Using a web browser to access material from a filesystem is not always the same as accessing it through an HTTP server.

Make index.html Explicit
HTTP servers typically append index.html to URLs which end in a directory rather than a file (i.e. http://host/directory or http://host/directory/). When used on a filesystem, browsers will instead produce a directory listing, thus exposing the user to all of the files in the directory, and losing the desired link. Consequently, all URLs which refer to the CD-ROM should end with an explicit file name, with index.html being the usual default. Note that this problem is hard to detect with a link checker or with basic tools like grep.

Content Issues

Some of these observations apply to network distribution as well as CD-ROM distribution, but others are unique to the fact that you're using a filesystem.

JavaScript
Most link checkers and other tools do not deal with JavaScript, which may contain URLs, so it is easy to miss problems which may crop up in placing JavaScript-containing material on a CD-ROM.
Java
We have not dealt with Java so far, but it seems the the convention of mixed-case naming of classes is likely to be problematic. If using a single case is not a workable solution, another approach to consider is design CD-ROM with archives of the Java code (i.e. zip, tar; one for each target platform) arranged so that they can be installed onto the user's harddisk and run from there. By wrapping up the Java in an archive which is unpacked onto the same kind of filesystem as it was designed to run on, so case of class names, etc. can be preserved where it is necessary.
Graphics File Formats
GIF and JPEG formats seem to be widely implemented. XBM files were also read by both Netscape Communicator 4 and MS Internet Explorer 4.
Use PDF Rather Than PostScript
Relatively few PC users have access to PostScript printers or on-line viewers. PDF files can be viewed online and printed using Adobe's freely available Acrobat Reader software, which is available for Mac, PC, and many unix platforms. PDF files are also generally a good deal smaller than their PostScript counterparts. Existing PostScript files can be "distilled" into PDF format by the Distiller component of the complete Adobe Acrobat package, or generated directly from most application through special "printer" drivers (on Mac and Windows platforms). The complete Acrobat package is not free, but is (at this writing) well under $200 street price, and much lower than that with the academic discount. The complete Acrobat package is available for Mac, Windows, and several unix platforms.
Use RealAudio/RealVideo/RealPlayer for Audio and Video
There are a variety of ways to present audio and video content on the web. Portability of some formats (i.e. WAVaudio players and QuickTime video players appear to be available only for Mac and Windows platforms) is a concern, but a substantial portion of this particular issue is simply the need to decide on one format for all content-contributors on a project to use.

Tools Issues

These are things we found useful, or "gotchas" we discovered.

Most Link Checkers Aren't Designed for Filesystems
Even if a link checker operates on a filesystem rather than actually accessing the HTTP server (as most seem to, for speed), they generally will not check for the above problems. On the other hand, they will catch a lot of basic problems and should definitely be used. Just be aware of their limitations.
Beware of Transferring tar Files
Our development work took place under unix, and the CD burner we used was on a Windows 95 system. We tried to transfer the entire tree by taring it up, FTPing it to the PC (using binary mode), and using WinZip to untar it. Unfortunately, this seems to have corrupted some PDF files and images. Our guess is that WinZip was trying to be helpful by converting end-of-line characters from unix to PC norms, but this is not necessarily the right thing to do for all files.

Appendix: A Primer on CD-ROM Filesystems

ISO 9660 Level 1
In practice, nearly all CD-ROMs produced use the ISO 9660 standard filesystem (also known as High Sierra). CD-ROMs produced to this standard are readable on nearly all modern computers. In order to achieve this portability, the ISO 9660 filesystem is designed for the lowest common denominator system, which (at the time of the standard) was MS-DOS. As a result, the basic ISO 9660 filesystem allows names in "8.3" format (8 character filename, ".", 3 character extension) with the charcters [A-Z0-9_]. Note that only uppercase letters are allowed, and only one period. Also, only 8 levels of directories are allowed. explain how measured. CD-ROMs written in this format are reported to be readable by all "interesting" platforms: MS-DOS, Windows3.11, Windows95, WindowsNT, Macintosh, and unix.

Not surprisingly, given the restrictive nature of the basic ISO 9660 filesystem (ISO 9660 Level 1, though it seems rarely to be referred to in that way), a number of extensions have been developed. Also not surprisingly, the extensions do not seem to offer the same breadth of implementation as ISO 9660 Level 1.

ISO 9660 Level 2
ISO 9660 Level 2 offers longer filenames (32 characters total, but two are taken up by the file version number (see below)), and more freedom in the character set ([A-Z][0-9]_- ???). Only one period may appear in a filename. directory depth??? Empirical evidence indicates ISO 9660 Level 2 filesystems can be read by Windows95, WindowsNT, Macintosh, and unix platforms. On MS-DOS and Windows3.11 systems, we expect that an ISO 9660 Level 2 disc should either appear with 8.3-style names, or be unreadable.
ISO 9660 Level 3
There is an ISO 9660 Level 3, but as far as I can tell, it is primarily about how the CD-R is written ("packetizing"), and the filename and directory depth limitations are the same as Level 2. It is not clear, however, if Level 3 discs can be read on "all" platforms.
Rockridge Extensions
The Rockridge Extensions to ISO 9660 were created to allow unix platforms to capture file permissions and longer POSIX-style filenames. Directory depth is ???. The Rockridge extensions are widely implemented on unix platforms, but not elsewhere.
Joliet Filesystem
The Joliet filesystem is a Microsoft extension which allows ???, and of course is only implemented on recent Windows platforms, though there is a Linux kernel patch that apparently supports the Joliet filesystem. Windows3.11 and MS-DOS systems supposedly see the truncated 8.3 name of the same form as if they were reading a Windows95 filesystem (what is proper name?)
Hierarchical File System
HFS is the Macintosh filesystem, and can be written on CD-ROM as well. It allows 31 character filenames, mixed case, and a larger character set.

Mixing Multiple Standards/Extensions

From my reading HFS can be combined with ISO 9660-based filesystems by writing a "hybrid" disc, with separate partitions (tracks???) for each. Obviously this cuts the space available roughly in half.

It also appears that Rockridge and Joliet extensions can be combined on the same disc, in this case without a price in storage capacity. From my reading, this fill not be useful on WindowsNT 3.51, nor Windows 3.11 or MS-DOS (of course). It is not clear if Macintosh widely support either of these two extensions.

Operating System Treatment of CD-ROM Filesystems

One must also consider the fact that different operating systems may treat CD-ROM filesystems differently. All current Microsoft operating systems (MS-DOS, Windows3.11, Windows95, WindowsNT) are case insensitive. In other words the case of the filename or any reference to it (i.e. in an operating system command or HTML document) is irrelevant. On the Macintosh, filenames are generally case sensitive, but it appears that CD-ROMs (ISO 9660, at least), are treated in a case insensitive fashion. By contrast, unix filesystems are case sensitive, and most unix implementations of the ISO 9660 filesystem either map all filenames to lowercase, or offer some control over the mapping at the system level (root access typically required to change).

File Versioning

The ISO 9660 filesystem provides for (really requires) that every filename also have a version number. These are typically represented as ";1" (or another integer) appended to the filename (in the style of VMS). Most ISO 9660 implementations on interesting platforms appear to more or less ignore the file version information. Microsoft systems don't display it at all; Macintoshes display it in directory listings, but otherwise seem to ignore it; unix systems either ignore it or offer some control over the behavior. It appears that from the user (or CD-ROM producer) point of view, ISO 9660 file versions are irrelevant and can be safely ignored. Note, however, that in ISO 9660 Level 2, two of the 32 characters allowed for the filename are reserved for the version identifier, but the user still need not worry about the version number.