Hermit Crab Variable Length Record Files
Last updated 1998 July 24 by Roedy Green ©
1991-1998 Canadian Mind Products.
Here are some thoughts on how an efficient variable length record manager
might work. This is a virtual product. There is no such utility. I first
proposed this back in 1991 for DOS. I have dusted it off a little for
Java. It would be useful for that intermediate territory where flat files
are not quite smart enough and an SQL database is overkill.
Purpose
The Hermit Crab variable length record manager is a way to efficiently
handle variable length records. The strange name comes from the algorithm
used. As records (crabs) grow, they cast off their shells and find
larger ones to live in. Smaller records growing can then move into one of
the cast-off shells.
The Interface
In DOS, Hermit would be a TSR you load before your application. In Java it
would be an ordinary class. It can handle files with records up to 65000
bytes long.
You need to design your application to use the Hermit API. Though you can
Copy, Move and Delete Hermit files with ordinary operating system
commands, you cannot edit them with ordinary word processors.
Hermit allows you to number your variable length records 0, 1, 2, ..
4,294,967,295. You can leave some numbers out, but you should attempt to
keep the highest record number used as low as possible. Read the later
Under the Hood to find out why.
Interface Details
File Create
- You must specify the file name and the maximum possible record
length.
- Optionally you may specify the maximum number of records that the file
will hold, and the average size in bytes of each record. These numbers are
used to preallocate space. They are not absolute limits on how big the
file can grow.
- You may also specify the number of bytes of breathing room to use for
each record. This is padding space at the end of the record to allow it to
grow without having to move to a new shell.
File Open
- To open an existing file, simply provide the name.
Random Write
- You provide a reference to the record (array of bytes) you want to write and
its length. You also must provide the record number.
- Whether the record exists already or not is immaterial. It may grow or
shrink from its previous size.
- Writing a zero-length record DELETEs it. Its unused space will be
reclaimed.
Random Read
- You provide provide the record number you
want to read.
- Hermit allocates and returns a buffer.
Read Next
- Read next is similar to adding one to the record number, except
that it automatically jumps to the next record actually used.
- To read the whole file, start with a READ RANDOM on record 0, (or the
low-water mark derived from stats) then do READ-NEXTs.
- Note that there is so such thing as WRITE NEXT. It is safer for the
programmer to maintain the counter.
read Prev
- This is just like READ NEXT, except it works backwards. It is slower
than READ NEXT under Win95 because of the of the dippy way Microsoft handles the FAT chains.
Get Stats
- Hermit returns the fully qualified name of the file, the count of how
many records exist in the file, the breathing room, the low water mark
record number used, the low water mark smallest record, the high water
mark record number used, the high water mark largest record, the total
number of bytes in all data records (not counting overhead), and average
record size.
- Note that High Water marks include records that may now be deleted. To
get accurate stats, you must do a REORG.
Reorg
- REORG reorganizes a file for maximum efficiency, collecting small
discarded "shells" -- free space from deleted records, into one big pool
of free space.
- optionally you may specify the maximum number of records that the
reorganized file will hold, and the average size in bytes of each record.
These numbers are used to preallocate space. They are not absolute limits
on how big the file can grow. Typically you would do a STATS and then add
some growing room.
- You may also specify a new breathing room parameter to use for each
record.
- REORG is a relatively slow process since each record must be copied to
a new file. When the copy is done, the old version is deleted, and the new
one renamed to the old. You need sufficient free disk space for two copies
of the file. Hermit will not even attempt the REORG unless there is
sufficient room.
Under The Hood
- The key to the system is a large array of 32 byte pointers. You index
this array by record number, and get the offset on disk where
that record currently lives. The low 4 bits are always 0 since records are
always aligned on even 16 byte paragraph boundaries within the file. This
gives every record an average of 8 bytes breathing room to grow and shrink
without having to find a new shell.
- The array is the only thing Hermit keeps in RAM or RAM cache. If it
has to grow, it can build a new larger one on disk as needed. This is time
consuming, so you want to get good estimates of file size at CREATE or
REORG time.
- The precise length of each record is also stored in the array of
pointers to the records. It takes 16 bits. We keep this information handy
in the array rather than with the record so that we can set up the physical
read directly into the user region in one single efficient i/o.
- The breathing room is not really what it appears. We arrange that
shells come in standard sizes. Let us say the breathing room were 8 bytes.
If we aligned the records always on a paragraph boundary -- every 16
bytes, then on average every record would have 8 bytes breathing room.
Using this trick, we can save the overhead of having to track both the
shell size and the record size. Whenever we see a record (crab-size) of
say 100 bytes, we know the shell size must be one size bigger 7*16 = 112
bytes.
- When records are created, we just tack them on the end and set the
pointer to them.
- When records are deleted, we add them to the free space chains. Since
shells come in only a limited number of sizes, we can build a separate
chain through the empty shells, one for each size. By building a LIFO
queue of shells, all you need to do is write a record into the deleted
slot with the index of the previous cast off shell of that size on it,
-- a single efficient disk i/o.
- To recycle the discarded shells, when we go to write a new record or
expand a new one, we first check if any free shells exist already of the
desired size, before creating one at the end of the file.
What is Wrong With This Scheme
- If a record shrinks, it must move to a smaller shell. Otherwise
the relationship between crab size and shell size would be broken.
- If no shells are available at the correct size, but they are
available a little too big, they cannot be used. Again the relationship
between crab and shell must be preserved.
- Eventually if say all records gradually grow, you will have all kinds
of cast off small shells that nobody wants. The only way to reclaim this
space is to do a Reorg. In theory it might be possible to do a mini Reorg
to sort the array by offset, then collapse pairs small shells into one big
shell.
Indexing
You could build external hashing or Btree indexes to this file. They would
convert a key to a record number. The design of these indexes would be
simplified because they would not need to be updated when a record was
moved, only when a key value changed.
You might use a dual index -- the base and the changes. On every reorg
you merge the changes back into the base. The base index can be a
read-only structure which makes it simpler to generate and more efficient
to scan. The changes index is small, so you can get away with less
sophisticated indexing techniques, perhaps even something as crude as a
linear table or hashtable kept in RAM.
With 100 MB RAMs becoming commonplace, you might simply implement all
your indexes as RAM-resident hashtables that cough up a record number to
hand to Hermit. A purely RAM-resident BTree is a much simpler animal to
code that a disk based one. You can use serialisation to load/store your
RAM-based hashtables or Btrees.
Compression
Records could be compressed and decompressed on write/read. Even with
fixed length data, compression creates varying length compressed results.