Hermit Crab Variable Length Record Files

Last updated 1998 July 24 by Roedy Green © 1991-1998 Canadian Mind Products.

Here are some thoughts on how an efficient variable length record manager might work. This is a virtual product. There is no such utility. I first proposed this back in 1991 for DOS. I have dusted it off a little for Java. It would be useful for that intermediate territory where flat files are not quite smart enough and an SQL database is overkill.

Purpose

The Hermit Crab variable length record manager is a way to efficiently handle variable length records. The strange name comes from the algorithm used. As records (crabs) grow, they cast off their shells and find larger ones to live in. Smaller records growing can then move into one of the cast-off shells.

The Interface

In DOS, Hermit would be a TSR you load before your application. In Java it would be an ordinary class. It can handle files with records up to 65000 bytes long.

You need to design your application to use the Hermit API. Though you can Copy, Move and Delete Hermit files with ordinary operating system commands, you cannot edit them with ordinary word processors.

Hermit allows you to number your variable length records 0, 1, 2, .. 4,294,967,295. You can leave some numbers out, but you should attempt to keep the highest record number used as low as possible. Read the later Under the Hood to find out why.

Interface Details

File Create

File Open

Random Write

Random Read

Read Next

read Prev

Get Stats

Reorg

Under The Hood

What is Wrong With This Scheme

Indexing

You could build external hashing or Btree indexes to this file. They would convert a key to a record number. The design of these indexes would be simplified because they would not need to be updated when a record was moved, only when a key value changed.

You might use a dual index -- the base and the changes. On every reorg you merge the changes back into the base. The base index can be a read-only structure which makes it simpler to generate and more efficient to scan. The changes index is small, so you can get away with less sophisticated indexing techniques, perhaps even something as crude as a linear table or hashtable kept in RAM.

With 100 MB RAMs becoming commonplace, you might simply implement all your indexes as RAM-resident hashtables that cough up a record number to hand to Hermit. A purely RAM-resident BTree is a much simpler animal to code that a disk based one. You can use serialisation to load/store your RAM-based hashtables or Btrees.

Compression

Records could be compressed and decompressed on write/read. Even with fixed length data, compression creates varying length compressed results.


HTML Checked!
Canadian Mind Products You can get an updated copy of this page from http://mindprod.com/hermit.html