Big/Little Endian in Java

Last updated 1998 June 16 by Roedy Green © 1996-1998 Canadian Mind Products.

Solving The Endian Problem: a Summary

Everything in Java binary format files is stored big-endian, MSB (Most Significant Byte) first. This is sometimes called network order.

There are no separators between fields. The files are in binary, not readable ASCII.

What do you do if you want to read data not in this standard format, usually prepared by some non-Java program?

You have four options:

  1. Rewrite the export program that is providing the imported file. It might export directly in either big-endian binary DataOutputStream or character DataOutput format.
  2. Write a separate translator program that reads and rearranges bytes. You could write this in any language.
  3. Read the data as bytes, and rearrange them on the fly.
  4. Easiest of all, use my LEDataInputStream, LEDataOutputStream and LERandomAccessFile analogs of DataInputStream, DataOutputStream and RandomAccessFile that work with little-endian binary data. You can download the code and source free.

Reading Little-Endian Binary Files

The most common problem is dealing with files stored in little-endian format.

I have implement routines parallel to those in java.io.DataInputStream which read raw binary, in my LEDataInputStream and LEDataOutputStream classes. Don't confuse this with the io.DataInput human-readable character-based file-interchange format.

If you wanted to do it yourself, without the overhead of the full LEDataInputStream and LEDataOutputStream classes, here is the basic technique:

Presuming your integers are in 2's complement little-endian format, shorts are pretty easy to handle:


short readShortLittleEndian() {
    // 2 bytes
    int low = readByte() & 0xff;
    int high = readByte(); & 0xff;
    return (short)(high << 8 | low);
    }
Or if you want to get clever and puzzle your readers, you can avoid one mask since the high bits will later be shaved off by conversion back to short.
short readShortLittleEndian() {
    // 2 bytes
    int low = readByte() & 0xff;
    int high = readByte(); // avoid masking here
    return (short)(high << 8 | low);
    }


Longs are a little more complicated:

short readLongLittleEndian() {
    // 8 bytes
    long accum = 0;
    for (int shiftBy = 0; shiftBy < 64; shiftBy+=8)
       accum |= (readByte() & 0xff) << shiftBy;
    return accum;
    }


In a similar way we handle char and int.

short readCharLittleEndian() {
    // 2 bytes
    int low = readByte() & 0xff;
    int high = readByte() & 0xff;
    return (char)(high << 8 | low);
   }


short readIntLittleEndian() {
    // 4 bytes
    int accum = 0;
    for (int shiftBy = 0; shiftBy < 32; shiftBy+=8)
       accum |= (readByte() & 0xff) << shiftBy;
    return accum;
    }


Floating point is a little trickier. Presuming your data is in IEEE little-endian format, you need something like this:

double readDoubleLittleEndian() {
    long accum = 0;
    for (int shiftBy = 0; shiftBy < 64; shiftBy+=8)
       accum |= ((long)(readByte() & 0xff)) << shiftBy;
    return Double.longBitsToDouble(accum);
    }


float readFloatLittleEndian() {
    int accum = 0;
    for (int shiftBy = 0; shiftBy < 32; shiftBy+=8)
       accum |= (readByte() & 0xff) << shiftBy;
    return Float.intBitsToFloat(accum);
    }


You don't need a readByteLittleEndian since the code would be identical to readByte, though you might create one just for consistency:

byte readByteLittleEndian() {
    // 1 byte
    return readByte();
    }


History

In Gulliver's travels the Lilliputians liked to break their eggs on the small end and the Brobdignagians on the big end. They fought wars over this. There is a computer analogy. Should numbers be stored most or least significant byte first? This is sometimes referred to as byte sex.

Those in the big-endian camp (most significant byte stored first) include the Java VM virtual computer, the Java binary file format, the IBM 360 and follow-on mainframes and the Motorola 68K and most mainframes.

Brobdignagians (big-endians) assert this is the way God intended integers to be stored, most important part first. At an assembler level fields of mixed positive integers and text can be sorted as if it were one big text field key. Real programmers read hex dumps, and big-endian is a lot easier to comprehend.

In the little-endian camp (least significant byte first) are the Intel 8080, 8086, 80286, Pentium and follow ons and the AMD 6502 popularised by the Apple ][.

Lilliputians (little-endians) assert that putting the low order part first is more natural because when you do arithmetic manually, you start at the least significant part and work toward the most significant part. This ordering makes writing multi-precision arithmetic easier since you work up not down. It made implementing 8-bit microprocessors easier. At the assembler level (not in Java) it also lets you cheat and pass addresses of a 32-bit positive ints to a routine expecting only a 16-bit parameter and still have it work. Real programmers read hex dumps, and little-endian is more of a stimulating challenge.

The Power-PC's have a foot in both camps. They are bisexual but the OS usually imposes one convention or the other. e.g. Mac PowerPCs are big-endian.


HTML Checked!
Canadian Mind Products You can get an updated copy of this page from http://mindprod.com/endian.html