*** LHA, LZH, LZS (LHArc compressed files) *** Document revision: 1.3 *** Last updated: March 11, 2004 *** Compiler/Editor: Peter Schepers *** Contributors/samples: Joe Forster/STA, net documents These files are created with LHA on the C64 (or C128), and can present special problems to the typical PC user. The compression used is LH1, an old method used on LZH 1.xx (pre-version 2), so any version of LHA on the PC can uncompress them. However, LHA allows filenames of up to 18 characters long, and DOS doesn't know how to handle them (Windows 95 unLHA utilities will extract the full filename). Usually, some of the files already uncompressed will be overwritten by other files just being uncompressed because the name seems the same to DOS. To LHA however, the filenames are quite different. LHA archives always have a string two bytes into the file ("-L??-") which describe the type of compression used. Over the development life of LHA there have been several different compression algorithms used. The "??" in the "-L??-" can be one of several possibilites, but on the C64 it is likely limited to "H0" (no compression) and "H1". Newer versions of LHA/LZH use other combinations like "H2", "H3", "H4", "H5", "ZS", "Z5", and "Z4". The letters typically used in the compression string come from a combination of the creators initials of the LZ algorithm, Lempel/Ziv, and the author of the LHA program, Haruyasu Yoshizaki. The following is a sample of an LHA header. Note the string to search for at byte $0002: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ASCII ----------------------------------------------- ---------------- 0000: 24 93 2D 6C 68 31 2D 39 02 00 00 16 04 00 00 00 ..-lh1-......... 0010: 08 4C 14 00 00 0E 73 79 73 2E 48 6F 75 73 65 20 ................ 0020: 4D 34 00 53 DE 06 11 1C 12 C4 C8 FA 3A 5B DC CE ................ 0030: B2 FA 38 1E 46 B0 B6 9E 9B 75 7A 49 71 72 B3 53 ................ 0040: 6E 4E B4 A0 BF 5E 95 B3 05 8A 75 D5 6C E3 03 4A ................ 0050: 2C 54 F4 AF 05 18 59 E2 F4 34 4A 0A 28 D4 33 E2 ................ 0060: C4 9D 04 D7 C7 8B 91 66 0E E5 DE 98 3C 92 CC B5 ................ The header layout is fairly basic. The header for each file starts *two* bytes before the "-lh?-" string. The above example has already been trimmed down to start at these two bytes. Each header has the same layout, only the length varies due to the length of the filename. Here is a breakdown of the above example. Bytes: $0000: 24 - Length of header (known as "LEN", not including this and the next byte). If it is zero, we are at the end of the file. 0001: 93 - Header checksum 0002: 2D 6C 68 31 2D - LHA compression type "-LH1-" 0007: 39 02 00 00 - Compressed file size ($00000239) 000B: 16 04 00 00 - Uncompressed file size ($00000416) 000F: 00 08 4C 14 - Time/date stamp 0013: 00 - File attribute 0014: 00 - Header level 00 = non-extended header 01, 02 = extended header 0015: 0E - Length of the following filename 0016: 73 79 73 2E 48 6F 75 - Filename, with a zero and filetype 73 65 20 4D 34 00 53 appended ("SYS.HOUSE M4úS"). The name can be up to 18 characters in length. Note the length *includes* the zero and filetype, making the actual filename length 2 bytes shorter. 0024: DE 06 - File data checksum (starts at LEN) 0026: 11 1C 12 C4 C8 FA... - File data (starts at LEN+2) The header checksum at byte $0001 is calculated by adding the bytes in the header from $0002 (LHA compression type) to LEN+1 (File data checksum), without carry. The time/date stamp (bytes $000F-$0012), is broken down as follows: Bytes:$000F-0010: Time of last modification: BITS 0- 4: Seconds divided by 2 (0-58, only even numbers) BITS 5-10: Minutes (0-59) BITS 11-15: Hours (0-23, no AM or PM) Bytes:$0011-0012: Date of last modification: BITS 0- 4: Day (1-31) BITS 5- 9: Month (1-12) BITS 10-15: Year minus 1980 The format of the compressed data is much too complex to get into here. Understanding the layout would require knowledge of Huffman coding and sliding dictionaries, and is nowhere near as simple as ZipCode! The description given in the LHA source code for the different compression modes are as follows: -lh0- no compression, file stored -lh1- 4k sliding dictionary (max 60 bytes) + dynamic Huffman + fixed encoding of position -lh2- 8k sliding dictionary (max 256 bytes) + dynamic Huffman -lh3- 8k sliding dictionary (max 256 bytes) + static Huffman -lh4- 4k sliding dictionary (max 256 bytes) + static Huffman + improved encoding of position and trees -lh5- 8k sliding dictionary (max 256 bytes) + static Huffman + improved encoding of position and trees -lzs- 2k sliding dictionary (max 17 bytes) -lz4- no compression, file stored -lz5- 4k sliding dictionary (max 17 bytes) There are several utilities that you can use to decompress these files, like the already-mentioned LHA on the PC, or Star LHA, one of the many excellent utilities contained in the Star Commander distribution package. If you use Star LHA, keep in mind it needs the program LHA v2.14 (or newer) to extract. If an older version of LHA is used (such as the common version 2.13), then the files being extracted will be corrupt. It will extract the files directly into a D64 image, so the long C64 filenames will not be lost. To an emulator user there is no use to these files, as their only real usage on a C64 was for storage and transmission benefits. The standard compression program on the PC is PKZIP (or ZIP compatibles), so unless you have some need to send *compressed* files back the C64, there is no use in using LHA.