Usage of checksums in SvarDOS

author: Mateusz V.
address: 176.138.251.144
date: 01.02.2024, 22:23 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Ha, I am still looking for an excuse to add MurmurHash somewhere :) Nah, but seriously: I understand where the question comes from as all these different checksums may appear chaotic. But there is a reason for each of these choices: pkgnet provides BSUMs of packages because BSUM is *very* fast, even on an 8088, this is why it is perfect for checksumming stuff in a restricted DOS environment. The website provides md5 sums for some stuff. MD5 is a de facto standard for this kind of things, and there are many web tools (downloaders, browsers...) that know how to compute and validate an md5 sum out of the box. But md5 is cpu-intensive (compared to BSUM and CRC32) so it is good only in modern-ish environment. PKG stores CRC32 sums of files that are installed on disk. Wouldn't BSUM be a better/faster choice? Possibly. But the CRC32 is essentially free here. It is part of the ZIP format and PKG has to compute it anyway to validate that the zip package is not corrupted, so it makes sense to reuse it for the LSM metadata instead of burning cpu power to compute yet another hash. Now, given that CRC32 is mandatory because of the zip format, maybe it would make sense to use it in pkgnet instead of bsum. CRC32 is much slower (CRC32-ing a 1M file takes about 25s on my 8086, while a BSUM needs only 3s), but maybe it is an acceptable cost. The nice thing about CRC32 is that the user already has a tool to compute it (PKG CRC32 file.dat), while BSUM requires an extra program... Or maybe even better - pkgnet could provide both a BSUM *and* a CRC, so users can verify whatever they prefer. Mateusz

author: Mateusz V.
address: 176.138.251.144
date: 02.02.2024, 08:19 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Another possible evolution could be that pkg stops validating the CRC32 completely, and instead relies on some in-package BSUM listing for individual files... That would significantly speed up the installation process of packages on sluggish (8086-style) machines, but at the cost of adding a complication to the package structure (necessity of computing a listing of BSUMs for each file stored in the package). I think it is not worth the effort and extra complication - whether it takes 5s or 20s to install a package probably does not make a difference to anybody, and it is nice to have something standard (zip/crc32) to rely on. Mateusz

author: bttr
address: 92.224.41.26
date: 02.02.2024, 16:00 UTC

aa23a6c4b2c12ce9 5293d7dd5c2e3994 352c1076588a3d54 b0a123df50cd5aa0 a83fa45a6763865f 7ee5b98b3a9b6080 a94232d4849d48c6 b025711ecd4619bf

Thanks for providing some background information on your decisions.

> Or maybe even better - pkgnet could provide both a BSUM *and* a CRC, so users can verify whatever they prefer.

> but at the cost of adding a complication to the package structure (necessity of computing a listing of BSUMs for each file stored in the package)

... the repo server could check for new packages, unpack each, compute the BSUM for the individual files, and place them into a .zip comment.

> whether it takes 5s or 20s to install a package probably does not make a difference to anybody

For 32 packages during SvarDOS installation it's 10 minutes just for validation.

> and it is nice to have something standard (zip/crc32) to rely on.

Indeed.

author: Mateusz V.
address: 176.138.251.144
date: 02.02.2024, 19:28 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Putting metadata in a ZIP comment is close to abusing the format. I'm not sure where exactly fair use would morph into Mordor's darkness, but this led to another thought: The ZIP archive format is nice because it's an industry standard. But it's also a very messy format with lots of historical layers and possible extensions. Processing it takes memory and CPU, when parsing the ZIP file pkg has to take lots of precautions not to step on a landmine. What about using a specialized, lightweight format for packages? Something that would do only one thing, and do it in a way that is as simple as possible. The packages could still be created as ZIP files by the packager, but then they would have to be converted by some zip2svp / svp2zip tool so pkg could understand them. Such zip2svp tool could also control the package structure so we would be sure that a svp package is 100% conform to what pkg expects. The checksum of files in such specialized archive could very well be BSUM then. Not sure about the compression algorithm. Deflate is not bad, but maybe there are some modern algorithms out there that are more lightweight. The benefits of such change would be: - much better control over what is inside svp packages - smaller and much faster pkg.exe Mateusz

author: bttr
address: 92.224.41.26
date: 02.02.2024, 20:05 UTC

aa23a6c4b2c12ce9 5293d7dd5c2e3994 352c1076588a3d54 b0a123df50cd5aa0 a83fa45a6763865f 7ee5b98b3a9b6080 a94232d4849d48c6 b025711ecd4619bf

Is this "worth the effort and extra complication"? Remember, spending time on SvarDOS is limited. But if you're keen on trying just for fun, just go ahead! :-)

author: Mateusz V.
address: 176.138.251.144
date: 02.02.2024, 23:30 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Is this "worth the effort and extra complication"?

That's a good question! Doing such change just for the sake of a different approach is definitely a waste of time, but if it could *significantly* speed up the process of installing packages on old PCs, then it might be quite a win. Today, the installation of SvarDOS on a 8088 @4Mhz PC takes some 40 minutes: - about 5 minutes to boot the floppy and partition+format the disk - 7 minutes to copy the packages to disk - and then about 25 minutes to inflate and CRC them That's A LOT of time. Not sure how much could be saved, but it's at least worth doing some preliminary benchmarks. A simple first test will be to disable CRC32 in PKG and see what the speed gain is. As for the compression algorithms, I did not find any strong candidate for deflate replacement so far. There are options that are much faster, but apparently all of them have a poorer compression ratio, and I'd really don't want to make packages bigger than they are already. https://github.com/facebook/zstd https://github.com/atomicobject/heatshrink https://github.com/lz4/lz4 Mateusz

author: ecm
address: 176.1.139.103
date: 03.02.2024, 17:56 UTC

3af6a715d1728f7e 56b8d5f4b9e77a4d 025531ef95f00066 0f29a937f6cd2db9 e08c2b523ecc6b5f ea02e13a0ba5bf58 b76cb8d8758b907a 76cf91ac43288e4b

Robert pointed me to this thread, so here I am. I would suggest evaluating the lzsa2 format, which in my experience is both faster to depack and compresses to a smaller image than LZ4. However, my LZ4 depacker is likely not as fast as it could be: https://github.com/foone/VGAPride/issues/4 LZSA2 is from https://github.com/emmanuel-marty/lzsa and I use it in https://hg.pushbx.org/ecm/ldebug/file/156a4890666e/source/mak.sh#l571 as in: lzsa -c -f2 --prefer-ratio -v infile.big outfile.sa2 Here's the decompression speed test results on the current lDebug tip revision. Both are created by running the command: INICOMP_SPEED_TEST=128 INICOMP_METHOD="lzsa2 lz4" use_build_compress_only=1 use_build_decomp_test=1 ./mak.sh On the server, running dosemu2 on an amd64 Debian without access to KVM: ldebug/source$ LC_ALL=C sort ../tmp/debug.siz 86528 bytes ( 65.75%), method lzsa2 97792 bytes ( 74.31%), method lz4 131584 bytes (100.00%), method none ldebug/source$ LC_ALL=C sort ../tmp/debug.spd 5.84s for 128 runs ( 45ms / run), method lzsa2 11.67s for 128 runs ( 91ms / run), method lz4 ldebug/source$ On the desktop, running dosemu2 on an amd64 Debian with access to KVM: ldebug/source$ LC_ALL=C sort ../tmp/debug.siz 86528 bytes ( 65.75%), method lzsa2 97792 bytes ( 74.31%), method lz4 131584 bytes (100.00%), method none ldebug/source$ LC_ALL=C sort ../tmp/debug.spd 1.32s for 128 runs ( 10ms / run), method lzsa2 1.94s for 128 runs ( 15ms / run), method lz4 ldebug/source$ All involved files copied to https://pushbx.org/ecm/test/20240203/ - The t*.com files are the test executables that are run to test the decompression size needed (run as "tdebug.com") and speed (run as "tdebug.com b 128"). The ldebugu.com file is the final uncompressed executable. The other *.com files are the final compressed executables. The debug.big file is the debugger image that is compressed or used as an uncompressed lDOS payload stage image. The debug.siz and debug.spd files are the results files as quoted above. As for the others, zstd is too complex to implement a depacker for me, and I suspect it would result in both ratios and depack time close to LZMA-302eos (formerly called LZMA-lzip). For reference, the LZMA-302eos depacker took "Nearly 3 minutes per run" on an HP 95LX with a NEC V20 in 2023 March: https://pushbx.org/ecm/dokuwiki/blog/pushbx/2023/0321_cpu_performance_comparison (LZMA-302eos does offer the highest compression, though.) I already use heatshrink as well. In fact, unlike LZ4 and LZSA2, I have written three different heatshrink depackers. I described them in some detail in https://github.com/atomicobject/heatshrink/issues/82 As an overview, the three depackers are: * lDOS inicomp (alike my LZ4 and LZSA2 depackers): full support, 8086 segmented memory, input and output sizes >= 64 KiB allowed, needs a full buffer for the entire depacked image + some grace area * lDebug help: sizes must be < 64 KiB, needs full buffers * lDebug extpak: streaming, needs a circular buffer of the window size (4 KiB "-w 12" chosen for lDebug), sizes may be >= 64 KiB, can stream one file to another without full buffers Heatshrink as yet has a bug with 32 KiB window size "-w 15" so the highest usable window size is 16 KiB "-w 14". Refer to https://github.com/atomicobject/heatshrink/issues/55 In any case, you can benefit from your own pack format if you compress solidly, ie compress an archive in one single block rather than compressing each file individually (as standard .zip format does). When compressing solidly, it may help to order the files in an archive (eg .tar format) so that similar files are placed next to each other in the image to compress.

author: Mateusz V.
address: 176.138.251.144
date: 04.02.2024, 09:15 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Hi ECM, Thanks for passing by! Your visits are always very much appreciated.

> I would suggest evaluating the lzsa2 format

I did not know about this algorithm, thank you for pointing at it. The graph that is shown on the lzsa2 github page suggests that while it is significantly faster than deflate, it is still less size-efficient. Is that consistent with your experience? https://github.com/emmanuel-marty/lzsa/raw/master/pareto_graph.png

> you can benefit from your own pack format if you compress solidly

Totally, yes. This is something also on my TODO list. But I do not think the gain will be tremendous because solid archives tend to be very good when associated with a huge compression window (which easily spans over multiple files). In our case we have to work with a constrained environment (8088/256K RAM) where even a 32K window is challenging, hence the advantage of "solid" vs "zip-like" might be not so great and I expect it will come mostly from the fact that a common dictionary is used for multiple files. In any case, I will definitely have to do some real-situation benchmarks to have hard facts/numbers at hand. Mateusz

author: ecm
address: 176.1.139.103
date: 04.02.2024, 09:48 UTC

3af6a715d1728f7e 56b8d5f4b9e77a4d 025531ef95f00066 0f29a937f6cd2db9 e08c2b523ecc6b5f ea02e13a0ba5bf58 b76cb8d8758b907a 76cf91ac43288e4b

> I did not know about this algorithm, thank you for pointing at it. The graph that is shown on the lzsa2 github page suggests that while it is significantly faster than deflate, it is still less size-efficient. Is that consistent with your experience?

I have never implemented a deflate depacker so I do not know. You will have to run tests yourself.

> even a 32K window is challenging,

As I mentioned, heatshrink's window size caps out at 16 KiB currently ;P

> and I expect it will come mostly from the fact that a common dictionary is used for multiple files.

Heatshrink and LZ4 and LZSA2 all do, I believe, only compress using backreferences into the window, so there is no separate dictionary for them.

author: ecm
address: 176.1.139.103
date: 04.02.2024, 10:02 UTC

3af6a715d1728f7e 56b8d5f4b9e77a4d 025531ef95f00066 0f29a937f6cd2db9 e08c2b523ecc6b5f ea02e13a0ba5bf58 b76cb8d8758b907a 76cf91ac43288e4b

If you do use heatshrink, you could try my ldebug/source/eld/depack.asm for partial depacking: https://hg.pushbx.org/ecm/ldebug/file/156a4890666e/source/eld/depack.asm You have to allocate and keep around the circular window buffer, and some stack to set aside for the depacker, as well as a bunch of variables. I'm using 256 Bytes for the stack: https://hg.pushbx.org/ecm/ldebug/file/156a4890666e/source/eld/extlib.asm#l1022 Then you pass to read_and_depack es:dx -> the buffer, cx = length of buffer, dword [depackskip] = where to start reading in terms of an offset in the data of the depacked image. Return is CY if error reading or depacking, else NC, ax = how many bytes were read. The trick is that if you've read some data before, and you call read_and_depack again but with dword [depackskip] above the last byte read, then the depacker will not start over but rather continue depacking from its current position. If you lay out your file format so as to allow a "linear" read, you can pack things solidly, depack bit by bit (no large buffers needed), and still only depack the entire image at most once. It isn't impossible to create the same kind of partial depacker for another format, of course.

SvarDOS community forum

Usage of checksums in SvarDOS