ETHFLOP v0.6.1

author: ecm
address: 185.128.71.183
date: 03.09.2024, 18:01 UTC

I looked at it a bit. In FINDPKTDRVR https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l710 you are comparing words from the interrupt vectors. If you're unlucky this can cause a GPF if your a16 address is exactly 0FFFFh so that the word access's high byte exceeds the 64 KiB segment limit that is usually effective for Real or Virtual 86 Mode. The filler part at the end https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l1136 can be made conditional using an %if that grabs the current size of the code assembled so far. (In other words, check that the times amount is not negative.) So if this was ever needed it would automatically be enabled. In the placeholder handler at https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l587 you put the size of the or instruction on the immediate rather than the memory operand. It is more idiomatic to put it on the memory operand. Besides, as the immediate fits in a byte in this case you can optimise this to just use a byte operand. You aren't freeing the process handles in your PSP upon TSR terminate (21.31). That means if a user runs this with "> NUL" your program will leak an SFT entry. You do free the environment before TSR terminate https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l697 but you don't zero the word in the PSP. I think it is better to do so. Your find TSR routine is very barebones: https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l739 It appears that your program must be the topmost int 13h handler. Besides, you change si and di without noting that you do in the comment. And the repeated cld is not needed. And the clc is not needed either, if rep* cmpsb returns ZR then it also has set NC already. Only the stc upon NZ is needed. Besides, I have some more ideas for what you could add as options: * Install a DOS block device instead of taking over an existing drive on int 13h. This would most readily be a new drive of its own with a previously unused drive letter. The easiest way to do that would be to load as a DOS device driver (in f/d/config.sys or using devload). You could make a dual-mode executable that can run either as an application or as a device, to support both modes. * Modify the UPB / UDSC / DDT entry of the drive to take over, which would allow to use drive B: using DOS's internal block device. This may need some knowledge of DOS/BIO internals, which can differ between kernels. * Load as a pre-DOS driver (or rootkit, basically). You can relocate the EBDA (if needed) and install your driver at the top of the Low Memory Area, then keep it resident by modifying the "amount KiB of low memory" in word [0:413h] (also returned by int 12h). At that point you can hook int 13h and direct requests for a specific unit to your control flow. Then you can run a DOS boot sector loader (or bring your own boot). Problem: The packet driver most likely needs a running DOS to initialise and install itself, so you would have to activate the resident program later from the DOS command line by running the transient program (as device or application).

author: Mateusz V.
address: 176.157.255.77
date: 03.09.2024, 20:48 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> I looked at it a bit.

Quite an understatement. :-) Thank you for having looked at this! It's a very impressive overview. I do not understand it all, hence a few follow up questions, if you don't mind.

> In FINDPKTDRVR https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l710 you are comparing words from the interrupt vectors. If you're unlucky this can cause a GPF if your a16 address is exactly 0FFFFh so that the word access's high byte exceeds the 64 KiB segment limit that is usually effective for Real or Virtual 86 Mode.

I never run protected mode so indeed I did not thought of such "wild" access to be a risk. To avoid this, I see two options: either normalizing the far pointer (check if offset > 16, if yes then do "offset -= 16" and "seg -= 1"), or simply check that offset is less than 0xFFF5 before doing the actual check. I doubt that a packet driver would register such a high offset, so the second option is probably good enough. What is the "best practice" solution in such situations?

> The filler part at the end https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l1136 can be made conditional using an %if that grabs the current size of the code assembled so far. (In other words, check that the times amount is not negative.) So if this was ever needed it would automatically be enabled.

Good idea. I will have to dive into the nasm documentation.

> In the placeholder handler at https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l587 you put the size of the or instruction on the immediate rather than the memory operand. It is more idiomatic to put it on the memory operand. Besides, as the immediate fits in a byte in this case you can optimise this to just use a byte operand.

Ha, I have been bitten by this recently here: http://svn.svardos.org/diff.php?repname=SvarDOS&path=%2Fsvarcom%2Ftrunk%2Fcommand.c&rev=1906&peg=1906 Defining the size on the immediate seemed more natural to me. This works on NASM, but not on WASM (wasm was silently ignoring my "word" directive). But back to ETHFLOP and NASM: "or [bp+6], word 1" is assembled as 83 4E 06 01, so it's already a byte! If I change it to "or [bp+6], byte 1", the encoding is exactly the same. A bug? No, it appears to be a feature because if I change it to "or [bp+6], word 0x101" then the encoding becomes 81 4E 06 01 01. So NASM is smart enough to figure out that OR-ing with 0x0001 is the same as OR-ing with 0x01 and uses the shorter variant. How cool is that? The funny thing is that if I do "or [bp+6], 1" then NASM complains that I need to provide a size (only to disregard it). :-D Apparently NASM does the same optimization on ANDs: "and [bp+6], word 0xFFFE" is encoded as 83 66 06 FE. But it any case you are right that it is better to write code that follows conventions. So I changed it to "or byte [bp+6], 1". It does not change the emitted code, but at least it's clearer to other humans.

> You aren't freeing the process handles in your PSP upon TSR terminate (21.31). That means if a user runs this with "> NUL" your program will leak an SFT entry.

I thought about this when you mentioned this "leaking SFT entry" thing at the occasion of your improved EIDL build, but I am unsure what to do. Am I supposed to call INT 0x21,AH=0x3E (close file handle) on all standard handles before TSR-ing? (stdin, stdout, stderr, stdaux, prn). And why would it be a problem for ">NUL" only?

> You do free the environment before TSR terminate https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l697 but you don't zero the word in the PSP. I think it is better to do so.

The TSR never tries to access its environment, but maybe there are some diagnostic tools that could be tempted to explore the environment of loaded programs. It does not harm to zero the env - done now.

> Your find TSR routine is very barebones: https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l739 It appears that your program must be the topmost int 13h handler.

That's true, yes. If another player hooks up 13h then all hope is lost. I find this reasonable enough for most situations. At least for me. :) I was also pondering about adding some non-standard int 13h call with a special signature so the ethflop TSR could catch it and advertise itself, but I was afraid that cluttering the int 13h API would eventually lead to some disaster should another TSR catch my signature or if the BIOS reacts to it in a weird way.

> Besides, you change si and di without noting that you do in the comment.

Comment extended.

> And the repeated cld is not needed.

Where is it repeated?

> And the clc is not needed either, if rep* cmpsb returns ZR then it also has set NC already.

Could be, but I am not fluent enough to see this right away, so I prefer to have it explicit.

> Besides, I have some more ideas for what you could add as options:

> * Install a DOS block device instead of taking over an existing drive on int 13h. This would most readily be a new drive of its own with a previously unused drive letter. The easiest way to do that would be to load as a DOS device driver (in f/d/config.sys or using devload). You could make a dual-mode executable that can run either as an application or as a device, to support both modes.

I thought about it in 2019... But I never created a device driver, while taking over an INT is easy - so I went the easy way. :-P

> * Modify the UPB / UDSC / DDT entry of the drive to take over, which would allow to use drive B: using DOS's internal block device. This may need some knowledge of DOS/BIO internals, which can differ between kernels.

Also something I (loosely) researched. And dropped the idea when I noticed that it implies meddling with internal DOS structures. It's something that I do in EtherDFS, but it became such a beast that I afraid to look at it now. So I try to keep ethflop as simple, generic and "universal" as possible - accepting some limitations here and there.

> * Load as a pre-DOS driver (or rootkit, basically). You can relocate the EBDA (if needed) and install your driver at the top of the Low Memory Area, then keep it resident by modifying the "amount KiB of low memory" in word [0:413h] (also returned by int 12h). At that point you can hook int 13h and direct requests for a specific unit to your control flow. Then you can run a DOS boot sector loader (or bring your own boot). Problem: The packet driver most likely needs a running DOS to initialise and install itself, so you would have to activate the resident program later from the DOS command line by running the transient program (as device or application).

At some point I was thinking of putting ethflop into the programmable EPROM of a network card so it could extend the BIOS at boot. If I understand correctly that's more or less what you describe. But then I realized that I would have to drive the eth card myself since I would not have access to a packet driver, and that's not an adventure I am planning to embark on. Having ethflop be a part of the BIOS but being able to activate it only once DOS (and a packet driver) is loaded does not seem to provide any added value.

author: ecm
address: 176.1.216.244
date: 04.09.2024, 08:08 UTC

> > I looked at it a bit.

> Quite an understatement. :-) Thank you for having looked at this! It's a very impressive overview. I do not understand it all, hence a few follow up questions, if you don't mind.

Sure! Glad to help.

> > In FINDPKTDRVR https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l710 you are comparing words from the interrupt vectors. If you're unlucky this can cause a GPF if your a16 address is exactly 0FFFFh so that the word access's high byte exceeds the 64 KiB segment limit that is usually effective for Real or Virtual 86 Mode.

> I never run protected mode so indeed I did not thought of such "wild" access to be a risk.

It is a little known fact but on most 386+ machines a word access to 0FFFFh can tear like this. On 8086, 186, and 286 it may fault, or access the byte at offset 0FFFFh and the byte at offset 0000h, or the second byte may be accessed from one higher than can be addressed using the segment.

> To avoid this, I see two options: either normalizing the far pointer (check if offset > 16, if yes then do "offset -= 16" and "seg -= 1"),

Think you meant seg += 1. Yes, this would work too.

> or simply check that offset is less than 0xFFF5 before doing the actual check. I doubt that a packet driver would register such a high offset, so the second option is probably good enough.

What is the "best practice" solution in such situations? I usually go with checking for a high offset like you suggested, eg in my most recent EDR-DOS changeset to check for an IISP "KB" signature: https://hg.pushbx.org/ecm/edrdos/rev/1e453d972df2#l2.18 Another approach is to use repe cmpsb or another way to check byte for byte rather than word-wise, which also eliminates the possibility of tearing, eg https://hg.pushbx.org/ecm/ldosboot/file/439448ca4188/boot.asm#l1591

> > In the placeholder handler at https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l587 you put the size of the or instruction on the immediate rather than the memory operand. It is more idiomatic to put it on the memory operand. Besides, as the immediate fits in a byte in this case you can optimise this to just use a byte operand.

> Ha, I have been bitten by this recently here:

http://svn.svardos.org/diff.php?repname=SvarDOS&path=%2Fsvarcom%2Ftrunk%2Fcommand.c&rev=1906&peg=1906

> Defining the size on the immediate seemed more natural to me. This works on NASM, but not on WASM (wasm was silently ignoring my "word" directive). But back to ETHFLOP and NASM:

> "or [bp+6], word 1" is assembled as 83 4E 06 01, so it's already a byte! If I change it to "or [bp+6], byte 1", the encoding is exactly the same. A bug? No, it appears to be a feature because if I change it to "or [bp+6], word 0x101" then the encoding becomes 81 4E 06 01 01. So NASM is smart enough to figure out that OR-ing with 0x0001 is the same as OR-ing with 0x01 and uses the shorter variant. How cool is that? The funny thing is that if I do "or [bp+6], 1" then NASM complains that I need to provide a size (only to disregard it). :-D

> Apparently NASM does the same optimization on ANDs: "and [bp+6], word 0xFFFE" is encoded as 83 66 06 FE.

> But it any case you are right that it is better to write code that follows conventions. So I changed it to "or byte [bp+6], 1". It does not change the emitted code, but at least it's clearer to other humans.

Look again. A word vs byte destination operand size for "or" is NOT changed by nasm automatically, I knew that immediately because the result (Zero Flag) could differ between the two so the assembler mustn't substitute byte size for word size. However, there is an encoding of word destination with an imms8 (sign-extended 8-bit immediate) source operand. Observe: test$ cat test.asm or word [bp + 6], 1 or byte [bp + 6], 1 or [bp + 6], word 1 or [bp + 6], byte 1 test$ nasm test.asm -l /dev/stderr 1 00000000 834E0601 or word [bp + 6], 1 2 00000004 804E0601 or byte [bp + 6], 1 3 00000008 834E0601 or [bp + 6], word 1 4 0000000C 804E0601 or [bp + 6], byte 1 test$ The 83... opcode is r/m16, imms8 whereas the 80... opcode is r/m8, imm8. It is true that neither is shorter in this case so my optimisation suggestion here was wrong. The same is true of "and". > > You aren't freeing the process handles in your PSP upon TSR terminate (21.31). That means if a user runs this with "> NUL" your program will leak an SFT entry.

> I thought about this when you mentioned this "leaking SFT entry" thing at the occasion of your improved EIDL build, but I am unsure what to do. Am I supposed to call INT 0x21,AH=0x3E (close file handle) on all standard handles before TSR-ing? (stdin, stdout, stderr, stdaux, prn).

Yes, though I hardened this to simply closing all PHT entries, see https://hg.pushbx.org/ecm/fdapm/file/350b11660733/source/fdapm/fdapm.asm#l146

> And why would it be a problem for ">NUL" only?

The CON, AUX, and PRN handles for the five std handles are typically DUPlicated from the shell's, so the leak in that case is just an increased use count of these "forever" handles. If you redirect to a file or NUL however, you get a new handle that is only used by the program being invoked. You can test this if you look at the SFTs.

> > You do free the environment before TSR terminate https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l697 but you don't zero the word in the PSP. I think it is better to do so.

> The TSR never tries to access its environment, but maybe there are some diagnostic tools that could be tempted to explore the environment of loaded programs. It does not harm to zero the env - done now.

Yes, exactingly.

> > Your find TSR routine is very barebones: https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l739 It appears that your program must be the topmost int 13h handler.

> That's true, yes. If another player hooks up 13h then all hope is lost. I find this reasonable enough for most situations. At least for me. :)

I was also pondering about adding some non-standard int 13h call with a special signature so the ethflop TSR could catch it and advertise itself, but I was afraid that cluttering the int 13h API would eventually lead to some disaster should another TSR catch my signature or if the BIOS reacts to it in a weird way. I agree that random signatures are not a good look. However, you could implement an AMIS multiplexer to find your resident instance. > > And the repeated cld is not needed.

> Where is it repeated?

Early in your transient program and immediately before the repe cmpsb. You don't even need any of these actually because DOS will always set up the UP state initially. But if you do wish to keep it, keeping a single cld at the start of the program will do nicely.

> > And the clc is not needed either, if rep* cmpsb returns ZR then it also has set NC already.

> Could be, but I am not fluent enough to see this right away, so I prefer to have it explicit.

That's why I pointed it out. I do usually put a comment to this effect if I depend on such "incidental" flag status because it isn't always obvious to me either. > > Besides, I have some more ideas for what you could add as options:

> > * Install a DOS block device instead of taking over an existing drive on int 13h. This would most readily be a new drive of its own with a previously unused drive letter. The easiest way to do that would be to load as a DOS device driver (in f/d/config.sys or using devload). You could make a dual-mode executable that can run either as an application or as a device, to support both modes.

> I thought about it in 2019... But I never created a device driver, while taking over an INT is easy - so I went the easy way. :-P

Fair.

> > * Modify the UPB / UDSC / DDT entry of the drive to take over, which would allow to use drive B: using DOS's internal block device. This may need some knowledge of DOS/BIO internals, which can differ between kernels.

> Also something I (loosely) researched. And dropped the idea when I noticed that it implies meddling with internal DOS structures. It's something that I do in EtherDFS, but it became such a beast that I afraid to look at it now. So I try to keep ethflop as simple, generic and "universal" as possible - accepting some limitations here and there.

Ok.

> > * Load as a pre-DOS driver (or rootkit, basically). You can relocate the EBDA (if needed) and install your driver at the top of the Low Memory Area, then keep it resident by modifying the "amount KiB of low memory" in word [0:413h] (also returned by int 12h). At that point you can hook int 13h and direct requests for a specific unit to your control flow. Then you can run a DOS boot sector loader (or bring your own boot). Problem: The packet driver most likely needs a running DOS to initialise and install itself, so you would have to activate the resident program later from the DOS command line by running the transient program (as device or application).

> At some point I was thinking of putting ethflop into the programmable EPROM of a network card so it could extend the BIOS at boot. If I understand correctly that's more or less what you describe.

Yesno. I suggest to load it like a kernel then chainload another (actual) DOS kernel. This is much simpler than programming a ROM.

> But then I realized that I would have to drive the eth card myself since I would not have access to a packet driver, and that's not an adventure I am planning to embark on. Having ethflop be a part of the BIOS but being able to activate it only once DOS (and a packet driver) is loaded does not seem to provide any added value.

Actually you can use drive B: then if you tell the DOS that there are two diskette drives in the machine, which as you noted doesn't work yet when DOS sets up its UPBs for a single-drive system.

author: Mateusz V.
address: 176.157.255.77
date: 04.09.2024, 09:28 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Look again. A word vs byte destination operand size for "or" is NOT changed by nasm automatically, I knew that immediately because the result (Zero Flag) could differ between the two so the assembler mustn't substitute byte size for word size.

It is obvious now that you said it. I re-tested and got the same results as you. Yesterday I was thrown off by the fact that both word- and byte- encodings were the same size so I did not notice the prefix is slightly different for both cases. So apparently there is a special x86 encoding for "OR this with a word but I give this word in 8 bits because it is small enough".

> Yes, though I hardened this to simply closing all PHT entries

Very nice! I will do the same, then. Thanks for the tip!

> I suggest to load it like a kernel then chainload another (actual) DOS kernel.

This idea has an extremely high coolness factor, but a debatable added value level. :) It would indeed provide a solution to the "B: is unusable because hooked by DOS" problem, but at the cost of putting ethflop in either the MBR or VBR/boot sector. Not very user friendly, plus it might be dangerous and/or frown upon by some antivirus software. I think it is much easier and safer for users to configure their CMOS BIOS so it thinks it has a real B: drive (of course BIOS should be instructed not to try booting from it and not to try any "seek test" at power up time). Alternatively, it should be possible to use some CONFIG.SYS driver that "reserves" B: without doing anything else, just to avoid DOS stealing it (surely such thing exists, maybe I could even ship it with ethflop).

author: ecm
address: 176.1.216.244
date: 04.09.2024, 10:28 UTC

> So apparently there is a special x86 encoding for "OR this with a word but I give this word in 8 bits because it is small enough".

Indeed.

> This idea has an extremely high coolness factor, but a debatable added value level. :)

It would indeed provide a solution to the "B: is unusable because hooked by DOS" problem, but at the cost of putting ethflop in either the MBR or VBR/boot sector. Not very user friendly, plus it might be dangerous and/or frown upon by some antivirus software. You can wrap the program in an lDOS iniload stage. Then you'd just copy your original kernel file (eg kernel.sys) to a different name and overwrite the DOS kernel with the ethflop "kernel". Or install a new boot sector loader with a different name, eg ethflop.com. If you can set up a different kernel to boot then you would be able to set up booting into the bootloadable ethflop program.

> I think it is much easier and safer for users to configure their CMOS BIOS so it thinks it has a real B: drive (of course BIOS should be instructed not to try booting from it and not to try any "seek test" at power up time).

May not be possible for everyone. And where's the fun in that? =P

> Alternatively, it should be possible to use some CONFIG.SYS driver that "reserves" B: without doing anything else, just to avoid DOS stealing it (surely such thing exists, maybe I could even ship it with ethflop).

No, at the point that f/d/config.sys is processed the DOS already has set up its UPBs from the number of diskette drives it had detected, so the DJ mechanism is already initialised by the time your device driver loads.

author: Mateusz V.
address: 176.157.255.77
date: 04.09.2024, 19:16 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

I am not sure I understand your "closing all handles" idea. Here is the code you linked to: xor bx, bx ; = 0 mov cx, word [32h] ; get amount of handles .loop: mov ah, 3Eh int 21h ; close it inc bx ; next handle loop .loop ; loop for all process handles So you basically read the amount of handles from the PSP and then you ask DOS to close handles from 0 to (amount), without looking at the JFT (PSP+18h) nor the JFT far pointer (PSP+34h). Is this really safe? I mean - are the handles guaranteed to be sequential? If it's the default handles stdin/stdout/stderr/aux/prn, then they are indeed sequential, but what if the code was executed with some non-standard (redirected) handles? Your code also assumes there is at least one handle defined - is this sure? Wouldn't something like this be safer / more universal? (untested code) mov cx, word [0x32] ; get amount of handles les dx, [0x34] ; ES:DX points at the JFT now xor bh, bh NEXT: jzcx DONE ; no more handles to process mov bl, [es:dx] ; "close handle" expects the handle in BX but the JFT has 8bit entries mov ah, 0x3E int 0x21 ; close it inc dx ; point at the next handle in JFT dec cx jmp short NEXT ; loop for all process handles DONE:

author: ecm
address: 185.128.71.183
date: 04.09.2024, 19:59 UTC

> I am not sure I understand your "closing all handles" idea. Here is the code you linked to:

> So you basically read the amount of handles from the PSP and then you ask DOS to close handles from 0 to (amount),

0 to amount minus 1 actually. The amount number itself is not a valid Process Handle so needs no closing.

> without looking at the JFT (PSP+18h) nor the JFT far pointer (PSP+34h).

Yes, DOS doesn't complain much if you try to close a Process Handle that is already closed (or was never opened) in the PHT. It does return an error from the int 21h call but we ignore that.

> Is this really safe? I mean - are the handles guaranteed to be sequential? If it's the default handles stdin/stdout/stderr/aux/prn, then they are indeed sequential, but what if the code was executed with some non-standard (redirected) handles?

Doesn't matter, even if any handles were left opened with gaps (ie not contiguous) then the gaps (not open handles) error out as before but the loop continues and eventually closes all still open handles.

> Your code also assumes there is at least one handle defined - is this sure?

DOS breaks usually if you set the PHT size to zero. In this case it would only break our code a little because it would simply loop for 64 Ki iterations in the loop.

> Wouldn't something like this be safer / more universal? (untested code)

> ; "close handle" expects the handle in BX but the JFT has 8bit entries

This is very wrong. The Process Handle Table is *indexed* by Process Handles. The DOS API expects a process handle, ie an index into the PHT, *not* the contents of that PHT entry. The contents are SFT indices. It is the DOS's job to index into the PHT and use its content as an SFT index, unless when it is closed and thus holds the byte value 255. Your code is very broken and wouldn't reliably work.

author: ecm
address: 185.128.71.183
date: 04.09.2024, 20:18 UTC

You noted in a revision message: https://sourceforge.net/p/ethflop/code/157/

> potentially triggerring a GPF if we are running under EMM386 or some other protected mode thing

This is inaccurate. On a 386 even in Real 86 Mode you may run into the problem of the exacting segment limit being 0FFFFh so that a word read at this offset will fault (or dword read at 0FFFDh+). Also, to be even more nitpicking, the problem isn't you would "read over 0xffff" generally - if the address overflows the 64 KiB boundary (eg you read at [bx + 4] where bx = 0FFFFh) in a16 addressing it will simply wrap around to the beginning of the segment. In 16-bit code, only the *word access* at precisely an effective address of 0FFFFh will cause a tear. So if bx + 4 = 0FFFFh you have a problem, if bx + 4 = (1)0000h or higher then the 17th offset bit is discarded and you read from the start of the segment.

author: ecm
address: 185.128.71.183
date: 04.09.2024, 20:25 UTC

In https://sourceforge.net/p/ethflop/code/156/ I noticed that you're putting the packet driver signature letters in the comments. You can just cmp [...], "PK" for example, no need to write the hexadecimal number. NASM knows to interpret the multi-byte string literals in memory order for two-byte literals used as word values.

author: Mateusz V.
address: 176.157.255.77
date: 04.09.2024, 20:29 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> The DOS API expects a process handle, ie an index into the PHT, *not* the contents of that PHT entry.

Okay, that's the information I was missing. Thank you for clarifying. So all handle-related DOS calls are actually taking an index to the (calling process) JFT, and not really a "handle". Didn't know that. Your code makes perfect sense now. About the ghost B: drive: it is not something I am going to solve, but I have at least added a check for this to forbid the user trying a setup that won't work: https://sourceforge.net/p/ethflop/code/161/

author: ecm
address: 185.128.71.183
date: 04.09.2024, 20:35 UTC

https://sourceforge.net/p/ethflop/code/161/tree/ethflop-client/trunk/ethflop.asm#l294 Here you mention that you created a thread on a mailing list in 2019 to modify the flags on stack. Your solution is good. However, I know a few alternatives. There is of course the bad old "retf 2" which doesn't restore DF, TF, IF. There is your way. You can push ax and lahf and then store ah to byte [bp + 6]. You can also do this: https://hg.pushbx.org/ecm/seekext/file/b8a84909ec0c/resident.asm#l281 .iret_CF: push bp mov bp, sp rcr byte [bp + 6], 1 ; flip rol byte [bp + 6], 1 ; flop pop bp iret The lahf method has the advantage that it will allow passing through your interrupt handler's Carry Flag *and* the Zero Flag. The flip flop method only passes the Carry Flag. Your method requires separate code for setting or clearing any flag.

author: Mateusz V.
address: 176.157.255.77
date: 04.09.2024, 20:38 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> > potentially triggerring a GPF if we are running under EMM386 or some other protected mode thing

> This is inaccurate.

Yes, yes - I know. But I understood that only after the commit (and after I read your second very kind explanation). I initially thought it would be a "segfault-like" reaction of EMM386 when process would try to read outside of its segment. Only later I understood the issue is about a single read operation being potentially spread ("teared") over a segment boundary.

> I noticed that you're putting the packet driver signature letters in the comments. You can just cmp [...], "PK" for example, no need to write the hexadecimal number.

Indeed, this seems to work - but I am afraid that I myself will be confused if I see such construct one year from now. :-P

author: ecm
address: 185.128.71.183
date: 04.09.2024, 20:41 UTC

> Okay, that's the information I was missing. Thank you for clarifying. So all handle-related DOS calls are actually taking an index to the (calling process) JFT, and not really a "handle". Didn't know that. Your code makes perfect sense now.

That's just what "handle" means to DOS though. This is also why I call them process handles. The process handle points into the PHT and that contains an SFT index. Otherwise in fact you could not use redirection for different processes, as there is only one global SFT index "1" but we want different processes to possibly have different process handles "1" for their stdout for example.

> About the ghost B: drive: it is not something I am going to solve, but I have at least added a check for this to forbid the user trying a setup that won't work: https://sourceforge.net/p/ethflop/code/161/

Good idea. Can you walk me through how to set up ethflop on a Debian server with dosemu2? I want the server and client part of ethflop to both run on the same Debian server. And I don't have root access to that machine. And it's not in a LAN. Is this supported? If you do help me achieve that I may look into adding the boot option by my self.

author: ecm
address: 185.128.71.183
date: 04.09.2024, 20:46 UTC

I can run qemu on the server as well, if that makes it any easier.

author: Mateusz V.
address: 176.157.255.77
date: 04.09.2024, 20:53 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Can you walk me through how to set up ethflop on a Debian server with dosemu2? I want the server and client part of ethflop to both run on the same Debian server.

With DOSEMU2 the only way I think would be to set DOSEMU2's network interface to one tap, set up a second instance of DOSEMU2 to use another tap, and then initiate an Ethernet bridge (brctl addbr) to add both taps into. Slightly complex. And you need to be root to create the tap interfaces (with tunctl) and to operate the bridge (with brctl). Can you use QEMU instead? With QEMU it's super simple, as QEMU supports a "socket" interface, no root needed and no third party pieces required. One QEMU instance listens on a TCP port, and the second QEMU instance connects to it. Then they send their Ethernet frames to each other inside this TCP connection. This is how to run it: 1st QEMU instance ("server", must be launched first): qemu-system-i386 -m 1M -fda server.img -hda servhd.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:01 -netdev socket,id=net0,listen=127.0.0.1:1985 2nd QEMU instance ("client"): qemu-system-i386 -m 1M -fda client.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:02 -netdev socket,id=net0,connect=127.0.0.1:1985 For this you do not need to be root, but you need QEMU to be installed, obviously. You can listen on something else than 127.0.0.1 if you like, can be handy to connect two QEMU instances running on different machines. And you can use whatever TCP port you wish (assuming it's available, and > 1024 for non-root users). On the DOS VMs you need the PCNTPK.COM packet driver to use the PCNET (virtual) NIC.

author: ecm
address: 185.128.71.183
date: 04.09.2024, 20:57 UTC

Yes, I can use qemu! Where do I get a packet driver for it though?

author: ecm
address: 185.128.71.183
date: 04.09.2024, 21:00 UTC

Noticed your edit,

> On the DOS VMs you need the PCNTPK.COM packet driver to use the PCNET (virtual) NIC.

This? https://www.lazybrowndog.net/freedos/virtualbox/?page_id=321

author: Mateusz V.
address: 176.157.255.77
date: 04.09.2024, 21:09 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

The packet driver for PC-NET is in the SvarDOS repo: http://svardos.org/?p=repo AND it's by default included on the floppy of all 1.44M and 2.88M SvarDOS builds (not installed, though). BTW this QEMU socket interface is obviously limited to two machines (because a TCP connection has two ends), but it would be fairly easy to extend this to more machines by creating a "software switch" that would listen on a tcp port, accept connections from QEMU instances and route frames to proper sockets. I briefly pondered about creating such switch, but since I do not actually need it I left it "for some future".

author: Mateusz V.
address: 176.157.255.77
date: 04.09.2024, 21:15 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

this forum has a limit of "daily posts per IP". it was set to 10. I increased it now to 20.

author: Mateusz V.
address: 176.157.255.77
date: 04.09.2024, 21:29 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> you mention that you created a thread on a mailing list in 2019 to modify the flags on stack.

> Your solution is good. However, I know a few alternatives.

Thanks for the extra methods, that's very interesting. About the thread: it was not on a mailing list, but on a usenet group, alt.lang.asm. If you don't know it, I recommend you take a look. I'm pretty sure you will find it very interesting, it's all about x86 assembly on DOS. Many brilliant people, you'd fit perfectly. In a similar register there is also comp.lang.asm.x86. To access the usenet you'd need a newsgroup client (I use ClawsMail on Linux, but there is a lot of choice). You can take a peek through a web archive if you're curious: https://alt.lang.asm.narkive.com/ https://comp.lang.asm.x86.narkive.com/

author: Mateusz V.
address: 176.157.255.77
date: 05.09.2024, 08:35 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> BTW this QEMU socket interface is obviously limited to two machines (because a TCP connection has two ends), but it would be fairly easy to extend this to more machines by creating a "software switch" that would listen on a tcp port, accept connections from QEMU instances and route frames to proper sockets.

Couldn't resist and created it this morning: https://sourceforge.net/p/qemusockhub/code/HEAD/tree/ Now with this I can network together an ethflop-server with up to 16 ethflop-client VMs. All running with QEMU, with no network configuration needed. ===================================== usage: qemusockhub tcp_port Example: # run qemusockhub qemusockhub 1985 # run 1st VM qemu-system-i386 -m 1M -fda client1.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:01 -netdev socket,id=net0,connect=127.0.0.1:1985 # run 2nd VM qemu-system-i386 -m 1M -fda client2.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:02 -netdev socket,id=net0,connect=127.0.0.1:1985 # run 3rd VM qemu-system-i386 -m 1M -fda client3.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:03 -netdev socket,id=net0,connect=127.0.0.1:1985 (etc) ===================================== qemusockhub is very crude, it's more a quick hack than a real program, but it works very well for the usage it is made for. QEMU offers also other networking backends, so maybe there is a solution that would not need such "hub" program, don't know, did not look further.

author: ecm
address: 89.246.111.84
date: 05.09.2024, 18:02 UTC

The DOS OpenWatcom makefile references pktdrv.c but this file doesn't seem to exist?

author: ecm
address: 89.246.111.84
date: 05.09.2024, 18:31 UTC

> The DOS OpenWatcom makefile references pktdrv.c but this file doesn't seem to exist?

When running wmake -f Makefile.dos there is an error message about not finding the file but compilation succeeds regardless. Running ethflopd (without parameters) in dosemu2 causes a crash. ethflopd /? works as expected.

author: ecm
address: 89.246.111.84
date: 05.09.2024, 18:34 UTC

How is mdrs2025.lib built? Are the sources not included in the ethflop repo?

author: ecm
address: 89.246.111.84
date: 05.09.2024, 18:36 UTC

Are there open source packet drivers for any NIC?

author: Mateusz V.
address: 176.157.255.77
date: 05.09.2024, 18:48 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> The DOS OpenWatcom makefile references pktdrv.c but this file doesn't seem to exist?

Indeed - I didn't even notice it. It's weird that WCL agreed to proceed nonetheless. Makefile fixed now.

> Running ethflopd (without parameters) in dosemu2 causes a crash. ethflopd /? works as expected.

I tested it now - ethflopd.exe loads as expected on my DOSEMU2 setup. Does it display anything on your screen before crashing?

> How is mdrs2025.lib built? Are the sources not included in the ethflop repo?

It is an external, stand-alone (and open-source) library: https://mdrlib.sourceforge.io/ (ver 2025 is the library's trunk)

> Are there open source packet drivers for any NIC?

see here, it's the golden reference in the packet driver world: http://crynwr.com/

author: ecm
address: 89.246.111.84
date: 05.09.2024, 18:51 UTC

When not loading a packet driver in qemu ethflopd crashes too: A:\>ethflopd.exe ETHFLOP server for DOS ver 20240904 starting... Packet driver found at int 0x6F70 Invalid Opcode at 61BD FFFF 0002 0D09 1AF5 0664 0282 0D09 0D09 6F70 6F6C DCEF 00 08 A:\>

author: Mateusz V.
address: 176.157.255.77
date: 05.09.2024, 18:53 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Ah, so you probably don't have $_pktdriver = (on) in your .dosemurc. I have to admit I never tried running it without a packet driver. Will investigate.

author: ecm
address: 89.246.111.84
date: 05.09.2024, 18:55 UTC

Client correctly determines "no packet driver found"

author: ecm
address: 89.246.111.84
date: 05.09.2024, 18:59 UTC

The pointer vs not pointer use of "pktint" in https://sourceforge.net/p/ethflop/code/HEAD/tree/ethflop-server/trunk/ui_dos.c#l48 is confusing to me.

author: ecm
address: 89.246.111.84
date: 05.09.2024, 19:03 UTC

http://crynwr.com/drivers/amdpd.zip seems to be under GNU GPL v1(-only?).

author: ecm
address: 89.246.111.84
date: 05.09.2024, 19:10 UTC

Running the DOS UI of ethflopd in qemu (with the network card setup and pcntpk int=0x60 loaded) doesn't seem to do anything when I press enter.

author: ecm
address: 89.246.111.84
date: 05.09.2024, 19:14 UTC

I managed to connect the client to the server. However, pressing Enter just toggles the "< NO FLOPPY >" text next to the client from normal to reverse video.

author: ecm
address: 89.246.111.84
date: 05.09.2024, 19:25 UTC

Can't seem to insert a disk created with ethflop n2880. And creating additional disks doesn't seem to fully work: https://pushbx.org/ecm/test/20240905/ethflop1.txt ethflop has been installed A:\>dir b: Error reading from drive B: DOS area: seek error (A)bort, (I)gnore, (R)etry, (F)ail? File not found. - 'b:' A:\>ethflop a ERROR: ethflop is already installed A:\>ethflop b ERROR: ethflop is already installed A:\>ethflop u ethflop has been uninstalled A:\>ethflop b ethflop server: 52:54:00:00:00:01 virt. floppy in drive B: <NONE> ethflop has been installed A:\>ethflop n2880 ERROR: specified disk name is invalid A:\>ethflop n2880 tesy Disk TESY created (2880 KiB) A:\>ethflop n2880 test ERROR: disk TEST failed to be initialized A:\>ethflop n2880 test1 ERROR: disk TEST1 failed to be initialized A:\> Ready, waiting for clients... client connected: 52:54:00:00:00:02 ERROR: unknown disk format$ ERROR: no virtual floppy loaded$ I can press Enter on the server UI now but selecting any of the three images returns the "unknown disk format" error. They are shown like so: Available floppy image files: TEST TEST1 TESY This test was with an -fda -fdb qemu setup to allow booting off of fda, and executing ethflop.com from fda, but using fdb as the drive to connect to the server.

author: ecm
address: 89.246.111.84
date: 05.09.2024, 19:30 UTC

This is on a FreeDOS kernel by the way. May be related to the "unknown disk format"? By the way this error is displayed by "ethflop i test" on the client side as well. "ethflop s" says: A:\>ethflop s ethflop server: 52:54:00:00:00:01 virt. floppy in drive B: <NONE>

author: ecm
address: 89.246.111.84
date: 05.09.2024, 19:32 UTC

"ethflop n1440 test1440" then "ethflop i test1440" also displays the error.

author: ecm
address: 89.246.111.84
date: 05.09.2024, 19:36 UTC

Aha! I hadn't switched the current drive to my C: so ethflopd tried to write the images on A: which is of course too small to hold all these images. The first test (tesy) was written partially, all others were zero-byte files. Works better now, after "ethflop n2880 test" and "ethflop i test" it seems I have successfully created and inserted an image.

author: Mateusz V.
address: 176.157.255.77
date: 05.09.2024, 19:37 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> The pointer vs not pointer use of "pktint"

Yes, this was one of the things that I fixed now, but this one was only cosmetic. The more serious one was this: https://sourceforge.net/p/mdrlib/code/131/tree//trunk/pktdrv/pktdrv.c?diff=6555eb2a6b7142b49cabfdac:130

> I managed to connect the client to the server. However, pressing Enter just toggles the "< NO FLOPPY >" text next to the client from normal to reverse video.

That's probably because you don't have any floppy images available (yet). Otherwise the highlight would jump to the right side and allow you to select an image. I will see to make it clearer.

> ERROR: disk TEST failed to be initialized

This is the culprit I think. For some reason the server fails to inflate the image. The server should display some logs about this. Do you have enough disk space on the server? (at least 2.8M)

author: Cryptus
address: 87.174.183.178
date: 05.09.2024, 20:20 UTC

Hi Mateusz and ecm,

> Aha! I hadn't switched the current drive to my C: so ethflopd tried to write the images on A: which is of course too small to hold all these images.

I made the same mistake (even topped it by starting ethflopd from CDROM drive)... Stupid me! Maybe the DOS server could do a) some error checking that the current directory is writable and/or b) print a warning if the free space is below a certain (at least standard floppy) size? Great piece of software anyway! Frank

author: Mateusz V.
address: 176.157.255.77
date: 05.09.2024, 21:04 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Maybe the DOS server could do a) some error checking that the current directory is writable and/or b) print a warning if the free space is below a certain (at least standard floppy) size?

Checking for diskspace is not that easy in a portable way, but I think it could be enough if the server relayed to the client a clear error message, like "failed to init floppy image: out of disk space". The user would probably quickly understand what's going on then.

author: ecm
address: 176.1.221.253
date: 06.09.2024, 08:14 UTC

Correctness bug: https://sourceforge.net/p/ethflop/code/168/tree/ethflop-client/trunk/ethflop.asm#l210 cx can be zero even if NZ, if the last word didn't match. You should use "je", not "jcxz"

author: Mateusz V.
address: 176.157.255.77
date: 06.09.2024, 08:47 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> cx can be zero even if NZ, if the last word didn't match. You should use "je", not "jcxz"

You are right. I so much wanted to use the cool-sounding "jcxz" at least in one place somewhere... Fixed now. Thanks. https://sourceforge.net/p/ethflop/code/169/tree//ethflop-client/trunk/ethflop.asm?diff=5d8fae32e39601433ecbde88:168

author: ecm
address: 176.1.221.253
date: 06.09.2024, 09:55 UTC

Is it possible to run ethflopd on our Linux server, without root access, and have it connect to the same ethflop client qemu machine? I tried running it but it complains about a lock or not being root? ~/proj/ethflop-code/ethflop-server/trunk$ ./ethflopd -f lo ./images/ Error: failed to acquire a lock. Are you running as root? If so - then perhaps ethflopd is running already? If not, and you're really sure of that, then delete the lock file at '/var/run/ethflopd.run'. eth_init() failed The lock file doesn't exist. I'd rather use the Linux server because it builds with gcc whereas the DOS server is probably tied to OpenWatcom (inline assembly particularly).

author: ecm
address: 176.1.221.253
date: 06.09.2024, 10:11 UTC

I have several new problems: When the client uninstalls with "ethflop u" it still seems to be displayed in the server's (DOS) UI. When I create a test image using "ethflop b" then "ethflop n2880 test" then "ethflop i test" it works, but when I shut down both qemu machines and then restart the server and the client, then "ethflop b" and "ethflop i test", I get DOS critical errors on trying "dir b:". The server gets multiple "fread() failure: No such file or directory (TEST, sect 37)" messages then. Afterwards, the FAT32 hdd image that I use seems to be reset?! No trace of the image file remains. This may be a qemu problem, I just haven't figured out who/what resets my hdd image yet.

author: ecm
address: 176.1.221.253
date: 06.09.2024, 10:14 UTC

Disregard the file corruption issue. My client VM scriptlet would reset the hdd image when I started it. This is why the server on restart finds the image file in C:\ but it is gone as soon as I start the client VM.

author: Mateusz V.
address: 176.157.255.77
date: 06.09.2024, 10:34 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Is it possible to run ethflopd on our Linux server, without root access,

No, ethflopd operates on a raw socket. This requires root (or at least the CAP_NET_RAW capability, but I do not know how to grant that without being root).

> When the client uninstalls with "ethflop u" it still seems to be displayed in the server's (DOS) UI.

That's normal, the server remembers the client forever. Thanks to this, when the client connects back, it gets its floppy back (if a floppy was left in drive).

> when I shut down both qemu machines and then restart the server and the client, then "ethflop b" and "ethflop i test", I get DOS critical errors on trying "dir b:". The server gets multiple "fread() failure: No such file or directory (TEST, sect 37)" messages then.

Sounds like your filesystem is broken. You should not shut down your PC when ethflopd is active, as it keeps file descriptors open all the time. This may be the cause of your issue.

author: Mateusz V.
address: 176.157.255.77
date: 06.09.2024, 11:05 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> No, ethflopd operates on a raw socket. This requires root (...)

What you could do, is run some minimalist linux in QEMU. Compile ethflopd on the real server, and sync it (ssh/scp/rsync) to the QEMU-linux-ethflopd server. I also know Open Watcom runs on Linux - but I do not know if it supports cross-compilation (ie. compile a DOS program using the Linux version of Open Watcom). Might be worth trying.

author: ecm
address: 176.1.221.253
date: 06.09.2024, 11:15 UTC

> I also know Open Watcom runs on Linux - but I do not know if it supports cross-compilation (ie. compile a DOS program using the Linux version of Open Watcom). Might be worth trying.

It does support cross-compilation. This is how the "ow" build of the FreeDOS kernel is built, as opposed to the "owdos" build which runs the toolchain in DOS. I dealt with the latter in https://github.com/FDOS/kernel/discussions/188 This is not the problem with OpenWatcom. I can build with Makefile.dos with DOS OW running in dosemu2. I dislike OpenWatcom because their license is considered open source by OSI but not free software by the FSF or the DFSG. There is an effort to change this but it hasn't been completed yet: https://github.com/open-watcom/open-watcom-v2/discussions/271

author: Mateusz V.
address: 176.157.255.77
date: 06.09.2024, 11:31 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Well, to each his own, I guess. Myself, I like OW very much, and if there is one thing I dislike it is the FSF's idea of "free software". :-) On another subject: have you tried qemusockhub by any chance? It allows to connect more than 2 QEMU VMs together over the "socket" network and makes things easier also with only 2 VMs because you do not have to worry about which VM to run first (and also networking is not disrupted if one of the VMs restarts while the other do not).

author: ecm
address: 176.1.8.172
date: 06.09.2024, 15:00 UTC

> Well, to each his own, I guess.

Or her own!

> Myself, I like OW very much, and if there is one thing I dislike it is the FSF's idea of "free software". :-)

I don't fully support the FSF but I do value concepts like the four freedoms. The OW relicensing issue links to a Debian issue, listing some of the problems with OW's current license: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=376431

> On another subject: have you tried qemusockhub by any chance? It allows to connect more than 2 QEMU VMs together over the "socket" network and makes things easier also with only 2 VMs because you do not have to worry about which VM to run first (and also networking is not disrupted if one of the VMs restarts while the other do not).

Oh, interesting, I may look into that.

author: ecm
address: 89.246.111.84
date: 06.09.2024, 15:16 UTC

If I run "make" for the qemusockhub it defaults to using cc which is presumably gcc. It complains that it doesn't recognise "-Weverything". Running "make CC=clang" works around this problem.

author: ecm
address: 89.246.111.84
date: 06.09.2024, 15:24 UTC

It mostly works with qemusockhub, but "ethflop b test" to both install as drive B: and insert the image doesn't seem to work. Also, quitting the server application leads to long waits when trying to access the disk. And restarting the server doesn't seem to reconnect that client, it needs to be restarted as well to reconnect.

author: ecm
address: 89.246.111.84
date: 06.09.2024, 15:27 UTC

> It mostly works with qemusockhub, but "ethflop b test" to both install as drive B: and insert the image doesn't seem to work.

This is not actually supported, is it? It just happened to be that when the server receives a registration of a previously connected client it restores the same image.

author: ecm
address: 89.246.111.84
date: 06.09.2024, 15:47 UTC

It seems that the "has been installed" message is lacking a linebreak. In https://sourceforge.net/p/ethflop/code/174/tree/ethflop-client/trunk/ethflop.asm#l1041 See what happens when running the client from a batch file: FreeCom version 0.85a - GNUC - XMS_Swap [Apr 29 2024 13:55:51] A:\>autoclnt.bat A:\>path a:\ A:\>fdapm apmdos Performing action: APMDOS If APMDOS slows down any app, use ADV:REG instead. Going resident. A:\>pcntpk int=0x60 Packet driver for an PCNTPK, version 02.20 Packet driver skeleton copyright 1988-92, Crynwr Software. This program is free software; see the file COPYING for details. NO WARRANTY; see the file COPYING for details. Packet driver is at segment 068E Interrupt number 0xB (11) I/O port 0xC000 (49152) My Ethernet address is 52:54:00:00:00:02 A:\>ethflop b ethflop server: 52:54:00:00:00:01 virt. floppy in drive B: TEST (2880K) ethflop has been installedA:\>ethflop i test ERROR: you must first eject your current virtual floppy (TEST) A:\>

author: Mateusz V.
address: 176.157.255.77
date: 06.09.2024, 16:00 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> If I run "make" for the qemusockhub it defaults to using cc which is presumably gcc. It complains that it doesn't recognise "-Weverything". Running "make CC=clang" works around this problem.

Yes, it is in "development" mode, there is a comment about that in the (super short) Makefile. :)

> It mostly works with qemusockhub, but "ethflop b test" to both install as drive B: and insert the image doesn't seem to work.

That is not intended to work. It's just "ethflop b", no further arguments. Perhaps the server could emit an error if it sees extraneous arguments (currently it ignores them).

> Also, quitting the server application leads to long waits when trying to access the disk.

The ethflop TSR waits up to 2s for every int 13h request that is not answered (and performs retransmissions every 440ms in the mean time). If DOS emits many such requests, then it's that many times 2s.

> And restarting the server doesn't seem to reconnect that client, it needs to be restarted as well to reconnect.

I am not sure this is supposed to work, I will check it.

> This is not actually supported, is it? It just happened to be that when the server receives a registration of a previously connected client it restores the same image.

Exactly, yes. The server keeps the client->floppy association in memory and hence if the client reconnects, it is immediately told which floppy it has in its "drive" (if any).

> It seems that the "has been installed" message is lacking a linebreak. (...) See what happens when running the client from a batch file:

The missing line breaks are intended (saves two bytes by string), DOS terminates them nicely. The "running from batch with ECHO enabled" is a scenario I will have to check. If it's not a FreeCOM bug I will add the line breaks after all.

author: Mateusz V.
address: 176.157.255.77
date: 06.09.2024, 16:05 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

BTW I worked a bit on qemusockhub. It's no longer a "1990-like hub", it became a switch (meaning it relays frames to their destinations instead of blasting them to all hosts). The console messages should be a little bit more friendly now, too.

author: ecm
address: 89.246.111.84
date: 06.09.2024, 16:09 UTC

Three more problems: The server on DOS will simply seek to the desired image size, minus 1, then write a single byte to the file to create an image. After the header (first 16 KiB up to offset 4000h) this will leave the data in the file uninitialised. I was able to get the server to create a "dirty" image with deleted prior data in the file system. Trying to create a dirty 30 MiB image results in an error after a short delay: A:\>ethflop n31000 dirty3 ERROR: server unreachable Trying to run an n1440 command after that also errors out with the same error. However, "ethflop s" seems to reset the client to working order. My purpose in trying to create the 30 MiB image is to see whether the 16 KiB of zeroed data is enough to cover the whole FAT and root directory, or whether it leaves parts uninitialised. (Is the file system always FAT12?) Finally, the image boot sector seems to contain an infinite loop jump instruction when the image is created. This could be changed to have a no-op boot sector loader instead. For instance, https://hg.pushbx.org/ecm/bootimg/file/e1e9e530b4a2/bootimg.asm#l888

author: ecm
address: 176.1.219.204
date: 06.09.2024, 16:25 UTC

Also some ideas: I'd like ethflop to support LBA access. This requires some reading and writing of user buffers (int 13.42/.43 disk access packet, 13.48 parameter buffer), which the client doesn't allow the server yet. It seems the client handles int 13h functions 0, 1, 2, and 3, where 2 and 3 pass the registers ax, bx, cx, dx to the server (which allows reading the CHS tuple in cx:dx), and a "sectnum" to indicate which sector out of a possible total of 255 is being transferred. The functions 2 and 3 use a 512-byte buffer to read or write sector data with the server. Other functions are passed to the server to handle as is, which seems to happen in https://sourceforge.net/p/ethflop/code/174/tree/ethflop-server/trunk/core.c#l353 My extensions would need to allow the server to send/reply with messages instructing the client to read from a buffer and write to a buffer. During my studying the server for this I noticed that int 13h function 08h seems to not be supported yet. This is desirable as well.

author: ecm
address: 176.1.219.204
date: 06.09.2024, 16:26 UTC

I think I just got the "20 messages within 24h" error message from the forum server. In Polish? I think. Luckily I have multiple IP addresses readily usable to me.

author: ecm
address: 176.1.219.204
date: 06.09.2024, 16:28 UTC

test

author: ecm
address: 176.1.219.204
date: 06.09.2024, 16:29 UTC

Error message reads:

> BŁĄD: Z TWOJEGO ADRESU NAPISANO JUŻ 20 WIADOMOŚCI W PRZECIĄGU OSTATNICH 24H. SPRÓBUJ PONOWNIE ZA JAKIŚ CZAS.

> Wróć do głównej strony

author: ecm
address: 176.1.219.204
date: 06.09.2024, 16:46 UTC

> Error message reads:

>> BŁĄD: Z TWOJEGO ADRESU NAPISANO JUŻ 20 WIADOMOŚCI W PRZECIĄGU OSTATNICH 24H. SPRÓBUJ PONOWNIE ZA JAKIŚ CZAS.

>> Wróć do głównej strony

Google translation:

> ERROR: 20 MESSAGES HAVE BEEN SENT FROM YOUR ADDRESS IN THE LAST 24 HOURS. PLEASE TRY AGAIN IN SOME TIME.

> Return to main page

(Interestingly, in the google translation to german it says instead wording equivalent to "sent to your address" rather than "from".)

author: ecm
address: 176.1.219.204
date: 06.09.2024, 16:55 UTC

It does seem that the file system is always FAT12: https://sourceforge.net/p/ethflop/code/174/tree/ethflop-server/trunk/core.c#l196

> My purpose in trying to create the 30 MiB image is to see whether the 16 KiB of zeroed data is enough to cover the whole FAT and root directory, or whether it leaves parts uninitialised. (Is the file system always FAT12?)

According to my calculations we have 6 KiB per FAT for two FAT copies, plus 7 KiB for the root directory, plus 512 Bytes of boot sector. That's 12 + 7 + 0.5 = 19.5 KiB. So it would seem that the 16 KiB initialisation doesn't suffice to set the entire root directory to all-zeroes.

author: Mateusz V.
address: 176.157.255.77
date: 06.09.2024, 18:33 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> The server on DOS will simply seek to the desired image size, minus 1, then write a single byte to the file to create an image.

Indeed. I was seeking to the full image size before and then writing 0 bytes (as it's the "common" way to grow a file in DOS), but unfortunately such write never returns an error, so I could not detect out of disk situations. Writing a single byte also does not return any error, but at least it returns the amount of written bytes so I can detect mismatches.

> After the header (first 16 KiB up to offset 4000h) this will leave the data in the file uninitialised. I was able to get the server to create a "dirty" image with deleted prior data in the file system.

Yes, I know. Should not be harmful since FAT is properly zeroed. The alternative would be to create the image writing all zeroes to it, or growing it with chsize(), but on my 386 this takes about 30 seconds for a 30M disk, so the ethflop TSR was always timeouting. With the "dirty" method it works, but I had to increase the ethflop timeout to 2s (before v0.6.1 it was only 100ms).

> Trying to create a dirty 30 MiB image results in an error after a short delay:

>

> A:\>ethflop n31000 dirty3

> ERROR: server unreachable

This happens if the server answers too late. It's not a real problem (in the sense that the image gets created anyway, and further communication with the TSR is not impacted) but it does mean the user will miss the feedback about the disk having been created (or not). But as I was saying with the dirty method it should work. Is your VM exceptionally slow? Another problem is that when the server is busy zeroing a 30M image, not only the requesting client will timeout, but all other clients too - including some that may be doing some critical copy operations at this moment.

> Finally, the image boot sector seems to contain an infinite loop jump instruction when the image is created. This could be changed to have a no-op boot sector loader instead. For instance, https://hg.pushbx.org/ecm/bootimg/file/e1e9e530b4a2/bootimg.asm#l888

I will look, thanks.

> Also some ideas: I'd like ethflop to support LBA access.

Is there any practical value in LBA support, in the context of a FAT12 "floppy-like" drive?

> Other functions are passed to the server to handle as is

Indeed, I try to keep the "intelligence" on the server side as much as possible to keep the TSR's memory footprint small.

> My extensions would need to allow the server to send/reply with messages instructing the client to read from a buffer and write to a buffer.

Sounds like a lot of memory needed.

> During my studying the server for this I noticed that int 13h function 08h seems to not be supported yet. This is desirable as well.

Yes, fn 08 is on my todo list.

> According to my calculations we have 6 KiB per FAT for two FAT copies, plus 7 KiB for the root directory, plus 512 Bytes of boot sector. That's 12 + 7 + 0.5 = 19.5 KiB. So it would seem that the 16 KiB initialisation doesn't suffice to set the entire root directory to all-zeroes.

To be honest I thought that everything (incl. root entries) is in the FAT. If that's not the case then indeed, a newly created floppy on DOS might have some weird content. I will look closer into this. This is a possible problem only for the DOS version of the server, since the Linux version uses ftruncate() to grow the image, and this function zeroes everything on its own.

author: ecm
address: 176.1.219.204
date: 06.09.2024, 18:51 UTC

> Indeed. I was seeking to the full image size before and then writing 0 bytes (as it's the "common" way to grow a file in DOS), but unfortunately such write never returns an error, so I could not detect out of disk situations. Writing a single byte also does not return any error, but at least it returns the amount of written bytes so I can detect mismatches.

Yes, this is true, and I think it is the correct way to do this.

> Yes, I know. Should not be harmful since FAT is properly zeroed. The alternative would be to create the image writing all zeroes to it, or growing it with chsize(), but on my 386 this takes about 30 seconds for a 30M disk, so the ethflop TSR was always timeouting. With the "dirty" method it works, but I had to increase the ethflop timeout to 2s (before v0.6.1 it was only 100ms).

Perhaps the client could learn that it may need to wait longer in such a case? Also, why does it affect the subsequent n1440 command? Is it just that the server is still busy?

> This happens if the server answers too late. It's not a real problem (in the sense that the image gets created anyway, and further communication with the TSR is not impacted)

Not true to my experience, as mentioned a subsequent n1440 command also failed.

> but it does mean the user will miss the feedback about the disk having been created (or not). But as I was saying with the dirty method it should work. Is your VM exceptionally slow?

It's a qemu running without KVM, as we do not have access to AMD SVM on this server. (I think because the server is already virtualised and they don't offer nested virtualisation.)

> Another problem is that when the server is busy zeroing a 30M image, not only the requesting client will timeout, but all other clients too - including some that may be doing some critical copy operations at this moment.

I think there may be a way to avoid this problem. The expensive file system operation could be split into blocks, and between the blocks the server could check for and service other clients. Similarly to how XMS drivers will enable interrupts during large block moves.

> Is there any practical value in LBA support, in the context of a FAT12 "floppy-like" drive?

Yes, I'd like to check LBA support in the kernel or other programs. Beyond that I don't think ethflop necessarily needs to only support diskette drives.

> Indeed, I try to keep the "intelligence" on the server side as much as possible to keep the TSR's memory footprint small.

Understandable.

> Sounds like a lot of memory needed.

That remains to be seen.

>> According to my calculations we have 6 KiB per FAT for two FAT copies, plus 7 KiB for the root directory, plus 512 Bytes of boot sector. That's 12 + 7 + 0.5 = 19.5 KiB. So it would seem that the 16 KiB initialisation doesn't suffice to set the entire root directory to all-zeroes.

> To be honest I thought that everything (incl. root entries) is in the FAT. If that's not the case then indeed,

The File Allocation Table only contains FAT entries, one per data cluster in the FS. This entry can generally contain one of four types of contents: 0 = free, FF7h = bad, FF8h to FFFh = this cluster is allocated and the End Of Chain, else = this cluster is allocated and points to another allocated cluster. It does not contain any directory entries. The FAT12 FS is split into five types of sectors: Reserved sectors (usually just the 1 boot sector), FAT sectors, usually 2 copies of the FAT back to back, root directory sectors of a fixed size, data sectors (including file data and subdirectory data), and possibly trailing unused sectors (at most cluster size minus 1).

> a newly created floppy on DOS might have some weird content. I will look closer into this. This is a possible problem only for the DOS version of the server, since the Linux version uses ftruncate() to grow the image, and this function zeroes everything on its own.

Yes, like I calculated the largest image type will have trailing trash in the root directory. That means when enough entries are written to the root, DOS will encounter a bunch of invalid entries at the end.

author: Mateusz V.
address: 176.157.255.77
date: 06.09.2024, 20:31 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Perhaps the client could learn that it may need to wait longer in such a case?

Waiting longer is not really a solution, because other clients will still be impacted.

> Also, why does it affect the subsequent n1440 command? Is it just that the server is still busy?

Yes, I think the server was still busy. I will add a timer on the server to log the time it takes to create disks. This way there will be no doubt.

> It's a qemu running without KVM, as we do not have access to AMD SVM on this server. (I think because the server is already virtualised and they don't offer nested virtualisation.)

Even a 100% emulated QEMU should not be slower than my 386. I don't think the lack of performance lays in the CPU - maybe the I/O speed? Once I have the timer done in ethflopd we will know more.

> The expensive file system operation could be split into blocks, and between the blocks the server could check for and service other clients. Similarly to how XMS drivers will enable interrupts during large block moves.

It's an option, but it brings its own complications - for example the image cannot be mounted until all blocks are done. Another option could be to use a "copy on write", ie zero the "dirty" block the first time a client wants to access it (unless it has been written to by the client before). But this is also not without complication - the server would need to keep some map somewhere to know which blocks are "good". And, ultimately, the "dirty growing" method must stay anyway, because otherwise it is impossible to tell the client whether or not the disk creation operation succeeded. So we are effectively talking here about only writing the 20K of data to cover the boot sector + 2xFAT + root entries. Surely this can be done without over-engineering... I have to think about it and do some measures.

> It does not contain any directory entries.

Thank you for your patience and all the explanations. I really should have googled this one earlier.

author: ecm
address: 89.246.111.84
date: 07.09.2024, 17:01 UTC

I ported the free software AMD packet driver from crynwr.com http://crynwr.com/drivers/amdpd.zip to NASM: https://hg.pushbx.org/ecm/amdpd/file/457c872d32c4 Build available at https://pushbx.org/ecm/download/amdpd.zip Running ethflop and ethflopd in qemu both work with this build.

author: Mateusz V.
address: 176.157.255.77
date: 07.09.2024, 18:33 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

On my side I published ethflopd 20240907 last night. The two major changes are that it displays the time it takes to create image files and it zeroes the first 32KiB of newly created images (to be sure to cover all root entries). I tested it this afternoon on some real hardware. This is the result on my 386SX: https://i.ibb.co/GF77TsF/20240907-180143.jpg

I also tested it on a Pentium II running at 200-something MHz and creating a 31M image takes about 900ms there. These results lead to some preliminary conclusions: 1. The CPU power has a negligible impact (a CPU 10x faster leads to only a 30% speedup). 2. Writing the 32K of data does not take much time (less than 200ms probably) 3. The "dirty file growing" used by DOS is not free, more space to grow = more time spent, but its overhead is not exactly predictable. Growing files to 2.8M and 9.6M took the same time (around 0.5s), while growing a file by 300KiB was only twice faster (0.22s) and growing a file to 31M took 1.3s (6x more time for 100x more space grown). I suspect that the main overhead when growing a file is for DOS to find available clusters to link, so the time spent probably depends on disk fragmentation, FAT16 vs FAT32 and possibly other such factors. Also, my tests were all done on machines with CompactFlash cards. I do not have any PC with a magnetic disk. I'm not sure whether it would be faster or not. The raw speed of a HDD is usually higher than a CF card, but its seek times are always much worse. I also made tests running ethflopd from a RAMDISK (on the PII machine). There, creating a 31M disk took consistently 0ms (really: less than one PIT tick, ie. less than 55ms). I'm still not sure what the best solution is. One option I thought about is to not grow the newly created diskette images and leave them to be 32K big - they will be extended as soon as some writes are done by clients. The only drawback is that ethflopd would need to change its method of detecting the floppy size (currently it simply looks at the size of the image file). Maybe prefix the image with an extra byte, or encode the size in the file extension... or read it from the image's boot sector. Another option could be to increase the timeout on the client to, say, 5s and call it a day. It would be nice to have some more timing results, though. Ideally on real hardware with magnetic drives.

author: Mateusz V.
address: 176.157.255.77
date: 07.09.2024, 22:49 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Committed to svn: ethflopd is now thin-provisionning the floppy images. This means that when a new image is created, it's filesize is only 32K (boot sector + FAT1 + FAT2 + root entries + some zero padding for safety) and grows only when it is needed to accommodate client write operations. If a client requests data that is not yet "physically available" (but still within the bounds of the filesystem as indicated by the "total sectors" word at offset 0x13 of the boot sector) then a fake, zeroed sector is returned. If a client writes new data, then the image is expanded accordingly. I tested this in a few scenarios, seems to work nicely. Image creation is now immediate, no matter the requested size. One drawback is that the client is able to change the size of the floppy by updating its boot sector. But maybe it's a feature. Another catch is that users might dislike the fact that they cannot take an ethflopd-created floppy image and use it with other floppy-image-processing programs (or mount it in a VM, etc). Perhaps a tool is needed to inflate such thin-provisioned images to their canonical size.

author: Cryptus
address: 87.174.183.178
date: 07.09.2024, 22:55 UTC

The new DOS server (ethflopd 20240907) reports a strange drive letter (with drive B: on the client side) "Disk NEWIMG loaded (2880 KiB) in drive Ö:" https://ibb.co/VJCprh3 It is working with drive B: though.

author: Cryptus
address: 87.174.183.178
date: 07.09.2024, 23:01 UTC

> Another catch is that users might dislike the fact that they cannot take an ethflopd-created floppy image and use it with other floppy-image-processing programs (or mount it in a VM, etc).

Indeed, I would dislike that.

author: Mateusz V.
address: 176.157.255.77
date: 08.09.2024, 07:42 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> I would dislike that.

To be honest I dislike it, too. But on the other hand I do like the idea of having thin provisioning to save disk space and make image generation fast. Ok, new concept: ethflopd will thin-provision floppy images larger than 2.88M, and do full pre-allocation for all smaller sizes (360K, 720K, 1.44M, 1.68M, 1.72M and 2.88M). This way for "standard" sizes we still get a proper image, and for the weird ethflop-specific sizes there is thin provisioning so ethflopd does not risk getting too busy when initializing them (+the images do not grow unnecessarily if they are not truly filled with data).

> The new DOS server (ethflopd 20240907) reports a strange drive letter

I forgot to hide that one. The drive letter is supposed to be reported only for newer ethflop TSRs (unreleased yet). The older one (0.6 and 0.6.1) do not provide the local driveid information to ethflopd. It's fixed in svn now: https://sourceforge.net/p/ethflop/code/182/tree//ethflop-server/trunk/core.c?diff=5d8fae32e39601433ecbde88:181

author: Cryptus
address: 87.174.183.178
date: 08.09.2024, 15:26 UTC

Ok, just compiled and tested ethflop 0.6.2 und ethflopd (DOS) from svn - now working as expected. However, your "new concept" sounds better :-)

author: Cryptus
address: 87.174.183.178
date: 10.09.2024, 16:43 UTC

Issues (only) with MS-DOS? Until now I only tested ethflop client with FreeDOS (with newest svn). Today I found errors with MS-DOS 5.0 and MS-DOS 6.22 as client. The ethflop TSR loads, but when copying a file, I get an error: "Not ready error writing drive B. Abort, Retry, Fail?" Doesn't matter if the virtual floppy is at A: or B: Screenshot: https://ibb.co/QM5VspX Further testings with Caldera DR-DOS 7.03 and Enhanced DR-DOS 7.01.08 (ecm, June 2024) do work without problems (as with FreeDOS). Tests done with both (i.e. Linux and DOS) servers (20240910). Floppy img 1,4 MB. Note: It's (always the same) virtual setup, no real hardware tests yet. Mateusz, can you reproduce this and/or give me hints what more I could check here? Frank

author: Mateusz V.
address: 176.157.255.77
date: 10.09.2024, 19:00 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Today I found errors with MS-DOS 5.0 and MS-DOS 6.22 as client.

> The ethflop TSR loads, but when copying a file, I get an error: "Not ready error writing drive B. Abort, Retry, Fail?"

> Doesn't matter if the virtual floppy is at A: or B:

Interesting. In theory there should not be anything specific about MS-DOS (and I am sure I have tested it successfully in the past). I will try to reproduce it on my QEMU setup. Do you see any suspect logs on the server side? (on Linux all logs go to syslog, so you should look in your /var/log/messages or similar, depends on your distribution). Mateusz

author: Cryptus
address: 87.174.183.178
date: 10.09.2024, 19:22 UTC

Ok, more testing (MS-DOS 6.22, ethflop drive = B:). 1.: Nothing special in the linux server log. 2.: Strange: Copying from CDROM drive to virtual floppy works (!), copying from drive A: or drive C: (HDD) doesnt: "Error at INT24". This is with virtual environment (86Box on the DOS side, QEMU as server, vde2 hub), now throwing on real hardware...

author: Cryptus
address: 87.174.183.178
date: 10.09.2024, 19:41 UTC

Oh..., when using a real PC as client to connect (MS-DOS 6.22, ethflop drive = B), it works! But the question remains: Why is it ok for FreeDOS/DR-DOS in a virtual environment and for MS-DOS not? Probably there are more than a few people using *DOS in virtualized environments. Is "Error at INT24" helpful? Should more tests in different virtualized environments (PCem, 86Box, Qemu, Bochs etc.) be necessary?

author: ecm
address: 89.246.111.84
date: 10.09.2024, 19:51 UTC

Wildly guessing, this may be related to changeline support.

author: Mateusz V.
address: 176.157.255.77
date: 10.09.2024, 20:32 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Wildly guessing, this may be related to changeline support.

My first thought as well.

> "Error at INT24"

I think it is a symptom, not the cause. INT 24 is the code that displays "Abort, Cancel, Retry...". I am not sure why it says "Error" but it should not have been called in the first place anyway.

> Should more tests in different virtualized environments (PCem, 86Box, Qemu, Bochs etc.) be necessary?

I wouldn't want to abuse your time, but maybe one last thing if you don't mind: is the issue also occurring with older releases of the ETHFLOP TSR (0.6.1 / 0.6) ? If it's a regression it will be easier to locate.

author: Cryptus
address: 87.174.183.178
date: 10.09.2024, 21:01 UTC

Testing on MS-DOS 6.22: ETHFLOP TSR 0.6 (2019) (server is 20240910): (setup: 86Box on the DOS side, QEMU as server, vde2 switch/slirpvde) Copying from local CDROM drive does work, but copying from drive A: or from a ramdisk (xmsdsk) to virtual floppy B: gives error: "Not ready error writing drive B. Abort, Retry, Fail?" as seen in (already above mentioned) screenshot: https://ibb.co/QM5VspX Usually I use FreeDOS, so I didn't detect this earlier...

author: Mateusz V.
address: 176.157.255.77
date: 10.09.2024, 21:12 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

I tried with ETHFLOP 0.6.2 (trunk) + ethflopd 20240910 (trunk) with MS-DOS 6.0 under QEMU for both client and server: cannot reproduce (COPY A:\COMMAND.COM B: succeeds). Two follow-up questions: 1. are you able to reproduce the issue with QEMU on the client side, or is it an 86Box-only problem? 2. do you have the issue with all floppy images, included some that you would have newly created? This is to exclude that the floppy image has a broken filesystem.

author: Mateusz V.
address: 176.157.255.77
date: 10.09.2024, 21:20 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Ideally: if you could upload somewhere a bootable client floppy that you have the issue with, it would be awesome... I'd be 100% sure we are using the exact same software and config files. I am off for now, but I will look into this again tomorrow, I really need to reproduce this issue.

author: Cryptus
address: 87.174.183.178
date: 10.09.2024, 21:52 UTC

Well. I just switched my setup, now 86Box = server side and Qemu = client side (with MS-DOS 6.22 as client). Now it works. Copied files (text files) compare ok. Only smaller (new) issue now is: Debug log (DOS server 20240910 trunk) spits out messages like: "CHECKSUM MISMATCH! Computed: 0x6378h Received: 0x00h" when I do commands like "dir" or "type sample.txt" on the client side on the virtual floppy (although the copied files seem to be ok). Floppy A: (client boot floppy) is always an eltorito boot image, as I start DOS from bootable CDROM. But it works "standalone" too, of course. Those images are/work ok. But that's why I added some tests with copying files from C:\ drive, from a ramdrive and from CDROM. Maybe its an 86Box-related thing? Tests with its "predecessor" PCem might be interesting, but my compiled version 14 can only do slirp, not vde.

author: Cryptus
address: 87.174.183.178
date: 10.09.2024, 22:16 UTC

Did another test with two VirtualBox instances connected via 'intnet' (VirtualBox internal net). No issues here. So it seems to be somehow 86Box-related... (nevertheless I ask myself: Why is it ok for FreeDOS/DR-DOS and not for MS-DOS?) If you're still interested in my MS-DOS 6.22 boot floppy, I'd prefer to mail it instead of uploading it somewhere. Just tell me (I still have your address from our EtherDFS conversation in May)

author: Mateusz V.
address: 176.157.255.77
date: 10.09.2024, 22:58 UTC

Mail is good od course. :) thanks! I will try to setup 86box networking tomorrow. And will check this cksum mismatch thing.

author: Mateusz V.
address: 176.157.255.77
date: 11.09.2024, 09:56 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

@Cryptus -- would you mind providing some hints as to how you configured your 86Box+VDE combo? I've just spent 2 hours trying to figure out how to make 86Box output frames to VDE. Tried a variety of vde_switch syntaxes, also tried some vde_plug and vde_plug2tap exorcism. All failed. Maybe I'm stupid, or maybe my 86Box is somehow broken. VDE itself seems to work because I was able to make two QEMU VMs communicate with each other through VDE. But not 86Box. The NICs I tried are PCnet-ISA, PCnet-ISA+, PCnet-VL and EtherLink II (with a suitable packet driver). The packet driver loads all right, yet ethflop is unable to discover the server that runs on a QEMU VM connected to the same VDE switch.

author: Mateusz V.
address: 176.157.255.77
date: 11.09.2024, 10:15 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

I'm also unable to reproduce the "cksum mismatch" issue that you described yesterday (I used two QEMU VMs). Have you observed this ONLY when one of the sides was running 86Box, or is it more general? If it shows up only with 86Box, then maybe the two problems are having the same cause.

author: Cryptus
address: 87.174.183.178
date: 11.09.2024, 11:46 UTC

Setup vde2 on your (hopefully linux) PC: (once all is done, it's only the start script I run in a terminal) vde2: Don't remember exactly which package (version) I installed, but I have the following files on my Debian PC, all dated 1/2019: /usr/bin/vde_autolink /usr/bin/vde_l3 /usr/bin/vde_over_ns /usr/bin/vde_pcapplug /usr/bin/vde_plug2tap /usr/bin/vde_switch /usr/bin/vdeq /usr/bin/vdeterm /usr/bin/unixcmd /usr/lib/vde2/plugins/dump.so /usr/lib/vde2/plugins/iplog.so /usr/lib/vde2/plugins/pdump.so /usr/lib/vde2/vde_l3/bfifo.so /usr/lib/vde2/vde_l3/pfifo.so /usr/lib/vde2/vde_l3/tbf.so /usr/lib/vde2/libvdetap.so /usr/lib/vde2/vdetap And additionally install tmux. With a simple shell script you can start the VDE Hub/Switch (see below). In 86Box, you go to network setup (Network card #1), select your adapter, and in Mode choose "VDE". You have to enter the path to "VDE Socket" manually, I choose "/tmp/vde.ctl". Must of course correspond to path in your start scripts. VDE Hub must be running (i.e. started before), otherwise 86Box complains. Qemu you start with "-net vde,sock=/tmp/vde.ctl" (at least the Qemu version I use, but it's in the docs). Here is my all-in-one VDE shell script: #!/bin/sh # VDE Hub # Memo: close vdeterm with 'shutdown' # Configuring 86Box for VDE # # Go to the emulated machine’s network settings and select VDE # as the mode for the emulated network card. # Enter the control socket path, which is /tmp/vde.ctl, in the VDE Socket box. # vde_switch requires root privileges to create the switch. # Applications will be able to connect to the switch with unprivileged # (non-root) permissions. if test `id -u` -ne 0 ; then echo "Oops: Only as root!" exit 1 fi # vde_switch # - Creates the management socket at /tmp/vde.mgmt # - Creates the control socket at /tmp/vde.ctl # - Sets the sockets’ permissions to world read/write to allow unprivileged access # - Sets the number of switch ports to 6 # - x: Make the switch act as a hub # check if Socket /tmp/vde.ctl exists if [ ! -S /tmp/vde.ctl ]; then tmux new-session -d -s vde \ vde_switch -x --mode 666 --numports 6 --mgmt /tmp/vde.mgmt --mgmtmode 666 -s /tmp/vde.ctl fi # set up slirp VDE # (It acts like a networking router connected to a vde_switch and provides # connectivity from the host where it is running to virtual machines inside # the virtual network.) echo "Starting slirpvde..." slirpvde -s /tmp/vde.ctl & sleep 2 # VDE switch status: # The vdeterm command can be used to view the status of the virtual switch. # It requires the path to the management socket (instead of the control socket) # created alongside the switch; the command would be vdeterm /tmp/vde.mgmt # for our example. # One helpful command is "port/allprint" which displays a list of all virtual # switch ports and the processes attached to them. # only a reminder for forgetful users as me.... echo "VDE Hub: 'port/allprint' in vdeterm shows a list." echo "Check the status bar of each machine" echo "to make sure the emulated network cable is actually connected." for i in 0 1 2 3 4; do sleep 1 if [ -S /tmp/vde.mgmt ]; then vdeterm /tmp/vde.mgmt break fi done if [ -S /tmp/vde.mgmt ]; then unixcmd -s /tmp/vde.mgmt shutdown fi pskill slirpvde rm -rf /tmp/tmux-0 HTH, Frank

author: Mateusz V.
address: 176.157.255.77
date: 11.09.2024, 12:02 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Maybe I'm stupid, or maybe my 86Box is somehow broken.

It was the former one. Turns out that once a NIC is configured in 86Box, it is "unconnected" by default. I just had to go to the "media" menu, then select the NIC and click on the "connect" checkbox... And now it works. And I am still unable to reproduce the problem. Here is my setup so far: CLIENT: - 86Box 4.2.1 b6130 - boots from a floppy with MS-DOS 6.00 - NIC set to AMD PCnet-PCI II - ethflop 0.6.1 - packet driver PCNTPK 03.10 SERVER: - QEMU, SvarDOS (EDR kernel), trunk ethflopd Client and server communicate through VDE. It all works fine for me. I also successfully launched the server on 86Box and the client in QEMU, and everything still works fine. There is quite a lot of moving parts here, so I suppose you have hit some "lucky" combination. To reproduce the problem I really need to have your exact environment. Also, on your side you might try changing some things: - boot from a floppy instead of a CDROM - change the model of the emulated NIC card (and the packet driver) - change the model of the emulated PC - upgrade 86Box Perhaps you would be able then to isolate the trigger element. BTW, before you run ethflop, is your B: drive working all right? (are you able to put a diskette image in it and access it?) I'm wondering if ethflop is catching the B: drive at all - maybe the BIOS performs some funny business with B: when it boots from CD and makes it inaccessible...

author: Cryptus
address: 87.174.183.178
date: 11.09.2024, 12:53 UTC

OMG! It was the NIC in 86Box. I switched from NE2000 to AMD PCnet-PCI II and it works now! I still don't understand why the exactly same setup (i.e. with NIC=NE2000 and the same packet driver, in the same IRQ-/IO- configuration) does work with FreeDOS and DR-DOS, but not with MS-DOS. Might remain a mystery, but doesn't really matter (for me at least). Anyway: Sorry to bother you with this strange issue. Frank P.S. Yes, the "connect" thing in 86Box is really stupid, I ran into it at first, too. See my comments in the script...

author: Mateusz V.
address: 176.157.255.77
date: 11.09.2024, 13:26 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Cool, so we have the trigger then. :-) It does not have to be the NIC itself, though. I'd rather bet on a different behavior of the packet driver. For example first thing that comes to mind is that maybe the NE2000 packet driver requires a lot of stack space. I've seen packet drivers use the stack to put entire ethernet frames on it (1.5K!). Ethflop provides a very modest stack of 690 bytes. Maybe it is not enough, and the NE2000 packet driver sometimes overflows it, overwriting some other part of memory. The first code line of ethflop.asm defines the stack size. Currently it is this: STACKSIZE equ 690 Could you try changing it to 2048, rebuild the binary (run "build.sh" on Linux) and test again on your setup? If this happen to work, then I think I will have to do some measurements with different packet drivers to assess how much stack they need. And/or ethflop could be able to detect stack overflow situations and print a warning about it. And possibly allow to make the stack size configurable on cmdline.

author: Mateusz V.
address: 176.157.255.77
date: 11.09.2024, 15:07 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

I have added a stack overflow detection mechanism to ethflop, committed to svn right now. This detects mild cases of stack overflow (mild cases meaning that the machine did not immediately crash). It works this way: when the transient part of ethflop is called, it locates the TSR and looks at the bottom of its private stack area for a special signature. If the signature is damaged, then a "STACK OVERFLOW" message is printed and the transient code refuses to perform any further action. This should help dimension the stack size in an optimal way. And ultimately I think it should be configurable anyway. For example on my QEMU system with the PCnet packet driver the stack usage is very low (only around 80 bytes) while one of my 3COM cards in real hw has a packet driver that requires over 600 bytes. So this highly depends on the packet driver (+ some possible overhead from BIOS int 13h handler).

author: Cryptus
address: 87.174.183.178
date: 11.09.2024, 17:13 UTC

Further testing (client as usual: MS-DOS 6.22, NE2000 standard setting = IRQ 3, I/O 0x300) 1) I have compiled ethflop.asm with stacksize 2048. Does not help, no change in behaviour. 2) I have compiled newest ethflop.asm from svn. Does not help, no change in behaviour, also no "STACK OVERFLOW" message. 3) Did the same/above two tests with a newer NE2000 packet driver (changing from 10.4.1 to 11.4.3), Does not help, no change in behaviour. Maybe stacksize is not the problem? Small remark: To be able to compile ethflop.asm under MS-DOS, build.bat should have DOS line endings.

author: Mateusz V.
address: 176.157.255.77
date: 11.09.2024, 19:08 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

> Maybe stacksize is not the problem?

It would seem so, yes. So I am back at trying to reproduce the problem... And still failing at that :-/ I used your german MS-DOS 6.22 floppy image and mounted it in 86Box on an emulated Pentium II machine with the "Novell NE2000" NIC. Proceeded with a few adjustments: - I disabled the keyb2 tsr so the keyboard layout does not drive me insane - copied NE2000.COM v 11.4.3 to the floppy - copied ETHFLOP.COM (trunk) to the floppy - added a line in autoexec: NE2000 0x60 3 0x300 Then I boot into menu 7 ("plain DOS, only HIMEM.SYS, without CDROM"). And: ethflop b ethflop n1440 test ethflop i test copy command.com b:\test.dat It works without issues. I also tried booting in the default menu (1) and kept the keyboard driver, did the same test. Also works. Network connectivity is over VDE to a local TAP interface where a linux ethflopd listens. Then I also tried replacing the linux ethflopd with a QEMU VM running the DOS version of ethflopd --> still works. Also tried changing the VM type to be some emulated 386SX PC --> still works. So it seems the combo "86Box + MSDOS + NE2000" is not enough. There is yet another ingredient needed to make the issue appear. I suspect either the version of 86Box (mine is 4.2.1 b6130) or a specific VM machine type (its BIOS).

author: Cryptus
address: 87.174.183.178
date: 11.09.2024, 19:40 UTC

Ok, going to update my 86Box version to 4.2.1 b6130 (was only about 100 builds behind), will report.

author: Cryptus
address: 87.174.183.178
date: 11.09.2024, 20:09 UTC

Your suspect was right: the version of 86Box! After updating to 4.2.1 b6130 it works now! While downloading I checked the corresponding 86Box Changelog, it says: - Fixed loss of received packets on DEC and NE2000-based cards. Maybe this was the error? (still no clue, why only with MS-DOS and not with FreeDOS/DR-DOS, but anyway it's history now) Surprisingly I never had issues with the ne2000 implementation in 86Box before, also in older builds, and I always do networking. But well, it's an emulator = piece of software... Thanks for your patience :-)

author: Mateusz V.
address: 176.157.255.77
date: 11.09.2024, 20:38 UTC

078ac416e4811d4f ac7134fc6c34222e e676a6b143232a57 f638c349b414e0e6 b25cc1129ce8293c bc9a49b88c1ea9f5 a8d5c26e0797bc6f 2eba886e5c7e366b

Nice, thanks for the update!

> While downloading I checked the corresponding 86Box Changelog, it says:

> - Fixed loss of received packets on DEC and NE2000-based cards.

> Maybe this was the error?

I don't think so. A simple loss of some received packets should not have such impact. And, actually, ethflopd has a a special mode to test it: there are two #defines in the CORE.C file, "SIMLOSS_INP" and "SIMLOSS_OUT". If set to non-zero, they mean that ethflopd will simulate packet loss at the given percentage, either for inbound frames or outbound frames. So "SIMLOSS_OUT 50" means that ethflopd will "forget" to send randomly about 50% of its answers. I tested it right now with your floppy: even in such heavily degraded mode it still works. It's *much* slower of course (because ethflop has to use lots of retransmissions), sometimes the DOS INT 24 pops out ("Abbrechen, Wiederholen, ...") but when told to "wiederholen" the copy eventually finishes and the file is good. So I suppose the 86Box developers must have fixed something else around NE2000. Something much less innocent than just packet loss, that was causing some memory corruption or changed cpu registers. I googled quickly and found this: "Fixed emulator crash during Windows installation with NE2000-based cards" https://86box.net/2024/07/26/86box-v4-2.html This sounds like the kind of things that could lead to the symptoms you had. It is not the first time apparently, because it seems they had a very nasty thing in NE2000 last year also: "Fixed NetWare packet corruption on NE2000 cards" https://86box.net/2023/08/26/86box-v4-0.html Anyway, seems to be clearly a 86Box bug, so we can finally rest. :) It wasn't wasted time, though - the stack overflow detection might prove useful in the future. Plus, it was the occasion for me to learn about this "VDE" thing. In any case - thank you for your reports, and all the tests!

SvarDOS community forum

ETHFLOP v0.6.1