Main page I Packages I Help I Forum
a place to talk about SvarDOS and other DOS-related things
> I looked at it a bit.Quite an understatement. :-) Thank you for having looked at this! It's a very impressive overview. I do not understand it all, hence a few follow up questions, if you don't mind.
> In FINDPKTDRVR https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l710 you are comparing words from the interrupt vectors. If you're unlucky this can cause a GPF if your a16 address is exactly 0FFFFh so that the word access's high byte exceeds the 64 KiB segment limit that is usually effective for Real or Virtual 86 Mode.I never run protected mode so indeed I did not thought of such "wild" access to be a risk. To avoid this, I see two options: either normalizing the far pointer (check if offset > 16, if yes then do "offset -= 16" and "seg -= 1"), or simply check that offset is less than 0xFFF5 before doing the actual check. I doubt that a packet driver would register such a high offset, so the second option is probably good enough. What is the "best practice" solution in such situations?
> The filler part at the end https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l1136 can be made conditional using an %if that grabs the current size of the code assembled so far. (In other words, check that the times amount is not negative.) So if this was ever needed it would automatically be enabled.Good idea. I will have to dive into the nasm documentation.
> In the placeholder handler at https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l587 you put the size of the or instruction on the immediate rather than the memory operand. It is more idiomatic to put it on the memory operand. Besides, as the immediate fits in a byte in this case you can optimise this to just use a byte operand.Ha, I have been bitten by this recently here: http://svn.svardos.org/diff.php?repname=SvarDOS&path=%2Fsvarcom%2Ftrunk%2Fcommand.c&rev=1906&peg=1906 Defining the size on the immediate seemed more natural to me. This works on NASM, but not on WASM (wasm was silently ignoring my "word" directive). But back to ETHFLOP and NASM: "or [bp+6], word 1" is assembled as 83 4E 06 01, so it's already a byte! If I change it to "or [bp+6], byte 1", the encoding is exactly the same. A bug? No, it appears to be a feature because if I change it to "or [bp+6], word 0x101" then the encoding becomes 81 4E 06 01 01. So NASM is smart enough to figure out that OR-ing with 0x0001 is the same as OR-ing with 0x01 and uses the shorter variant. How cool is that? The funny thing is that if I do "or [bp+6], 1" then NASM complains that I need to provide a size (only to disregard it). :-D Apparently NASM does the same optimization on ANDs: "and [bp+6], word 0xFFFE" is encoded as 83 66 06 FE. But it any case you are right that it is better to write code that follows conventions. So I changed it to "or byte [bp+6], 1". It does not change the emitted code, but at least it's clearer to other humans.
> You aren't freeing the process handles in your PSP upon TSR terminate (21.31). That means if a user runs this with "> NUL" your program will leak an SFT entry.I thought about this when you mentioned this "leaking SFT entry" thing at the occasion of your improved EIDL build, but I am unsure what to do. Am I supposed to call INT 0x21,AH=0x3E (close file handle) on all standard handles before TSR-ing? (stdin, stdout, stderr, stdaux, prn). And why would it be a problem for ">NUL" only?
> You do free the environment before TSR terminate https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l697 but you don't zero the word in the PSP. I think it is better to do so.The TSR never tries to access its environment, but maybe there are some diagnostic tools that could be tempted to explore the environment of loaded programs. It does not harm to zero the env - done now.
> Your find TSR routine is very barebones: https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l739 It appears that your program must be the topmost int 13h handler.That's true, yes. If another player hooks up 13h then all hope is lost. I find this reasonable enough for most situations. At least for me. :) I was also pondering about adding some non-standard int 13h call with a special signature so the ethflop TSR could catch it and advertise itself, but I was afraid that cluttering the int 13h API would eventually lead to some disaster should another TSR catch my signature or if the BIOS reacts to it in a weird way.
> Besides, you change si and di without noting that you do in the comment.Comment extended.
> And the repeated cld is not needed.Where is it repeated?
> And the clc is not needed either, if rep* cmpsb returns ZR then it also has set NC already.Could be, but I am not fluent enough to see this right away, so I prefer to have it explicit.
> Besides, I have some more ideas for what you could add as options:
> * Install a DOS block device instead of taking over an existing drive on int 13h. This would most readily be a new drive of its own with a previously unused drive letter. The easiest way to do that would be to load as a DOS device driver (in f/d/config.sys or using devload). You could make a dual-mode executable that can run either as an application or as a device, to support both modes.I thought about it in 2019... But I never created a device driver, while taking over an INT is easy - so I went the easy way. :-P
> * Modify the UPB / UDSC / DDT entry of the drive to take over, which would allow to use drive B: using DOS's internal block device. This may need some knowledge of DOS/BIO internals, which can differ between kernels.Also something I (loosely) researched. And dropped the idea when I noticed that it implies meddling with internal DOS structures. It's something that I do in EtherDFS, but it became such a beast that I afraid to look at it now. So I try to keep ethflop as simple, generic and "universal" as possible - accepting some limitations here and there.
> * Load as a pre-DOS driver (or rootkit, basically). You can relocate the EBDA (if needed) and install your driver at the top of the Low Memory Area, then keep it resident by modifying the "amount KiB of low memory" in word [0:413h] (also returned by int 12h). At that point you can hook int 13h and direct requests for a specific unit to your control flow. Then you can run a DOS boot sector loader (or bring your own boot). Problem: The packet driver most likely needs a running DOS to initialise and install itself, so you would have to activate the resident program later from the DOS command line by running the transient program (as device or application).At some point I was thinking of putting ethflop into the programmable EPROM of a network card so it could extend the BIOS at boot. If I understand correctly that's more or less what you describe. But then I realized that I would have to drive the eth card myself since I would not have access to a packet driver, and that's not an adventure I am planning to embark on. Having ethflop be a part of the BIOS but being able to activate it only once DOS (and a packet driver) is loaded does not seem to provide any added value.
> > I looked at it a bit.
> Quite an understatement. :-) Thank you for having looked at this! It's a very impressive overview. I do not understand it all, hence a few follow up questions, if you don't mind.Sure! Glad to help.
> > In FINDPKTDRVR https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l710 you are comparing words from the interrupt vectors. If you're unlucky this can cause a GPF if your a16 address is exactly 0FFFFh so that the word access's high byte exceeds the 64 KiB segment limit that is usually effective for Real or Virtual 86 Mode.
> I never run protected mode so indeed I did not thought of such "wild" access to be a risk.It is a little known fact but on most 386+ machines a word access to 0FFFFh can tear like this. On 8086, 186, and 286 it may fault, or access the byte at offset 0FFFFh and the byte at offset 0000h, or the second byte may be accessed from one higher than can be addressed using the segment.
> To avoid this, I see two options: either normalizing the far pointer (check if offset > 16, if yes then do "offset -= 16" and "seg -= 1"),Think you meant seg += 1. Yes, this would work too.
> or simply check that offset is less than 0xFFF5 before doing the actual check. I doubt that a packet driver would register such a high offset, so the second option is probably good enough.What is the "best practice" solution in such situations? I usually go with checking for a high offset like you suggested, eg in my most recent EDR-DOS changeset to check for an IISP "KB" signature: https://hg.pushbx.org/ecm/edrdos/rev/1e453d972df2#l2.18 Another approach is to use repe cmpsb or another way to check byte for byte rather than word-wise, which also eliminates the possibility of tearing, eg https://hg.pushbx.org/ecm/ldosboot/file/439448ca4188/boot.asm#l1591
> > In the placeholder handler at https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l587 you put the size of the or instruction on the immediate rather than the memory operand. It is more idiomatic to put it on the memory operand. Besides, as the immediate fits in a byte in this case you can optimise this to just use a byte operand.
> Ha, I have been bitten by this recently here:http://svn.svardos.org/diff.php?repname=SvarDOS&path=%2Fsvarcom%2Ftrunk%2Fcommand.c&rev=1906&peg=1906
> Defining the size on the immediate seemed more natural to me. This works on NASM, but not on WASM (wasm was silently ignoring my "word" directive). But back to ETHFLOP and NASM:
> "or [bp+6], word 1" is assembled as 83 4E 06 01, so it's already a byte! If I change it to "or [bp+6], byte 1", the encoding is exactly the same. A bug? No, it appears to be a feature because if I change it to "or [bp+6], word 0x101" then the encoding becomes 81 4E 06 01 01. So NASM is smart enough to figure out that OR-ing with 0x0001 is the same as OR-ing with 0x01 and uses the shorter variant. How cool is that? The funny thing is that if I do "or [bp+6], 1" then NASM complains that I need to provide a size (only to disregard it). :-D
> Apparently NASM does the same optimization on ANDs: "and [bp+6], word 0xFFFE" is encoded as 83 66 06 FE.
> But it any case you are right that it is better to write code that follows conventions. So I changed it to "or byte [bp+6], 1". It does not change the emitted code, but at least it's clearer to other humans.Look again. A word vs byte destination operand size for "or" is NOT changed by nasm automatically, I knew that immediately because the result (Zero Flag) could differ between the two so the assembler mustn't substitute byte size for word size. However, there is an encoding of word destination with an imms8 (sign-extended 8-bit immediate) source operand. Observe: test$ cat test.asm or word [bp + 6], 1 or byte [bp + 6], 1 or [bp + 6], word 1 or [bp + 6], byte 1 test$ nasm test.asm -l /dev/stderr 1 00000000 834E0601 or word [bp + 6], 1 2 00000004 804E0601 or byte [bp + 6], 1 3 00000008 834E0601 or [bp + 6], word 1 4 0000000C 804E0601 or [bp + 6], byte 1 test$ The 83... opcode is r/m16, imms8 whereas the 80... opcode is r/m8, imm8. It is true that neither is shorter in this case so my optimisation suggestion here was wrong. The same is true of "and". > > You aren't freeing the process handles in your PSP upon TSR terminate (21.31). That means if a user runs this with "> NUL" your program will leak an SFT entry.
> I thought about this when you mentioned this "leaking SFT entry" thing at the occasion of your improved EIDL build, but I am unsure what to do. Am I supposed to call INT 0x21,AH=0x3E (close file handle) on all standard handles before TSR-ing? (stdin, stdout, stderr, stdaux, prn).Yes, though I hardened this to simply closing all PHT entries, see https://hg.pushbx.org/ecm/fdapm/file/350b11660733/source/fdapm/fdapm.asm#l146
> And why would it be a problem for ">NUL" only?The CON, AUX, and PRN handles for the five std handles are typically DUPlicated from the shell's, so the leak in that case is just an increased use count of these "forever" handles. If you redirect to a file or NUL however, you get a new handle that is only used by the program being invoked. You can test this if you look at the SFTs.
> > You do free the environment before TSR terminate https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l697 but you don't zero the word in the PSP. I think it is better to do so.
> The TSR never tries to access its environment, but maybe there are some diagnostic tools that could be tempted to explore the environment of loaded programs. It does not harm to zero the env - done now.Yes, exactingly.
> > Your find TSR routine is very barebones: https://sourceforge.net/p/ethflop/code/152/tree/ethflop-client/trunk/ethflop.asm#l739 It appears that your program must be the topmost int 13h handler.
> That's true, yes. If another player hooks up 13h then all hope is lost. I find this reasonable enough for most situations. At least for me. :)I was also pondering about adding some non-standard int 13h call with a special signature so the ethflop TSR could catch it and advertise itself, but I was afraid that cluttering the int 13h API would eventually lead to some disaster should another TSR catch my signature or if the BIOS reacts to it in a weird way. I agree that random signatures are not a good look. However, you could implement an AMIS multiplexer to find your resident instance. > > And the repeated cld is not needed.
> Where is it repeated?Early in your transient program and immediately before the repe cmpsb. You don't even need any of these actually because DOS will always set up the UP state initially. But if you do wish to keep it, keeping a single cld at the start of the program will do nicely.
> > And the clc is not needed either, if rep* cmpsb returns ZR then it also has set NC already.
> Could be, but I am not fluent enough to see this right away, so I prefer to have it explicit.That's why I pointed it out. I do usually put a comment to this effect if I depend on such "incidental" flag status because it isn't always obvious to me either. > > Besides, I have some more ideas for what you could add as options:
> > * Install a DOS block device instead of taking over an existing drive on int 13h. This would most readily be a new drive of its own with a previously unused drive letter. The easiest way to do that would be to load as a DOS device driver (in f/d/config.sys or using devload). You could make a dual-mode executable that can run either as an application or as a device, to support both modes.
> I thought about it in 2019... But I never created a device driver, while taking over an INT is easy - so I went the easy way. :-PFair.
> > * Modify the UPB / UDSC / DDT entry of the drive to take over, which would allow to use drive B: using DOS's internal block device. This may need some knowledge of DOS/BIO internals, which can differ between kernels.
> Also something I (loosely) researched. And dropped the idea when I noticed that it implies meddling with internal DOS structures. It's something that I do in EtherDFS, but it became such a beast that I afraid to look at it now. So I try to keep ethflop as simple, generic and "universal" as possible - accepting some limitations here and there.Ok.
> > * Load as a pre-DOS driver (or rootkit, basically). You can relocate the EBDA (if needed) and install your driver at the top of the Low Memory Area, then keep it resident by modifying the "amount KiB of low memory" in word [0:413h] (also returned by int 12h). At that point you can hook int 13h and direct requests for a specific unit to your control flow. Then you can run a DOS boot sector loader (or bring your own boot). Problem: The packet driver most likely needs a running DOS to initialise and install itself, so you would have to activate the resident program later from the DOS command line by running the transient program (as device or application).
> At some point I was thinking of putting ethflop into the programmable EPROM of a network card so it could extend the BIOS at boot. If I understand correctly that's more or less what you describe.Yesno. I suggest to load it like a kernel then chainload another (actual) DOS kernel. This is much simpler than programming a ROM.
> But then I realized that I would have to drive the eth card myself since I would not have access to a packet driver, and that's not an adventure I am planning to embark on. Having ethflop be a part of the BIOS but being able to activate it only once DOS (and a packet driver) is loaded does not seem to provide any added value.Actually you can use drive B: then if you tell the DOS that there are two diskette drives in the machine, which as you noted doesn't work yet when DOS sets up its UPBs for a single-drive system.
> Look again. A word vs byte destination operand size for "or" is NOT changed by nasm automatically, I knew that immediately because the result (Zero Flag) could differ between the two so the assembler mustn't substitute byte size for word size.It is obvious now that you said it. I re-tested and got the same results as you. Yesterday I was thrown off by the fact that both word- and byte- encodings were the same size so I did not notice the prefix is slightly different for both cases. So apparently there is a special x86 encoding for "OR this with a word but I give this word in 8 bits because it is small enough".
> Yes, though I hardened this to simply closing all PHT entriesVery nice! I will do the same, then. Thanks for the tip!
> I suggest to load it like a kernel then chainload another (actual) DOS kernel.This idea has an extremely high coolness factor, but a debatable added value level. :) It would indeed provide a solution to the "B: is unusable because hooked by DOS" problem, but at the cost of putting ethflop in either the MBR or VBR/boot sector. Not very user friendly, plus it might be dangerous and/or frown upon by some antivirus software. I think it is much easier and safer for users to configure their CMOS BIOS so it thinks it has a real B: drive (of course BIOS should be instructed not to try booting from it and not to try any "seek test" at power up time). Alternatively, it should be possible to use some CONFIG.SYS driver that "reserves" B: without doing anything else, just to avoid DOS stealing it (surely such thing exists, maybe I could even ship it with ethflop).
> So apparently there is a special x86 encoding for "OR this with a word but I give this word in 8 bits because it is small enough".Indeed.
> This idea has an extremely high coolness factor, but a debatable added value level. :)It would indeed provide a solution to the "B: is unusable because hooked by DOS" problem, but at the cost of putting ethflop in either the MBR or VBR/boot sector. Not very user friendly, plus it might be dangerous and/or frown upon by some antivirus software. You can wrap the program in an lDOS iniload stage. Then you'd just copy your original kernel file (eg kernel.sys) to a different name and overwrite the DOS kernel with the ethflop "kernel". Or install a new boot sector loader with a different name, eg ethflop.com. If you can set up a different kernel to boot then you would be able to set up booting into the bootloadable ethflop program.
> I think it is much easier and safer for users to configure their CMOS BIOS so it thinks it has a real B: drive (of course BIOS should be instructed not to try booting from it and not to try any "seek test" at power up time).May not be possible for everyone. And where's the fun in that? =P
> Alternatively, it should be possible to use some CONFIG.SYS driver that "reserves" B: without doing anything else, just to avoid DOS stealing it (surely such thing exists, maybe I could even ship it with ethflop).No, at the point that f/d/config.sys is processed the DOS already has set up its UPBs from the number of diskette drives it had detected, so the DJ mechanism is already initialised by the time your device driver loads.
> I am not sure I understand your "closing all handles" idea. Here is the code you linked to:
> So you basically read the amount of handles from the PSP and then you ask DOS to close handles from 0 to (amount),0 to amount minus 1 actually. The amount number itself is not a valid Process Handle so needs no closing.
> without looking at the JFT (PSP+18h) nor the JFT far pointer (PSP+34h).Yes, DOS doesn't complain much if you try to close a Process Handle that is already closed (or was never opened) in the PHT. It does return an error from the int 21h call but we ignore that.
> Is this really safe? I mean - are the handles guaranteed to be sequential? If it's the default handles stdin/stdout/stderr/aux/prn, then they are indeed sequential, but what if the code was executed with some non-standard (redirected) handles?Doesn't matter, even if any handles were left opened with gaps (ie not contiguous) then the gaps (not open handles) error out as before but the loop continues and eventually closes all still open handles.
> Your code also assumes there is at least one handle defined - is this sure?DOS breaks usually if you set the PHT size to zero. In this case it would only break our code a little because it would simply loop for 64 Ki iterations in the loop.
> Wouldn't something like this be safer / more universal? (untested code)
> ; "close handle" expects the handle in BX but the JFT has 8bit entriesThis is very wrong. The Process Handle Table is *indexed* by Process Handles. The DOS API expects a process handle, ie an index into the PHT, *not* the contents of that PHT entry. The contents are SFT indices. It is the DOS's job to index into the PHT and use its content as an SFT index, unless when it is closed and thus holds the byte value 255. Your code is very broken and wouldn't reliably work.
> potentially triggerring a GPF if we are running under EMM386 or some other protected mode thingThis is inaccurate. On a 386 even in Real 86 Mode you may run into the problem of the exacting segment limit being 0FFFFh so that a word read at this offset will fault (or dword read at 0FFFDh+). Also, to be even more nitpicking, the problem isn't you would "read over 0xffff" generally - if the address overflows the 64 KiB boundary (eg you read at [bx + 4] where bx = 0FFFFh) in a16 addressing it will simply wrap around to the beginning of the segment. In 16-bit code, only the *word access* at precisely an effective address of 0FFFFh will cause a tear. So if bx + 4 = 0FFFFh you have a problem, if bx + 4 = (1)0000h or higher then the 17th offset bit is discarded and you read from the start of the segment.
> The DOS API expects a process handle, ie an index into the PHT, *not* the contents of that PHT entry.Okay, that's the information I was missing. Thank you for clarifying. So all handle-related DOS calls are actually taking an index to the (calling process) JFT, and not really a "handle". Didn't know that. Your code makes perfect sense now. About the ghost B: drive: it is not something I am going to solve, but I have at least added a check for this to forbid the user trying a setup that won't work: https://sourceforge.net/p/ethflop/code/161/
> > potentially triggerring a GPF if we are running under EMM386 or some other protected mode thing
> This is inaccurate.Yes, yes - I know. But I understood that only after the commit (and after I read your second very kind explanation). I initially thought it would be a "segfault-like" reaction of EMM386 when process would try to read outside of its segment. Only later I understood the issue is about a single read operation being potentially spread ("teared") over a segment boundary.
> I noticed that you're putting the packet driver signature letters in the comments. You can just cmp [...], "PK" for example, no need to write the hexadecimal number.Indeed, this seems to work - but I am afraid that I myself will be confused if I see such construct one year from now. :-P
> Okay, that's the information I was missing. Thank you for clarifying. So all handle-related DOS calls are actually taking an index to the (calling process) JFT, and not really a "handle". Didn't know that. Your code makes perfect sense now.That's just what "handle" means to DOS though. This is also why I call them process handles. The process handle points into the PHT and that contains an SFT index. Otherwise in fact you could not use redirection for different processes, as there is only one global SFT index "1" but we want different processes to possibly have different process handles "1" for their stdout for example.
> About the ghost B: drive: it is not something I am going to solve, but I have at least added a check for this to forbid the user trying a setup that won't work: https://sourceforge.net/p/ethflop/code/161/Good idea. Can you walk me through how to set up ethflop on a Debian server with dosemu2? I want the server and client part of ethflop to both run on the same Debian server. And I don't have root access to that machine. And it's not in a LAN. Is this supported? If you do help me achieve that I may look into adding the boot option by my self.
> Can you walk me through how to set up ethflop on a Debian server with dosemu2? I want the server and client part of ethflop to both run on the same Debian server.With DOSEMU2 the only way I think would be to set DOSEMU2's network interface to one tap, set up a second instance of DOSEMU2 to use another tap, and then initiate an Ethernet bridge (brctl addbr) to add both taps into. Slightly complex. And you need to be root to create the tap interfaces (with tunctl) and to operate the bridge (with brctl). Can you use QEMU instead? With QEMU it's super simple, as QEMU supports a "socket" interface, no root needed and no third party pieces required. One QEMU instance listens on a TCP port, and the second QEMU instance connects to it. Then they send their Ethernet frames to each other inside this TCP connection. This is how to run it: 1st QEMU instance ("server", must be launched first): qemu-system-i386 -m 1M -fda server.img -hda servhd.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:01 -netdev socket,id=net0,listen=127.0.0.1:1985 2nd QEMU instance ("client"): qemu-system-i386 -m 1M -fda client.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:02 -netdev socket,id=net0,connect=127.0.0.1:1985 For this you do not need to be root, but you need QEMU to be installed, obviously. You can listen on something else than 127.0.0.1 if you like, can be handy to connect two QEMU instances running on different machines. And you can use whatever TCP port you wish (assuming it's available, and > 1024 for non-root users). On the DOS VMs you need the PCNTPK.COM packet driver to use the PCNET (virtual) NIC.
> On the DOS VMs you need the PCNTPK.COM packet driver to use the PCNET (virtual) NIC.This? https://www.lazybrowndog.net/freedos/virtualbox/?page_id=321
> you mention that you created a thread on a mailing list in 2019 to modify the flags on stack.
> Your solution is good. However, I know a few alternatives.Thanks for the extra methods, that's very interesting. About the thread: it was not on a mailing list, but on a usenet group, alt.lang.asm. If you don't know it, I recommend you take a look. I'm pretty sure you will find it very interesting, it's all about x86 assembly on DOS. Many brilliant people, you'd fit perfectly. In a similar register there is also comp.lang.asm.x86. To access the usenet you'd need a newsgroup client (I use ClawsMail on Linux, but there is a lot of choice). You can take a peek through a web archive if you're curious: https://alt.lang.asm.narkive.com/ https://comp.lang.asm.x86.narkive.com/
> BTW this QEMU socket interface is obviously limited to two machines (because a TCP connection has two ends), but it would be fairly easy to extend this to more machines by creating a "software switch" that would listen on a tcp port, accept connections from QEMU instances and route frames to proper sockets.Couldn't resist and created it this morning: https://sourceforge.net/p/qemusockhub/code/HEAD/tree/ Now with this I can network together an ethflop-server with up to 16 ethflop-client VMs. All running with QEMU, with no network configuration needed. ===================================== usage: qemusockhub tcp_port Example: # run qemusockhub qemusockhub 1985 # run 1st VM qemu-system-i386 -m 1M -fda client1.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:01 -netdev socket,id=net0,connect=127.0.0.1:1985 # run 2nd VM qemu-system-i386 -m 1M -fda client2.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:02 -netdev socket,id=net0,connect=127.0.0.1:1985 # run 3rd VM qemu-system-i386 -m 1M -fda client3.img -boot a -device pcnet,netdev=net0,mac=52:54:00:00:00:03 -netdev socket,id=net0,connect=127.0.0.1:1985 (etc) ===================================== qemusockhub is very crude, it's more a quick hack than a real program, but it works very well for the usage it is made for. QEMU offers also other networking backends, so maybe there is a solution that would not need such "hub" program, don't know, did not look further.
> The DOS OpenWatcom makefile references pktdrv.c but this file doesn't seem to exist?When running wmake -f Makefile.dos there is an error message about not finding the file but compilation succeeds regardless. Running ethflopd (without parameters) in dosemu2 causes a crash. ethflopd /? works as expected.
> The DOS OpenWatcom makefile references pktdrv.c but this file doesn't seem to exist?Indeed - I didn't even notice it. It's weird that WCL agreed to proceed nonetheless. Makefile fixed now.
> Running ethflopd (without parameters) in dosemu2 causes a crash. ethflopd /? works as expected.I tested it now - ethflopd.exe loads as expected on my DOSEMU2 setup. Does it display anything on your screen before crashing?
> How is mdrs2025.lib built? Are the sources not included in the ethflop repo?It is an external, stand-alone (and open-source) library: https://mdrlib.sourceforge.io/ (ver 2025 is the library's trunk)
> Are there open source packet drivers for any NIC?see here, it's the golden reference in the packet driver world: http://crynwr.com/
> The pointer vs not pointer use of "pktint"Yes, this was one of the things that I fixed now, but this one was only cosmetic. The more serious one was this: https://sourceforge.net/p/mdrlib/code/131/tree//trunk/pktdrv/pktdrv.c?diff=6555eb2a6b7142b49cabfdac:130
> I managed to connect the client to the server. However, pressing Enter just toggles the "< NO FLOPPY >" text next to the client from normal to reverse video.That's probably because you don't have any floppy images available (yet). Otherwise the highlight would jump to the right side and allow you to select an image. I will see to make it clearer.
> ERROR: disk TEST failed to be initializedThis is the culprit I think. For some reason the server fails to inflate the image. The server should display some logs about this. Do you have enough disk space on the server? (at least 2.8M)
> Aha! I hadn't switched the current drive to my C: so ethflopd tried to write the images on A: which is of course too small to hold all these images.I made the same mistake (even topped it by starting ethflopd from CDROM drive)... Stupid me! Maybe the DOS server could do a) some error checking that the current directory is writable and/or b) print a warning if the free space is below a certain (at least standard floppy) size? Great piece of software anyway! Frank
> Maybe the DOS server could do a) some error checking that the current directory is writable and/or b) print a warning if the free space is below a certain (at least standard floppy) size?Checking for diskspace is not that easy in a portable way, but I think it could be enough if the server relayed to the client a clear error message, like "failed to init floppy image: out of disk space". The user would probably quickly understand what's going on then.
> cx can be zero even if NZ, if the last word didn't match. You should use "je", not "jcxz"You are right. I so much wanted to use the cool-sounding "jcxz" at least in one place somewhere... Fixed now. Thanks. https://sourceforge.net/p/ethflop/code/169/tree//ethflop-client/trunk/ethflop.asm?diff=5d8fae32e39601433ecbde88:168
> Is it possible to run ethflopd on our Linux server, without root access,No, ethflopd operates on a raw socket. This requires root (or at least the CAP_NET_RAW capability, but I do not know how to grant that without being root).
> When the client uninstalls with "ethflop u" it still seems to be displayed in the server's (DOS) UI.That's normal, the server remembers the client forever. Thanks to this, when the client connects back, it gets its floppy back (if a floppy was left in drive).
> when I shut down both qemu machines and then restart the server and the client, then "ethflop b" and "ethflop i test", I get DOS critical errors on trying "dir b:". The server gets multiple "fread() failure: No such file or directory (TEST, sect 37)" messages then.Sounds like your filesystem is broken. You should not shut down your PC when ethflopd is active, as it keeps file descriptors open all the time. This may be the cause of your issue.
> No, ethflopd operates on a raw socket. This requires root (...)What you could do, is run some minimalist linux in QEMU. Compile ethflopd on the real server, and sync it (ssh/scp/rsync) to the QEMU-linux-ethflopd server. I also know Open Watcom runs on Linux - but I do not know if it supports cross-compilation (ie. compile a DOS program using the Linux version of Open Watcom). Might be worth trying.
> I also know Open Watcom runs on Linux - but I do not know if it supports cross-compilation (ie. compile a DOS program using the Linux version of Open Watcom). Might be worth trying.It does support cross-compilation. This is how the "ow" build of the FreeDOS kernel is built, as opposed to the "owdos" build which runs the toolchain in DOS. I dealt with the latter in https://github.com/FDOS/kernel/discussions/188 This is not the problem with OpenWatcom. I can build with Makefile.dos with DOS OW running in dosemu2. I dislike OpenWatcom because their license is considered open source by OSI but not free software by the FSF or the DFSG. There is an effort to change this but it hasn't been completed yet: https://github.com/open-watcom/open-watcom-v2/discussions/271
> Well, to each his own, I guess.Or her own!
> Myself, I like OW very much, and if there is one thing I dislike it is the FSF's idea of "free software". :-)I don't fully support the FSF but I do value concepts like the four freedoms. The OW relicensing issue links to a Debian issue, listing some of the problems with OW's current license: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=376431
> On another subject: have you tried qemusockhub by any chance? It allows to connect more than 2 QEMU VMs together over the "socket" network and makes things easier also with only 2 VMs because you do not have to worry about which VM to run first (and also networking is not disrupted if one of the VMs restarts while the other do not).Oh, interesting, I may look into that.
> It mostly works with qemusockhub, but "ethflop b test" to both install as drive B: and insert the image doesn't seem to work.This is not actually supported, is it? It just happened to be that when the server receives a registration of a previously connected client it restores the same image.
> If I run "make" for the qemusockhub it defaults to using cc which is presumably gcc. It complains that it doesn't recognise "-Weverything". Running "make CC=clang" works around this problem.Yes, it is in "development" mode, there is a comment about that in the (super short) Makefile. :)
> It mostly works with qemusockhub, but "ethflop b test" to both install as drive B: and insert the image doesn't seem to work.That is not intended to work. It's just "ethflop b", no further arguments. Perhaps the server could emit an error if it sees extraneous arguments (currently it ignores them).
> Also, quitting the server application leads to long waits when trying to access the disk.The ethflop TSR waits up to 2s for every int 13h request that is not answered (and performs retransmissions every 440ms in the mean time). If DOS emits many such requests, then it's that many times 2s.
> And restarting the server doesn't seem to reconnect that client, it needs to be restarted as well to reconnect.I am not sure this is supposed to work, I will check it.
> This is not actually supported, is it? It just happened to be that when the server receives a registration of a previously connected client it restores the same image.Exactly, yes. The server keeps the client->floppy association in memory and hence if the client reconnects, it is immediately told which floppy it has in its "drive" (if any).
> It seems that the "has been installed" message is lacking a linebreak. (...) See what happens when running the client from a batch file:The missing line breaks are intended (saves two bytes by string), DOS terminates them nicely. The "running from batch with ECHO enabled" is a scenario I will have to check. If it's not a FreeCOM bug I will add the line breaks after all.
> BŁĄD: Z TWOJEGO ADRESU NAPISANO JUŻ 20 WIADOMOŚCI W PRZECIĄGU OSTATNICH 24H. SPRÓBUJ PONOWNIE ZA JAKIŚ CZAS.> Wróć do głównej strony
> Error message reads:
>> BŁĄD: Z TWOJEGO ADRESU NAPISANO JUŻ 20 WIADOMOŚCI W PRZECIĄGU OSTATNICH 24H. SPRÓBUJ PONOWNIE ZA JAKIŚ CZAS.
>> Wróć do głównej stronyGoogle translation:
> ERROR: 20 MESSAGES HAVE BEEN SENT FROM YOUR ADDRESS IN THE LAST 24 HOURS. PLEASE TRY AGAIN IN SOME TIME.
> Return to main page(Interestingly, in the google translation to german it says instead wording equivalent to "sent to your address" rather than "from".)
> My purpose in trying to create the 30 MiB image is to see whether the 16 KiB of zeroed data is enough to cover the whole FAT and root directory, or whether it leaves parts uninitialised. (Is the file system always FAT12?)According to my calculations we have 6 KiB per FAT for two FAT copies, plus 7 KiB for the root directory, plus 512 Bytes of boot sector. That's 12 + 7 + 0.5 = 19.5 KiB. So it would seem that the 16 KiB initialisation doesn't suffice to set the entire root directory to all-zeroes.
> The server on DOS will simply seek to the desired image size, minus 1, then write a single byte to the file to create an image.Indeed. I was seeking to the full image size before and then writing 0 bytes (as it's the "common" way to grow a file in DOS), but unfortunately such write never returns an error, so I could not detect out of disk situations. Writing a single byte also does not return any error, but at least it returns the amount of written bytes so I can detect mismatches.
> After the header (first 16 KiB up to offset 4000h) this will leave the data in the file uninitialised. I was able to get the server to create a "dirty" image with deleted prior data in the file system.Yes, I know. Should not be harmful since FAT is properly zeroed. The alternative would be to create the image writing all zeroes to it, or growing it with chsize(), but on my 386 this takes about 30 seconds for a 30M disk, so the ethflop TSR was always timeouting. With the "dirty" method it works, but I had to increase the ethflop timeout to 2s (before v0.6.1 it was only 100ms).
> Trying to create a dirty 30 MiB image results in an error after a short delay:
>
> A:\>ethflop n31000 dirty3
> ERROR: server unreachableThis happens if the server answers too late. It's not a real problem (in the sense that the image gets created anyway, and further communication with the TSR is not impacted) but it does mean the user will miss the feedback about the disk having been created (or not). But as I was saying with the dirty method it should work. Is your VM exceptionally slow? Another problem is that when the server is busy zeroing a 30M image, not only the requesting client will timeout, but all other clients too - including some that may be doing some critical copy operations at this moment.
> Finally, the image boot sector seems to contain an infinite loop jump instruction when the image is created. This could be changed to have a no-op boot sector loader instead. For instance, https://hg.pushbx.org/ecm/bootimg/file/e1e9e530b4a2/bootimg.asm#l888I will look, thanks.
> Also some ideas: I'd like ethflop to support LBA access.Is there any practical value in LBA support, in the context of a FAT12 "floppy-like" drive?
> Other functions are passed to the server to handle as isIndeed, I try to keep the "intelligence" on the server side as much as possible to keep the TSR's memory footprint small.
> My extensions would need to allow the server to send/reply with messages instructing the client to read from a buffer and write to a buffer.Sounds like a lot of memory needed.
> During my studying the server for this I noticed that int 13h function 08h seems to not be supported yet. This is desirable as well.Yes, fn 08 is on my todo list.
> According to my calculations we have 6 KiB per FAT for two FAT copies, plus 7 KiB for the root directory, plus 512 Bytes of boot sector. That's 12 + 7 + 0.5 = 19.5 KiB. So it would seem that the 16 KiB initialisation doesn't suffice to set the entire root directory to all-zeroes.To be honest I thought that everything (incl. root entries) is in the FAT. If that's not the case then indeed, a newly created floppy on DOS might have some weird content. I will look closer into this. This is a possible problem only for the DOS version of the server, since the Linux version uses ftruncate() to grow the image, and this function zeroes everything on its own.
> Indeed. I was seeking to the full image size before and then writing 0 bytes (as it's the "common" way to grow a file in DOS), but unfortunately such write never returns an error, so I could not detect out of disk situations. Writing a single byte also does not return any error, but at least it returns the amount of written bytes so I can detect mismatches.Yes, this is true, and I think it is the correct way to do this.
> Yes, I know. Should not be harmful since FAT is properly zeroed. The alternative would be to create the image writing all zeroes to it, or growing it with chsize(), but on my 386 this takes about 30 seconds for a 30M disk, so the ethflop TSR was always timeouting. With the "dirty" method it works, but I had to increase the ethflop timeout to 2s (before v0.6.1 it was only 100ms).Perhaps the client could learn that it may need to wait longer in such a case? Also, why does it affect the subsequent n1440 command? Is it just that the server is still busy?
> This happens if the server answers too late. It's not a real problem (in the sense that the image gets created anyway, and further communication with the TSR is not impacted)Not true to my experience, as mentioned a subsequent n1440 command also failed.
> but it does mean the user will miss the feedback about the disk having been created (or not). But as I was saying with the dirty method it should work. Is your VM exceptionally slow?It's a qemu running without KVM, as we do not have access to AMD SVM on this server. (I think because the server is already virtualised and they don't offer nested virtualisation.)
> Another problem is that when the server is busy zeroing a 30M image, not only the requesting client will timeout, but all other clients too - including some that may be doing some critical copy operations at this moment.I think there may be a way to avoid this problem. The expensive file system operation could be split into blocks, and between the blocks the server could check for and service other clients. Similarly to how XMS drivers will enable interrupts during large block moves.
> Is there any practical value in LBA support, in the context of a FAT12 "floppy-like" drive?Yes, I'd like to check LBA support in the kernel or other programs. Beyond that I don't think ethflop necessarily needs to only support diskette drives.
> Indeed, I try to keep the "intelligence" on the server side as much as possible to keep the TSR's memory footprint small.Understandable.
> Sounds like a lot of memory needed.That remains to be seen.
>> According to my calculations we have 6 KiB per FAT for two FAT copies, plus 7 KiB for the root directory, plus 512 Bytes of boot sector. That's 12 + 7 + 0.5 = 19.5 KiB. So it would seem that the 16 KiB initialisation doesn't suffice to set the entire root directory to all-zeroes.
> To be honest I thought that everything (incl. root entries) is in the FAT. If that's not the case then indeed,The File Allocation Table only contains FAT entries, one per data cluster in the FS. This entry can generally contain one of four types of contents: 0 = free, FF7h = bad, FF8h to FFFh = this cluster is allocated and the End Of Chain, else = this cluster is allocated and points to another allocated cluster. It does not contain any directory entries. The FAT12 FS is split into five types of sectors: Reserved sectors (usually just the 1 boot sector), FAT sectors, usually 2 copies of the FAT back to back, root directory sectors of a fixed size, data sectors (including file data and subdirectory data), and possibly trailing unused sectors (at most cluster size minus 1).
> a newly created floppy on DOS might have some weird content. I will look closer into this. This is a possible problem only for the DOS version of the server, since the Linux version uses ftruncate() to grow the image, and this function zeroes everything on its own.Yes, like I calculated the largest image type will have trailing trash in the root directory. That means when enough entries are written to the root, DOS will encounter a bunch of invalid entries at the end.
> Perhaps the client could learn that it may need to wait longer in such a case?Waiting longer is not really a solution, because other clients will still be impacted.
> Also, why does it affect the subsequent n1440 command? Is it just that the server is still busy?Yes, I think the server was still busy. I will add a timer on the server to log the time it takes to create disks. This way there will be no doubt.
> It's a qemu running without KVM, as we do not have access to AMD SVM on this server. (I think because the server is already virtualised and they don't offer nested virtualisation.)Even a 100% emulated QEMU should not be slower than my 386. I don't think the lack of performance lays in the CPU - maybe the I/O speed? Once I have the timer done in ethflopd we will know more.
> The expensive file system operation could be split into blocks, and between the blocks the server could check for and service other clients. Similarly to how XMS drivers will enable interrupts during large block moves.It's an option, but it brings its own complications - for example the image cannot be mounted until all blocks are done. Another option could be to use a "copy on write", ie zero the "dirty" block the first time a client wants to access it (unless it has been written to by the client before). But this is also not without complication - the server would need to keep some map somewhere to know which blocks are "good". And, ultimately, the "dirty growing" method must stay anyway, because otherwise it is impossible to tell the client whether or not the disk creation operation succeeded. So we are effectively talking here about only writing the 20K of data to cover the boot sector + 2xFAT + root entries. Surely this can be done without over-engineering... I have to think about it and do some measures.
> It does not contain any directory entries.Thank you for your patience and all the explanations. I really should have googled this one earlier.
> Another catch is that users might dislike the fact that they cannot take an ethflopd-created floppy image and use it with other floppy-image-processing programs (or mount it in a VM, etc).Indeed, I would dislike that.
> I would dislike that.To be honest I dislike it, too. But on the other hand I do like the idea of having thin provisioning to save disk space and make image generation fast. Ok, new concept: ethflopd will thin-provision floppy images larger than 2.88M, and do full pre-allocation for all smaller sizes (360K, 720K, 1.44M, 1.68M, 1.72M and 2.88M). This way for "standard" sizes we still get a proper image, and for the weird ethflop-specific sizes there is thin provisioning so ethflopd does not risk getting too busy when initializing them (+the images do not grow unnecessarily if they are not truly filled with data).
> The new DOS server (ethflopd 20240907) reports a strange drive letterI forgot to hide that one. The drive letter is supposed to be reported only for newer ethflop TSRs (unreleased yet). The older one (0.6 and 0.6.1) do not provide the local driveid information to ethflopd. It's fixed in svn now: https://sourceforge.net/p/ethflop/code/182/tree//ethflop-server/trunk/core.c?diff=5d8fae32e39601433ecbde88:181
> Today I found errors with MS-DOS 5.0 and MS-DOS 6.22 as client.
> The ethflop TSR loads, but when copying a file, I get an error: "Not ready error writing drive B. Abort, Retry, Fail?"
> Doesn't matter if the virtual floppy is at A: or B:Interesting. In theory there should not be anything specific about MS-DOS (and I am sure I have tested it successfully in the past). I will try to reproduce it on my QEMU setup. Do you see any suspect logs on the server side? (on Linux all logs go to syslog, so you should look in your /var/log/messages or similar, depends on your distribution). Mateusz
> Wildly guessing, this may be related to changeline support.My first thought as well.
> "Error at INT24"I think it is a symptom, not the cause. INT 24 is the code that displays "Abort, Cancel, Retry...". I am not sure why it says "Error" but it should not have been called in the first place anyway.
> Should more tests in different virtualized environments (PCem, 86Box, Qemu, Bochs etc.) be necessary?I wouldn't want to abuse your time, but maybe one last thing if you don't mind: is the issue also occurring with older releases of the ETHFLOP TSR (0.6.1 / 0.6) ? If it's a regression it will be easier to locate.
> Maybe I'm stupid, or maybe my 86Box is somehow broken.It was the former one. Turns out that once a NIC is configured in 86Box, it is "unconnected" by default. I just had to go to the "media" menu, then select the NIC and click on the "connect" checkbox... And now it works. And I am still unable to reproduce the problem. Here is my setup so far: CLIENT: - 86Box 4.2.1 b6130 - boots from a floppy with MS-DOS 6.00 - NIC set to AMD PCnet-PCI II - ethflop 0.6.1 - packet driver PCNTPK 03.10 SERVER: - QEMU, SvarDOS (EDR kernel), trunk ethflopd Client and server communicate through VDE. It all works fine for me. I also successfully launched the server on 86Box and the client in QEMU, and everything still works fine. There is quite a lot of moving parts here, so I suppose you have hit some "lucky" combination. To reproduce the problem I really need to have your exact environment. Also, on your side you might try changing some things: - boot from a floppy instead of a CDROM - change the model of the emulated NIC card (and the packet driver) - change the model of the emulated PC - upgrade 86Box Perhaps you would be able then to isolate the trigger element. BTW, before you run ethflop, is your B: drive working all right? (are you able to put a diskette image in it and access it?) I'm wondering if ethflop is catching the B: drive at all - maybe the BIOS performs some funny business with B: when it boots from CD and makes it inaccessible...
> Maybe stacksize is not the problem?It would seem so, yes. So I am back at trying to reproduce the problem... And still failing at that :-/ I used your german MS-DOS 6.22 floppy image and mounted it in 86Box on an emulated Pentium II machine with the "Novell NE2000" NIC. Proceeded with a few adjustments: - I disabled the keyb2 tsr so the keyboard layout does not drive me insane - copied NE2000.COM v 11.4.3 to the floppy - copied ETHFLOP.COM (trunk) to the floppy - added a line in autoexec: NE2000 0x60 3 0x300 Then I boot into menu 7 ("plain DOS, only HIMEM.SYS, without CDROM"). And: ethflop b ethflop n1440 test ethflop i test copy command.com b:\test.dat It works without issues. I also tried booting in the default menu (1) and kept the keyboard driver, did the same test. Also works. Network connectivity is over VDE to a local TAP interface where a linux ethflopd listens. Then I also tried replacing the linux ethflopd with a QEMU VM running the DOS version of ethflopd --> still works. Also tried changing the VM type to be some emulated 386SX PC --> still works. So it seems the combo "86Box + MSDOS + NE2000" is not enough. There is yet another ingredient needed to make the issue appear. I suspect either the version of 86Box (mine is 4.2.1 b6130) or a specific VM machine type (its BIOS).
> While downloading I checked the corresponding 86Box Changelog, it says:
> - Fixed loss of received packets on DEC and NE2000-based cards.
> Maybe this was the error?I don't think so. A simple loss of some received packets should not have such impact. And, actually, ethflopd has a a special mode to test it: there are two #defines in the CORE.C file, "SIMLOSS_INP" and "SIMLOSS_OUT". If set to non-zero, they mean that ethflopd will simulate packet loss at the given percentage, either for inbound frames or outbound frames. So "SIMLOSS_OUT 50" means that ethflopd will "forget" to send randomly about 50% of its answers. I tested it right now with your floppy: even in such heavily degraded mode it still works. It's *much* slower of course (because ethflop has to use lots of retransmissions), sometimes the DOS INT 24 pops out ("Abbrechen, Wiederholen, ...") but when told to "wiederholen" the copy eventually finishes and the file is good. So I suppose the 86Box developers must have fixed something else around NE2000. Something much less innocent than just packet loss, that was causing some memory corruption or changed cpu registers. I googled quickly and found this: "Fixed emulator crash during Windows installation with NE2000-based cards" https://86box.net/2024/07/26/86box-v4-2.html This sounds like the kind of things that could lead to the symptoms you had. It is not the first time apparently, because it seems they had a very nasty thing in NE2000 last year also: "Fixed NetWare packet corruption on NE2000 cards" https://86box.net/2023/08/26/86box-v4-0.html Anyway, seems to be clearly a 86Box bug, so we can finally rest. :) It wasn't wasted time, though - the stack overflow detection might prove useful in the future. Plus, it was the occasion for me to learn about this "VDE" thing. In any case - thank you for your reports, and all the tests!
> The missing line breaks are intended (saves two bytes by string), DOS terminates them nicely. The "running from batch with ECHO enabled" is a scenario I will have to check. If it's not a FreeCOM bug I will add the line breaks after all.I confirm that MS-DOS also behaves the same way (does not append an end of line after a program ends it's last string mid-line), hence I've added proper CR/LF terminators to messages.