-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETH_MAC indirection erratum #453
base: master
Are you sure you want to change the base?
Conversation
Pervasive misuse of "ETH_MAC *" (a pointer to an ETH_MAC, aka a 6 element unsigned char array) when a simple "ETH_MAC" is correct. The best example of this was eth_mac_fmt() in sim_ether.c with the following prototype: t_stat eth_mac_fmt (ETH_MAC* const mac, char* strmac) The first parameter is a pointer to an array of 6 unsigned characters, whereas it really just wants to be a pointer to the first element of the array: t_stat eth_mac_scan (const ETH_MAC mac, char* strmac) The "ETH_MAC *" indirection error also results in subtle memcpy() and memcmp() issues, e.g.: void network_func(DEVICE *dev, ETH_MAC *mac) { ETH_MAC other_mac; /* ...code... */ /* memcpy() bug: */ memcpy(other_mac, mac, sizeof(ETH_MAC)); /* or worse: */ memcpy(mac, other_mac, sizeof(ETH_MAC)); } eth_copy_mac() and eth_mac_cmp() replace calls to memcpy() and memcmp() that copy or compare Ethernet MAC addresses. These are type-enforcing functions, i.e., the parameters are ETH_MAC-s, to avoid the subtle memcpy() and memcmp() bugs. This fix solves at least one Heisenbug in _eth_close() while free()-ing write request buffers (and possibly other Heisenbugs.)
|
I wonder about the correct vs. incorrect thing. The problem is that C has a broken type system, and its way of treating an array as essentially synonymous with a pointer is what lies at the bottom of this. |
Quite true. However, passing around a pointer to an array vs. the pointer to the array's first element is the issue here. At least with a type-enforcing inline function, code is more "correct" (since the parameters are pointers to the array's first element.) |
You'd have been surprised how many places code was |
To put it another way, |
So lets see. You need to change 100's of lines in 10 files to change the view of a ETH_MAC which is a 6 byte object into an array of 6 bytes. You claim that there were places that "you'd have been surprised how many places code was memcpy-ing using the first 4 or 8 bytes of ". Given the 100's of lines of code affected in your proposed "fix", you don't specifically identify any of these 4 or 8 byte error cases. Looking through ALL of your changes, I can't find any memcpy (or memcmp) that you "fixed" with your massive changes that actually did what you claim (4 or 8 byte). |
You claim: "This fix solves at least one Heisenbug in _eth_close() while free()-ing write request buffers (and possibly other Heisenbugs.)", but there isn't actually a function named _eth_close(). There is a function named _eth_close_port() which makes no reference to ETH_MAC objects. Similarly there is a function named eth_close() which also makes no reference to any ETH_MAC objects. |
Just to make it really plain, I'll further annotate the example:
In other words, |
The reason for replacing all of the Example program where the extra indirection is an error (i.e., MAC address comes from a
Output:
|
Remove the extra
Output:
You get a consistent result for the case when the |
Oh. Gee. I had a typo. BFD. |
@markpizz There is an Much easier and more reliable to replace the |
I don't dispute this statement. However, there may (or may not) be one or at most a couple of places where this is mixed up. Fixing those few places would be an appropriate fix rather than the 100's of lines of code changes you've proposed. You carefully avoid pointing out any specific case (or cases).
Surely you can have a typo.
Independent of the typo, you claim that there is a problem there which, in fact you don't actually fix since you didn't actually find ANY case where the issue you claim has happened (wrong pointer to memcpy) is actually at fault. Your changes no longer demonstrate the memory corruption problem which occurs in the cleanup being done of a linked list WITHOUT ANY ETH_MAC objects being referenced by eth_close(). Your fix changes SO MANY things that you've merely hidden the potential heisenbug problem. You claim: "Going through all of the ETH_MAC memcpy()-s to track that down would have been very time consuming and error prone." Actually, there are maybe 3 places in essentially any running simulator which write ETH_MAC objects. Those should be somewhat easy to find, but there are other ways to track down a memory corruption problem which would be useful to track down the problem. Since you are already using MS's heap validator, there should be an easy way to force a heap validation check with a specific call in the very small number of places that actually copy ETH_MAC objects. There could be bugs in other code (outside of sim_ether) that corrupt the heap. I recently sent a message to the author of the pdp10-kl simulator about such a bug in kx10_imp.c that could possibly write outside of the contents of a packet. Maybe you want to provide specific details about how you reproduce the failing case. |
You're making a mountain out of a molehill. In terms of the number of lines changed, I'll agree with there a lot of lines changed. But it's not a massive structural change as you vigorously assert. What are those changes?
There are places where a packet buffer is used as an Perhaps I may have moved a Heisenbug's effect. I'm more confident that the Heisenbug is resolved. But that's the nature of a Heisenbug -- one cannot be completely certain. One thing is certain: An extra level of
And if you're unable to reproduce it? What would that prove? All it would prove is that you cannot reproduce the problem while I still get to suffer with it. I've solved a particular issue for which I've submitted a PR and from which the open-simh could benefit. |
Your PR DOES NOT specifically identify an actual problem and a fix for it. If you were solving the whole network problem from scratch, you could code the various pieces however you want, but the existing WORKING code happens to be written with a view of things that you personally wouldn't have chosen from scratch. Your PR does not prove that the problem has been solved, only that it seems to have not been detected in the same way as before. Interesting that you see this problem with only one simulator but your change requires changes to many simulators (dozens of simulators involving 10 files and hundreds of lines of change). Once again, describe the problem's reproduction and I will either demonstrate it and propose a very minimal fix, or prove that it has nothing to do with memcpy of ETH_MAC objects. |
First and foremost, using Secondly, this PR fixes a heap corruption issue encountered at line 2688,
Type safety is the issue here.
So, you're admitting there's a bug while claiming that I haven't actually found one?
Not changing them all to leverage the C language's type system while fixing one simulator would be irresponsible and incomplete. Of the 256 lines changed, many of those changes remove an unnecessary level of indirection so that the code only deals with
Windows:
Linux:
|
I think you're mistaking C for a strongly typed language. It isn't. |
Haskell, it ain't for sure. My contention is that it's a good idea to use the type system, such as it is, to catch subtle errors. This is one case where the type system helps, not hurts. |
There is some sort of bug which corrupts heap. The failure on exit proves that, but you haven't actually identified the specific bug. The failure on exit is NOT the bug. The bug happened some time before that and the failure on exit is the demonstration that the bug had been encountered during this particular run. Meanwhile, You've provided good instructions about how to see the failure. It starts with: "1. Create an ITS image. I've been using the pdp10-kl simulator." Since you've already got an ITS image that fails, will you make that available somewhere (Google Drive, FTP, github, etc.) so that hill doesn't need to be climbed to see the failure? Thanks. |
It's easy enough to build it, if @larsbrinkhoff or @rcornwell haven't uploaded it already somewhere. Standard, stock ITS using the generated SIMH configuration file with aforementioned tweaks to the IMP network. If you build it from the PDP-10/its repo on GH, you also get the SIMH configuration file in the |
Well, technically it is easy enough to build, BUT a build seems to take some 2 hours on my working environment AND the build doesn't attempt to build the ENVIRONMENT needed pieces until the very end. These may fail depending on whether all the dependencies are installed. One which wasn't mentioned in the repo was openssl for cbridge. There were a couple of others so it took some 6 hours to get through everything, since I started from scratch each time. Even if I didn't start from scratch, it still built all the contents again which seemed to be 99% of the elapsed time. I've got the results now. |
@markpizz: While I appreciate your minimalist approach to rectifying places where the Ethernet MAC address is embedded in a packet buffer (an That's one reason why I developed a more comprehensive solution. Ultimately, it makes the code safer because Let me underscore my point with an expanded version of the example code above:
Here's the output:
In the first two I'm not as confident as you that this latent bug only exists in three places. Hence the more comprehensive solution that ultimately makes the code safer, where |
Let's ask Copilot this question: Assume I have a C typedef, "typedef unsigned char ETH_MAC[6]". Why is it a bug to pass the array as a pointer, i.e., "ETH_MAC *", to functions? Answer: Loss of Array Size Information: When you pass an array as a pointer, the size information of the array is lost. The function receiving the pointer has no way of knowing the actual size of the array. This can lead to out-of-bounds access and buffer overflows if the function assumes a different array size. Pointer Arithmetic: When you pass ETH_MAC * to a function, it is treated as a pointer to an array of 6 unsigned char elements. However, pointer arithmetic on this type can be confusing and error-prone, as it may not behave as expected. For example, incrementing the pointer will move it by 6 bytes, not 1 byte. Unexpected Behavior: If the function modifies the contents of the array through the pointer, it may lead to unexpected behavior if the caller is not aware of the modifications. This can result in hard-to-debug issues, especially if the array is used elsewhere in the program. Type Mismatch: Passing ETH_MAC * to functions expecting a different pointer type can lead to type mismatches and potential undefined behavior. This can cause compilation errors or runtime issues. To avoid these issues, it is better to pass the array directly or use a pointer with explicit size information. |
As I said before, your hundreds of lines of code changes don't actually identify a problem anywhere, just something you wouldn't have done the way it was originally written. You provide examples that cover your theory, but don't actually reflect anything that was done in the original code. Meanwhile, I can reliably reproduce the failure you observe when closing the IMP device ethernet connection both with the https://github/simh/simh and https://github.com/bscottm/open-simh code in your PR. I'll provide an appropriate PR to fix this problem after I fully test the minimal change it involves. |
On the contrary, I identified a latent bug source that doesn't result in a catastrophic termination of SIMH. Latent bugs don't have to cause catastrophic termination; they are known to cause corruption and incorrect behavior. Instead of identifying individual instances of Coincidentally, it eliminated a tiresome heap corruption message each time I exited the PDP10-KL simulator. This isn't a coding style issue. It's a correctness issue -- |
Ultimately, @pkoning2 and the open-simh steering group decide whether to accept this PR. |
Pervasive misuse of "ETH_MAC *" (a pointer to an ETH_MAC, aka a 6 element unsigned char array) when a simple "ETH_MAC" is correct. The best example of this was eth_mac_fmt() in sim_ether.c with the following prototype:
The first parameter is a pointer to an array of 6 unsigned characters, whereas it really just wants to be a pointer to the first element of the array:
The "ETH_MAC *" indirection error also results in subtle memcpy() and memcmp() issues, e.g.:
eth_copy_mac() and eth_mac_cmp() replace calls to memcpy() and memcmp() that copy or compare Ethernet MAC addresses. These are type-enforcing functions, i.e., the parameters are ETH_MAC-s, to avoid the subtle memcpy() and memcmp() bugs.
This fix solves at least one Heisenbug in _eth_close() while free()-ing write request buffers (and possibly other Heisenbugs.)