Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic C source code parsing between platforms #8

Open
PeterMatula opened this issue Aug 23, 2018 · 0 comments
Open

Nondeterministic C source code parsing between platforms #8

PeterMatula opened this issue Aug 23, 2018 · 0 comments
Labels

Comments

@PeterMatula
Copy link
Collaborator

The C sources that are being tested are parsed using clang. The problem is that the result of this parsing (AST) is not always the same on all the supported platforms (Linux, Windows, macOS). Difference can probably occur even between machines using the same platform. Even if the same version of clang is used, there can be differences. It looks like system includes play a role here. The problem is most prominent in call expression parsing, but probably can occur in other situations as well.

Example:

#include <stdlib.h>

#include <arpa/inet.h>
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <signal.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stropts.h>
#include <sys/prctl.h>
#include <sys/select.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

int main()
{
        int32_t set;
        sigaddset((struct _TYPEDEF_sigset_t *)&set, SIGINT):
}
  • Linux parses it ok and recognizes sigaddset call.
  • macOS does not parse the call at all - it completely ignores it.
  • macOS without the long list of includes parses it ok and recognizes sigaddset call.
  • macOS without the type cast (i.e. sigaddset(&set, SIGINT)) parses the call as __sigbits.

Another example is parsing of strlcpy() call (without proper type signature). Linux parses it ok, but macOS does not. It parses it only if the function (and its calls) have the full signature of

size_t strlcpy(char * restrict dst, const char * restrict src, size_t dstsize);

Solutions:

  • Can we force clang to use some custom set of includes that would be the same everywhere?
  • Can we remove the #include statements from C sources before parsing it? (=> I don't think so, without them, some other function calls may not get parsed.)
  • Are includes the only problem?
PeterMatula added a commit to avast/retdec-regression-tests that referenced this issue Aug 23, 2018
In all the disabled cases functions Y are actually called on macOS, but the regression test framework does not parse the output C correctly and does not know about the calls. See this for more details: avast/retdec-regression-tests-framework#8
@s3rvac s3rvac added the bug label Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants