-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting segfaults, on dmesg #261
Comments
PD: I'm not sure if there is a manual way to trigger fossilize so I can test it better. |
I have also passed Memtest and passed correctly |
This is usually an indicator of a stressed CPU due to overclocking or too long power limits. But this is the first time I see this for AMD. Nevertheless, maybe check your BIOS settings if there are overly relaxed overclocking settings by default. Mainboard manufacturers often do this by default to win the benchmarks for their marketing - but those defaults are not always safe. For Intel, the tuning knob to fix this is usually the power limit and power limit duration. I cannot recommend anything for AMD here. Other than that, I was also seeing the opcode errors a lot in the past. It's probably not much to worry about. But they seem mostly gone since I cleaned old and stale fossilize caches from my system, something like:
This removes all cache files not touched in the last 180 days. So if you didn't play a game for longer, it may completely remove caches for such games. |
I have 0 overclock here supposedly. The chip by default "overclocks" itself until it reaches 85º. With max frequency capped at 5Ghz. Thats default. Only thing I've done is a little undervolt and capping max temp to 70º so it shouldn't be stressed at all... Will try cleaning the caches, but would love to get some kind of manual mechanism to trigger shader cache compilation to try after changing different BIOS settings |
Then it should be fine. As said, I'm not familiar with AMD. There is a tool to decode MCEs, tho:
Well, after cleaning, shader compilation should kick in automatically if you start Steam or a game. But since I do irregular cleanups, I'm no longer seeing an overly active fossilize process, neither do I get long compilation times when starting a game. Usually, there's only high activity after updating the graphics drivers and/or libs. |
I closed Steam, cleared the whole shader cache directory (For games I'm playing nowadays, even this morning), restarted the computer into the BIOS, check that all ASUS Enhancement stuff on the Overclock page was dissabled (And yes, all was disabled), rebooted, launched Steam, and the fossilize is not kicking it. (No processes, no CPU load and no subdirectories inside the shadercache dir) |
Well, if this is different from previous behavior, things seemingly have improved. But of course that's completely unrelated to your original problem. Maybe check if starting a game, playing for a little bit, then stop gaming, kicks in fossilize. You should try with a recent game and one that you didn't play for a long time. I don't know of a manual way to start fossilize. In theory, it's possible from the cmdline but you should avoid trying that because Steam sets up a very tailored environment to run it, and running fossilize from outside this environment is not supported. |
Oh, if you completely purged the directory, fossilize will only start pre-compiling caches after it either downloaded its crowed-sources caches, or collected fresh caches from your gaming sessions. I may have misunderstood what you wrote. |
I don't see it kicking it after playing a game. But looks like it kicked in while playing the game, shadercache dirs are created Now, I see that on the mesa_sahder_cache_sf dir, there are 2 dirs, and by the names, looks like its generating sahder caches for both the dedicated gpu and the integrated gpu. Could it be something related to the integrated one? |
So, went to the storage settings on Steam, disabled downloading pre-compiled shaders and background processing, restarted steam and enabled both. Fossilize_Replay started working. Single process. Very low CPU usage, and still getting a segfault? Not sure it's related to overclock or stress on the CPU... (I've been checking the system monitor and dmesg and haven't seen a peak on CPU usage) when those two segfaults. Now, after these there are some CPU usage "spikes" (That I know are normal with Fossilize since I usually see multple processes) but CPU is at most, at 15% global usage System Monitor: And just got there 2 messages:
|
I don't know if it means something that the message:
it's always the same for all segfaults, referring to what I understand is the same memory address all the time? |
Well, fossilize works in two stages: While playing the game, it will record the shader pipeline (in its GPU-agnostic raw format). Additionally, it may download crowd-sourced pipeline caches so you get the "source code" of shaders that you, while playing the game, have not encountered yet. This will actually not yet compile the shaders for your GPU, it only collects what is used and thus creates the directories (well, ofc, your driver still compiles these shaders and caches them, but fossilize does nothing at this point yet except recording the shader pipeline). Then, when the system is idle or while starting a game, fossilize will start compiling those caches using your GPU drivers - for each GPU driver it encounters (this is the fossilize_replay process). This reduces or eliminates stutters in games because shader compilation on demand would block rendering. Your GPU drivers now create another set of files, similar in size to the pipeline caches previously created. These compiled pipelines are specific to your system (driver versions, GPU, libs) and need to be recompiled if any component changes. If you see two GPUs, I would expect your shader cache size to triple, because you have the raw pipeline from the game, and two compiled pipelines for your GPUs. Even tho shaders are GPU code, they are usually compiled on your CPU. The errors you're seeing may come from your integrated GPU driver if it does not support all types of shaders. Or it may come from old buggy caches. Thus, it's probably harmless. The opcode errors do not indicate a hardware problem. It may just come from incomplete driver support. You could check if the error is gone if you disable your integrated GPU. The memory address is virtually mapped, so it probably points to a different physical memory location each time the process starts. As said, I don't think this indicates are hardware problem. And you already memtested, so your memory is fine. |
I haven't noticed any "real" issue when playing games, I was just trying to figure out this since it has always happened, and also because fossilize_replay is filling system coredumps:
There are a couple non-related to fossilize, bit the amounts of coredumps is.... yikes. I will try to see what happens if I disable de integrated GPU on the BIOS, but I hope that if that's the case, this can be fixed since I use it for some stuff... |
Just in case, I tried launching steam like:
To see if that would dump some info, triggered coredumps but the DUMP_PATH is empty |
Still triggering the issue with Integrated GPU disabled from the BIOS |
Yeah, these are gone for me since some time now. But I also had a lot of them. For me, the remaining coredumps are from Maybe try launching Steam with Note: Be aware that disabling the CEF sandbox (for the embedded Chromium framework) may expose you to some risks if browsing websites inside the Steam client. |
I'm still getting way of these messages. Not sure when this gets updated on Steam and at this point "all my cpus" have segfaulted but I have no issue when stress testing the machine or doing any other kind of heavy task like compiling rust or playing some cpu demanding games... |
Hi,
I've been trying to figure out this issue, when fossilize is triggered I usually get some segfaults from dmesg, for example:
At first I thought I could be related to the little undervolt I've done to the CPU, which hasn't caused any issue or crashed the system. I have been testing with Core Cycler since it has stress test for different CPU arquitectures focusing on just a single core when doing it. Passed 7 iterations in around 12 hours. So I'm not sure it's something related to that.
Also checking for mce errors on the journal, all I see is:
I'm not sure if I can get some more help figuring out what could it be here.
System Info:
Arch Linux (Updated)
Ryzen 7800X3D
32 GB of RAM
AMD 7800XT
Shader log in case it helps...
shader_log.txt
The text was updated successfully, but these errors were encountered: