-
-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] NVMe device shows as failed because of just 3 media errors #729
Comments
NVME thresholds are from the links in https://github.com/AnalogJ/scrutiny/blob/master/webapp/backend/pkg/thresholds/nvme_attribute_metadata.go if you can share some links detailing how Media Errors works, and what an optimal range looks like, I'd be happy to tweak it. |
Hi, there are no specific thresholds for media errors. Whats important is that they dont increase. So what might be a good feature (but a much more difficult one to implement) is to add a button "Accept as new acceptable value". So that the drive would not be failed, not until the number increases. This is in accordance with what I found online, as long as the number does not go up, its OK. This would also make it more useful, as now I consider the drive as "permanently failed". If it were to say "Healthy" and only change to "Failed" when the number increased, it would indicate that I should dedicate my attention to it. Now I can easily overlook the increase of media errors, just because it is "failed" all the time anyways. WDYT? |
I can say I'd love a feature like this. Maybe make it flag the drive with a warning status instead of giving it a clean bill of health. I've got a drive with a few reallocated sectors, for example, which makes Scrutiny flag it as failed - but I can do multiple full disk writes and reads with no errors and that count doesn't increase. Between that and the redundancy/number of hot spares I have, it's good enough for my purposes. |
I really think that 3 media errors that dont go up should not mark the device as failed. I read many forums online, seems that as long as this is not increasing, its fine. The threshold value is 0 here, but that makes no sense. Look:
The device is really healthy. Is there something I am missing?
I think it would be cool if I could set the expected value to 3 in config. Then the disk would show as OK as long as this did not increase. This would bring my attention to the disk whenever the errors would go up, and if it starts happening often, I know its time to switch it. WDYT?
The text was updated successfully, but these errors were encountered: