Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Any way to reset/clear SMART attributes (i.e. 199 UDMA_CRC_Error_Count) #172

Open
stevecs opened this issue Dec 7, 2024 · 15 comments
Open

Comments

@stevecs
Copy link

stevecs commented Dec 7, 2024

More of a question so if this is the wrong forum let me know. I have been looking to see if there is a way to clear/reset some SMART attributes or how to go about it. I am, in particular, looking at 199 UDMA_CRC_Error_Count as I have a good number of drives that have values there due to misbehaving back planes or bad cables/hba's in the past.

Yes I can track each variable to see if it increases but that gets harder to see/monitor with hundreds of drives. The ability to reset/clear that to zero would be very useful. Likewise other values like 188 Command Timeout.

I know that these can be cleared by the OEM on refurbished drives, as well as I've seen some instances where they can be cleared with certain firmware updates. But have not found any means so far to clear them for general/advanced users.

Vast majority of our rotating rust drives are seagate if that matters (ST4000's though ST20000's) if it's a oem specific type of command.

@vonericsen
Copy link
Contributor

HI @stevecs,

Thanks for the question!
There is no way to do this for SMART attributes on ATA drives.
SAS is a bit different with the ability to reset log pages (but not every counter is resettable).

Since SMART attributes are obsolete and being replaced with Device Statistics, I did check and there is a feature that can be used to reset some statistics. The SATA Phy event counters log also has something like this.
openSeaChest does not currently have an option to reset these, but I will look into adding those options.

The phy event counters log does support the CRC counter and device statistics has both CRC counter and command timeouts (It's called "Number of Resets Between Command Acceptance and Command Completion").

I will test a few different products for these features as well and update this issue as I find out more.

@stevecs
Copy link
Author

stevecs commented Dec 9, 2024

@vonericsen Thanks for taking a look and will be interested in what you find.

Yes I've been seeing the 'slow demise' of SMART attributes over the years (not to mention that they were never really standardized or enforced) but they did at least provide a lot of data that was very useful (and have always wanted similar details in SAS/SCSI/FC devices over the last ~40 years).

I was not aware of "Device Statistics" for SATA drives (to be fair, I only have a couple hundred SATA drives most are SAS/FC or NVME). So that's interesting. Would be interested if you could point to any URL's for specs or standards to that for "bedtime reading".

@vonericsen
Copy link
Contributor

@stevecs,

Yes I've been seeing the 'slow demise' of SMART attributes over the years (not to mention that they were never really standardized or enforced) but they did at least provide a lot of data that was very useful (and have always wanted similar details in SAS/SCSI/FC devices over the last ~40 years).

Yeah, there are multiple reasons for this, some dating back to when SMART was released in ATA-3.
There was an attempt to create standardized attributes that made it as far as a draft, but it ran into other issues. One of them was that vendors wanted to report more data than SMART had space to report. This was one of the main driving factors to create the Device Statistics log in place. There is also a technical report called SMART Attribute Descriptions (SAD) which was the reference the committee used to determine which attributes should be standardized based on what they could find searching around the web and reported by the members of the committee.

Seagate's firmware group has not given any timeline in which SMART attributes will be removed, but the device statistics log has been supported for quite a while now.

I was not aware of "Device Statistics" for SATA drives (to be fair, I only have a couple hundred SATA drives most are SAS/FC or NVME). So that's interesting. Would be interested if you could point to any URL's for specs or standards to that for "bedtime reading".

ACS-3 was the first spec to define the majority of the device statistics log on SATA and it is very similar to the standardized outputs from SAS/FC log pages. The SAT specs even translate many of these statistics to these log pages today as well.
All the most common attributes now have a statistic, although some may have a slightly different name (like I mentioned about command timeout).
There have been a few additions over time to the log for more statistics, including for Zoned devices and most recently CDL (command duration limits).
I have not had a chance to see how long the option to reinitialize certain statistics has been in the standard but I will be looking that up when I get started on implementing that option.

We support showing that page with --deviceStatistics in openSeaChest_SMART, and I also added support for SAS devices to read the various log pages to get similar output.

One other part of device statistics added to the standard is Device Statistics Notifications. The idea here is the drive can generate a sense code when one of these notifications triggers, similar to a SMART trip type of event. It can be based on a firmware monitored event and there is also support from the standard for programmable notifications as well.
I do not think there is a lot of support for setting notifications from software yet though which is why it is not yet part of openSeaChest, however we do have a way to note which statistics do allow setting a notification.

vonericsen added a commit to Seagate/opensea-operations that referenced this issue Feb 14, 2025
Refactored the code to simplify it and start reading which statistics support reinitialization.
Will need to identify a product that supports statistic reinitialization for full testing.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Feb 18, 2025
Adding function to set the date and time timestamp and updating how it is displayed to convert it to a more human readable timestamp.

Adding a function to reset/reinitialize supported device statistics.

Cleaned up more about how a statistic is looked up from its page and offset to be more readable and maintainable.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Feb 18, 2025
Adding support for issing the read log ext with the feature field set to 1 to trigger a reset of the phy event counters.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit that referenced this issue Feb 18, 2025
… options

Adding options to set the device timestamp as well as new options to issue the command to reset the SATA phy event counters and supported device statistics that support resetting.

[#172]

Signed-off-by: Tyler Erickson <[email protected]>
@vonericsen
Copy link
Contributor

I have created the branch feature/SATA_Dev_Stats_and_Phy_Counters_Refresh and have added the initial code to support issuing the commands that reset supported device statistics and the SATA phy event counters log.
The options are in the openSeaChest_Info utility at the moment since that also supports outputting these logs.

I am still working on testing, but feel free to pull this and test it out in the meantime.
Resetting SATA phy event counters has been in the standards for a while, but resetting device statistics is newer and I'm not sure which statistics support resetting at this time.

vonericsen added a commit that referenced this issue Feb 20, 2025
…e stats

Pulling in the library bug fix from populating device statistics.
Also added in additional Seagate device erase statistics that were already in openSeaChest_Info

[#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Feb 20, 2025
Adding CDL device statistics on SATA drives.
This code supports concurrent ranges 0-3 per ACS-6 statistics.
This also handles the difference between whole device policies and concurrent range policies depending on what the drive populates when it is read.

All these statistics are defined in the spec as supporting the read then initialize feature as well according to the standards.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit that referenced this issue Feb 20, 2025
@LAN007w
Copy link

LAN007w commented Apr 1, 2025

I have created the branch feature/SATA_Dev_Stats_and_Phy_Counters_Refresh and have added the initial code to support issuing the commands that reset supported device statistics and the SATA phy event counters log. The options are in the openSeaChest_Info utility at the moment since that also supports outputting these logs.

I am still working on testing, but feel free to pull this and test it out in the meantime. Resetting SATA phy event counters has been in the standards for a while, but resetting device statistics is newer and I'm not sure which statistics support resetting at this time.

Hello,

I have cloned the /SATA_Dev_Stats_and_Phy_Counters_Refresh branch and ran OpenSeaChest_Info --help, but I did not find any instructions regarding the reset function for SATA Phy event counters. Does this mean that the reset functionality is currently unavailable?

Thank you for your assistance!

@vonericsen
Copy link
Contributor

Hi @LAN007w,

The option should be there. It would be under the "SATA Only" section of the help.
I see it pushed, so I don't think it's an error with me not sharing 😄

The option is --resetATAPhyEvents so you can try passing that in or grepping the output for it just to verify.

Also, just to verify, you have cloned the feature/SATA_Dev_Stats_and_Phy_Counters_Refresh branch for openSeaChest and opensea-operations? (If you did a recursive clone of openSeaChest on this branch, it would have pulled the operations branch as well)

@LAN007w
Copy link

LAN007w commented Apr 1, 2025

Hi @LAN007w,

The option should be there. It would be under the "SATA Only" section of the help. I see it pushed, so I don't think it's an error with me not sharing 😄

The option is so you can try passing that in or grepping the output for it just to verify.--resetATAPhyEvents

Also, just to verify, you have cloned the feature/SATA_Dev_Stats_and_Phy_Counters_Refresh branch for openSeaChest and opensea-operations? (If you did a recursive clone of openSeaChest on this branch, it would have pulled the operations branch as well)

Hi @vonericsen ,

Thank you very much for your suggestions! I followed your advice and re-cloned the /SATA_Dev_Stats_and_Phy_Counters_Refresh branch. After running --resetATAPhyEvents, I received the response: “Successfully reinitialized SATA Phy event counter log.” Additionally, after running the --resetDevStats transport, I received: “Successfully reinitialized Device Statistics. NOTE: Only statistics marked with the read then initialize supported bit were reinitialized.”

However, when I tried to view the results using --deviceStatistics, I encountered an error with code 22.

To further investigate, I switched to openSeaChest-v24.08.1 and ran the check again. Unfortunately, I found that the "Number Of Interface CRC Errors" still shows as 14.
Drive Model:ST4000VX016
OS:Windows11

Any insights or suggestions you may have would be greatly appreciated!

Thank you again for your help.

@vonericsen
Copy link
Contributor

Hi @LAN007w,

The device statistics log command to reset statistics can only reset the ones that mark as reset capable.
The spec specifically calls out ZAC statistics and CDL statistics as needing this. I do not see limitations on the others, so that becomes up to the firmware to decide.

Resetting the phy event counters log will only reset that page.
After the reset, running --showPhyEvents should have reset the counters in this log. One of these is a CRC counter specific to the SATA interface.

However, when I tried to view the results using --deviceStatistics, I encountered an error with code 22.

Can you share the full output you received when this error happened?
We have a lot of security enhancements we've made, and one may have triggered this error (memory bounds checking).
I have not merged some fixes for this kind of error from the develop branch to this one yet, but I can do that since we have fixed a few known cases of this showing up. Before I do that, I want to make sure this error is not unique to your drive and what it is reporting.

@LAN007w
Copy link

LAN007w commented Apr 1, 2025

您好 ,

用于重置统计信息的 device statistics log 命令只能重置标记为 reset capable 的统计信息。该规范特别指出 ZAC 统计和 CDL 统计需要这样做。我没有看到其他的限制,所以这要由固件来决定。

重置 phy 事件计数器日志只会重置该页面。重置后,running 应该已经重置了此日志中的计数器。其中一个是特定于 SATA 接口的 CRC 计数器。--showPhyEvents

但是,当我尝试使用 --deviceStatistics 查看结果时,遇到了代码为 22 的错误。

您能否分享发生此错误时收到的完整输出?我们做了很多安全增强,其中一个可能触发了这个错误(内存边界检查)。我还没有将 develop 分支中针对此类错误的一些修复合并到这个分支中,但我可以这样做,因为我们已经修复了一些已知的出现的情况。在我这样做之前,我想确保这个错误不是您的驱动器及其报告的内容所独有的。

D:\002\openSeaChest\Make\VS.2019\x64\Debug>OpenSeaChest_Info -d PD0 --deviceStatistics

openSeaChest_Info - openSeaChest drive utilities - NVMe Enabled
Copyright (c) 2014-2025 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
openSeaChest_Info Version: 2.8.0-9_0_0 X86_64
Build Date: Apr 1 2025
Today: 20250402T004034 User: admin

Attempting to open handle "\.\PhysicalDrive0"
WIN: opened dev
WIN: Checking for volumes
WIN: getting SCSI address
WIN: det adapter descriptor
Adapter BusType: SATA
WIN: get device descriptor
WIN: get adapter IDs (VID/PID for USB or PCIe)
WIN: Get MiniPort FWDL capabilities
WIN: get Win10 FWDL support
Got Win10 FWDL Info
Supported: 1
Payload Alignment: 512
maxXferSize: 131072
PendingActivate: 255
ActiveSlot: 0
Slot Count: 1
Firmware Shared: 0
Firmware Slot 0:
Read Only: 0
Revision: CV10
Drive BusType: SATA
WIN: get SMART IO support SATA
WIN: filling device information
fill_Drive_Info_Data: -->
fill_In_ATA_Drive_Info -->
Drive type: 1
Interface type: 1
Media type: 0
SN: WW64T4QV
fill_In_ATA_Drive_Info <--
Drive type: 1
Interface type: 1
Media type: 0
fill_Drive_Info_Data: <--
WIN: Additional CSMI check
WIN: Looking for CSMI IO support

\.\PhysicalDrive0 - ST4000VX016-3CV104 - WW64T4QV - CV10 - ATA
get_ATA_Log_Size: logAddress 4, gpl=true, smart=true
get_ATA_Log: -->
get_ATA_Log: <--
===Device Statistics===
* = condition monitored with threshold (DSN Feature)
! = monitored condition met
- = supports notification (DSN Feature)
^ = supports reinitialization/reset
Statistic Name: Threshold: Value:

---General Statistics---
LifeTime Power-On Resets N/A 29
Power-On Hours N/A 112 hours
Logical Sectors Written N/A 1510784300
Number Of Write Commands N/A 5986883
Logical Sectors Read N/A 25059814096
Number Of Read Commands N/A 129137635
Date And Time Timestamp N/A abort_handler_s: safe_asctime: time_ptr->tm_year out of range
Error code: 22
Additional Error info:
File: device_statistics.c
Line: 8840
Function: print_Date_And_Time_Timestamp_Statistic
Expression: safe_asctime(timestr, TIME_STRING_LENGTH, milliseconds_Since_Unix_Epoch_To_Struct_TM(theStatistic.statisticValue, &time))

@vonericsen
Copy link
Contributor

@LAN007w,

Thanks for the information!
I think the issue is a change in newer ACS standards for how this statistic can report its value.
I'm looking into the details to update the parsing of this statistic.
I'll push an update to this issue when I figure out the correct change...hopefully not too long 😄

vonericsen added a commit to Seagate/opensea-operations that referenced this issue Apr 1, 2025
The date and time timestamp statistic can report either the most recent value or the power on hours in milliseconds.
We need a little more work beyond this commit, but this will workaround the crash/failure for now.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
@vonericsen
Copy link
Contributor

@LAN007w,

If you pull the feature branch on the opensea-operations submodule, I have made a workaround for now that should get past this spot in device statistics.
I'm working on a longer change, but I think it will work around the current error that is happening.

You can do this by going into the subproject opensea-operations and running git checkout develop followed by git pull --rebase.
I'll do the larger parser update before I push a change to openSeaChest to pick up this change in the submodule.

@LAN007w
Copy link

LAN007w commented Apr 1, 2025

@LAN007w,

If you pull the feature branch on the opensea-operations submodule, I have made a workaround for now that should get past this spot in device statistics. I'm working on a longer change, but I think it will work around the current error that is happening.

You can do this by going into the subproject opensea-operations and running git checkout develop followed by git pull --rebase. I'll do the larger parser update before I push a change to openSeaChest to pick up this change in the submodule.

Hi @vonericsen,

Thank you so much for your detailed explanation and for providing a temporary workaround. I really appreciate your efforts in addressing the issue, and it’s great to hear that further updates are being worked on. I’ll follow your suggestion to check out the develop branch and pull the latest changes.

Thanks again for your help and for keeping the community updated! Looking forward to the upcoming fixes.

Best regards,
LAN007w

@vonericsen
Copy link
Contributor

@LAN007w,

No Problem!

One more thing as I am working on changes, can you share the verbose output?
openSeaChet_Info -d <handle> --deviceStatistics -v 4
You can redirect this to a file with | tee verboseStats.txt or > verbostStats.txt at the end of the line.

This will print out all the drive's raw data responses from reading the log.
I'm trying to see which bits it is setting for the statistic (valid, normalized, etc) to compare to what I have as well as the value it returns so I can verify it is converting it properly.

@LAN007w
Copy link

LAN007w commented Apr 1, 2025

@LAN007w,

No Problem!

One more thing as I am working on changes, can you share the verbose output? openSeaChet_Info -d <handle> --deviceStatistics -v 4 You can redirect this to a file with | tee verboseStats.txt or > verbostStats.txt at the end of the line.

This will print out all the drive's raw data responses from reading the log. I'm trying to see which bits it is setting for the statistic (valid, normalized, etc) to compare to what I have as well as the value it returns so I can verify it is converting it properly.

@vonericsen

verboseStats.txt

vonericsen added a commit to Seagate/opensea-operations that referenced this issue Apr 1, 2025
…nd time timestamp

Fixing a bug in the set date and time timestamp command for ATA drives.
It was checking the wrong byte when validating if the timestamp is supported.

When printing the date and time timestamp it is possible for it to be representing the power on time in milliseconds, so this also adds support for that.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-common that referenced this issue Apr 1, 2025
Fixing a conversion error when converting from milliseconds since the Unix Epoch (Jan 1, 1970) to struct tm.
The year was initialized wrong, then when converting to tm_year it was not adjusting based on the unix epoch and leading to an invalid/out of range year during the conversion.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Apr 1, 2025
Refactored the code to simplify it and start reading which statistics support reinitialization.
Will need to identify a product that supports statistic reinitialization for full testing.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Apr 1, 2025
Adding function to set the date and time timestamp and updating how it is displayed to convert it to a more human readable timestamp.

Adding a function to reset/reinitialize supported device statistics.

Cleaned up more about how a statistic is looked up from its page and offset to be more readable and maintainable.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Apr 1, 2025
Adding support for issing the read log ext with the feature field set to 1 to trigger a reset of the phy event counters.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Apr 1, 2025
Adding CDL device statistics on SATA drives.
This code supports concurrent ranges 0-3 per ACS-6 statistics.
This also handles the difference between whole device policies and concurrent range policies depending on what the drive populates when it is read.

All these statistics are defined in the spec as supporting the read then initialize feature as well according to the standards.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Apr 1, 2025
The date and time timestamp statistic can report either the most recent value or the power on hours in milliseconds.
We need a little more work beyond this commit, but this will workaround the crash/failure for now.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit to Seagate/opensea-operations that referenced this issue Apr 1, 2025
…nd time timestamp

Fixing a bug in the set date and time timestamp command for ATA drives.
It was checking the wrong byte when validating if the timestamp is supported.

When printing the date and time timestamp it is possible for it to be representing the power on time in milliseconds, so this also adds support for that.

[Seagate/openSeaChest#172]

Signed-off-by: Tyler Erickson <[email protected]>
vonericsen added a commit that referenced this issue Apr 1, 2025
…nch changes

Pulling in fixes that were made in the develop branch of various libraries as well.
This will fix other bugs we've also run into due to bounds checking on the develop branch.

[#172]

Signed-off-by: Tyler Erickson <[email protected]>
@vonericsen
Copy link
Contributor

@LAN007w,

Thanks!
I found a conversion error that was part of the issue as well.
I tested it on a drive I have as well as with the data from your log, so the parsing of the date and time timestamp has been resolved.
When this statistic is not configured yet (the set date and time has not been sent to the drive yet) it can report the power on time in milliseconds, so I also added support for outputting that value.

I've merged the fixes from the develop branch into here as well which fixed a few bugs with -i and a few other places in the internals of the libraries that would sometimes trigger a stop.

The idea behind the SATA Phy event counters log and its CRC counter is that it will track this as errors happen. Then it can be reset to zero, tests can be run, then checked again.
Historically the SMART attribute and device statistic are not meant to be reset but are a lifetime counter.
I do not know if that will be changed in future firmware or not, but this SATA phy events log is what should be used when trying to determine if CRC errors are happening.
I have begun writing up this kind of information for a SMART/device statistics Wiki page, but it is still in progress, but I will try to capture this kind of information as I continue writing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants