SMART (from the English self-monitoring, analysis and reporting technology - technology for self-monitoring, analysis and reporting) is a technology for assessing the status of a hard disk with built-in self-diagnostic equipment, as well as a mechanism for predicting the time of its failure.
The first hard disk with a self-diagnosis system was introduced in 1992 by IBM in IBM 9337 disk arrays for AS / 400 servers using IBM 0662 SCSI-2 disks. The technology was called Predictive Failure Analysis (PFA). Several key parameters were measured and evaluated directly by the disk controller. The result was limited to only one bit: either everything is in order, or the disk may soon fail. Compaq, Seagate, Quantum and Conner later developed another technology called IntelliSafe. It had a general protocol for issuing information about the state of the hard disk, but each company determined the measured parameters and their thresholds independently.
In early 1995, Compaq proposed standardizing technology. The companies IBM, Seagate, Quantum, Conner and Western Digital (the latter at that time did not yet have a system for tracking the parameters of the hard disk) supported this idea. IntelliSafe technology was taken as a basis. The jointly developed standard was called SMART. The SMART I standard provided for monitoring of the main parameters and was launched only after a command.
Hitachi participated in the development of SMART II, which proposed a method of full self-diagnosis of the drive (extended self-test), and also a function for logging errors. SMART III introduced the function of detecting surface defects and the ability to repair them “transparently” to the user.
SMART monitors the basic characteristics of the drive, each of which receives an assessment. Characteristics can be divided into two groups:
- 1) parameters reflecting the natural aging process of the hard disk (spindle speed, number of head movements, number of on-off cycles);
- 2) current parameters of the drive (head height above the disk surface, the number of reassigned sectors, track search time and the number of search errors).
The data is stored in a hexadecimal form called raw value, and then converted to value - a value that symbolizes reliability relative to some reference value. Typically, value ranges from 0 to 100 (some attributes have values from 0 to 200 and from 0 to 253).
A high score indicates the absence of changes in this parameter or its slow deterioration. Low - about a possible failure soon.
A value less than the minimum at which the manufacturer guarantees the failure-free operation of the drive, means the node failure.
SMART technology allows you to:
- 1) monitoring of state parameters;
- 2) surface scanning;
- 3) surface scanning with automatic replacement of doubtful sectors with reliable ones.
It should be noted that SMART technology allows predicting the failure of a device as a result of mechanical malfunctions, which makes up about 60% of the causes of hard drive failure. SMART is not able to predict the effects of a power surge or mechanical shock.
It should be noted that drives cannot independently report their status through SMART technology, but there are special programs for this. Thus, the use of SMART technology is impossible without the following two components:
- 1) software built into the drive controller;
- 2) External software embedded in the host.
Programs that display the state of SMART attributes work according to the following algorithm:
- Checking if the drive supports SMART technology;
- Sending a query command to SMART tables;
- Getting tables to the application buffer;
- Deciphering table structures, extracting the attribute number and its numerical value;
- Comparison of standardized attribute numbers to their names (sometimes - depending on type, model or manufacturer, as, for example, in the Victoria program);
- Conclusion of numerical values in a form convenient for perception (for example, conversion of hexadecimal values to decimal);
- Extracting attribute tables from the flag tables (signs characterizing the attribute’s purpose in this drive, for example, “vital” or “counter”);
- Display the general status of the device based on all tables, values, and flags.
A table of known SMART attributes is as follows:
|No.||Hex||Attribute name||Better if ...||Description|
|01||01||Raw Read Error Rate|| The frequency of errors when reading data from a disk, the origin of which is due to the hardware of the disk.
For all Seagate, Samsung (F1 families and newer) and Fujitsu 2.5 drives? this is the number of internal corrections of the data carried out before the output to the interface, therefore, you can calmly respond to frighteningly huge numbers.
|02||02||Throughput performance||Total disk performance. If the attribute value decreases, then there is a high probability that there are problems with the disk.|
|03||03||Spin up time|| The time it takes to spin a disk pack from idle to operating speed.
It grows when the mechanics wear out (increased friction in the bearing, etc.), and may also indicate poor power supply (for example, a voltage drop at the start of the disk).
|04||04||Start / Stop Count||The total number of spindle start-stop cycles. Disks of some manufacturers (for example, Seagate) have a counter for turning on the power saving mode. The raw value field stores the total number of starts / stops of the disk.|
|05||05||Reallocated Sectors Count|| The number of sector reassignment operations. When a disk detects a read / write error, it marks the sector “reassigned” and transfers the data to a specially designated spare area. That is why it is impossible to see bad blocks on modern hard drives - they are all hidden in reassigned sectors. This process is called remapping, and the reassigned sector is called remap. The higher the value, the worse the surface condition of the discs. The raw value field contains the total number of reassigned sectors.
An increase in the value of this attribute may indicate a deterioration in the surface of the pancake disk.
|06||06||Read channel margin||Reserve feed reading. The purpose of this attribute is not documented. In modern drives is not used.|
|07||07||Seek Error Rate||Error rate when positioning a block of magnetic heads. The more of them, the worse the condition of the mechanics and / or surface of the hard disk. Also, the value of the parameter can be affected by overheating and external vibrations (for example, from neighboring disks in the basket).|
|08||08||Seek Time Performance||The average performance of magnetic head positioning operations. If the attribute value decreases (slowing down positioning), then there is a high probability of problems with the mechanical part of the drive heads.|
|09||09||Power-on Time Count (Power-On Hours)||The number of hours (minutes, seconds - depending on the manufacturer) spent in the on state. As a threshold value for it, the passport MTBF (MTBF - mean time between failure) is selected.|
|ten||0A||Spin-up retry count||The number of repeated attempts to spin up the disks to operating speed if the first attempt was unsuccessful. If the value of the attribute increases, then the probability of malfunctions with the mechanical part is high.|
|eleven||0B||Recalibration retries||The number of retries of recalibration requests if the first attempt was unsuccessful. If the attribute value increases, then there is a high probability of problems with the mechanical part.|
|12||0C||Device Power Cycle Count||The number of complete on / off cycles of the disk.|
|13||0D||Soft Read Error Rate||The number of reading errors caused by software that could not be fixed. All errors are not mechanical in nature and indicate only incorrect markup / interaction with the program disk or the operating system.|
|184||B8||End-to-end error||This attribute - part of the HP SMART IV technology - means that after data transfer through the cache memory, the parity of the data between the host and the hard disk does not match.|
|187||BB||Reported UNC Errors||Errors that could not be repaired using hardware troubleshooting methods.|
|188||BC||Command timeout||The number of interrupted operations due to the HDD timeout. Usually this attribute value should be zero, and if the value is much higher than zero, then most likely there will be some serious power problems or oxidized data cables.|
|190||BE||Airflow Temperature (WDC)||Air temperature inside the hard drive enclosure. For Seagate discs, it is calculated using the formula (100 - HDA temperature). For Western Digital discs - (125-HDA).|
|191||Bf||G-sense error rate||The number of errors resulting from shock loads. The attribute stores the readings of the built-in accelerometer, which records all shocks, shocks, falls and even a sloppy disk installation in the computer case.|
|192||C0||Power-off retract count||The number of shutdown or crash cycles (power on / off).|
|193||C1||Load / Unload Cycle||The number of cycles of moving the block of magnetic heads in the parking zone / in the working position.|
|194||C2||HDA temperature||Here are the readings of the built-in temperature sensor for the mechanical part of the disk - banks (HDA - Hard Disk Assembly). Information is taken from the built-in temperature sensor, which is one of the magnetic heads, usually the bottom in the bank. The attribute's bit fields contain the current, minimum, and maximum temperatures. Not all programs working with SMART correctly parse these fields, so their testimony should be treated critically.|
|195||C3||Hardware ECC Recovered||The number of error correction by the hardware of the disk (reading, positioning, transmission via the external interface). On disks with a SATA interface, the value often worsens with an increase in the frequency of the system bus - SATA is very sensitive to overclocking.|
|196||C4||Reallocation Event Count||The number of reassignment operations. The “raw value” field of the attribute stores the total number of attempts to transfer information from reassigned sectors to the backup area. Both successful and unsuccessful attempts are taken into account.|
|197||C5||Current Pending Sector Count|| The number of sectors that are candidates for replacement. They have not yet been identified as bad, but reading from them is different from reading a stable sector, these are the so-called suspicious or unstable sectors. In case of successful subsequent reading of the sector, it is excluded from the list of candidates. In case of repeated erroneous readings, the drive tries to restore it and performs a remapping operation.
An increase in the value of this attribute may indicate physical degradation of the hard disk.
|198||C6||Uncorrectable Sector Count||The number of unadjustable (by disk means) sectors. In the case of an increase in the number of errors, the probability of critical surface defects and / or drive mechanics is high.|
|199||C7||UltraDMA CRC Error Count|| The number of errors that occur when transmitting data via the external interface in UltraDMA mode (packet integrity violation, etc.). The growth of this attribute indicates a bad (wrinkled, twisted) cable and bad contacts. Also, such errors appear during PCI bus overclocking, power failures, strong electromagnetic interference, and sometimes due to the fault of the driver.
Perhaps the reason is a poor-quality loop. To fix, try using a SATA cable without latches, which has a tight connection with the contacts of the disk.
|200||C8|| Write Error Rate /
Multi-Zone Error Rate
|Shows the total number of errors that occurred while recording a sector. Shows the total number of disk write errors. It can serve as an indicator of surface quality and drive mechanics.|
|201||C9||Soft read error rate|| The frequency of occurrence of "software" errors when reading data from disk.
This parameter shows the frequency of errors during read operations from the surface of the disk due to the fault of the software, and not the hardware of the drive.
|202||CA||Data Address Mark errors||Number of Data Address Mark (DAM) errors (or) vendor-specific.|
|203||CB||Run out cancel||The number of ECC errors.|
|204||CC||Soft ECC correction||Number of ECC errors programmatically corrected.|
|205||CD||Thermal asperity rate (TAR)||Number of thermal asperity errors.|
|206||CE||Flying height||The height between the head and the surface of the disc.|
|207||CF||Spin high current||The magnitude of the current during the promotion of the disk.|
|208||D0||Spin buzz||Number of buzz routines to spin up the drive.|
|209||D1||Offline seek performance||Drive's seek performance during offline operations.|
|220||DC||Disk shift||Displacement distance of the disk block relative to the spindle. Mostly due to shock or fall. Unit is unknown. As the attribute increases, the drive quickly becomes inoperative.|
|221||DD||G-Sense Error Rate||The number of errors that occurred due to external loads and shocks. The attribute stores the readings of the built-in shock sensor.|
|222||DE||Loaded hours||The time spent by the block of magnetic heads between unloading from the parking area to the working area of the disk and loading the block back into the parking area.|
|223||Df||Load / Unload Retry Count||The number of new attempts to unload / load the block of magnetic heads to / from the parking area after an unsuccessful attempt.|
|224||E0||Load friction||The magnitude of the friction force of the block of magnetic heads when it is unloaded from the parking area.|
|225||E1||Load cycle count||The number of cycles of moving the block of magnetic heads in the parking area.|
|226||E2||Load 'in'-time||The time during which the drive unloads the magnetic heads from the parking area to the working surface of the disk.|
|227||E3||Torque amplification count||The number of attempts to compensate for the torque.|
|228||E4||Power-Off Retract Cycle||The number of repetitions of the automatic parking of the block of magnetic heads as a result of turning off the power.|
|230||E6||GMR Head Amplitude||Amplitude of “jitter” (distance of repetitive movement of a block of magnetic heads).|
|231||E7||Temperature||Hard drive temperature.|
|240||F0||Head flying hours||The total time the head unit is in the operating position in hours.|
|250||FA||Read error retry rate||The number of errors while reading a hard disk.|
- - Larger parameter value is better
- - A lower value is better
- Critical Parameter - Red String Background