SMART (Self-Monitoring, Analysis and Reporting Technology) is a technology for evaluating the state of a hard disk by the built-in self-diagnostics equipment, as well as a mechanism for predicting the time of its failure.
The first hard disk with a self-diagnostic system was introduced in 1992 by IBM in IBM 9337 disk arrays for AS / 400 servers using IBM 0662 SCSI-2 disks. The technology was called Predictive Failure Analysis (PFA). Several key parameters were measured, their evaluation was made directly by the disk controller. The result was limited to only one bit: either everything is in order, or the disk could soon fail. Later, Compaq, Seagate, Quantum and Conner developed another technology, called IntelliSafe. It had a common protocol for issuing information about the state of the hard drive, but measured the parameters and their thresholds each company determined independently.
In early 1995, Compaq proposed to standardize the technology. IBM, Seagate, Quantum, Conner and Western Digital (the latter at that time did not yet have a system for monitoring hard disk parameters) supported this idea. The basis was taken technology IntelliSafe. A jointly developed standard was called SMART Standard SMART I provided for monitoring of the main parameters and was launched only after the command.
In the development of SMART II, Hitachi participated, which offered a method of full self-test of the drive (extended self-test), and a function of error logging appeared. SMART III has a feature to detect surface defects and the ability to restore them "transparent" to the user.
SMART monitors the main characteristics of the drive, each of which receives an estimate. Characteristics can be divided into two groups:
- 1) parameters reflecting the process of natural aging of the hard disk (spindle speed, number of head movements, number of on / off cycles);
- 2) current parameters of the drive (head height above the disk surface, the number of reassigned sectors, track search time and the number of search errors).
Data is stored in a hexadecimal form called raw value, and then converted to a value, a value that symbolizes reliability relative to some reference value. Usually value is in the range from 0 to 100 (some attributes have values from 0 to 200 and from 0 to 253).
A high score indicates that there is no change in this parameter or its slow deterioration. Low - about a possible malfunction in the near future.
A value less than the minimum, at which the manufacturer guarantees trouble-free operation of the drive, means the failure of the node.
SMART technology allows you to:
- 1) monitoring of state parameters;
- 2) surface scanning;
- 3) Surface scanning with automatic replacement of doubtful sectors with reliable ones.
It should be noted that SMART technology allows you to predict the failure of the device as a result of mechanical malfunctions, which is about 60% of the causes of hard drive failure. Predicting the consequences of a power surge or a mechanical shock SMART is not capable.
It should be noted that drives can not independently report their status through SMART technology, but for this purpose there are special programs. Thus, the use of SMART technology is impossible without the following two components:
- 1) software integrated in the drive controller;
- 2) External software embedded in the host.
Programs that display the status of SMART-attributes work according to the following algorithm:
- Verify the availability of support for the SMART technology drive;
- Sending a SMART-table query command;
- Getting tables into the application buffer;
- Decoding table structures, extracting the attribute number and its numerical value;
- Comparison of standardized attribute numbers to their names (sometimes - depending on the type, model or manufacturer, as, for example, in the program Victoria);
- Output of numerical values in a form convenient for perception (for example, conversion of hexadecimal values to decimal ones);
- Extracting attributes from attribute tables (attributes characterizing the assignment of an attribute in a given drive, for example, "vital" or "counter");
- Output of the general state of the device based on all tables, values and flags.
A table of known SMART attributes looks like this:
|No.||Hex||Attribute Name||It's better if ...||Description|
|01||01||Raw Read Error Rate|| The frequency of errors when reading data from a disk, the origin of which is due to the hardware of the disk.
For all Seagate, Samsung (F1 and newer) and Fujitsu 2.5? this is the number of internal corrections of data held before issuing to the interface, therefore it is possible to react calmly to frighteningly large numbers.
|02||02||Throughput Performance||Overall disk performance. If the value of the attribute decreases, then there is a high probability that there are problems with the disk.|
|03||03||Spin-Up Time|| Time to unwind the package of disks from the resting state to the operating speed.
It increases with wear of mechanics (increased friction in the bearing, etc.), it can also indicate a poor-quality power supply (for example, a voltage drop at the start of the disk).
|04||04||Start / Stop Count||The total number of start-stop cycles of the spindle. Some disc manufacturers (for example, Seagate) have a power saving countdown counter. The raw value field stores the total number of starts / stops of the disk.|
|05||05||Reallocated Sectors Count|| Number of sector remapping operations. When a disc detects a read / write error, it marks the sector "remapped" and transfers the data to a dedicated spare area. That's why on modern hard disks one can not see bad blocks - they are all hidden in the reassigned sectors. This process is called remapping, and the remapped sector is remap. The higher the value, the worse the surface condition of the disks. The raw value field contains the total number of reassigned sectors.
An increase in the value of this attribute may indicate a worsening of the condition of the surface of the pancakes of the disk.
|06||06||Read Channel Margin||The reserve of the reading channel. The purpose of this attribute is not documented. In modern drives is not used.|
|07||07||Seek Error Rate||Frequency of errors when positioning a block of magnetic heads. The more of them, the worse the state of mechanics and / or the surface of the hard drive. Also, the value of the parameter can be affected by overheating and external vibrations (for example, from neighboring disks in the basket).|
|08||08||Seek Time Performance||The average performance of a positioning operation with magnetic heads. If the value of the attribute decreases (slowing of positioning), then the probability of problems with the mechanical part of the drive of the heads is high.|
|09||09||Power-on Time Count (Power-On Hours)||The number of hours (minutes, seconds - depending on the manufacturer), conducted in the on state. As a threshold value for him, the MTBF is the mean time between failure.|
|10||0A||Spin-Up Retry Count||The number of repeated attempts to untwist disks to the working speed in the event that the first attempt was unsuccessful. If the value of the attribute increases, then the probability of a malfunction with the mechanical part is high.|
|eleven||0B||Recalibration Retries||The number of repetitions of recalibration requests in the event that the first attempt was unsuccessful. If the value of the attribute increases, then the probability of problems with the mechanical part is high.|
|12||0C||Device Power Cycle Count||Number of complete cycles of turning the disk on and off.|
|13||0D||Soft Read Error Rate||The number of errors in reading, due to the fault of the software, which did not lend itself to correction. All errors are not of a mechanical nature and indicate only the wrong markup / interaction with the program or operating system disk.|
|184||B8||End-to-End error||This attribute - part of the HP SMART IV technology - means that after data transmission through the cache memory, the parity of data between the host and the hard disk does not match.|
|187||BB||Reported UNC Errors||Errors that could not be recovered using methods of error elimination by hardware.|
|188||BC||Command Timeout||Number of interrupted operations due to HDD timeout. Usually this attribute value should be zero, and if the value is much higher than zero, then most likely there will be some serious power problems or oxidized cables for data transfer.|
|190||BE||Airflow Temperature (WDC)||Air temperature inside the hard disk housing. For disks Seagate is calculated by the formula (100 - HDA temperature). For Western Digital - (125-HDA) discs.|
|191||BF||G-sense error rate||The number of errors that occur as a result of impact loads. The attribute stores the readings of the built-in accelerometer, which records all impacts, jolts, drops and even inaccurate disk installation in the computer case.|
|192||C0||Power-off retract count||Number of cycles of shutdowns or emergency failures (power on / off of the drive).|
|193||C1||Load / Unload Cycle||Number of cycles of moving the magnetic head unit to the parking zone / in the working position.|
|194||C2||HDA temperature||Here, the readings of the built-in thermal sensor for the mechanical part of the disk - banks (HDA - Hard Disk Assembly) are stored. Information is removed from the built-in temperature sensor, which is one of the magnetic heads, usually the lower one in the bank. In the bit fields of the attribute, the current, minimum and maximum temperatures are fixed. Not all programs working with SMART correctly disassemble these fields, so that their indications should be treated critically.|
|195||C3||Hardware ECC Recovered||The number of errors corrected by the hardware of the disk (reading, positioning, transmission via the external interface). On SATA drives, the value often deteriorates with increasing system bus frequency - SATA is very sensitive to overclocking.|
|196||C4||Reallocation Event Count||Number of reassignment operations. The "raw value" field of the attribute stores the total number of attempts to transfer information from the reassigned sectors to the spare area. Both successful and unsuccessful attempts are taken into account.|
|197||C5||Current Pending Sector Count|| Number of sectors that are candidates for replacement. They have not yet been identified as bad, but reading from them is different from reading a stable sector, these are the so-called suspicious or unstable sectors. In case of successful subsequent reading of the sector, it is excluded from the number of candidates. In case of repeated erroneous reads, the drive tries to restore it and performs a remapping operation.
An increase in the value of this attribute may indicate a physical degradation of the hard drive.
|198||C6||Uncorrectable Sector Count||The number of sectors that can not be corrected (by disk means). In the case of an increase in the number of errors, the probability of critical surface defects and / or drive mechanics is high.|
|199||C7||UltraDMA CRC Error Count|| The number of errors that occur when data is transferred over an external interface in UltraDMA mode (packet integrity violations, etc.). The growth of this attribute indicates a bad (mint, twisted) cable and bad contacts. Also, such errors appear when overclocking the PCI bus, power failures, strong electromagnetic interference, and sometimes through the driver's fault.
Probably the reason is in a low-quality trail. For correction, try using a SATA cable without latches, which has a tight connection with the contacts of the disk.
|200||C8|| Write Error Rate /
Multi-Zone Error Rate
|Shows the total number of errors that occur when writing a sector. Shows the total number of write errors per disk. It can serve as an indicator of the quality of the surface and mechanics of the drive.|
|201||C9||Soft read error rate|| Frequency of occurrence of "program" errors when reading data from a disk.
This parameter shows the frequency of errors in read operations from the surface of the disk through the fault of the software, rather than the hardware of the drive.
|202||CA||Data Address Mark errors||Number of Data Address Mark (DAM) errors (or) vendor-specific.|
|203||CB||Running out cancel||Number of ECC errors.|
|204||CC||Soft ECC correction||Number of ECC errors corrected programmatically.|
|205||Cd||Thermal asperity rate (TAR)||Number of thermal asperity errors.|
|206||CE||Flying height||Height between the head and the surface of the disc.|
|207||CF||Spin high current||The magnitude of the current when the disk is spinning.|
|208||D0||Spin buzz||Number of buzz routines to spin up the drive.|
|209||D1||Offline seek performance||The performance of the search during offline operations (Drive's seek performance during offline operations.)|
|220||DC||Disk Shift||The distance of the disc block displacement relative to the spindle. Mostly due to impact or fall. The unit of measure is unknown. With an increase in the attribute, the disk quickly becomes unworkable.|
|221||DD||G-Sense Error Rate||The number of errors due to external loads and shocks. The attribute stores the readings of the built-in shock sensor.|
|222||DE||Loaded Hours||The time spent by the block of magnetic heads between unloading from the parking area to the working area of the disk and loading the unit back into the parking area.|
|223||DF||Load / Unload Retry Count||Number of new attempts to unload / load the magnetic head unit to / from the parking area after an unsuccessful attempt.|
|224||E0||Load Friction||The magnitude of the frictional force of the magnetic head unit when it is unloaded from the parking area.|
|225||E1||Load Cycle Count||Number of cycles of moving the block of magnetic heads to the parking area.|
|226||E2||Load 'In'-time||The time during which the drive unloads the magnetic heads from the parking area to the working surface of the disk.|
|227||E3||Torque Amplification Count||The number of attempts to compensate for the torque.|
|228||E4||Power-Off Retract Cycle||The number of repetitions of the automatic parking of the magnetic head unit as a result of power off.|
|230||E6||GMR Head Amplitude||Amplitude of "jitter" (the distance of the repeated movement of the block of magnetic heads).|
|231||E7||Temperature||The temperature of the hard drive.|
|240||F0||Head flying hours||Total time of the head unit in working position in hours.|
|250||FA||Read error retry rate||The number of errors during the hard drive reading.|
- - Higher parameter value is better
- - A smaller parameter value is better
- Critical parameter - red line background