SMART (from the English. Self-monitoring, analysis and reporting technology - technology of self-monitoring, analysis and reporting) - a technology for assessing the state of a hard disk with built-in self-diagnostic equipment, as well as a mechanism for predicting the time of its failure.
The first hard drive with a self-diagnostic system was introduced in 1992 by IBM in IBM 9337 disk arrays for AS / 400 servers using IBM 0662 SCSI-2 disks. The technology was called Predictive Failure Analysis (PFA). Several key parameters were measured, they were evaluated directly by the disk controller. The result was limited to only one bit: either everything is in order, or the disk may soon fail. Later, Compaq, Seagate, Quantum and Conner developed another technology called IntelliSafe. It was a general protocol for issuing information about the status of the hard disk, but the measured parameters and their thresholds, each company determined independently.
In early 1995, Compaq offered to standardize technology. IBM, Seagate, Quantum, Conner and Western Digital (the latter did not have a hard disk tracking system at the time) supported this idea. The basis was taken technology IntelliSafe. The jointly developed standard was called SMART The SMART I standard provided for monitoring the basic parameters and was launched only after the command.
Hitachi participated in the development of SMART II, offering a method of full self-diagnosis of the drive (extended self-test), also appeared error logging function. SMART III has a feature for detecting surface defects and the ability to restore them transparently to the user.
SMART monitors the main characteristics of the drive, each of which receives a rating. Characteristics can be divided into two groups:
- 1) parameters reflecting the process of natural aging of the hard disk (spindle speed, number of head movements, number of on-off cycles);
- 2) the current parameters of the drive (the height of the heads above the disk surface, the number of reassigned sectors, the track search time and the number of search errors).
The data is stored in hexadecimal form, called the raw value (“raw values”), and then converted into a value, a value that symbolizes reliability relative to some reference value. Typically, the value ranges from 0 to 100 (some attributes have values from 0 to 200 and from 0 to 253).
A high score indicates the absence of changes in this parameter or its slow deterioration. Low - a possible failure in the near future.
A value less than the minimum at which the manufacturer guarantees the failure-free operation of the drive means that the node fails.
SMART technology allows you to:
- 1) monitoring state parameters;
- 2) surface scanning;
- 3) surface scanning with automatic replacement of doubtful sectors with reliable ones.
It should be noted that the SMART technology allows you to predict the failure of the device as a result of mechanical faults, which is about 60% of the causes of hard drive failure. SMART cannot predict the effects of a voltage surge or mechanical shock.
It should be noted that drives cannot independently report their condition using SMART technology, however there are special programs for this. Thus, the use of SMART technology is impossible without the following two components:
- 1) software embedded in the storage controller;
- 2) External software embedded in the host.
Programs that display the status of SMART attributes work according to the following algorithm:
- Check for the presence of SMART drive support technology;
- Sending a query command SMART-tables;
- Getting tables to the application buffer;
- Decoding of table structures, extracting the attribute number and its numerical value;
- Comparison of standardized attribute numbers to their names (sometimes depending on the type, model or manufacturer, as, for example, in the Victoria program);
- Outputting numerical values in a form that is easy to read (for example, converting hexadecimal values to decimal values);
- Extracting attribute flags from tables (attributes characterizing the attribute assignment in a given drive, for example, “vital” or “counter”);
- Displays the overall status of the device based on all tables, values, and flags.
The table of known attributes SMART as follows:
|No||Hex||Attribute name||Better if ...||Description|
|01||01||Raw Read Error Rate|| The error rate when reading data from the disk, the origin of which is due to the hardware part of the disk.
For all Seagate, Samsung (F1 and newer) drives and Fujitsu 2.5? This is the number of internal corrections of data carried out before the output to the interface, hence the frighteningly huge numbers can be responded calmly.
|02||02||Throughput Performance||Overall disk performance. If the attribute value decreases, then it is likely that there are problems with the disk.|
|03||03||Spin-up time|| Time of promotion of the package of disks from the state of rest to the working speed.
It grows when mechanics are worn out (increased friction in the bearing, etc.), may also indicate poor-quality power (for example, a voltage drop at the start of a disc).
|04||04||Start / Stop Count||The total number of spindle start-stop cycles. Some discs manufacturers (for example, Seagate) - power-on mode counter. The raw value field stores the total number of starts / stops of the disk.|
|05||05||Reallocated Sectors Count|| The number of sector reassignment operations. When the disk detects a read / write error, it marks the sector as “reassigned” and transfers the data to a dedicated backup area. That is why it is impossible to see bad-blocks on modern hard disks - they are all hidden in reassigned sectors. This process is called remapping, and the remapped sector is remap. The larger the value, the worse the surface condition of the disks. The raw value field contains the total number of reassigned sectors.
The increase in the value of this attribute may indicate a deterioration of the surface state of the pancake disc.
|06||06||Read Channel Margin||Stock channel reading. The purpose of this attribute is not documented. In modern drives is not used.|
|07||07||Seek Error Rate||Error rate during positioning of the magnetic heads block. The more of them, the worse the condition of the mechanics and / or the surface of the hard disk. Also, the parameter value can be affected by overheating and external vibrations (for example, from adjacent discs in the basket).|
|08||08||Seek Time Performance||The average performance of the positioning operation of magnetic heads. If the attribute value decreases (slowing down positioning), then there is a high probability of problems with the mechanical part of the actuator.|
|09||09||Power-on Time Count (Power-On Hours)||The number of hours (minutes, seconds - depending on the manufacturer), held in the on state. Passport time between failures (MTBF - mean time between failure) is chosen as the threshold value for it.|
|ten||0A||Spin-Up Retry Count||The number of repeated attempts to spin a disc to its operating speed in case the first attempt was unsuccessful. If the attribute value increases, then there is a high probability of problems with the mechanical part.|
|eleven||0B||Recalibration Retries||The number of retries of recalibration requests in case the first attempt was unsuccessful. If the attribute value increases, then the probability of problems with the mechanical part is high.|
|12||0C||Device Power Cycle Count||The number of complete on-off cycles of the disk.|
|13||0D||Soft Read Error Rate||The number of errors in reading, due to the fault of the software that could not be corrected. All errors are not of a mechanical nature and indicate only the wrong markup / interaction with the disk programs or the operating system.|
|184||B8||End-to-End error||This attribute — part of HP SMART IV technology — means that after data is transferred through the cache, the parity of the data between the host and the hard disk does not match.|
|187||BB||Reported UNC Errors||Errors that could not be recovered using hardware error resolution techniques.|
|188||BC||Command timeout||The number of interrupted operations due to HDD timeout. Usually this attribute value should be zero, and if the value is much higher than zero, then most likely there will be some serious power problems or oxidized data cables.|
|190||BE||Airflow Temperature (WDC)||The temperature of the air inside the hard drive. For Seagate drives, it is calculated using the formula (100 - HDA temperature). For Western Digital drives - (125- HDA).|
|191||Bf||G-sense error rate||The number of errors resulting from shock loads. The attribute stores the readings of the built-in accelerometer, which records all shocks, jolts, falls and even inaccurate installation of the disk in the computer case.|
|192||C0||Power-off retract count||The number of shutdown cycles or crashes (power on / off drive).|
|193||C1||Load / Unload Cycle||The number of cycles of movement of the block of magnetic heads in the parking zone / in the working position.|
|194||C2||HDA temperature||The readings of the built-in thermal sensor for the mechanical part of the disk - banks (HDA - Hard Disk Assembly) are stored here. Information is removed from the built-in thermal sensor, which is one of the magnetic heads, usually the bottom in the bank. The current, minimum and maximum temperatures are fixed in the attribute bit fields. Not all programs that work with SMART correctly parse these fields, so their testimony should be treated critically.|
|195||C3||Hardware ECC Recovered||The number of error correction hardware disk (read, positioning, transmission on the external interface). On disks with a SATA interface, the value often worsens as the system bus frequency increases - SATA is very sensitive to overclocking.|
|196||C4||Reallocation Event Count||The number of reassignment operations. The attribute's “raw value” field stores the total number of attempts to transfer information from the reassigned sectors to the spare area. Both successful and unsuccessful attempts are counted.|
|197||C5||Current Pending Sector Count|| The number of sectors that are candidates for replacement. They have not yet been identified as bad, but reading them is different from reading a stable sector, these are the so-called suspicious or unstable sectors. In case of successful subsequent reading of the sector, it is excluded from the number of candidates. In case of repeated erroneous reads, the drive tries to restore it and performs a remapping operation.
The increase in the value of this attribute may indicate physical degradation of the hard disk.
|198||C6||Uncorrectable Sector Count||The number of non-adjustable (disk means) sectors. In the case of an increase in the number of errors, the probability of critical surface defects and / or mechanics of the drive is high.|
|199||C7||UltraDMA CRC Error Count|| The number of errors that occur when data is transmitted via the external interface in UltraDMA mode (packet integrity violations, etc.). The growth of this attribute indicates a bad (crumpled, twisted) cable and bad contacts. Also, such errors appear during PCI bus overclocking, power failures, strong electromagnetic pickups, and sometimes due to the driver.
Perhaps the reason for poor quality plume. To fix this, try using a SATA loopless cable with a tight connection to the drive contacts.
|200||C8|| Write Error Rate /
Multi-Zone Error Rate
|Shows the total number of errors that occur when writing a sector. Shows the total number of disk write errors. Can serve as an indicator of surface quality and drive mechanics.|
|201||C9||Soft read error rate|| The frequency of occurrence of "software" errors when reading data from the disk.
This parameter shows the frequency of errors in reading operations from the disk surface due to software, rather than the hardware of the drive.
|202||CA||Data Address Mark errors||Mark (DAM) vendor-specific number of Data Address Marker (DAM).|
|203||CB||Run out cancel||The number of ECC errors.|
|204||CC||Soft ECC correction||The number of ECC errors corrected programmatically.|
|205||CD||Thermal asperity rate (TAR)||Number of thermal asperity errors.|
|206||CE||Flying height||The height between the head and the surface of the disk.|
|207||CF||Spin high current||The magnitude of the current during the promotion of the disc.|
|208||D0||Spin buzz||Number of buzz routines to spin up the drive.|
|209||D1||Offline seek performance||Search's performance during offline operations.|
|220||DC||Disk shift||The offset distance of the disk block relative to the spindle. Mostly arises from a blow or a fall. Unit of measure unknown. As the attribute increases, the disk quickly becomes inoperative.|
|221||DD||G-Sense Error Rate||The number of errors caused by external loads and shocks. The attribute stores the readings of the built-in shock sensor.|
|222||DE||Loaded hours||Time spent by the magnetic heads block between unloading from the parking area to the working area of the disc and loading the unit back to the parking area.|
|223||Df||Load / Unload Retry Count||The number of new attempts to unload / load the block of magnetic heads to / from the parking area after an unsuccessful attempt.|
|224||E0||Load friction||The magnitude of the friction force of the magnetic heads when it is unloaded from the parking area.|
|225||E1||Load cycle count||The number of cycles to move the block of magnetic heads in the parking area.|
|226||E2||Load 'in'-time||The time it takes for the drive to unload the magnetic heads from the parking area to the working surface of the disc.|
|227||E3||Torque Amplification Count||The number of attempts to compensate for the torque.|
|228||E4||Power-Off Retract Cycle||The number of repetitions of automatic parking of a block of magnetic heads as a result of power off.|
|230||E6||GMR Head Amplitude||The amplitude of the "jitter" (the distance of the repetitive movement of the magnetic head block).|
|231||E7||Temperature||The temperature of the hard disk.|
|240||F0||Head flying hours||The total time spent by the head unit in the working position in hours.|
|250||FA||Read error retry rate||The number of errors while reading a hard disk.|
- - A larger parameter value is better.
- - A smaller parameter value is better.
- Critical parameter - red line background