This page has been robot translated, sorry for typos if any. Original content here.

SMART

S.M.A.R.T.

SMART (from the English self-monitoring, analysis and reporting technology - technology for self-monitoring, analysis and reporting) is a technology for assessing the status of a hard disk with built-in self-diagnostic equipment, as well as a mechanism for predicting the time of its failure.

Story

The first hard disk with a self-diagnosis system was introduced in 1992 by IBM in IBM 9337 disk arrays for AS / 400 servers using IBM 0662 SCSI-2 disks. The technology was called Predictive Failure Analysis (PFA). Several key parameters were measured and evaluated directly by the disk controller. The result was limited to only one bit: either everything is in order, or the disk may soon fail. Compaq, Seagate, Quantum and Conner later developed another technology called IntelliSafe. It had a general protocol for issuing information about the state of the hard disk, but each company determined the measured parameters and their thresholds independently.

In early 1995, Compaq proposed standardizing technology. The companies IBM, Seagate, Quantum, Conner and Western Digital (the latter at that time did not yet have a system for tracking the parameters of the hard disk) supported this idea. IntelliSafe technology was taken as a basis. The jointly developed standard was called SMART. The SMART I standard provided for monitoring of the main parameters and was launched only after a command.

Hitachi participated in the development of SMART II, ​​which proposed a method of full self-diagnosis of the drive (extended self-test), and also a function for logging errors. SMART III introduced the function of detecting surface defects and the ability to repair them “transparently” to the user.

Description

SMART monitors the basic characteristics of the drive, each of which receives an assessment. Characteristics can be divided into two groups:

  • 1) parameters reflecting the natural aging process of the hard disk (spindle speed, number of head movements, number of on-off cycles);
  • 2) current parameters of the drive (head height above the disk surface, the number of reassigned sectors, track search time and the number of search errors).

The data is stored in a hexadecimal form called raw value, and then converted to value - a value that symbolizes reliability relative to some reference value. Typically, value ranges from 0 to 100 (some attributes have values ​​from 0 to 200 and from 0 to 253).

A high score indicates the absence of changes in this parameter or its slow deterioration. Low - about a possible failure soon.

A value less than the minimum at which the manufacturer guarantees the failure-free operation of the drive, means the node failure.

SMART technology allows you to:

  • 1) monitoring of state parameters;
  • 2) surface scanning;
  • 3) surface scanning with automatic replacement of doubtful sectors with reliable ones.

It should be noted that SMART technology allows predicting the failure of a device as a result of mechanical malfunctions, which makes up about 60% of the causes of hard drive failure. SMART is not able to predict the effects of a power surge or mechanical shock.

It should be noted that drives cannot independently report their status through SMART technology, but there are special programs for this. Thus, the use of SMART technology is impossible without the following two components:

  • 1) software built into the drive controller;
  • 2) External software embedded in the host.

Programs that display the state of SMART attributes work according to the following algorithm:

  • Checking if the drive supports SMART technology;
  • Sending a query command to SMART tables;
  • Getting tables to the application buffer;
  • Deciphering table structures, extracting the attribute number and its numerical value;
  • Comparison of standardized attribute numbers to their names (sometimes - depending on type, model or manufacturer, as, for example, in the Victoria program);
  • Conclusion of numerical values ​​in a form convenient for perception (for example, conversion of hexadecimal values ​​to decimal);
  • Extracting attribute tables from the flag tables (signs characterizing the attribute’s purpose in this drive, for example, “vital” or “counter”);
  • Display the general status of the device based on all tables, values, and flags.

SMART Attributes

A table of known SMART attributes is as follows:

No. Hex Attribute name Better if ... Description
01 01 Raw Read Error Rate
Less
The frequency of errors when reading data from a disk, the origin of which is due to the hardware of the disk.
For all Seagate, Samsung (F1 families and newer) and Fujitsu 2.5 drives? this is the number of internal corrections of the data carried out before the output to the interface, therefore, you can calmly respond to frighteningly huge numbers.
02 02 Throughput performance
More
Total disk performance. If the attribute value decreases, then there is a high probability that there are problems with the disk.
03 03 Spin up time
Less
The time it takes to spin a disk pack from idle to operating speed.
It grows when the mechanics wear out (increased friction in the bearing, etc.), and may also indicate poor power supply (for example, a voltage drop at the start of the disk).
04 04 Start / Stop Count The total number of spindle start-stop cycles. Disks of some manufacturers (for example, Seagate) have a counter for turning on the power saving mode. The raw value field stores the total number of starts / stops of the disk.
05 05 Reallocated Sectors Count
Less
The number of sector reassignment operations. When a disk detects a read / write error, it marks the sector “reassigned” and transfers the data to a specially designated spare area. That is why it is impossible to see bad blocks on modern hard drives - they are all hidden in reassigned sectors. This process is called remapping, and the reassigned sector is called remap. The higher the value, the worse the surface condition of the discs. The raw value field contains the total number of reassigned sectors.
An increase in the value of this attribute may indicate a deterioration in the surface of the pancake disk.
06 06 Read channel margin Reserve feed reading. The purpose of this attribute is not documented. In modern drives is not used.
07 07 Seek Error Rate
Less
Error rate when positioning a block of magnetic heads. The more of them, the worse the condition of the mechanics and / or surface of the hard disk. Also, the value of the parameter can be affected by overheating and external vibrations (for example, from neighboring disks in the basket).
08 08 Seek Time Performance
More
The average performance of magnetic head positioning operations. If the attribute value decreases (slowing down positioning), then there is a high probability of problems with the mechanical part of the drive heads.
09 09 Power-on Time Count (Power-On Hours)
Less
The number of hours (minutes, seconds - depending on the manufacturer) spent in the on state. As a threshold value for it, the passport MTBF (MTBF - mean time between failure) is selected.
ten 0A Spin-up retry count
Less
The number of repeated attempts to spin up the disks to operating speed if the first attempt was unsuccessful. If the value of the attribute increases, then the probability of malfunctions with the mechanical part is high.
eleven 0B Recalibration retries
Less
The number of retries of recalibration requests if the first attempt was unsuccessful. If the attribute value increases, then there is a high probability of problems with the mechanical part.
12 0C Device Power Cycle Count The number of complete on / off cycles of the disk.
13 0D Soft Read Error Rate
Less
The number of reading errors caused by software that could not be fixed. All errors are not mechanical in nature and indicate only incorrect markup / interaction with the program disk or the operating system.
184 B8 End-to-end error
Less
This attribute - part of the HP SMART IV technology - means that after data transfer through the cache memory, the parity of the data between the host and the hard disk does not match.
187 BB Reported UNC Errors
Less
Errors that could not be repaired using hardware troubleshooting methods.
188 BC Command timeout
Less
The number of interrupted operations due to the HDD timeout. Usually this attribute value should be zero, and if the value is much higher than zero, then most likely there will be some serious power problems or oxidized data cables.
190 BE Airflow Temperature (WDC)
Less
Air temperature inside the hard drive enclosure. For Seagate discs, it is calculated using the formula (100 - HDA temperature). For Western Digital discs - (125-HDA).
191 Bf G-sense error rate
Less
The number of errors resulting from shock loads. The attribute stores the readings of the built-in accelerometer, which records all shocks, shocks, falls and even a sloppy disk installation in the computer case.
192 C0 Power-off retract count
Less
The number of shutdown or crash cycles (power on / off).
193 C1 Load / Unload Cycle
Less
The number of cycles of moving the block of magnetic heads in the parking zone / in the working position.
194 C2 HDA temperature
Less
Here are the readings of the built-in temperature sensor for the mechanical part of the disk - banks (HDA - Hard Disk Assembly). Information is taken from the built-in temperature sensor, which is one of the magnetic heads, usually the bottom in the bank. The attribute's bit fields contain the current, minimum, and maximum temperatures. Not all programs working with SMART correctly parse these fields, so their testimony should be treated critically.
195 C3 Hardware ECC Recovered
Less
The number of error correction by the hardware of the disk (reading, positioning, transmission via the external interface). On disks with a SATA interface, the value often worsens with an increase in the frequency of the system bus - SATA is very sensitive to overclocking.
196 C4 Reallocation Event Count
Less
The number of reassignment operations. The “raw value” field of the attribute stores the total number of attempts to transfer information from reassigned sectors to the backup area. Both successful and unsuccessful attempts are taken into account.
197 C5 Current Pending Sector Count
Less
The number of sectors that are candidates for replacement. They have not yet been identified as bad, but reading from them is different from reading a stable sector, these are the so-called suspicious or unstable sectors. In case of successful subsequent reading of the sector, it is excluded from the list of candidates. In case of repeated erroneous readings, the drive tries to restore it and performs a remapping operation.
An increase in the value of this attribute may indicate physical degradation of the hard disk.
198 C6 Uncorrectable Sector Count
Less
The number of unadjustable (by disk means) sectors. In the case of an increase in the number of errors, the probability of critical surface defects and / or drive mechanics is high.
199 C7 UltraDMA CRC Error Count
Less
The number of errors that occur when transmitting data via the external interface in UltraDMA mode (packet integrity violation, etc.). The growth of this attribute indicates a bad (wrinkled, twisted) cable and bad contacts. Also, such errors appear during PCI bus overclocking, power failures, strong electromagnetic interference, and sometimes due to the fault of the driver.
Perhaps the reason is a poor-quality loop. To fix, try using a SATA cable without latches, which has a tight connection with the contacts of the disk.
200 C8 Write Error Rate /
Multi-Zone Error Rate
Less
Shows the total number of errors that occurred while recording a sector. Shows the total number of disk write errors. It can serve as an indicator of surface quality and drive mechanics.
201 C9 Soft read error rate
Less
The frequency of occurrence of "software" errors when reading data from disk.

This parameter shows the frequency of errors during read operations from the surface of the disk due to the fault of the software, and not the hardware of the drive.

202 CA Data Address Mark errors
Less
Number of Data Address Mark (DAM) errors (or) vendor-specific.
203 CB Run out cancel
Less
The number of ECC errors.
204 CC Soft ECC correction
Less
Number of ECC errors programmatically corrected.
205 CD Thermal asperity rate (TAR)
Less
Number of thermal asperity errors.
206 CE Flying height The height between the head and the surface of the disc.
207 CF Spin high current
Less
The magnitude of the current during the promotion of the disk.
208 D0 Spin buzz Number of buzz routines to spin up the drive.
209 D1 Offline seek performance Drive's seek performance during offline operations.
220 DC Disk shift
Less
Displacement distance of the disk block relative to the spindle. Mostly due to shock or fall. Unit is unknown. As the attribute increases, the drive quickly becomes inoperative.
221 DD G-Sense Error Rate
Less
The number of errors that occurred due to external loads and shocks. The attribute stores the readings of the built-in shock sensor.
222 DE Loaded hours The time spent by the block of magnetic heads between unloading from the parking area to the working area of ​​the disk and loading the block back into the parking area.
223 Df Load / Unload Retry Count The number of new attempts to unload / load the block of magnetic heads to / from the parking area after an unsuccessful attempt.
224 E0 Load friction
Less
The magnitude of the friction force of the block of magnetic heads when it is unloaded from the parking area.
225 E1 Load cycle count
Less
The number of cycles of moving the block of magnetic heads in the parking area.
226 E2 Load 'in'-time The time during which the drive unloads the magnetic heads from the parking area to the working surface of the disk.
227 E3 Torque amplification count
Less
The number of attempts to compensate for the torque.
228 E4 Power-Off Retract Cycle
Less
The number of repetitions of the automatic parking of the block of magnetic heads as a result of turning off the power.
230 E6 GMR Head Amplitude Amplitude of “jitter” (distance of repetitive movement of a block of magnetic heads).
231 E7 Temperature
Less
Hard drive temperature.
240 F0 Head flying hours The total time the head unit is in the operating position in hours.
250 FA Read error retry rate
Less
The number of errors while reading a hard disk.

Where:

  • More - Larger parameter value is better
  • Less - A lower value is better
  • Critical Parameter - Red String Background