SMART

S.M.A.R.T.

SMART (Self-Monitoring, Analysis and Reporting Technology) is a technology for evaluating the status of a hard disk by the built-in self-diagnostic equipment, as well as a mechanism for predicting the time of its failure.

History

The first hard drive with a self-diagnostic system was introduced in 1992 by IBM in IBM 9337 disk arrays for AS / 400 servers using IBM 0662 SCSI-2 disks. The technology was called Predictive Failure Analysis (PFA). Several key parameters were measured, their evaluation was performed directly by the disk controller. The result was limited to only one bit: either everything is in order, or the disk could soon fail. Later, Compaq, Seagate, Quantum and Conner developed another technology, called IntelliSafe. It had a common protocol for issuing information about the state of the hard drive, but measured the parameters and their thresholds each company determined independently.

In early 1995, Compaq proposed to standardize the technology. IBM, Seagate, Quantum, Conner and Western Digital (the latter at that time did not yet have a system for monitoring the parameters of the hard drive) supported this idea. The basis was taken technology IntelliSafe. A jointly developed standard was called SMART Standard SMART I provided for monitoring of the main parameters and was launched only after the command.

In the development of SMART II, ​​Hitachi participated, which offered a method of full self-test of the drive (extended self-test), and a function of error logging appeared. SMART III has a feature to detect surface defects and the ability to restore them "transparent" to the user.

Description

SMART monitors the main characteristics of the drive, each of which receives an estimate. Characteristics can be divided into two groups:

  • 1) parameters reflecting the process of natural aging of the hard disk (spindle speed, number of head movements, number of on-off cycles);
  • 2) current parameters of the drive (head height above the disk surface, the number of remapped sectors, track search time and the number of search errors).

Data is stored in a hexadecimal form called raw value, and then converted to a value, a value that symbolizes reliability relative to some reference value. Usually value ranges from 0 to 100 (some attributes have values ​​from 0 to 200 and from 0 to 253).

A high score indicates that there is no change in this parameter or its slow deterioration. Low - about a possible malfunction in the near future.

A value less than the minimum, at which the manufacturer guarantees trouble-free operation of the drive, means the failure of the node.

SMART technology allows:

  • 1) monitoring of state parameters;
  • 2) surface scanning;
  • 3) Surface scanning with automatic replacement of doubtful sectors with reliable ones.

It should be noted that SMART technology allows you to predict the failure of the device as a result of mechanical malfunctions, which is about 60% of the causes of hard drive failure. Predicting the consequences of a surge in voltage or mechanical shock SMART is not capable.

It should be noted that drives can not independently report their status via SMART technology, but there are special programs for this. Thus, the use of SMART technology is impossible without the following two components:

  • 1) software integrated in the drive controller;
  • 2) External software embedded in the host.

Programs that display the status of SMART-attributes work according to the following algorithm:

  • Verify the availability of support for the SMART technology drive;
  • Send SMART-table query command;
  • Getting tables into the application buffer;
  • Decoding table structures, extracting the attribute number and its numerical value;
  • Comparison of standardized attribute numbers to their names (sometimes - depending on the type, model or manufacturer, as, for example, in the Victoria program);
  • Output of numerical values ​​in a form convenient for perception (for example, conversion of hexadecimal values ​​to decimal ones);
  • Extracting attributes from attribute tables (attributes that characterize the attribute assignment in this drive, for example, "vital" or "counter");
  • Output of the device's general state based on all tables, values ​​and flags.

SMART Attributes

A table of known SMART attributes looks like this:

No. Hex Attribute Name It's better if ... Description
01 01 Raw Read Error Rate
Less
The frequency of errors when reading data from a disk, the origin of which is due to the hardware of the disk.
For all Seagate, Samsung (F1 and newer) and Fujitsu 2.5? This is the number of internal corrections of data held before issuing to the interface, therefore on frighteningly huge numbers one can react calmly.
02 02 Throughput Performance
More
Total disk performance. If the value of the attribute decreases, then there is a high probability that there are problems with the disk.
03 03 Spin-Up Time
Less
The time for unwrapping a package of disks from idle to operational speed.
It increases with the wear of mechanics (increased friction in the bearing, etc.), it can also indicate a poor-quality power supply (for example, a voltage drop at the start of the disk).
04 04 Start / Stop Count The total number of start-stop cycles of the spindle. The discs of some manufacturers (for example, Seagate) - the counter of the power-saving mode. The raw value field stores the total number of starts / stops of the disk.
05 05 Reallocated Sectors Count
Less
Number of sector reassignments. When a disc detects a read / write error, it marks the sector "reassigned" and transfers the data to a dedicated spare area. That's why on modern hard disks one can not see bad blocks - they are all hidden in the reassigned sectors. This process is called remapping, and the remapped sector is remap. The higher the value, the worse the surface condition of the discs. The raw value field contains the total number of reassigned sectors.
An increase in the value of this attribute may indicate a worsening of the state of the surface of the pancakes of the disk.
06 06 Read Channel Margin The reserve of the reading channel. The purpose of this attribute is not documented. In modern drives is not used.
07 07 Seek Error Rate
Less
Frequency of errors when positioning a block of magnetic heads. The more of them, the worse the state of mechanics and / or the surface of the hard drive. Also, the value of the parameter can be affected by overheating and external vibrations (for example, from neighboring disks in the basket).
08 08 Seek Time Performance
More
The average performance of the positioning operation by the magnetic heads. If the value of the attribute decreases (slowing the positioning), then the probability of problems with the mechanical part of the drive of the heads is high.
09 09 Power-on Time Count (Power-On Hours)
Less
The number of hours (minutes, seconds - depending on the manufacturer), conducted in the on state. As a threshold value for him, the MTBF is the mean time between failure.
10 0A Spin-Up Retry Count
Less
The number of repeated attempts to untwist disks to the working speed in case the first attempt was unsuccessful. If the value of the attribute increases, then the probability of a malfunction with the mechanical part is high.
eleven 0B Recalibration Retries
Less
Number of repetitions of re-calibration requests in case the first attempt was unsuccessful. If the value of the attribute increases, then the probability of problems with the mechanical part is high.
12 0C Device Power Cycle Count The number of complete cycles of turning the disk on and off.
13 0D Soft Read Error Rate
Less
The number of errors in reading, through the fault of the software, which did not lend itself to correction. All errors are not of a mechanical nature and indicate only the wrong markup / interaction with the disk of programs or the operating system.
184 B8 End-to-End error
Less
This attribute - part of the HP SMART IV technology - means that after data transmission through the cache memory, the parity of data between the host and the hard disk does not match.
187 BB Reported UNC Errors
Less
Errors that could not be recovered using methods of error elimination by hardware.
188 BC Command Timeout
Less
Number of interrupted operations due to HDD timeout. Usually this attribute value should be zero, and if the value is much higher than zero, then most likely there will be some serious power problems or oxidized cables for data transfer.
190 BE Airflow Temperature (WDC)
Less
Air temperature inside the hard disk housing. For disks Seagate is calculated by the formula (100 - HDA temperature). For Western Digital - (125-HDA) discs.
191 BF G-sense error rate
Less
The number of errors that occur as a result of impact loads. The attribute stores the readings of the built-in accelerometer, which records all impacts, jolts, drops and even inaccurate disk installation in the computer case.
192 C0 Power-off retract count
Less
Number of cycles of shutdowns or emergency failures (power on / off of the drive).
193 C1 Load / Unload Cycle
Less
Number of cycles of moving the magnetic head unit to the parking zone / in the working position.
194 C2 HDA temperature
Less
Here, the readings of the built-in thermal sensor for the mechanical part of the disk - banks (HDA - Hard Disk Assembly) are stored. The information is removed from the built-in thermal sensor, which is one of the magnetic heads, usually the lower one in the bank. In the bitfields of the attribute, the current, minimum and maximum temperatures are fixed. Not all programs that work with SMART correctly disassemble these fields, so their indications should be treated critically.
195 C3 Hardware ECC Recovered
Less
The number of errors corrected by the hardware of the disk (reading, positioning, transmission via the external interface). On SATA drives, the value often deteriorates with increasing system bus frequency - SATA is very sensitive to overclocking.
196 C4 Reallocation Event Count
Less
Number of reassignment operations. The "raw value" field of the attribute stores the total number of attempts to transfer information from the reassigned sectors to the spare area. Both successful and unsuccessful attempts are taken into account.
197 C5 Current Pending Sector Count
Less
Number of sectors that are candidates for replacement. They have not yet been identified as bad, but reading from them is different from reading a stable sector, these are the so-called suspicious or unstable sectors. In case of successful subsequent reading of the sector, it is excluded from the number of candidates. In case of repeated erroneous reads, the drive tries to restore it and performs a remapping operation.
An increase in the value of this attribute may indicate a physical degradation of the hard drive.
198 C6 Uncorrectable Sector Count
Less
The number of sectors that can not be adjusted (disk means). In the case of an increase in the number of errors, the probability of critical surface defects and / or drive mechanics is high.
199 C7 UltraDMA CRC Error Count
Less
The number of errors that occur when data is transferred over an external interface in UltraDMA mode (packet integrity violations, etc.). The growth of this attribute indicates a bad (mint, twisted) cable and bad contacts. Also, similar errors appear when overclocking the PCI bus, power failures, strong electromagnetic interference, and sometimes through the driver's fault.
Probably the reason is in a low-quality trail. For correction, try using a SATA cable without latches, which has a tight connection with the contacts of the disk.
200 C8 Write Error Rate /
Multi-Zone Error Rate
Less
Shows the total number of errors that occur when writing a sector. Shows the total number of write errors per disk. It can serve as an indicator of the surface quality and mechanics of the drive.
201 C9 Soft read error rate
Less
The frequency of "software" errors when reading data from a disk.

This parameter shows the frequency of errors in the read operations from the surface of the disk through the fault of the software, rather than the hardware of the drive.

202 CA Data Address Mark errors
Less
Number of Data Address Mark (DAM) errors (or) vendor-specific.
203 CB Running out cancel
Less
Number of ECC errors.
204 CC Soft ECC correction
Less
Number of ECC errors corrected programmatically.
205 Cd Thermal asperity rate (TAR)
Less
Number of thermal asperity errors.
206 CE Flying height Height between the head and the surface of the disc.
207 CF Spin high current
Less
The value of the current strength when the disk is untwisted.
208 D0 Spin buzz Number of buzz routines to spin up the drive.
209 D1 Offline seek performance The performance of the search during offline operations (Drive's seek performance during offline operations.)
220 DC Disk Shift
Less
The distance of the disc block displacement relative to the spindle. Mostly due to impact or fall. The unit of measurement is unknown. With an increase in the attribute, the disk quickly becomes inoperable.
221 DD G-Sense Error Rate
Less
The number of errors that have occurred due to external loads and shocks. The attribute stores the readings of the built-in shock sensor.
222 DE Loaded Hours The time spent by the block of magnetic heads between unloading from the parking area to the working area of ​​the disk and loading the block back into the parking area.
223 DF Load / Unload Retry Count Number of new attempts to unload / load the magnetic head unit to / from the parking area after an unsuccessful attempt.
224 E0 Load Friction
Less
The magnitude of the frictional force of the magnetic head unit when it is unloaded from the parking area.
225 E1 Load Cycle Count
Less
Number of cycles of moving the magnetic head unit to the parking area.
226 E2 Load 'In'-time The time during which the drive unloads the magnetic heads from the parking area to the working surface of the disk.
227 E3 Torque Amplification Count
Less
The number of attempts to compensate for the torque.
228 E4 Power-Off Retract Cycle
Less
The number of repetitions of the automatic parking of the magnetic head unit as a result of power off.
230 E6 GMR Head Amplitude Amplitude of "jitter" (the distance of a repetitive movement of a block of magnetic heads).
231 E7 Temperature
Less
The temperature of the hard drive.
240 F0 Head flying hours Total time of the head unit in the working position in the clock.
250 FA Read error retry rate
Less
The number of errors while reading the hard disk.

Where:

  • More - Higher parameter value is better
  • Less - A smaller parameter value is better
  • Critical parameter - red line background