This page has been robot translated, sorry for typos if any. Original content here.

SMART

S.M.A.R.T.

SMART (from the English. Self-monitoring, analysis and reporting technology - technology of self-monitoring, analysis and reporting) - a technology for assessing the status of a hard disk with built-in self-diagnostic equipment, as well as a mechanism for predicting the time of its failure.

History

The first hard drive with a self-diagnostic system was introduced in 1992 by IBM in IBM 9337 disk arrays for AS / 400 servers using IBM 0662 SCSI-2 disks. The technology was called Predictive Failure Analysis (PFA). Several key parameters were measured, they were evaluated directly by the disk controller. The result was limited to only one bit: either everything is in order, or the disk may soon fail. Later, Compaq, Seagate, Quantum and Conner developed another technology called IntelliSafe. It was a general protocol for issuing information about the status of the hard disk, but the measured parameters and their thresholds each company determined independently.

In early 1995, Compaq offered to standardize technology. IBM, Seagate, Quantum, Conner and Western Digital (the latter at that time did not have a hard disk tracking system) supported this idea. The basis was taken technology IntelliSafe. The jointly developed standard was called SMART. The SMART I standard provided for monitoring the basic parameters and was launched only after the command.

Hitachi participated in the development of SMART II, ​​offering a method of full self-diagnosis of the drive (extended self-test), also appeared error logging function. SMART III has a feature for detecting surface defects and the ability to restore them transparently to the user.

Description

SMART monitors the main characteristics of the drive, each of which receives a rating. Characteristics can be divided into two groups:

  • 1) parameters reflecting the process of natural aging of the hard disk (spindle speed, number of head movements, number of on-off cycles);
  • 2) current drive parameters (height of heads above the disk surface, number of reassigned sectors, track search time and number of search errors).

The data is stored in hexadecimal form, called the raw value (“raw values”), and then converted into a value, a value that symbolizes reliability relative to some reference value. Typically, the value ranges from 0 to 100 (some attributes have values ​​from 0 to 200 and from 0 to 253).

A high score indicates the absence of changes in this parameter or its slow deterioration. Low - a possible failure in the near future.

A value less than the minimum at which the manufacturer guarantees the failure-free operation of the drive means that the node fails.

SMART technology allows you to:

  • 1) monitoring state parameters;
  • 2) surface scanning;
  • 3) surface scanning with automatic replacement of doubtful sectors with reliable ones.

It should be noted that the SMART technology allows you to predict the failure of the device as a result of mechanical faults, which is about 60% of the causes of a hard drive failure. SMART cannot predict the effects of a voltage surge or mechanical shock.

It should be noted that drives cannot independently report their condition using SMART technology, however there are special programs for this. Thus, the use of SMART technology is impossible without the following two components:

  • 1) software embedded in the storage controller;
  • 2) External software embedded in the host.

Programs that display the status of SMART attributes work according to the following algorithm:

  • Check for the presence of SMART drive support technology;
  • Sending a query command SMART-tables;
  • Getting tables to the application buffer;
  • Decoding of table structures, extracting the attribute number and its numerical value;
  • Comparison of standardized attribute numbers to their names (sometimes depending on the type, model or manufacturer, such as in the Victoria program);
  • Outputting numerical values ​​in a form that is easy to read (for example, converting hexadecimal values ​​to decimal values);
  • Extracting attribute flags from tables (attributes characterizing the attribute assignment in a given drive, for example, “vital” or “counter”);
  • Displays the overall status of the device based on all tables, values, and flags.

SMART attributes

The table of known attributes SMART looks like this:

No Hex Attribute name Better if ... Description
01 01 Raw Read Error Rate
Less
The error rate when reading data from the disk, the origin of which is due to the hardware part of the disk.
For all Seagate, Samsung (F1 and newer) drives and Fujitsu 2.5? This is the number of internal corrections of data carried out before the output to the interface; therefore, it is possible to respond calmly to frighteningly huge numbers.
02 02 Throughput Performance
More
Overall disk performance. If the attribute value decreases, then it is likely that there are problems with the disk.
03 03 Spin-up time
Less
Time of promotion of the package of disks from the state of rest to the working speed.
It grows when mechanics wear out (increased friction in the bearing, etc.), may also indicate poor-quality power (for example, a voltage drop at the start of a disc).
04 04 Start / Stop Count The total number of spindle start-stop cycles. Some discs manufacturers (for example, Seagate) - power-on mode counter. The raw value field stores the total number of starts / stops of the disk.
05 05 Reallocated Sectors Count
Less
The number of sector reassignment operations. When a disk detects a read / write error, it marks the sector as “reassigned” and transfers the data to a dedicated backup area. That is why it is impossible to see bad-blocks on modern hard drives - they are all hidden in reassigned sectors. This process is called remapping, and the remapped sector is remap. The larger the value, the worse the surface condition of the disks. The raw value field contains the total number of reassigned sectors.
The increase in the value of this attribute may indicate a deterioration of the surface state of the pancake disc.
06 06 Read Channel Margin Stock channel reading. The purpose of this attribute is not documented. In modern drives is not used.
07 07 Seek Error Rate
Less
Error rate during positioning of the magnetic heads unit. The more of them, the worse the condition of the mechanics and / or the surface of the hard disk. Also, the parameter value can be affected by overheating and external vibrations (for example, from adjacent discs in the basket).
08 08 Seek Time Performance
More
The average performance of the operation of positioning magnetic heads. If the attribute value decreases (slowing down positioning), then the probability of problems with the mechanical part of the actuator is high.
09 09 Power-on Time Count (Power-On Hours)
Less
The number of hours (minutes, seconds - depending on the manufacturer) spent in the on state. The passport time between failures (MTBF - mean time between failure) is chosen as the threshold value for it.
10 0A Spin-Up Retry Count
Less
The number of repeated attempts to spin a disc to its operating speed in case the first attempt was unsuccessful. If the attribute value increases, then there is a high probability of problems with the mechanical part.
eleven 0B Recalibration Retries
Less
The number of retries of recalibration requests in case the first attempt was unsuccessful. If the attribute value increases, then the probability of problems with the mechanical part is high.
12 0C Device Power Cycle Count The number of complete on-off cycles of the disk.
13 0D Soft Read Error Rate
Less
The number of errors in reading, due to the fault of the software that could not be corrected. All errors are not of a mechanical nature and indicate only the wrong markup / interaction with the disk programs or the operating system.
184 B8 End-to-End error
Less
This attribute — part of HP SMART IV technology — means that after data is transferred through the cache, the parity of the data between the host and the hard disk does not match.
187 BB Reported UNC Errors
Less
Errors that could not be recovered using hardware error resolution techniques.
188 BC Command timeout
Less
The number of interrupted operations due to HDD timeout. Usually this attribute value should be zero, and if the value is much higher than zero, then most likely there will be some serious power problems or oxidized data cables.
190 BE Airflow Temperature (WDC)
Less
The temperature of the air inside the hard drive. For Seagate drives, it is calculated using the formula (100 - HDA temperature). For Western Digital drives - (125- HDA).
191 Bf G-sense error rate
Less
The number of errors resulting from shock loads. The attribute stores the readings of the built-in accelerometer, which records all shocks, jolts, falls and even inaccurate installation of the disk in the computer case.
192 C0 Power-off retract count
Less
The number of shutdown cycles or crashes (power on / off drive).
193 C1 Load / Unload Cycle
Less
The number of cycles of movement of the block of magnetic heads in the parking zone / in the working position.
194 C2 HDA temperature
Less
The readings of the built-in thermal sensor for the mechanical part of the disk - banks (HDA - Hard Disk Assembly) are stored here. Information is removed from the built-in thermal sensor, which is one of the magnetic heads, usually the bottom in the bank. The current, minimum and maximum temperatures are fixed in the attribute bit fields. Not all programs that work with SMART correctly parse these fields, so their testimony should be treated critically.
195 C3 Hardware ECC Recovered
Less
The number of error correction hardware disk (reading, positioning, transmission on the external interface). On disks with a SATA interface, the value often worsens as the system bus frequency increases - SATA is very sensitive to overclocking.
196 C4 Reallocation Event Count
Less
The number of reassignment operations. The attribute's “raw value” field stores the total number of attempts to transfer information from the reassigned sectors to the spare area. Both successful and unsuccessful attempts are counted.
197 C5 Current Pending Sector Count
Less
The number of sectors that are candidates for replacement. They have not yet been identified as bad, but reading them is different from reading a stable sector, these are the so-called suspicious or unstable sectors. In case of successful subsequent reading of the sector, it is excluded from the number of candidates. In case of repeated erroneous reads, the drive tries to restore it and performs a remapping operation.
The increase in the value of this attribute may indicate physical degradation of the hard disk.
198 C6 Uncorrectable Sector Count
Less
The number of non-adjustable (disk means) sectors. In the case of an increase in the number of errors, the probability of critical surface defects and / or mechanics of the drive is high.
199 C7 UltraDMA CRC Error Count
Less
The number of errors that occur when data is transmitted over the external interface in UltraDMA mode (packet integrity violations, etc.). The growth of this attribute indicates a bad (crumpled, twisted) cable and bad contacts. Also, such errors appear during PCI bus overclocking, power failures, strong electromagnetic pickups, and sometimes due to the driver.
Perhaps the reason for poor quality plume. To fix it, try using a SATA loop-free cable with a tight connection to the disk contacts.
200 C8 Write Error Rate /
Multi-Zone Error Rate
Less
Shows the total number of errors that occur when writing a sector. Shows the total number of disk write errors. Can serve as an indicator of surface quality and drive mechanics.
201 C9 Soft read error rate
Less
The frequency of occurrence of "software" errors when reading data from the disk.

This parameter shows the frequency of errors in reading operations from the disk surface due to software, rather than the hardware of the drive.

202 CA Data Address Mark errors
Less
Mark (DAM) vendor-specific number of Data Address Marker (DAM).
203 CB Run out cancel
Less
The number of ECC errors.
204 CC Soft ECC correction
Less
The number of ECC errors corrected by software.
205 CD Thermal asperity rate (TAR)
Less
Number of thermal asperity errors.
206 CE Flying height The height between the head and the surface of the disk.
207 CF Spin high current
Less
The magnitude of the current during the promotion of the disc.
208 D0 Spin buzz Number of buzz routines to spin up the drive.
209 D1 Offline seek performance Drive's seek performance during offline operations.
220 DC Disk shift
Less
The offset distance of the disk block relative to the spindle. Mostly arises from a blow or a fall. Unit of measure unknown. As the attribute increases, the disk quickly becomes inoperative.
221 DD G-Sense Error Rate
Less
The number of errors caused by external loads and shocks. The attribute stores the readings of the built-in shock sensor.
222 DE Loaded hours Time spent by the magnetic heads unit between unloading from the parking area to the working area of ​​the disk and loading the unit back to the parking area.
223 Df Load / Unload Retry Count The number of new attempts to unload / load a block of magnetic heads to / from the parking area after an unsuccessful attempt.
224 E0 Load friction
Less
The magnitude of the friction force of the magnetic heads when it is unloaded from the parking area.
225 E1 Load cycle count
Less
The number of cycles to move the block of magnetic heads in the parking area.
226 E2 Load 'in'-time The time it takes for the drive to unload the magnetic heads from the parking area to the working surface of the disc.
227 E3 Torque Amplification Count
Less
The number of attempts to compensate for the torque.
228 E4 Power-Off Retract Cycle
Less
The number of repetitions of automatic parking of a block of magnetic heads as a result of power off.
230 E6 GMR Head Amplitude The amplitude of the "jitter" (the distance of the repetitive movement of the block of magnetic heads).
231 E7 Temperature
Less
The temperature of the hard disk.
240 F0 Head flying hours Total time the head unit is in operating position in hours.
250 FA Read error retry rate
Less
The number of errors while reading a hard disk.

Where:

  • More - A larger parameter value is better.
  • Less - A smaller parameter value is better.
  • Critical parameter - red line background