SMART

SMART (Self-Monitoring, Analysis and Reporting Technology) is a technology for evaluating the status of a hard disk by the built-in self-diagnostic equipment, as well as a mechanism for predicting the time of its failure.

History

The first hard drive with a self-diagnostic system was introduced in 1992 by IBM in IBM 9337 disk arrays for AS / 400 servers using IBM 0662 SCSI-2 disks. The technology was called Predictive Failure Analysis (PFA). Several key parameters were measured, their evaluation was performed directly by the disk controller. The result was limited to only one bit: either everything is in order, or the disk could soon fail. Later, Compaq, Seagate, Quantum and Conner developed another technology, called IntelliSafe. It had a common protocol for issuing information about the state of the hard drive, but measured the parameters and their thresholds each company determined independently.

In early 1995, Compaq proposed to standardize the technology. IBM, Seagate, Quantum, Conner and Western Digital (the latter at that time did not yet have a system for monitoring the parameters of the hard drive) supported this idea. The basis was taken technology IntelliSafe. A jointly developed standard was called SMART Standard SMART I provided for monitoring of the main parameters and was launched only after the command.

In the development of SMART II, Hitachi participated, which offered a method of full self-test of the drive (extended self-test), and a function of error logging appeared. SMART III has a feature to detect surface defects and the ability to restore them "transparent" to the user.

Description

SMART monitors the main characteristics of the drive, each of which receives an estimate. Characteristics can be divided into two groups:

1) parameters reflecting the process of natural aging of the hard disk (spindle speed, number of head movements, number of on-off cycles);
2) current parameters of the drive (head height above the disk surface, the number of remapped sectors, track search time and the number of search errors).

Data is stored in a hexadecimal form called raw value, and then converted to a value, a value that symbolizes reliability relative to some reference value. Usually value ranges from 0 to 100 (some attributes have values from 0 to 200 and from 0 to 253).

A high score indicates that there is no change in this parameter or its slow deterioration. Low - about a possible malfunction in the near future.

A value less than the minimum, at which the manufacturer guarantees trouble-free operation of the drive, means the failure of the node.

SMART technology allows:

1) monitoring of state parameters;
2) surface scanning;
3) Surface scanning with automatic replacement of doubtful sectors with reliable ones.

It should be noted that SMART technology allows you to predict the failure of the device as a result of mechanical malfunctions, which is about 60% of the causes of hard drive failure. Predicting the consequences of a surge in voltage or mechanical shock SMART is not capable.

It should be noted that drives can not independently report their status via SMART technology, but there are special programs for this. Thus, the use of SMART technology is impossible without the following two components:

1) software integrated in the drive controller;
2) External software embedded in the host.

Programs that display the status of SMART-attributes work according to the following algorithm:

Verify the availability of support for the SMART technology drive;
Send SMART-table query command;
Getting tables into the application buffer;
Decoding table structures, extracting the attribute number and its numerical value;
Comparison of standardized attribute numbers to their names (sometimes - depending on the type, model or manufacturer, as, for example, in the Victoria program);
Output of numerical values in a form convenient for perception (for example, conversion of hexadecimal values to decimal ones);
Extracting attributes from attribute tables (attributes that characterize the attribute assignment in this drive, for example, "vital" or "counter");
Output of the device's general state based on all tables, values and flags.

SMART Attributes

A table of known SMART attributes looks like this:

No.	Hex	Attribute Name	Description
01	01	Raw Read Error Rate	The frequency of errors when reading data from a disk, the origin of which is due to the hardware of the disk. For all Seagate, Samsung (F1 and newer) and Fujitsu 2.5? This is the number of internal corrections of data held before issuing to the interface, therefore on frighteningly huge numbers one can react calmly.
02	02	Throughput Performance	Total disk performance. If the value of the attribute decreases, then there is a high probability that there are problems with the disk.
03	03	Spin-Up Time	The time for unwrapping a package of disks from idle to operational speed. It increases with the wear of mechanics (increased friction in the bearing, etc.), it can also indicate a poor-quality power supply (for example, a voltage drop at the start of the disk).
04	04	Start / Stop Count	The total number of start-stop cycles of the spindle. The discs of some manufacturers (for example, Seagate) - the counter of the power-saving mode. The raw value field stores the total number of starts / stops of the disk.
05	05	Reallocated Sectors Count	Number of sector reassignments. When a disc detects a read / write error, it marks the sector "reassigned" and transfers the data to a dedicated spare area. That's why on modern hard disks one can not see bad blocks - they are all hidden in the reassigned sectors. This process is called remapping, and the remapped sector is remap. The higher the value, the worse the surface condition of the discs. The raw value field contains the total number of reassigned sectors. An increase in the value of this attribute may indicate a worsening of the state of the surface of the pancakes of the disk.
06	06	Read Channel Margin	The reserve of the reading channel. The purpose of this attribute is not documented. In modern drives is not used.
07	07	Seek Error Rate	Frequency of errors when positioning a block of magnetic heads. The more of them, the worse the state of mechanics and / or the surface of the hard drive. Also, the value of the parameter can be affected by overheating and external vibrations (for example, from neighboring disks in the basket).
08	08	Seek Time Performance	The average performance of the positioning operation by the magnetic heads. If the value of the attribute decreases (slowing the positioning), then the probability of problems with the mechanical part of the drive of the heads is high.
09	09	Power-on Time Count (Power-On Hours)	The number of hours (minutes, seconds - depending on the manufacturer), conducted in the on state. As a threshold value for him, the MTBF is the mean time between failure.
10	0A	Spin-Up Retry Count	The number of repeated attempts to untwist disks to the working speed in case the first attempt was unsuccessful. If the value of the attribute increases, then the probability of a malfunction with the mechanical part is high.
eleven	0B	Recalibration Retries	Number of repetitions of re-calibration requests in case the first attempt was unsuccessful. If the value of the attribute increases, then the probability of problems with the mechanical part is high.
12	0C	Device Power Cycle Count	The number of complete cycles of turning the disk on and off.
13	0D	Soft Read Error Rate	The number of errors in reading, through the fault of the software, which did not lend itself to correction. All errors are not of a mechanical nature and indicate only the wrong markup / interaction with the disk of programs or the operating system.
184	B8	End-to-End error	This attribute - part of the HP SMART IV technology - means that after data transmission through the cache memory, the parity of data between the host and the hard disk does not match.
187	BB	Reported UNC Errors	Errors that could not be recovered using methods of error elimination by hardware.
188	BC	Command Timeout	Number of interrupted operations due to HDD timeout. Usually this attribute value should be zero, and if the value is much higher than zero, then most likely there will be some serious power problems or oxidized cables for data transfer.
190	BE	Airflow Temperature (WDC)	Air temperature inside the hard disk housing. For disks Seagate is calculated by the formula (100 - HDA temperature). For Western Digital - (125-HDA) discs.
191	BF	G-sense error rate	The number of errors that occur as a result of impact loads. The attribute stores the readings of the built-in accelerometer, which records all impacts, jolts, drops and even inaccurate disk installation in the computer case.
192	C0	Power-off retract count	Number of cycles of shutdowns or emergency failures (power on / off of the drive).
193	C1	Load / Unload Cycle	Number of cycles of moving the magnetic head unit to the parking zone / in the working position.
194	C2	HDA temperature	Here, the readings of the built-in thermal sensor for the mechanical part of the disk - banks (HDA - Hard Disk Assembly) are stored. The information is removed from the built-in thermal sensor, which is one of the magnetic heads, usually the lower one in the bank. In the bitfields of the attribute, the current, minimum and maximum temperatures are fixed. Not all programs that work with SMART correctly disassemble these fields, so their indications should be treated critically.
195	C3	Hardware ECC Recovered	The number of errors corrected by the hardware of the disk (reading, positioning, transmission via the external interface). On SATA drives, the value often deteriorates with increasing system bus frequency - SATA is very sensitive to overclocking.
196	C4	Reallocation Event Count	Number of reassignment operations. The "raw value" field of the attribute stores the total number of attempts to transfer information from the reassigned sectors to the spare area. Both successful and unsuccessful attempts are taken into account.
197	C5	Current Pending Sector Count	Number of sectors that are candidates for replacement. They have not yet been identified as bad, but reading from them is different from reading a stable sector, these are the so-called suspicious or unstable sectors. In case of successful subsequent reading of the sector, it is excluded from the number of candidates. In case of repeated erroneous reads, the drive tries to restore it and performs a remapping operation. An increase in the value of this attribute may indicate a physical degradation of the hard drive.
198	C6	Uncorrectable Sector Count	The number of sectors that can not be adjusted (disk means). In the case of an increase in the number of errors, the probability of critical surface defects and / or drive mechanics is high.
199	C7	UltraDMA CRC Error Count	The number of errors that occur when data is transferred over an external interface in UltraDMA mode (packet integrity violations, etc.). The growth of this attribute indicates a bad (mint, twisted) cable and bad contacts. Also, similar errors appear when overclocking the PCI bus, power failures, strong electromagnetic interference, and sometimes through the driver's fault. Probably the reason is in a low-quality trail. For correction, try using a SATA cable without latches, which has a tight connection with the contacts of the disk.
200	C8	Write Error Rate / Multi-Zone Error Rate	Shows the total number of errors that occur when writing a sector. Shows the total number of write errors per disk. It can serve as an indicator of the surface quality and mechanics of the drive.
201	C9	Soft read error rate	The frequency of "software" errors when reading data from a disk. This parameter shows the frequency of errors in the read operations from the surface of the disk through the fault of the software, rather than the hardware of the drive.
202	CA	Data Address Mark errors	Number of Data Address Mark (DAM) errors (or) vendor-specific.
203	CB	Running out cancel	Number of ECC errors.
204	CC	Soft ECC correction	Number of ECC errors corrected programmatically.
205	Cd	Thermal asperity rate (TAR)	Number of thermal asperity errors.
206	CE	Flying height	Height between the head and the surface of the disc.
207	CF	Spin high current	The value of the current strength when the disk is untwisted.
208	D0	Spin buzz	Number of buzz routines to spin up the drive.
209	D1	Offline seek performance	The performance of the search during offline operations (Drive's seek performance during offline operations.)
220	DC	Disk Shift	The distance of the disc block displacement relative to the spindle. Mostly due to impact or fall. The unit of measurement is unknown. With an increase in the attribute, the disk quickly becomes inoperable.
221	DD	G-Sense Error Rate	The number of errors that have occurred due to external loads and shocks. The attribute stores the readings of the built-in shock sensor.
222	DE	Loaded Hours	The time spent by the block of magnetic heads between unloading from the parking area to the working area of the disk and loading the block back into the parking area.
223	DF	Load / Unload Retry Count	Number of new attempts to unload / load the magnetic head unit to / from the parking area after an unsuccessful attempt.
224	E0	Load Friction	The magnitude of the frictional force of the magnetic head unit when it is unloaded from the parking area.
225	E1	Load Cycle Count	Number of cycles of moving the magnetic head unit to the parking area.
226	E2	Load 'In'-time	The time during which the drive unloads the magnetic heads from the parking area to the working surface of the disk.
227	E3	Torque Amplification Count	The number of attempts to compensate for the torque.
228	E4	Power-Off Retract Cycle	The number of repetitions of the automatic parking of the magnetic head unit as a result of power off.
230	E6	GMR Head Amplitude	Amplitude of "jitter" (the distance of a repetitive movement of a block of magnetic heads).
231	E7	Temperature	The temperature of the hard drive.
240	F0	Head flying hours	Total time of the head unit in the working position in the clock.
250	FA	Read error retry rate	The number of errors while reading the hard disk.

Where:

- Higher parameter value is better
- A smaller parameter value is better
Critical parameter - red line background

Liked? Subscribe to RSS news!
You can also support shram.kiev.ua, press:

It will not be superfluous for your friends to learn this information, share their article with them!

Expand / Collapse

Comments

When commenting on, remember that the content and tone of your message can hurt the feelings of real people, show respect and tolerance to your interlocutors even if you do not share their opinion, your behavior in the conditions of freedom of expression and anonymity provided by the Internet, changes Not only virtual, but also the real world. All comments are hidden from the index, spam is controlled.