Limit SATA Speed

Recently we got a disk write problem during our service running.

The dmesg log is as following:

[08:32:30 2021] NET: Registered protocol family 38
[08:32:30 2021] EXT4-fs (dm-0): warning: mounting fs with errors, running e2fsck is recommended
[08:32:30 2021] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[08:32:30 2021] ata2.00: exception Emask 0x10 SAct 0x2 SErr 0x800000 action 0x6 frozen
[08:32:30 2021] ata2.00: irq_stat 0x08000000, interface fatal error
[08:32:30 2021] ata2: SError: { LinkSeq }
[08:32:30 2021] ata2.00: failed command: READ FPDMA QUEUED
[08:32:30 2021] ata2.00: cmd 60/08:08:78:19:c1/00:00:12:00:00/40 tag 1 ncq dma 4096 in
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:32:30 2021] ata2.00: status: { DRDY }
[08:32:30 2021] ata2: hard resetting link
[08:32:31 2021] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[08:32:31 2021] ata2.00: configured for UDMA/133
[08:32:31 2021] ata2: EH complete
[08:32:31 2021] ata2.00: Enabling discard_zeroes_data
[08:33:08 2021] ata2.00: exception Emask 0x0 SAct 0xc0000 SErr 0x400001 action 0x6 frozen
[08:33:08 2021] ata2: SError: { RecovData Handshk }
[08:33:08 2021] ata2.00: failed command: READ FPDMA QUEUED
[08:33:08 2021] ata2.00: cmd 60/08:90:00:1d:c1/00:00:12:00:00/40 tag 18 ncq dma 4096 in
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[08:33:08 2021] ata2.00: status: { DRDY }
[08:33:08 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:08 2021] ata2.00: cmd 61/08:98:00:18:c4/00:00:6f:00:00/40 tag 19 ncq dma 4096 out
                         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[08:33:08 2021] ata2.00: status: { DRDY }
[08:33:08 2021] ata2: hard resetting link
[08:33:08 2021] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[08:33:08 2021] ata2.00: configured for UDMA/133
[08:33:08 2021] ata2.00: device reported invalid CHS sector 0
[08:33:08 2021] ata2: EH complete
[08:33:08 2021] ata2.00: Enabling discard_zeroes_data
[08:33:08 2021] ata2.00: exception Emask 0x10 SAct 0xc00000 SErr 0x400100 action 0x6 frozen
[08:33:08 2021] ata2.00: irq_stat 0x08000000, interface fatal error
[08:33:08 2021] ata2: SError: { UnrecovData Handshk }
[08:33:08 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:08 2021] ata2.00: cmd 61/08:b0:00:18:c4/00:00:6f:00:00/40 tag 22 ncq dma 4096 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:08 2021] ata2.00: status: { DRDY }
[08:33:08 2021] ata2.00: failed command: READ FPDMA QUEUED
[08:33:08 2021] ata2.00: cmd 60/08:b8:00:1d:c1/00:00:12:00:00/40 tag 23 ncq dma 4096 in
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:08 2021] ata2.00: status: { DRDY }
[08:33:08 2021] ata2: hard resetting link
[08:33:08 2021] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[08:33:08 2021] ata2.00: configured for UDMA/133
[08:33:08 2021] ata2: EH complete
[08:33:08 2021] ata2.00: Enabling discard_zeroes_data
[08:33:08 2021] ata2: limiting SATA link speed to 1.5 Gbps
[08:33:08 2021] ata2.00: exception Emask 0x10 SAct 0x3fc SErr 0x400100 action 0x6 frozen
[08:33:08 2021] ata2.00: irq_stat 0x08000000, interface fatal error
[08:33:08 2021] ata2: SError: { UnrecovData Handshk }
[08:33:08 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:08 2021] ata2.00: cmd 61/30:10:10:18:c4/00:00:6f:00:00/40 tag 2 ncq dma 24576 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:08 2021] ata2.00: status: { DRDY }
[08:33:08 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:08 2021] ata2.00: cmd 61/18:18:40:18:c4/00:00:6f:00:00/40 tag 3 ncq dma 12288 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:08 2021] ata2.00: status: { DRDY }
[08:33:08 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:08 2021] ata2.00: cmd 61/08:20:60:18:c4/00:00:6f:00:00/40 tag 4 ncq dma 4096 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:08 2021] ata2.00: status: { DRDY }
[08:33:08 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:09 2021] ata2.00: cmd 61/08:28:68:18:c4/00:00:6f:00:00/40 tag 5 ncq dma 4096 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:09 2021] ata2.00: status: { DRDY }
[08:33:09 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:09 2021] ata2.00: cmd 61/08:30:58:18:c4/00:00:6f:00:00/40 tag 6 ncq dma 4096 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:09 2021] ata2.00: status: { DRDY }
[08:33:09 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:09 2021] ata2.00: cmd 61/38:38:70:18:c4/00:00:6f:00:00/40 tag 7 ncq dma 28672 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:09 2021] ata2.00: status: { DRDY }
[08:33:09 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:09 2021] ata2.00: cmd 61/08:40:a8:18:c4/00:00:6f:00:00/40 tag 8 ncq dma 4096 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:09 2021] ata2.00: status: { DRDY }
[08:33:09 2021] ata2.00: failed command: WRITE FPDMA QUEUED
[08:33:09 2021] ata2.00: cmd 61/20:48:b8:18:c4/00:00:6f:00:00/40 tag 9 ncq dma 16384 out
                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[08:33:09 2021] ata2.00: status: { DRDY }
[08:33:09 2021] ata2: hard resetting link
[08:33:09 2021] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[08:33:09 2021] ata2.00: configured for UDMA/133
[08:33:09 2021] ata2: EH complete
[08:33:09 2021] ata2.00: Enabling discard_zeroes_data

The default speed for SATA is 6.0 Gbps, but during the device running, something hardware problem happens, and the original speed is not met.

After several handshakes, the speed is limited to 1.5 Gbps.

The whole procedure is normal for disk problem, but it takes 39 seconds(from 08:32:30 to 08:33:09), and during this time, the disk is blocked and programs can't write data to the disk.

It certainly is a hardware problem, maybe caused by some dust in the hard disk interface or due to violent vibration, but how can we mitigate this problem in the system level?

We checked the normal write speed of the disk and found the lowest speed(1.5Gbps) is enough for our usage. So the simple way is the limit the SATA speed at this speed to reduce the handshake times when hardware problem happends.

To implement this limit we can add libata.force option to kernel:

GRUB_CMDLINE_LINUX="libata.force=1.5"

#sata #linux #disk