I run a homelab with a dozen VLANs, 20+ monitored hosts, and a desktop workstation that doubles as my daily driver for development and gaming. Over the past few months I’ve hit four separate USB bugs on Linux that each took hours to diagnose — and they’re all related to how the Linux kernel handles USB devices. Here’s what happened, what I learned, and the fixes.
Bug 1: The Phantom I/O Wait — A USB Card Reader That Corrupts Kernel Accounting
I noticed my system was reporting 40-60% iowait in top and vmstat, with load averages spiking to 4-5 on an 8-core machine. My NVMe drive should be handling anything I throw at it. Something was wrong.
Running iotop showed zero actual disk I/O. Not a single byte being read or written. Yet the kernel insisted that processes were blocked waiting on I/O.
$ vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 4 0 7810420 1361964 3957988 0 0 0 0 8303 23388 17 2 61 21 0
0 4 0 7869484 1361964 3958112 0 0 0 0 12885 18690 13 3 50 35 0The b=4 column means 4 processes in uninterruptible sleep (D state). But when I checked:
$ ps -eo pid,state,comm | awk '$2 == "D"'
(nothing)Zero processes in D state. The kernel’s procs_blocked counter in /proc/stat was stuck at 4, permanently inflating the iowait percentage. Every monitoring tool that reads /proc/stat — top, vmstat, htop, Grafana node_exporter — was showing phantom I/O wait.
The Culprit
Initially I blamed a Genesys Logic USB card reader (05e3:0751) connected through a VIA Labs USB hub (a KVM-style switch with two upstream ports and a button to toggle between hosts). But after extensive testing, the leak turned out to have two sources: the USB card reader AND four empty AHCI/SATA ports (ata3-6) on the motherboard. Both go through the same boot-time async SCSI scan path, and both leak the nr_iowait counter.
During boot, each synchronous SCSI probe command goes through blk_execute_rq() → io_schedule_timeout(), which sets current->in_iowait = 1 and increments the kernel’s nr_iowait counter. The matching decrement is missed during the boot-time async scan — leaking one count per command. The counter is stuck until reboot.
Critically, unbinding and rebinding the USB card reader after boot triggers the exact same code path but does not leak — proving the bug is specific to the early boot async scanning environment, not the probe logic itself.
What I Tried
- Kernel upgrade from 6.8 to 6.17 (Ubuntu HWE) — leaked count dropped from 4-6 to 4, partial improvement
- Booting with an SD card inserted — still leaked
- Booting without the USB hub — clean
b=0, but the counter leaked as soon as I plugged the hub back in usb-storage.quirks=05e3:0751:i— card reader properly ignored (confirmed in dmesg), butprocs_blockedstill 4 — the AHCI empty ports also leakscsi_mod.scan=sync— no effectusb-storage.quirks=05e3:0751:u(ignore UAS) — no effect, device doesn’t support UAS anywayusb-storage.delay_use=10— no effect, the delay just postpones the same buggy probe pathlibata.force=3:disable,4:disable,5:disable,6:disable— droppedprocs_blockedfrom 4 to 2, confirming AHCI ports as a leak source
Update: Root Cause Identified via ftrace
Using ftrace with stack traces, I traced the exact code path that leaks the counter. Every SCSI probe command goes through:
io_schedule_timeout
← __wait_for_common
← wait_for_completion_io_timeout
← blk_execute_rq
← scsi_execute_cmd
← scsi_probe_lun
← do_scan_async
← async_run_entry_fnThe io_schedule_timeout() call sets current->in_iowait = 1 and increments the kernel’s nr_iowait counter. During normal operation, the matching decrement happens when the IO completes. But during the boot-time async SCSI scan, the decrement is missed — leaking one count per probe command.
The critical evidence: unbinding and rebinding the USB card reader after boot triggers the exact same code path but does NOT leak the counter. This proves the bug is specific to the boot-time async scanning environment, not the probe logic itself.
Update: Two leak sources identified. Further testing revealed that ignoring the card reader (quirks=:i) still left procs_blocked=4 — the four empty AHCI/SATA ports on the motherboard also leak through the same async scan path. Adding libata.force=3:disable,4:disable,5:disable,6:disable dropped the count from 4 to 2, confirming two independent sources. Both the USB and AHCI SCSI probe paths share the same buggy boot-time io_schedule_timeout accounting.
I’ve filed this as Ubuntu bug #2146707 with full ftrace data and a suggested diagnostic kernel patch.
The Fix
There isn’t a clean one yet. The procs_blocked counter leak is a kernel bug that persists from at least 6.8 through 6.17. Your options are:
Option 1: Reduce the leak with kernel parameters:
# Edit /etc/default/grub, add to GRUB_CMDLINE_LINUX_DEFAULT:
# usb-storage.quirks=05e3:0751:i (ignores the card reader)
# libata.force=3:disable,4:disable,... (reduces AHCI leak)
# Then regenerate GRUB:
sudo update-grubThis won’t fully eliminate the leak (in my testing it dropped from 4 to 2), but it helps. Disabling unused SATA ports in BIOS may also help.
Option 2: Accept the cosmetic iowait — iotop is your source of truth for actual I/O activity. The phantom counter doesn’t affect real performance.
Bonus: Re-enabling the Card Reader On Demand
Since the warm reprobe (after boot) doesn’t leak the counter, you can safely enable the card reader when you need it. The usb-storage.quirks parameter is writable at runtime:
#!/bin/bash
# card-reader.sh — toggle Genesys Logic card reader on/off
# Boot uses quirks=:i to prevent the phantom iowait bug.
# Warm reprobe after boot is safe — does NOT leak the counter.
VENDOR="05e3"
PRODUCT="0751"
find_device() {
for dev in /sys/bus/usb/devices/*; do
[ "$(cat "$dev/idVendor" 2>/dev/null)" = "$VENDOR" ] &&
[ "$(cat "$dev/idProduct" 2>/dev/null)" = "$PRODUCT" ] &&
basename "$dev" && return 0
done
return 1
}
case "$1" in
on)
DEV=$(find_device) || { echo "Card reader not found"; exit 1; }
# Clear the ignore quirk
echo "${VENDOR}:${PRODUCT}:" | sudo tee /sys/module/usb_storage/parameters/quirks
# Rebind the driver
echo "${DEV}:1.0" | sudo tee /sys/bus/usb/drivers/usb-storage/bind
echo "Card reader enabled"
;;
off)
DEV=$(find_device) || { echo "Card reader not found"; exit 1; }
echo "${DEV}:1.0" | sudo tee /sys/bus/usb/drivers/usb-storage/unbind
echo "${VENDOR}:${PRODUCT}:i" | sudo tee /sys/module/usb_storage/parameters/quirks
echo "Card reader disabled"
;;
*) echo "Usage: $0 {on|off}" ;;
esacHow to Reproduce the ftrace Diagnosis
If you’re seeing phantom iowait and suspect a USB mass storage device, here’s how to trace it. As root:
# 1. Check for the phantom counter
grep procs_blocked /proc/stat # nonzero?
ps -eo pid,state,comm | awk '$2 == "D"' # no D-state processes?
# 2. Set up ftrace
cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo > trace
echo function > current_tracer
echo io_schedule >> set_ftrace_filter
echo io_schedule_timeout >> set_ftrace_filter
echo blk_execute_rq >> set_ftrace_filter
echo scsi_execute_cmd >> set_ftrace_filter
echo 1 > options/func_stack_trace
echo 1 > tracing_on
# 3. Trigger a reprobe of the suspect device
echo <interface> > /sys/bus/usb/drivers/usb-storage/unbind
sleep 3
echo <interface> > /sys/bus/usb/drivers/usb-storage/bind
sleep 5
# 4. Stop and check
echo 0 > tracing_on
grep procs_blocked /proc/stat # did it increase?
cat trace > /tmp/scsi_trace.txt # save for analysisIf procs_blocked increases after the reprobe, the trace will show you exactly which SCSI commands are leaking the counter. If it stays the same (as it did for me), that narrows the bug to the boot-time async scan path.
The lesson: If your Linux system shows high iowait but iotop shows zero disk activity, check for USB mass storage devices. The SCSI probe path can permanently corrupt the kernel’s I/O accounting during boot.
Bug 2: The Camera That Kills All USB — Razer Kiyo Pro xHCI Death Spiral
My Razer Kiyo Pro webcam (1532:0e05, firmware 8.21) has a much more dramatic failure mode: it can crash the entire xHCI host controller, disconnecting every USB device on the bus — keyboard, mouse, audio interface, everything. The only recovery is a hard reboot.
The failure has two independent triggers, and the kernel’s built-in error recovery makes both of them worse.
Trigger 1: Link Power Management Resume Failure
The Kiyo Pro fails to reinitialize after USB Link Power Management (LPM) transitions. When the kernel tries to wake the camera from a low-power state, it produces EPIPE (-32) errors on UVC SET_CUR operations. The stalled endpoint triggers an xHCI stop-endpoint timeout, and the kernel declares the entire controller dead.
Trigger 2: Rapid Control Transfer Overflow
Approximately 25 rapid consecutive UVC SET_CUR operations overwhelm the camera’s firmware. The standard UVC error-recovery query (GET_CUR after EPIPE) amplifies the failure by sending a second transfer to the already-stalling device. The kernel detects the fault, resets the controller, the reset triggers another fault, and the system enters a death spiral.
Testing confirmed that disabling LPM alone is insufficient — a stress test with NO_LPM active still caused delayed controller death 13 minutes later via TRB warning escalation.
Update: Fixed in Kernel 6.17
After installing the HWE (Hardware Enablement) kernel on Ubuntu 24.04 LTS (sudo apt install linux-generic-hwe-24.04), which brought me to 6.17.0-19-generic, I retested with dynamic debug enabled for xhci_hcd and usbcore, and without any patches or workarounds applied — no udev quirks, no LPM disable, no control throttle. Completely stock kernel.
The stress test now passes 50/50 rounds with 0ms delay. On 6.8, the same test crashed consistently around round 25. The xHCI error handling improvements between 6.8 and 6.17 prevent the cascade to hc_died().
The kernel logs show two things happening under the hood:
# The EPIPE probe error still occurs at device init — but no longer cascades
uvcvideo 2-3.4:1.1: Failed to set UVC probe control : -32 (exp. 26).
# The kernel dynamically disables U1 LPM when the device responds slowly
usb 2-3.4: Hub-initiated U1 disabled due to long timeout 16800usThe kernel is now doing at runtime what my USB_QUIRK_NO_LPM patch was trying to do statically — detecting that the device can’t handle LPM transitions and disabling them automatically. Combined with the improved xHCI error recovery, the death spiral no longer occurs.
I’ve reported these findings to the kernel mailing list. The patch series may no longer be needed for current kernels, though the firmware bug itself (version 1.5.0.1, the latest from Razer) remains unfixed.
Workarounds for Older Kernels (6.8 and Earlier)
If you’re stuck on an older kernel, you’ll need multiple layers of defense:
- Kernel module quirk —
snd-usb-audioquirk flags to skip sample rate readback and defer audio interface setup:options snd-usb-audio quirk_flags=0x4000001 - Kernel patches — Three patches submitted to
[email protected]:USB_QUIRK_NO_LPMfor device 1532:0e05- A new
UVC_QUIRK_CTRL_THROTTLEthat rate-limitsSET_CURtransfers (50ms minimum interval) and skips error-code queries afterEPIPE - Razer Kiyo Pro device entry combining throttle, disable-autosuspend, and no-reset-resume quirks
- Udev rule — Sets
avoid_reset_quirk=1to prevent USB errors from cascading to the xHCI controller - Userspace watchdog — A systemd service that monitors kernel logs for xHCI fatal errors and performs escalating recovery: rebind the port, then reset the controller, then reload the driver. If all three fail, it stops — no retry loops, no death spirals.
Quick Runtime Workaround (Older Kernels)
If you have a Razer Kiyo Pro on kernel 6.8 or earlier and just want it to stop crashing your USB bus:
# Disable LPM for the Kiyo at runtime
echo "1532:0e05:k" | sudo tee /sys/module/usbcore/parameters/quirks
# Make it permanent
echo 'options usbcore quirks=1532:0e05:k' | sudo tee /etc/modprobe.d/razer-kiyo-usb.conf
sudo update-initramfs -uThis only addresses the LPM trigger. Full protection on older kernels requires the CTRL_THROTTLE kernel patch. The best fix is upgrading to kernel 6.17 or later.
Bug 3: The Initial Discovery — Tracker Meets a Slow USB Drive
The investigation that led me down this rabbit hole started with a much simpler problem. I plugged in an old 128GB LITEON SATA SSD in a USB enclosure (Realtek RTL9201 bridge) to grab some files. Within minutes, my system ground to a halt — real iowait this time, 50-60%, with the NVMe feeling sluggish.
GNOME’s tracker-miner-fs-3 had automatically started indexing the drive. The old SSD contained a full Ubuntu installation — hundreds of thousands of files under /usr, /lib, /snap. Tracker was hammering it with random reads, and the slow SATA-over-USB interface was bottlenecking everything.
Killing tracker and unmounting the drive should have fixed it. But when I ran udisksctl unmount, the USB enclosure physically disconnected mid-cache-sync:
sd 6:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OKThis left 6 SCSI error handler threads stuck in uninterruptible sleep — and that’s when the phantom procs_blocked counter leak began. The threads eventually recovered, but the kernel’s accounting never decremented. This is the same class of bug as the card reader issue, just triggered through a different path (disconnect during cache flush vs. probe of removable media).
Bug 4: The Webcam Mic That Vanishes — Logitech StreamCam PipeWire Conflict
The Logitech StreamCam (046d:0893) is a USB 3.0 webcam that exposes both UVC video and UAC audio interfaces. On Ubuntu with Wayland and PipeWire, it mostly works — until you try to use it for streaming.
The Problem
After switching from X11 to Wayland with PipeWire (for screen capture in OBS), the StreamCam mic worked fine in veadotube. But the moment I added a PipeWire screen capture source in OBS, the mic stopped working in both OBS and veadotube. WirePlumber logged:
Failed to call Lookup: GDBus.Error:org.freedesktop.portal.Error.NotFound: No entry for cameraThe issue is a conflict between PipeWire’s camera portal and the audio routing. When OBS requests screen capture through the PipeWire portal, WirePlumber’s session management interferes with the existing audio node connections to the StreamCam. The mic’s PipeWire node gets orphaned — it still exists, but no application can connect to it.
Additional Issues
The StreamCam also exposed two other quirks:
- USB reset permission denied —
usbreset "Logitech StreamCam"fails as a normal user. The fallback is toggling the sysfsauthorizedflag to force a re-enumeration:echo 0 > /sys/bus/usb/devices/*/authorized && echo 1 > /sys/bus/usb/devices/*/authorized - Wrong V4L2 node in OBS — The StreamCam exposes two video nodes:
/dev/video0(main capture) and/dev/video1(metadata). OBS can pick the wrong one, resulting in no video. Always use/dev/video0.
The Fix
Restart the entire PipeWire stack to clear stale node connections:
# Restart PipeWire stack (fixes mic routing)
systemctl --user restart pipewire pipewire-pulse wireplumber
# If that doesn't work, reset the USB device first
pkexec usbreset "Logitech StreamCam"
sleep 1
systemctl --user restart pipewire pipewire-pulse wireplumberI wrote a fix-mic.sh script that automates this: it tests if the mic is working, and if not, resets the USB device and restarts PipeWire. It runs automatically as part of my streaming startup script.
The Common Thread
All four bugs live in the space where USB hardware meets Linux’s kernel subsystems — SCSI, xHCI, UVC, UAC, and PipeWire. Each layer has its own error handling and recovery logic, and when a USB device misbehaves, these layers can interact in unexpected ways.
The specific issues:
- SCSI probe accounting — The scheduler’s iowait counter can be permanently corrupted during boot-time async SCSI scanning — affects both USB mass storage and AHCI/SATA devices (bug #2146707)
- xHCI error recovery — The kernel’s “fix it by resetting everything” approach can create death spirals with devices that have buggy firmware
- USB disconnect handling — Cache sync operations on devices that physically disappear leave orphaned kernel accounting state
- PipeWire session conflicts — USB devices that expose multiple interfaces (video + audio) can have their audio routing broken when PipeWire’s portal system manages video capture separately
Recommendations
If you’re running Linux with USB peripherals (especially on a workstation or homelab):
- Install
iotop— It reads per-process I/O accounting, which is more accurate than/proc/stat. Whentopshows high iowait butiotopshows zero, you know it’s a phantom counter. - Check
usb-storagequirks — If you have USB card readers or enclosures causing issues, theusb-storage.quirksboot parameter is your friend. - Disable LPM for problem cameras (older kernels) — On kernels before 6.17, many USB cameras have firmware bugs exposed by Linux’s aggressive power management. The
usbcore quirksparameter can disable LPM per-device. On 6.17+, the kernel handles this automatically. - Keep your kernel updated — Kernel 6.17 fixed the xHCI death spiral that affected the Razer Kiyo Pro and likely other devices with buggy firmware. The xHCI error handling no longer escalates to killing the entire controller, and LPM is dynamically disabled for slow devices. On Ubuntu 24.04 LTS, you need to install the HWE (Hardware Enablement) kernel to get 6.17:
sudo apt install linux-generic-hwe-24.04. If you’re on the stock 6.8 LTS kernel, upgrading is the single best fix. - Use
udisksctl power-offbefore physically unplugging USB drives — this prevents the cache sync race condition that can leak kernel accounting state. - Restart PipeWire after USB webcam issues — If your webcam’s mic stops working (especially after adding screen capture sources in OBS), run
systemctl --user restart pipewire pipewire-pulse wireplumber. PipeWire’s portal system can orphan audio nodes when managing video and audio from the same USB device.
The Linux USB subsystem handles thousands of device combinations, and most of them work flawlessly. But when they don’t, the failure modes can be subtle and deeply confusing — phantom I/O wait, cascading controller resets, kernel counters that lie to every monitoring tool on your system, and audio routing that silently breaks. Understanding the layers underneath USB devices — SCSI, xHCI, UVC, PipeWire — is the key to diagnosing these issues.
All of the patches and scripts mentioned here are available on GitHub: kiyo-xhci-fix (Razer Kiyo Pro patches and watchdog) and streamcam-fixes (Logitech StreamCam mic fix).
