The USB Bug Trilogy: Phantom I/O, Camera Crashes, and Kernel Accounting Lies

I run a homelab with a dozen VLANs, 20+ monitored hosts, and a desktop workstation that doubles as my daily driver for development and gaming. Over the past few months I’ve hit four separate USB bugs on Linux that each took hours to diagnose — and they’re all related to how the Linux kernel handles USB devices. Here’s what happened, what I learned, and the fixes.

Bug 1: The Phantom I/O Wait — A USB Card Reader That Corrupts Kernel Accounting

I noticed my system was reporting 40-60% iowait in top and vmstat, with load averages spiking to 4-5 on an 8-core machine. My NVMe drive should be handling anything I throw at it. Something was wrong.

Running iotop showed zero actual disk I/O. Not a single byte being read or written. Yet the kernel insisted that processes were blocked waiting on I/O.

$ vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  4      0 7810420 1361964 3957988    0    0     0     0  8303 23388 17  2 61 21  0
 0  4      0 7869484 1361964 3958112    0    0     0     0 12885 18690 13  3 50 35  0

The b=4 column means 4 processes in uninterruptible sleep (D state). But when I checked:

$ ps -eo pid,state,comm | awk '$2 == "D"'
(nothing)

Zero processes in D state. The kernel’s procs_blocked counter in /proc/stat was stuck at 4, permanently inflating the iowait percentage. Every monitoring tool that reads /proc/stattop, vmstat, htop, Grafana node_exporter — was showing phantom I/O wait.

The Culprit

Initially I blamed a Genesys Logic USB card reader (05e3:0751) connected through a VIA Labs USB hub (a KVM-style switch with two upstream ports and a button to toggle between hosts). But after extensive testing, the leak turned out to have two sources: the USB card reader AND four empty AHCI/SATA ports (ata3-6) on the motherboard. Both go through the same boot-time async SCSI scan path, and both leak the nr_iowait counter.

During boot, each synchronous SCSI probe command goes through blk_execute_rq()io_schedule_timeout(), which sets current->in_iowait = 1 and increments the kernel’s nr_iowait counter. The matching decrement is missed during the boot-time async scan — leaking one count per command. The counter is stuck until reboot.

Critically, unbinding and rebinding the USB card reader after boot triggers the exact same code path but does not leak — proving the bug is specific to the early boot async scanning environment, not the probe logic itself.

What I Tried

  • Kernel upgrade from 6.8 to 6.17 (Ubuntu HWE) — leaked count dropped from 4-6 to 4, partial improvement
  • Booting with an SD card inserted — still leaked
  • Booting without the USB hub — clean b=0, but the counter leaked as soon as I plugged the hub back in
  • usb-storage.quirks=05e3:0751:i — card reader properly ignored (confirmed in dmesg), but procs_blocked still 4 — the AHCI empty ports also leak
  • scsi_mod.scan=sync — no effect
  • usb-storage.quirks=05e3:0751:u (ignore UAS) — no effect, device doesn’t support UAS anyway
  • usb-storage.delay_use=10 — no effect, the delay just postpones the same buggy probe path
  • libata.force=3:disable,4:disable,5:disable,6:disable — dropped procs_blocked from 4 to 2, confirming AHCI ports as a leak source

Update: Root Cause Identified via ftrace

Using ftrace with stack traces, I traced the exact code path that leaks the counter. Every SCSI probe command goes through:

io_schedule_timeout
 ← __wait_for_common
  ← wait_for_completion_io_timeout
   ← blk_execute_rq
    ← scsi_execute_cmd
     ← scsi_probe_lun
      ← do_scan_async
       ← async_run_entry_fn

The io_schedule_timeout() call sets current->in_iowait = 1 and increments the kernel’s nr_iowait counter. During normal operation, the matching decrement happens when the IO completes. But during the boot-time async SCSI scan, the decrement is missed — leaking one count per probe command.

The critical evidence: unbinding and rebinding the USB card reader after boot triggers the exact same code path but does NOT leak the counter. This proves the bug is specific to the boot-time async scanning environment, not the probe logic itself.

Update: Two leak sources identified. Further testing revealed that ignoring the card reader (quirks=:i) still left procs_blocked=4 — the four empty AHCI/SATA ports on the motherboard also leak through the same async scan path. Adding libata.force=3:disable,4:disable,5:disable,6:disable dropped the count from 4 to 2, confirming two independent sources. Both the USB and AHCI SCSI probe paths share the same buggy boot-time io_schedule_timeout accounting.

I’ve filed this as Ubuntu bug #2146707 with full ftrace data and a suggested diagnostic kernel patch.

The Fix

There isn’t a clean one yet. The procs_blocked counter leak is a kernel bug that persists from at least 6.8 through 6.17. Your options are:

Option 1: Reduce the leak with kernel parameters:

# Edit /etc/default/grub, add to GRUB_CMDLINE_LINUX_DEFAULT:
#   usb-storage.quirks=05e3:0751:i        (ignores the card reader)
#   libata.force=3:disable,4:disable,...   (reduces AHCI leak)
# Then regenerate GRUB:
sudo update-grub

This won’t fully eliminate the leak (in my testing it dropped from 4 to 2), but it helps. Disabling unused SATA ports in BIOS may also help.

Option 2: Accept the cosmetic iowaitiotop is your source of truth for actual I/O activity. The phantom counter doesn’t affect real performance.

Bonus: Re-enabling the Card Reader On Demand

Since the warm reprobe (after boot) doesn’t leak the counter, you can safely enable the card reader when you need it. The usb-storage.quirks parameter is writable at runtime:

#!/bin/bash
# card-reader.sh — toggle Genesys Logic card reader on/off
# Boot uses quirks=:i to prevent the phantom iowait bug.
# Warm reprobe after boot is safe — does NOT leak the counter.

VENDOR="05e3"
PRODUCT="0751"

find_device() {
    for dev in /sys/bus/usb/devices/*; do
        [ "$(cat "$dev/idVendor" 2>/dev/null)" = "$VENDOR" ] &&
        [ "$(cat "$dev/idProduct" 2>/dev/null)" = "$PRODUCT" ] &&
        basename "$dev" && return 0
    done
    return 1
}

case "$1" in
    on)
        DEV=$(find_device) || { echo "Card reader not found"; exit 1; }
        # Clear the ignore quirk
        echo "${VENDOR}:${PRODUCT}:" | sudo tee /sys/module/usb_storage/parameters/quirks
        # Rebind the driver
        echo "${DEV}:1.0" | sudo tee /sys/bus/usb/drivers/usb-storage/bind
        echo "Card reader enabled"
        ;;
    off)
        DEV=$(find_device) || { echo "Card reader not found"; exit 1; }
        echo "${DEV}:1.0" | sudo tee /sys/bus/usb/drivers/usb-storage/unbind
        echo "${VENDOR}:${PRODUCT}:i" | sudo tee /sys/module/usb_storage/parameters/quirks
        echo "Card reader disabled"
        ;;
    *)  echo "Usage: $0 {on|off}" ;;
esac

How to Reproduce the ftrace Diagnosis

If you’re seeing phantom iowait and suspect a USB mass storage device, here’s how to trace it. As root:

# 1. Check for the phantom counter
grep procs_blocked /proc/stat      # nonzero?
ps -eo pid,state,comm | awk '$2 == "D"'  # no D-state processes?

# 2. Set up ftrace
cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo > trace
echo function > current_tracer
echo io_schedule >> set_ftrace_filter
echo io_schedule_timeout >> set_ftrace_filter
echo blk_execute_rq >> set_ftrace_filter
echo scsi_execute_cmd >> set_ftrace_filter
echo 1 > options/func_stack_trace
echo 1 > tracing_on

# 3. Trigger a reprobe of the suspect device
echo <interface> > /sys/bus/usb/drivers/usb-storage/unbind
sleep 3
echo <interface> > /sys/bus/usb/drivers/usb-storage/bind
sleep 5

# 4. Stop and check
echo 0 > tracing_on
grep procs_blocked /proc/stat      # did it increase?
cat trace > /tmp/scsi_trace.txt    # save for analysis

If procs_blocked increases after the reprobe, the trace will show you exactly which SCSI commands are leaking the counter. If it stays the same (as it did for me), that narrows the bug to the boot-time async scan path.

The lesson: If your Linux system shows high iowait but iotop shows zero disk activity, check for USB mass storage devices. The SCSI probe path can permanently corrupt the kernel’s I/O accounting during boot.

Bug 2: The Camera That Kills All USB — Razer Kiyo Pro xHCI Death Spiral

My Razer Kiyo Pro webcam (1532:0e05, firmware 8.21) has a much more dramatic failure mode: it can crash the entire xHCI host controller, disconnecting every USB device on the bus — keyboard, mouse, audio interface, everything. The only recovery is a hard reboot.

The failure has two independent triggers, and the kernel’s built-in error recovery makes both of them worse.

Trigger 1: Link Power Management Resume Failure

The Kiyo Pro fails to reinitialize after USB Link Power Management (LPM) transitions. When the kernel tries to wake the camera from a low-power state, it produces EPIPE (-32) errors on UVC SET_CUR operations. The stalled endpoint triggers an xHCI stop-endpoint timeout, and the kernel declares the entire controller dead.

Trigger 2: Rapid Control Transfer Overflow

Approximately 25 rapid consecutive UVC SET_CUR operations overwhelm the camera’s firmware. The standard UVC error-recovery query (GET_CUR after EPIPE) amplifies the failure by sending a second transfer to the already-stalling device. The kernel detects the fault, resets the controller, the reset triggers another fault, and the system enters a death spiral.

Testing confirmed that disabling LPM alone is insufficient — a stress test with NO_LPM active still caused delayed controller death 13 minutes later via TRB warning escalation.

Update: Fixed in Kernel 6.17

After installing the HWE (Hardware Enablement) kernel on Ubuntu 24.04 LTS (sudo apt install linux-generic-hwe-24.04), which brought me to 6.17.0-19-generic, I retested with dynamic debug enabled for xhci_hcd and usbcore, and without any patches or workarounds applied — no udev quirks, no LPM disable, no control throttle. Completely stock kernel.

The stress test now passes 50/50 rounds with 0ms delay. On 6.8, the same test crashed consistently around round 25. The xHCI error handling improvements between 6.8 and 6.17 prevent the cascade to hc_died().

The kernel logs show two things happening under the hood:

# The EPIPE probe error still occurs at device init — but no longer cascades
uvcvideo 2-3.4:1.1: Failed to set UVC probe control : -32 (exp. 26).

# The kernel dynamically disables U1 LPM when the device responds slowly
usb 2-3.4: Hub-initiated U1 disabled due to long timeout 16800us

The kernel is now doing at runtime what my USB_QUIRK_NO_LPM patch was trying to do statically — detecting that the device can’t handle LPM transitions and disabling them automatically. Combined with the improved xHCI error recovery, the death spiral no longer occurs.

I’ve reported these findings to the kernel mailing list. The patch series may no longer be needed for current kernels, though the firmware bug itself (version 1.5.0.1, the latest from Razer) remains unfixed.

Workarounds for Older Kernels (6.8 and Earlier)

If you’re stuck on an older kernel, you’ll need multiple layers of defense:

  1. Kernel module quirksnd-usb-audio quirk flags to skip sample rate readback and defer audio interface setup: options snd-usb-audio quirk_flags=0x4000001
  2. Kernel patches — Three patches submitted to [email protected]:
    • USB_QUIRK_NO_LPM for device 1532:0e05
    • A new UVC_QUIRK_CTRL_THROTTLE that rate-limits SET_CUR transfers (50ms minimum interval) and skips error-code queries after EPIPE
    • Razer Kiyo Pro device entry combining throttle, disable-autosuspend, and no-reset-resume quirks
  3. Udev rule — Sets avoid_reset_quirk=1 to prevent USB errors from cascading to the xHCI controller
  4. Userspace watchdog — A systemd service that monitors kernel logs for xHCI fatal errors and performs escalating recovery: rebind the port, then reset the controller, then reload the driver. If all three fail, it stops — no retry loops, no death spirals.

Quick Runtime Workaround (Older Kernels)

If you have a Razer Kiyo Pro on kernel 6.8 or earlier and just want it to stop crashing your USB bus:

# Disable LPM for the Kiyo at runtime
echo "1532:0e05:k" | sudo tee /sys/module/usbcore/parameters/quirks

# Make it permanent
echo 'options usbcore quirks=1532:0e05:k' | sudo tee /etc/modprobe.d/razer-kiyo-usb.conf
sudo update-initramfs -u

This only addresses the LPM trigger. Full protection on older kernels requires the CTRL_THROTTLE kernel patch. The best fix is upgrading to kernel 6.17 or later.

Bug 3: The Initial Discovery — Tracker Meets a Slow USB Drive

The investigation that led me down this rabbit hole started with a much simpler problem. I plugged in an old 128GB LITEON SATA SSD in a USB enclosure (Realtek RTL9201 bridge) to grab some files. Within minutes, my system ground to a halt — real iowait this time, 50-60%, with the NVMe feeling sluggish.

GNOME’s tracker-miner-fs-3 had automatically started indexing the drive. The old SSD contained a full Ubuntu installation — hundreds of thousands of files under /usr, /lib, /snap. Tracker was hammering it with random reads, and the slow SATA-over-USB interface was bottlenecking everything.

Killing tracker and unmounting the drive should have fixed it. But when I ran udisksctl unmount, the USB enclosure physically disconnected mid-cache-sync:

sd 6:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK

This left 6 SCSI error handler threads stuck in uninterruptible sleep — and that’s when the phantom procs_blocked counter leak began. The threads eventually recovered, but the kernel’s accounting never decremented. This is the same class of bug as the card reader issue, just triggered through a different path (disconnect during cache flush vs. probe of removable media).

Bug 4: The Webcam Mic That Vanishes — Logitech StreamCam PipeWire Conflict

The Logitech StreamCam (046d:0893) is a USB 3.0 webcam that exposes both UVC video and UAC audio interfaces. On Ubuntu with Wayland and PipeWire, it mostly works — until you try to use it for streaming.

The Problem

After switching from X11 to Wayland with PipeWire (for screen capture in OBS), the StreamCam mic worked fine in veadotube. But the moment I added a PipeWire screen capture source in OBS, the mic stopped working in both OBS and veadotube. WirePlumber logged:

Failed to call Lookup: GDBus.Error:org.freedesktop.portal.Error.NotFound: No entry for camera

The issue is a conflict between PipeWire’s camera portal and the audio routing. When OBS requests screen capture through the PipeWire portal, WirePlumber’s session management interferes with the existing audio node connections to the StreamCam. The mic’s PipeWire node gets orphaned — it still exists, but no application can connect to it.

Additional Issues

The StreamCam also exposed two other quirks:

  • USB reset permission deniedusbreset "Logitech StreamCam" fails as a normal user. The fallback is toggling the sysfs authorized flag to force a re-enumeration: echo 0 > /sys/bus/usb/devices/*/authorized && echo 1 > /sys/bus/usb/devices/*/authorized
  • Wrong V4L2 node in OBS — The StreamCam exposes two video nodes: /dev/video0 (main capture) and /dev/video1 (metadata). OBS can pick the wrong one, resulting in no video. Always use /dev/video0.

The Fix

Restart the entire PipeWire stack to clear stale node connections:

# Restart PipeWire stack (fixes mic routing)
systemctl --user restart pipewire pipewire-pulse wireplumber

# If that doesn't work, reset the USB device first
pkexec usbreset "Logitech StreamCam"
sleep 1
systemctl --user restart pipewire pipewire-pulse wireplumber

I wrote a fix-mic.sh script that automates this: it tests if the mic is working, and if not, resets the USB device and restarts PipeWire. It runs automatically as part of my streaming startup script.

The Common Thread

All four bugs live in the space where USB hardware meets Linux’s kernel subsystems — SCSI, xHCI, UVC, UAC, and PipeWire. Each layer has its own error handling and recovery logic, and when a USB device misbehaves, these layers can interact in unexpected ways.

The specific issues:

  • SCSI probe accounting — The scheduler’s iowait counter can be permanently corrupted during boot-time async SCSI scanning — affects both USB mass storage and AHCI/SATA devices (bug #2146707)
  • xHCI error recovery — The kernel’s “fix it by resetting everything” approach can create death spirals with devices that have buggy firmware
  • USB disconnect handling — Cache sync operations on devices that physically disappear leave orphaned kernel accounting state
  • PipeWire session conflicts — USB devices that expose multiple interfaces (video + audio) can have their audio routing broken when PipeWire’s portal system manages video capture separately

Recommendations

If you’re running Linux with USB peripherals (especially on a workstation or homelab):

  1. Install iotop — It reads per-process I/O accounting, which is more accurate than /proc/stat. When top shows high iowait but iotop shows zero, you know it’s a phantom counter.
  2. Check usb-storage quirks — If you have USB card readers or enclosures causing issues, the usb-storage.quirks boot parameter is your friend.
  3. Disable LPM for problem cameras (older kernels) — On kernels before 6.17, many USB cameras have firmware bugs exposed by Linux’s aggressive power management. The usbcore quirks parameter can disable LPM per-device. On 6.17+, the kernel handles this automatically.
  4. Keep your kernel updated — Kernel 6.17 fixed the xHCI death spiral that affected the Razer Kiyo Pro and likely other devices with buggy firmware. The xHCI error handling no longer escalates to killing the entire controller, and LPM is dynamically disabled for slow devices. On Ubuntu 24.04 LTS, you need to install the HWE (Hardware Enablement) kernel to get 6.17: sudo apt install linux-generic-hwe-24.04. If you’re on the stock 6.8 LTS kernel, upgrading is the single best fix.
  5. Use udisksctl power-off before physically unplugging USB drives — this prevents the cache sync race condition that can leak kernel accounting state.
  6. Restart PipeWire after USB webcam issues — If your webcam’s mic stops working (especially after adding screen capture sources in OBS), run systemctl --user restart pipewire pipewire-pulse wireplumber. PipeWire’s portal system can orphan audio nodes when managing video and audio from the same USB device.

The Linux USB subsystem handles thousands of device combinations, and most of them work flawlessly. But when they don’t, the failure modes can be subtle and deeply confusing — phantom I/O wait, cascading controller resets, kernel counters that lie to every monitoring tool on your system, and audio routing that silently breaks. Understanding the layers underneath USB devices — SCSI, xHCI, UVC, PipeWire — is the key to diagnosing these issues.

All of the patches and scripts mentioned here are available on GitHub: kiyo-xhci-fix (Razer Kiyo Pro patches and watchdog) and streamcam-fixes (Logitech StreamCam mic fix).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.