SCSI pass-through devices disappearing immediately in FreeBSD guest

Jan 31, 2011 11:44


This article first describes a problem of VMware ESXi 3.5 (other versions may be affected as well; feedback appreciated) which causes host-side SCSI devices, when connected to a FreeBSD guest via SCSI pass-through, to disappear immediately, presents a workaround, and discusses why it happens.


Environment
  • VMware ESXi 3.5
  • Guest running FreeBSD
  • One or more host SCSI devices attached to the guest via SCSI pass-through
Symptom

  • The SCSI devices are not found upon guest start-up.

  • When one runs "camcontrol rescan" as root in order to rescan the SCSI bus for the devices, they appear to be found and attached normally, only to be detached immediately.  Kernel messages (in bright white print on the system console; also printed by running “dmesg”) resembles the following sample output:
    sa0 at mpt1 bus 0 scbus3 target 0 lun 0 sa0: Removable Sequential Access SCSI-3 device sa0: 320.000MB/s transfers (160.000MHz DT, offset 127, 16bit) (sa0:mpt1:0:0:0): lost device (sa0:mpt1:0:0:0): removing device entry

    The first 3 lines indicate that the device has been found and attached, then the next 2 lines indicate that it has been detached.

    Note that, on the first line, the SCSI bus (scbus) number is 3 and not 0.  “bus 0” is really a part of “mpt1 bus 0”, which means the sub-bus #0 of the SCSI controller device named mpt1; “mpt1 bus 0 scbus3” should be interpreted as “sub-bus #0 of mpt1, a.k.a. scbus3”.
Workaround

The problem can be worked around by selectively probing LUNs that exist (typically just LUN 0) of the affected SCSI target.
Step-by-step Instruction

  1. If you already know which scbus has the affected device, skip steps 2-5 and go to step 6.  Otherwise, continue to step 2.

  2. If the problematic device is attached through the same virtual SCSI adapter where another device has been attached (a virtual disk or another SCSI pass-through device), run:
    # camcontrol devlist

    This command locates the other device.  Note its scbus number, then skip steps 3-5 and go to step 6.

    Otherwise (i.e. the problematic device is attached via a dedicated virtual SCSI adapter with no other devices on), continue to step 3.

  3. Run:
    # camcontrol devlist -v

    You will see one or more empty scbusses with no devices. 

  4. For each empty scbus identified in step 3, except “scbus-1 on xpt0” which is not a real but a hidden “root”SCSI bus, run:
    # camcontrol rescan
    N

    N is the bus number, e.g. 6 for scbus6.

  5. Either examine /var/log/messages or run dmesg to see the kernel log.  You will find that the problematic device has been attached then removed immediately (see example output in the Symptoms section).  Note its scbus number.

  6. At this point, you know the scbus number where the problematic device is.

  7. Consult the guest VM configuration to figure out which SCSI target number the problematic device uses.

  8. Consult the device documentation (usually its installation manual) to figure out which LUN(s) the problematic device uses.  Usually, this is just LUN 0.

  9. Now the real thing: Selectively probe the scbus:target:lun using camcontrol(8)'s rescan command, e.g. for scbus3 target 0 lun 0, run:
    # camcontrol rescan 3:0:0

  10. The kernel should now find and attach the device successfully.

  11. For future reference, record which scbus the device is on.  If you later place more pass-through devices on the same virtual SCSI controller, you will be able to skip steps 2-6 of this workaround.

Caution: Never initiate a full bus scan on a SCSI bus (“camcontrol rescan 3” for example; note that just a bus number (3) has been specified but target/LUN numbers are missing) if another SCSI pass-through device has been attached to the bus and is working properly!  A full bus scan may cause existing SCSI pass-through devices to disappear, and may panic the OS kernel if the disappeared device is already in use.

Caution: Never initiate a system-wide scan either (that is, “camcontrol rescan”, without bus/target/LUN numbers), because this will initiate a full bus scan on all SCSI buses!

It is recommended that you place non-pass-through devices on one virtual SCSI adapter, and pass-through devices on another virtual SCSI adapter so that you can safely initiate a full scan on the non-pass-through bus, e.g. to discover newly added virtual disks.
Cause
Background

When scanning the SCSI bus for new devices, FreeBSD issues SCSI INQUIRY command to all possible SCSI targets and LUNs (logical unit numbers) sequentially on the bus, waiting a short time after sending each INQUIRY command for a positive response.  For example, when scanning for all devices on mpt1 sub-bus 0 as in the preceding example, FreeBSD sends an INQUIRY command to a total of 128 possible LUNs:
  • To mpt1:0:0:0 (mpt1, sub-bus 0, target 0, LUN 0)
  • To mpt1:0:0:1 (mpt1, sub-bus 0, target 0, LUN 1)
  • To mpt1:0:0:6 (mpt1, sub-bus 0, target 0, LUN 6)
  • To mpt1:0:0:7 (mpt1, sub-bus 0, target 0, LUN 7)
  • To mpt1:0:1:0 (mpt1, sub-bus 0, target 1, LUN 0)
  • To mpt1:0:1:1 (mpt1, sub-bus 0, target 1, LUN 1)
  • To mpt1:0:15:6 (mpt1, sub-bus 0, target 15, LUN 6)
  • To mpt1:0:15:7 (mpt1, sub-bus 0, target 15, LUN 7)
Expected behavior of ESXi 3.5

Most SCSI devices respond positively only to LUN 0, and either ignore or send negative (i.e. absent LUN) to LUNs 1-7, as is the case with the HP tape drive in the sample output above:
  • To mpt1:0:0:0 - inquiry; from mpt1:0:0:0 - positive
  • To mpt1:0:0:1 - inquiry; from mpt1:0:0:1 - negative
  • To mpt1:0:0:2 - inquiry; from mpt1:0:0:2 - negative
  • To mpt1:0:0:3 - inquiry; from mpt1:0:0:3 - negative
  • To mpt1:0:0:4 - inquiry; from mpt1:0:0:4 - negative
  • To mpt1:0:0:5 - inquiry; from mpt1:0:0:5 - negative
  • To mpt1:0:0:6 - inquiry; from mpt1:0:0:6 - negative
  • To mpt1:0:0:7 - inquiry; from mpt1:0:0:7 - negative

(Negative responses may as well be absent-if so, the corresponding inquiry will time out and FreeBSD will assume that no device exists at that address.)
Actual behavior of ESXi 3.5 and its consequence

The virtual SCSI host adapter presented by ESXi 3.5, when responding negatively to an inquiry in order to indicate there exists no device at the requested address, errneously report LUN as 0 even when the inquiry requested non-zero LUN:
  • To mpt1:0:0:0 - inquiry; from mpt1:0:0:0 - positive
  • To mpt1:0:0:1 - inquiry; from mpt1:0:0:0 - negative
  • To mpt1:0:0:2 - inquiry; from mpt1:0:0:0 - negative
  • To mpt1:0:0:3 - inquiry; from mpt1:0:0:0 - negative
  • To mpt1:0:0:4 - inquiry; from mpt1:0:0:0 - negative
  • To mpt1:0:0:5 - inquiry; from mpt1:0:0:0 - negative
  • To mpt1:0:0:6 - inquiry; from mpt1:0:0:0 - negative
  • To mpt1:0:0:7 - inquiry; from mpt1:0:0:0 - negative

FreeBSD, having received a positive response to LUN 0 and created the device successfully, interprets the negative response to the same LUN 0 as an indication that the device is gone now, and removes the newly created device.

Previous post Next post
Up