日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 运维知识 > linux >内容正文

linux

The Linux SG_IO ioctl in the 2.6 series

發布時間:2024/9/5 linux 66 豆豆
生活随笔 收集整理的這篇文章主要介紹了 The Linux SG_IO ioctl in the 2.6 series 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

http://gmd20.blog.163.com/blog/static/1684392320100227396270/


原文地址:http://sg.danny.cz/sg/sg_io.html

The? Linux SG_IO ioctl in the 2.6 series

  • The? Linux SG_IO ioctl in the 2.6 series
  • Introduction
  • SCSI and related command sets
  • SG_IO ioctl overview
  • SG_IO ioctl in the sg driver
  • SG_IO ioctl differences
  • open() considerations
  • SCSI command permissions
  • CAP_SYS_RAWIO from a user process
  • SG_IO and the st driver
  • Maximum transfer size per command
  • Conclusion
  • Introduction

    The? SG_IO ?ioctl permits user applications to send SCSI commands to a device. In the linux 2.4 series this ioctl was on ly available via the SCSI generic (sg) driver. In the linux 2.6 series the SG_IO ioctl is additionally available for block devices and SCSI tape (st) devices.? So there are multiple implementations of this ioctl within the kernel with slightly different characteristics and describing these is the purpose of this document.

    The information in this page is valid for linux kernel 2.6.16 .

    SCSI and related command sets

    All SCSI devices should respond to an INQUIRY command and part of their response is the so-called peripheral device type. This is used by the linux kernel to decide which upper level driver controls the device. There are also devices that belong to other (i.e. not considered SCSI) transports that use SCSI command sets, the primary examples of this are (S-)ATAPI CD and DVD drives. Not all peripheral device types map to upper level drivers and devices of these types are usually accessed via the SCSI generic (sg) driver.

    SCSI (draft) standards are found at? www.t10.org ?. SCSI commands common to all SCSI devices are found in SPC-4 while those specific to block devices are found in SBC-2, those for CD/DVD drives are found in MMC-5 and those for SCSI tape drives are found in SSC-3.

    The major non-SCSI command set in the storage area is for ATA? non-packet ?devices which are typically disks. ATA packet ?devices use ATAPI which in the vast majority of cases carry a SCSI command set. The most recent draft ATA command set standard is ATA8-ACS and can be found at? www.t13.org ?. To complicate things (non-packet) ATA devices may have their native command set translated into SCSI. This can happen in the kernel (e.g. libata in linux) or in an intermediate device (e.g. in a USB external disk enclosure). Yet another possibility are disks whose firmware can be changed to allow them to use either the SCSI or ATA command set, this may happen in the SAS/SATA area since the physical (cabling) and phy (electrical signalling) levels are so similar.

    SG_IO ioctl overview

    The third argument given to the SG_IO ioctl is a pointer to an instance of the sg_io_hdr structure which is defined in the <scsi/sg.h> header file. The execution of the SG_IO ioctl can viewed as going through three phases:
  • do sanity checks on the metadata in the sg_io_hdr instance; read the input fields and the data pointed to by some of those fields; build a SCSI command and issue it to the device
  • wait for either a response from the device, the command to timeout or the user to terminate the process (or thread) that invoked the SG_IO ioctl
  • write the output fields and in some cases write data to locations pointed to by some fields, then return
  • On ly phase 1 returns an ioctl error (i.e. a return value of -1 and a value set in errno). In phase 2, command timeouts should be used sparingly as the device (and some others on the same interconnect) may end up being reset. If the user terminates the process or thread that invoked the SG_IO ioctl then obviously phase 3 never occurs but the command execution runs to completion (or timeout) and the kernel "throws away" the results. If the command yields a SCSI status of CHECK CONDITION (in field "status") then sense da ta is written out in phase 3 .

    Now we will assume that the SCSI command involves user da ta being transferred to or from the device. The SCSI subsystem does not support true bidirectional da ta transfers to a device. All da ta DMA transfers (assuming the hardware supports DMA) occur in phase 2. However, if indirect IO is being used (i.e. neither direct IO nor mmap-ed transfers) then either:
    • data is read from the user space in phase 1 into kernel buffers and DMA-ed to the device in phase 2, or
    • data is read from the device into kernel buffers in phase 2 and written into the user space in phase 3
    When direct IO or mmap-ed transfers are being used then all user da ta is moved in phase 2 . If a process is terminated during such a da ta transfer then the kernel gracefully handles this (by pinning the associated memory pages until the transfer is complete).

    The sg_io_hdr structure has 22 fields (members) but typically on ly a small number of them need to be set. The following co de fragment shows the setup for a simple TEST UNIT READY SCSI command which has no associated da ta transfers:
    ????
    ??? unsigned char sense_b[32];
    ??? unsigned char turCmbBlk[] = {TUR_CMD, 0, 0, 0, 0, 0};
    ??? struct sg_io_hdr io_hdr;

    ??? memset(&io_hdr, 0, sizeof(struct sg_io_hdr));
    ??? io_hdr.interface_id = 'S';
    ??? io_hdr.cmd_len = sizeof(turCmbBlk);
    ??? io_hdr.mx_sb_len = sizeof(sense_b);
    ??? io_hdr.dxfer_direction = SG_DXFER_NONE;
    ??? io_hdr.cmdp = turCmbBlk;
    ??? io_hdr.sbp = sense_b;
    ??? io_hdr.timeout = DEF_TIMEOUT;

    ??? if (ioctl(fd, SG_IO, &io_hdr) < 0) {

    The memset() call is pretty imp ortant, setting unused input fields to safe values. Setting the timeout field to zero is not a good idea; 30,000 (for 30 seconds) is a reasonable default for most SCSI commands. As always, good error processing consumes a lot more co de. This is especially the case with SCSI commands that yield "sense da ta" when something goes wrong. For example, if there is a medium error during a disk read, the sense da ta will contain the logical block address (lba) of the failure. Another error processing example is a SCSI command that the device considers an "illegal request", the sense da ta may show the byte and bit position of the field in the command block (usually referred to as a "cdb") that it objects to. For examples on error processing please refer to the sg3_utils package, its "examples" directory and its library components: sg_lib.c (SCSI error processing and tables) and sg_cmds.c (common SCSI commands).

    Below is a grouping of imp ortant sg_io_hdr structure fields with brief summaries:
    Command block (historically referred to as the "cdb"):
    • cmdp - pointer to cdb (the SCSI command block)
    • cmd_len - length (in bytes) of cdb
    Da ta transfer:
    • dxferp - pointer to user data to start reading from or start writing to
    • dxfer_len - number of bytes to transfer
    • dxfer_direction - whether to read from device (into user memory) or write to device (from user memory) or transfer no data: DXFER_FROM_DEV, DXFER_TO_DEV or DXFER_NONE respectively
    • resid - requested number of bytes to transfer (i.e. dxfer_len) less the actual number transferred
    Error indication:
    • status - SCSI status returned from the device
    • host_status - error from Host Bus Adapter including initiator (port)
    • driver_status - driver (mid level or low level driver) error and suggestion mask
    Sense da ta (on ly used when 'status' is CHECK CONDITION or (driver_status & DRIVER_SENSE) is true):
    • sbp - pointer to start writing sense data to
    • mx_sb_len - maximum number of bytes to write to sbp
    • sb_len_wr - actual number of bytes written to sbp
    The fields in the sg_io_hdr structure are defined in more detail in the? SCSI-Generic-HOWTO ?document.

    SG_IO ioctl in the sg driver

    Linux kernel 2.4.0 was the first production kernel in which the SG_IO ioctl appeared in the SCSI generic (sg) driver. The sg driver itself has been in linux since around 1993. An instance of the sg_io_hdr structure in the sg driver can either be:
    • pointed to by the third argument of the SG_IO ioctl
    • pointed to by the second argument of UNIX write() or read() system calls which have a file descriptor of a sg device node as their first argument
    The? SCSI-Generic-HOWTO ?document describes the sg driver in the lk 2.4 series including its use of the SG_IO ioctl. Prior to the lk 2.4 series the sg driver on ly had the sg_header structure. It was used as an asynchronous command interface in which command, metadata and optionally user da ta was sent via a Unix write() system call. The corresponding response which included error information (e.g. sense da ta) or optionally user da ta was received via a Unix read() system call. Two major additions were made to the sg driver at the beginning of the lk 2.4 series:
    • a new metadata structure (sg_io_hdr) as an alternative to the original mixed metadata and data structure (sg_header)
    • the SG_IO ioctl that used the new metadata structure and was synchronous: it sent a SCSI command and waited for its reply
    The sg_io_hdr on ly contains metadata in the sense that it contains pointers to locations of where da ta will come from (command or da ta in) or go to (sense da ta or da ta out). These pointers have caused problems in mixed 32/64 bit environments, especially when the user application (e.g. cdrecord) is built for 32 bits and the kernel is 64 bits. The lk 2.6 series has a compatibility layer to cope with this via co de specialized for the SG_IO ioctl. Unfortunately this problem was not foreseen when the sg_io_hdr structure was designed.

    A significant feature of the SG_IO ioctl in the sg driver is that it is user interruptible. This means between issuing a command (e.g. a long duration command like a disk format) and its response arriving a user could hit control-C on the associated application. The kernel would remain stable and resources would be cleared up at the appropriate time. The sg driver does not attempt to abort such a command that is "in flight", it simply throws away the response and cleans up. Naturally the user has no direct way of finding out whether an interrupted command succeeded or not, by there may be indirect ways.

    A warning may also be in order here: a long duration command such as format would typically be given a long timeout value. If the user interrupted the application that sent the format command then the device may remain busy doing the format (especially if the IMMED bit is not set). So if the user then sent a short duration command such as TEST UNIT READY or REQUEST SENSE to see what the device was doing, these commands may timeout. This would invoke the SCSI subsystem error handler which would most likely send a device reset, thus aborting the format, to get the device's attention. This is probably not what the user had in mind!

    SG_IO ioctl differences

    In the following table, sg_io_hdr structure fields are listed in the order they appear in that structure. Basically the "in" fields appear at the top of the structure and are read in phase 1. The latter fields are termed as "out" and are written by the SG_IO implementation in phase 3.
    ?
    Table 1. sg_io_hdr structure? summary and implementation differences
    sg_io_hdr fieldin or outtypedifferentbrief description including differences between implementations
    interface_idinint?guard field. Current implementations only accept " (int)'S' ". If not set, the sg driver sets errno to ENOSYS while the block layer sets it to EINVAL
    dxfer_directionin(-ve) intminordirection of data transfer. SG_DXFER_NONE and friends are defined as negative integers so the sg driver can discriminate between sg_io_hdr instances and those of sg_header. This nuance is irrelevant to non-sg driver usage of SG_IO. See below.
    cmd_leninunsigned char?limits command length to 255 bytes. No SCSI commands (even variable length ones in OSD) are this long (yet)
    max_sb_leninunsigned char?maximum number of bytes of sense data that the driver can output via the sbp pointer
    iovec_countinunsigned shortyesif not sg driver and greater than zero then the SG_IO ioctl fails with errno set to EOPNOTSUPP; sg driver treats dxferp as a pointer to an array struct sg_iovec when this field is greater than zero
    dxfer_lenin
    unsigned intminornumber of bytes of data to transfer to or from the device. Upper limit for block devices related to/sys/block/<device>/queue/max_sectors_kb
    dxferpin [*in or *out]void *minorpointer to (user space) data to transfer to (if reading from device) or transfer from (if writing to device). Further level of indirection in the sg driver when iovec_count is greater than 0 .
    cmdpin [*in]unsigned char *?pointer to SCSI command. The SG_IO ioctl in the sg drive fails with errno set to? EMSGSIZE if cmdp is NULL and EFAULT if it is invalid; the block layer sets errno to EFAULT? in both cases.
    sbpin [*out]unsigned char *?pointer to user data area where no more than max_sb_len bytes of sense data from the device will be written if the SCSI status is CHECK CONDITION.?
    timeoutinunsigned intyes?
    (if = 0)
    time in milliseconds that the SCSI mid-level will wait for a response. If that timer expires before the command finishes, then the command may be aborted, the device (and maybe others on the same interconnect) may be reset depending on error handler settings. Dangerous stuff, the SG_IO ioctl has no control (through this interface) of exactly what happens. In the sg driver a timeout value of 0 means 0 milliseconds, in the block layer (currently) it means 60 seconds.
    flagsinunsigned intyesBlock layer SG_IO ioctl ignores this field; the sg driver uses it to request special services like direct IO or mmap-ed transfers. It is a bit mask.
    pack_idin -> outint?unused (for user space program tag)
    usr_ptrin -> outvoid *?unused (for user space pointer tag)
    statusoutunsigned char?SCSI command status, zero implies GOOD
    masked_statusoutunsigned char?Logically: masked_status == ((status & 0x3e) >> 1). Old linux SCSI subsystem usage, deprecated.
    msg_statusoutunsigned char?SCSI parallel interface (SPI) message status (very old, deprecated)
    sb_len_wroutunsigned char?actual length of sense data (in bytes) output via sbp pointer.
    host_statusoutunsigned short?error reported by the initiator (port). These are the "DID_*" error codes in scsi.h
    driver_statusoutunsigned short?bit mask: error and suggestion reported by the low level driver (LLD). These are the "DRIVER_*" error codes in scsi.h
    residoutint?(dxfer_len - number_of_bytes_actually_transferred). Typically only set when there is a shortened DMA transfer from the device.? Not necessarily an error. Older LLDs always yield zero.
    durationoutunsigned int?number of milliseconds that elapsed between when the command was injected into the SCSI mid level and the corresponding "done" callback was invoked. Roughly the duration of the SCSI command in milliseconds.
    infooutunsigned intminorbit mask indicating what was done (or not) and whether any error was detected. Block layer SG_IO ioctl only sets SG_INFO_CHECK if an error was detected

    The DID_* and DRIVER_* error and suggestion codes (associated with host_status and driver_status) are discussed in more detail in the? SCSI-Generic-HOWTO ?document.

    open() considerations

    Various drivers have different characteristics when a device node is opened. On e problem with the ioctl system call is that a user on ly needs read permissions to execute it but may, with the ioctls like SG_IO, write to a device (e.g. format it).? Command (operation co de) sniffing logic is used to overcome this security problem. Also users of the SG_IO ioctl need to be aware when they "share" a device with sd, st or a cdrom driver that state machines within those drivers may be tricked. This may be unavoidable but the users of the SG_IO ioctl should take appropriate care.

    Opening a file in linux with flags of zero implies the O_RDONLY flag and hence read on ly access. All open() system calls can yield ENOENT (no such file or directory); ENODEV (no such device) if the file exists but there is no attached device and EACCES (permission denied) if the user doesn't have appropriate permissions.

    A user with CAP_SYS_RAWIO capability (normally associated with the "root" user) bypasses all command sniffing and other access controls that would otherwise lead to EACCES or EPERM errors. With the sg driver such a user may still need to open() a device node with O_RDWR (rather than O_RDONLY) to use all SCSI commands.

    Table 2. open() flags for SG_IO ioctl usage
    open() flagssg
    notes
    sd
    notes
    st
    notes
    cdrom
    notes
    Comments
    <none> or
    O_RDONLY
    1, 23,43,53,6best to add O_NONBLOCK. For a device with removable media (e.g. tape drive) that depends on whether the drive or its media is being accessed.
    O_RDONLY | O_NONBLOCK1,733,133recommended when SCSI commands are recognized as reading information from the device
    O_RDWR24,8,95,8,96,8,9again, could be better to add O_NONBLOCK
    O_RDWR | O_NONBLOCK78,98,9,138,9recommended when arbitrary (including vendor specific) SCSI commands are to be sent
    << interaction with O_EXCL>>10111211only use when sure that no other application may want to access the device (or partition). A surprising number of applications do "poke around" devices.
    << interaction with O_DIRECT>>--->--->requires sector alignment on data transfers (ignored by sg and st)

    Notes :
  • on subsequent SG_IO ioctl calls, the sg driver will only allow SCSI commands in its allow_ops array, others result in EPERM (operation not permitted) in errno. See?below?.
  • if previous open() of this sg device node still holds O_EXCL then this open() waits until it clears.
  • on subsequent SG_IO ioctl calls, the block layer will only allow SCSI commands listed as "safe_for_read" in the verify_command() function in the drivers/block/scsi_ioctl.c file; others result in EPERM (operation not permitted) in errno. See?below?.
  • if removable media and it is not present then yields ENOMEDIUM (no medium found)
  • if a tape is not present in drive then yields EIO (input/output error), if tape is "in use" then yields EBUSY (resource busy). Only one open file descriptor is allowed per st device node at a time (although dup() can be used).
  • if tray closed and media is not present then yields ENOMEDIUM (no medium found); if tray open then tries to close it and if no media present then yields ENOMEDIUM
  • if previous open() of this sg device node still holds O_EXCL then yields EBUSY (resource busy).
  • on subsequent SG_IO ioctl calls, the block layer will allow SCSI commands listed as either "safe_for_read" or "safe_for_write". For other SCSI commands the user requires the CAP_SYS_RAWIO capability (usually associated with the "root" user); if not yields EPERM (operation not permitted). The first instance of other SCSI commands since boot, sends an annoying "scsi: unknown opcode" message to the log.
  • if the media or drive is marked as not writable then yields EROFS (read-only file system).
  • if sg device node already has exclusive lock then a subsequent attempt to open(O_EXCL) will wait unless O_NONBLOCK is given in which case it yields EBUSY (resource busy)
  • implemented at block device level (which knows about partitions within devices). If a previous open(O_EXCL) is active then a subsequent open(O_EXCL) yields EBUSY (resource busy). Mounted file systems typically open a device/partition with O_EXCL; as long as an application using the SG_IO ioctl does not also try and use the O_EXCL flag then it will be allowed access to the device.
  • the st driver does not support (i.e. ignores) the O_EXCL flag. However the fact that it only permits one active open() per tape device is similar functionality.
  • if tape is "in use" then yields EBUSY (resource busy). Only one open file descriptor is allowed per st device node at a time.
  • The O_EXCL flag has a different effect in the sg driver and the block layer. In the sg driver, on ce O_EXCL is held on a device, all subsequent open() attempts will either wait or yield EBUSY (irrespective of whether they attempt to use the O_EXCL flag). On ce a partition/device is opened successfully in the block layer (with the sd or cdrom driver) on ly subsequent open() attempts that also use the O_EXCL flag are rejected (with EBUSY). A O_EXCL lock held on a device in the block layer has no effect on accessing the same device via the sg driver (and vice versa).

    The first successful open on a sd or a cdrom device node that has removable media will send a PREVENT ALLOW MEDIUM REMOVAL (prevent) SCSI command to the device. If successful, this will inhibit a subsequent START STOP UNIT (eject) SCSI command and de-activate the eject button on the drive. In emergencies, the SG_IO ioctl can be used to defeat this act ion, an example of this is the? sdparm ?utility, specifically "sdparm --command=unlock".

    The open() flag O_NDELAY has the same value and meaning as O_NONBLOCK. Other flags such as O_DIRECT, O_TRUNC and O_APPEND have no effect on the SG_IO ioctl.

    SCSI command permissions

    In linux a user on ly needs read permissions on a file descriptor to execute an ioctl() system command. In the case of the SG_IO ioctl, a SCSI command could be sent that obviously changes the state of a device (e.g. WRITE to a disk). So both implementations of the SG_IO ioctl require more than read permissions for some commands, especially those that are known to change the state of a device or those that have some unknown act ion (e.g. vendor specific commands).

    Here is a table of SCSI commands that don't need the user to have write permissions (or in some cases CAP_SYS_RAWIO capability which usually equates to "root" user):
    Table 3. SCSI command minimum permission requirements
    SCSI command(draft) standardsg driver requiresblock layer SG_IO
    requires (except st)
    Comments
    BLANKMMC-4O_RDWRO_RDWR?
    CLOSE TRACK/SESSIONMMC-4O_RDWRO_RDWR?
    ERASEMMC-4O_RDWRO_RDWR?
    FLUSH CACHESBC-3, MMC-4O_RDWRO_RDWRReally SYNCHRONIZE CACHE command
    FORMAT UNITSBC-3, MMC-4O_RDWRO_RDWRdefault command timeout may not be long enough
    GET CONFIGURATIONMMC-4O_RDWRO_RDONLYreads CD/DVD metadata
    GET EVENT STATUS NOTIFICATIONMMC-4O_RDWRO_RDONLY?
    GET PERFORMANCEMMC-4O_RDWRO_RDONLY?
    INQUIRYSPC-4O_RDONLYO_RDONLYAll SCSI devices should respond to this command
    LOAD UNLOAD MEDIUMMMC-4O_RDWRO_RDWRMEDIUM may be replaced by CD, DVD or nothing
    LOG SELECTSPC-4O_RDWRO_RDWRused to change logging or clear logged data
    LOG SENSESPC-4O_RDONLYO_RDONLYused to fetch logged data
    MAINTENANCE COMMAND INSPC-4O_RDONLYCAP_SYS_RAWIO
    various "REPORT ..." commands such as REPORT SUPPORTED OPERATION CODES in here
    MODE SELECT (6+10)SPC-4O_RDWRO_RDWRUsed to change SCSI device metadata
    MODE SENSE (6+10)SPC-4O_RDONLYO_RDONLYUsed to read SCSI device metadata
    PAUSE RESUMEMMC-4O_RDWRO_RDONLY?
    PLAY AUDIO (10)MMC-4O_RDWRO_RDONLY?
    PLAY AUDIO MSFMMC-4O_RDWRO_RDONLY?
    PLAY AUDIO TI??O_RDWRO_RDONLYopcode 0x48, unassigned to? any spec in SPC-4
    PLAY CDMMC-2O_RDWRO_RDONLYold, now SPARE IN in SPC-4
    PREVENT ALLOW MEDIUM REMOVALSPC-4, MMC-4O_RDWRO_RDWRsd, st and cdrom drivers use this internally
    READ (6+10+12+16)SBC-3O_RDONLYO_RDONLYREAD(16) requires O_RDWR with the sg driver before lk2.6.11
    READ BUFFERSPC-4O_RDONLYO_RDONLY?
    READ BUFFER CAPACITYMMC-4O_RDWRO_RDONLY?
    READ CAPACITY(10)SBC-3, MMC-4O_RDONLYO_RDONLY?
    READ CAPACITY(16)SBC-3,
    MMC-4
    O_RDONLYCAP_SYS_RAWIOwithin SERVICE ACTION IN command. Needed for RAIDs larger than 2 TB
    READ CDMMC-4O_RDWRO_RDONLY?
    READ CD MSFMMC-4O_RDWRO_RDONLY?
    READ CDVD CAPACITYSBC-3, MMC-4O_RDONLYO_RDONLYStrange (old ?) name from cdrom.h . Actually is READ CAPACITY.
    READ DEFECT (10)SBC-3O_RDWRO_RDONLY?
    READ DISC INFOMMC-4O_RDWRO_RDONLY?
    READ DVD STRUCTUREMMC-4O_RDWRO_RDONLY?
    READ FORMAT CAPACITIESMMC-4O_RDWRO_RDONLY?
    READ HEADERMMC-2O_RDWRO_RDONLY?
    READ LONG (10)SBC-3O_RDONLYO_RDONLYbut not READ LONG (16)
    READ SUB-CHANNELMMC-4O_RDWRO_RDONLY?
    READ TOC/PMA/ATIPMMC-4O_RDWRO_RDONLY?
    READ TRACK (RZONE) INFOMMC-4O_RDWRO_RDONLYIn MMC-4 called READ TRACK INFO
    RECEIVE DIAGNOSTICSPC-4O_RDONLYCAP_SYS_RAWIOthe SES command set uses this command a lot. An SES device is only accessible via an sg device node
    REPAIR (RZONE) TRACKMMC-4O_RDWRO_RDWR?
    REPORT KEYMMC-4O_RDWRO_RDONLY?
    REPORT LUNSSPC-4O_RDONLYCAP_SYS_RAWIOmandatory since SPC-3
    REQUEST SENSESPC-4O_RDONLYO_RDONLYhas uses other than those displaced by autosense
    RESERVE (RZONE) TRACKMMC-4O_RDWRO_RDWR?
    SCANMMC-4O_RDWRO_RDONLY?
    SEEKMMC-4O_RDWRO_RDONLY?
    SEND CUE SHEETMMC-4O_RDWRO_RDWR?
    SEND DVD STRUCTUREMMC-4O_RDWRO_RDWR?
    [SEND EVENT]MMC-2?O_RDWRcdrom.h associates opcode 0xa2 but MMC-2 uses opcode 0x5d ??
    SEND KEYMMC-4O_RDWRO_RDWR?
    SEND OPC INFORMATIONMMC-4O_RDWRO_RDWR?
    SERVICE ACTION INSPC-4, SBC-3O_RDONLYCAP_SYS_RAWIOREAD CAPACITY (16) service action in here
    SET CD SPEEDMMC-4O_RDWRO_RDWRcdrom.h calls this SET SPEED
    SET STREAMINGMMC-4O_RDWRO_RDWR?
    START STOP UNITSBC-3, MMC-4O_RDWRO_RDONLYhmm
    STOP PLAY/SCANMMC-4O_RDWRO_RDONLY?
    SYNCHRONIZE CACHESBC-3, MMC-4O_RDWRO_RDWRcdrom.h calls this FLUSH CACHE
    TEST UNIT READYSPC-4O_RDONLYO_RDONLYAll SCSI devices should respond to this command
    VERIFY (10+16)SBC-3, MMC-4O_RDWRO_RDONLY?
    WRITE (6+10+12+16)SBC-3O_RDWRO_RDWR?
    WRITE LONG (10+16)SBC-3O_RDWRO_RDWR?
    WRITE VERIFY (10+16)SBC-3, MMC-4O_RDWRO_RDWRonly WRITE VERIFY(10) is in MMC-4

    Any other SCSI command (opcode) not mentioned for the sg driver needs O_RDWR. Any other SCSI command (opcode) not mentioned for the block layer SG_IO ioctl needs a user with CAP_SYS_RAWIO capability. All "block" SG_IO ioctl calls on st device nodes need a user with CAP_SYS_RAWIO capability. If a user does not have sufficient permissions to execute a SCSI command via the SG_IO ioctl then the system calls fails (i.e. no SCSI command is sent) and errno is set to EPERM (operation not permitted).

    Both the sg driver and the block layer SG_IO co de use internal tables to enforce the permissions shown in the above table (allow_ops and cmd_type [safe_for_read and safe_for_write] respectively). This technique doesn't scale well, since more advanced command sets (e.g. OSD) use service actions (and on e opcode: 0x7f in the case of OSD). There may also be overlap in opcode usage between command sets, for example between SBC, MMC and SSC.

    CAP_SYS_RAWIO from a user process

    While root processes usually have CAP_SYS_RAWIO, processes running under a user's ID (i.e. non-root) typically don't. Hence non-root processes may not be able to use SG_IO to send SCSI commands that require CAP_SYS_RAWIO. This may occur even if the permission bits of the device node file allow for read or write access, user processes will receive EPERM when using SG_IO.?

    By default the capability to assign capabilities to other processes (CAP_SETPCAP) is limited to very few processes, such as certain kernel threads. Changing this default would require to change and recompile the kernel.

    Processes which are forked by a root process and call setuid later will lose the CAP_SYS_RAWIO capability the parent root process (and the child before the setuid) had. However, the child can preserve the capabilities of the root process in the permitted set and raise it after the call of setuid:

    /* ... in child after fork(), still running as root ... */
    prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0);
    setuid(...);
    cap_set_proc(cap_from_text("cap_sys_rawio+ep"));

    This way a user process with a parent root process can 'get back' the required capabilities to directly send SCSI commands to a device via SG_IO.

    The above technique may be of use to daemons that are started with root permissions (most are) and then changes to another user after a fork(). It is not obvious to the author how utilities that use the SG_IO ioctl on device nodes that require CAP_SYS_RAWIO for some or all SCSI commands (e.g. nodes associated with the sd and st drivers) can use the above technique.

    SG_IO and the st driver

    In order to implement its user space API, the st driver has to maintain information about where the read head is with respect to the structural elements of the tape (filemarks, beginning of tape, end of da ta). Because the streaming device SCSI commands don't have addresses, the st driver has to know what commands have been sent. When reading, the filemarks are noticed when a read fails and sense da ta is fethed. If SG_IO is mixed with tape commands, the st driver may lose information (it does not look at the SG_IO commands and results). Because of this, the st driver may not implement the semantics the user expects. If the user accepts this or knows when using? SG_IO does not cause information loss, then using SG_IO is OK.

    So mixing st driver read, write and ioctl commands with SCSI commands sent via SG_IO that change the state of the tape is not recommended. This applies whether the SG_IO SCSI commands are sent via st or sg device nodes.

    Maximum transfer size per command

    The largest amount of da ta that can be transferred by a single SCSI command is often a concern. Various SCSI command sets (e.g. SBC-3 for disk READs and WRITEs, SSC-3 for tape READs and WRITEs, and SPC-4 for READ+WRITE BUFFER) allow very large da ta transfer sizes but Linux is not so accommodating. The Host Bus Adapter (HBA) could have transfer size limits as could the transport and finally the SCSI device itself. In the latter case SBC-3 defines a "Block Limits" Vital Product Da ta (VPD) while SSC has the READ BLOCK LIMITS SCSI command. SBC-3's optional Block Limits VPD page contains both maximum and optimal counts. In the author's opinion that latter distinction is very imp ortant: the block susbsystem should try and use optimal sizes while pass through users should on ly be constrained by maximum sizes. Also if a pass through user exceeds a maximum transfer size imposed by a SCSI device, then the device can report an error. There is an underlying assumption that the applications using a pass through interface know what they are doing, or at least know more than the various kernel susbsystems. On the other hand, the kernel has the responsibility to allocate critical shared resources such as memory.

    In the past, Linux used a single, "big-enough", block of memory for the source or destination of large da ta transfers. Then scatter-gather lists where added to break transfers up into smaller (often "page" size (4 KB on i386 architecture)) chunks which made memory management easier for the kernel. Now, in the lk 2.6 series, the single block of memory option is being phased out.?

    The Linux SCSI subsystem imposes a 128 element limit on scatter gather lists via its SCSI_MAX_PHYS_SEGMENTS define. The way various memory pools are allocated by the linux SCSI subsystem, SCSI_MAX_PHYS_SEGMENTS could be increased to 256. Associated with each type of HBA there is normally a low level driver (LLD). Each LLD can further limit the maximum number of elements with the scsi_host_template::sg_tablesize field. Prior to lk 2.6.16 the sg and st drivers used the .sg_tablesize field on ly, since lk 2.6.16 those drivers are also constrained by SCSI_MAX_PHYS_SEGMENTS. This leads to a potential halving of the maximum transfer size. Many LLDs set the .sg_tablesize field to SG_ALL (which is 255) but they may as well set that field to 256 unless the HBA hardware has a constraint.

    User space memory may be allocated as the source and/or destination for DMA transfers from the HBA (i.e. direct IO). Even if the user space allocated a large amount of memory with a single malloc(), the HBA DMA element typically has a different view of memory. This view may well contain many "page" size discontinuous pieces. This has the effect of using up, or perhaps exhausting, scatter-gather elements.

    The sg driver attempts to build scatter gather lists with each element up to SG_SCATTER_SZ bytes large. This define is found in include/scsi/sg.h and has been set to 32 KB for some years. That is 8 times the page size (of 4 KB) on the i386 architecture. Some users who need really large transfers increase this define (and it is best to keep it a power of 2). However since lk 2.6.16 another limit comes into play: the MAX_SEGMENT_SIZE define which is set to 64 KB. MAX_SEGMENT_SIZE is a default and can be overridden by the LLD calling blk_queue_max_segment_size().

    In lk 2.6.16 two further LLD parameters come into play even when the sg (and st) driver is used. These are scsi_host_template::max_sectors and scsi_host_template::use_clustering .??

    The .max_sectors setting in the LLD is the maximum number of 512 byte sectors allowed in a single SCSI command's scatter gather lists (for da ta transfers). Yes, that is a strange limit when trying to send a SCSI WRITE BUFFER command to upload firmware. Sysfs makes the LLD's .max_sectors setting visible (converted to kilobytes) in /sys/block/sd<x>/queue/max_hw_sectors_kb . The maximum allowable value in a LLD's .max_sector seems to be 65535 (0xffff in hexadecimal). This limits the maximum transfer size to (32*1024*1024 - 512) bytes, assuming other limitations have been overcome. [The 65535 sector limit is because Scsi_Host::max_sectors has type "unsigned short". Hopefully this type is expanded to "int" in the future (or removed).]

    The .use_clustering field should be set to ENABLE_CLUSTERING . If not, the block subsystem rebuilds the scatter gather list it gets from the sg driver with page size (e.g. 4 KB) elements. [Actually is does that anyway, but when ENABLE_CLUSTERING is set, it coalesces them again!]

    Conclusion

    In some situations, sending commands via the SG_IO ioctl may interfere with a higher level driver's use of a device. Users of the SG_IO ioctl should be aware that they are using a powerful, but low level facility, and write co de accordingly. An example of this would be a utility to perform self tests on a disk: "background" self tests should be preferred over "foreground" self tests if there is a chance the computer may be using a file system on that disk at the time. Even a short foreground self test may take up to two minutes which is a long time to lock out a file system.

    Return to?main?page.

    Last updated: 26th July 2008



    總結

    以上是生活随笔為你收集整理的The Linux SG_IO ioctl in the 2.6 series的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。