[SCSI] libiscsi: Fix scsi command timeout oops in iscsi_eh_timed_out
Yanling Qi from LSI found the root cause of the panic, below is his analysis: Problem description: the open iscsi driver installs eh_timed_out handler to the blank_transport_template of the scsi middle level that causes panic of timed out command of other host Here are the details Iscsi Session creation During iscsi session creation time, the iscsi_tcp_session_create() of iscsi_tpc.c will create a scsi-host for the session. See the statement marked with the label A. The statement B replaces the shost->transportt point with a local struct variable. static struct iscsi_cls_session * iscsi_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max, uint16_t qdepth, uint32_t initial_cmdsn, uint32_t *hostno) { struct iscsi_cls_session *cls_session; struct iscsi_session *session; struct Scsi_Host *shost; int cmd_i; if (ep) { printk(KERN_ERR "iscsi_tcp: invalid ep %p.\n", ep); return NULL; } A shost = iscsi_host_alloc(&iscsi_sht, 0, qdepth); if (!shost) return NULL; B shost->transportt = iscsi_tcp_scsi_transport; shost->max_lun = iscsi_max_lun; Please note the scsi host is allocated by invoking isccsi_host_alloc() in libiscsi.c Polluting the middle level blank_transport_template in iscsi_host_alloc() of libiscsi.c The iscsi_host_alloc() invokes the middle level function scsi_host_alloc() in hosts.c for allocating a scsi_host. Then the statement marked with C assigns the iscsi_eh_cmd_timed_out handler to the eh_timed_out callback function. struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht, int dd_data_size, uint16_t qdepth) { struct Scsi_Host *shost; struct iscsi_host *ihost; shost = scsi_host_alloc(sht, sizeof(struct iscsi_host) + dd_data_size); if (!shost) return NULL; C shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out; Please note the shost->transport is the middle level blank_transport_template as shown in the code segment below. We see two problems here. 1. iscsi_eh_cmd_timed_out is installed to the blank_transport_template that will cause some body else problem. 2. iscsi_eh_cmd_timed_out will never be invoked when iscsi command gets timeout because the statement B resets the pointer. Middle level blank_transport_template In the middle level function scsi_host_alloc() of hosts.c, the middle level assigns a blank_transport_template for those hosts not implementing its transport layer. All HBAs without supporting a specific scsi_transport will share the middle level blank_transport_template. Please see the statement D struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) { struct Scsi_Host *shost; gfp_t gfp_mask = GFP_KERNEL; int rval; if (sht->unchecked_isa_dma && privsize) gfp_mask |= __GFP_DMA; shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask); if (!shost) return NULL; shost->host_lock = &shost->default_lock; spin_lock_init(shost->host_lock); shost->shost_state = SHOST_CREATED; INIT_LIST_HEAD(&shost->__devices); INIT_LIST_HEAD(&shost->__targets); INIT_LIST_HEAD(&shost->eh_cmd_q); INIT_LIST_HEAD(&shost->starved_list); init_waitqueue_head(&shost->host_wait); mutex_init(&shost->scan_mutex); shost->host_no = scsi_host_next_hn++; /* XXX(hch): still racy */ shost->dma_channel = 0xff; /* These three are default values which can be overridden */ shost->max_channel = 0; shost->max_id = 8; shost->max_lun = 8; /* Give each shost a default transportt */ D shost->transportt = &blank_transport_template; Why we see panic at iscsi_eh_cmd_timed_out() The mpp virtual HBA doesn’t have a specific scsi_transport. Therefore, the blank_transport_template will be assigned to the virtual host of the MPP virtual HBA by SCSI middle level. Please note that the statement C has assigned iscsi-transport eh_timedout handler to the blank_transport_template. When a mpp virtual command gets timedout, the iscsi_eh_cmd_timed_out() will be invoked to handle mpp virtual command timeout from the middle level scsi_times_out() function of the scsi_error.c. enum blk_eh_timer_return scsi_times_out(struct request *req) { struct scsi_cmnd *scmd = req->special; enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *); enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; scsi_log_completion(scmd, TIMEOUT_ERROR); if (scmd->device->host->transportt->eh_timed_out) E eh_timed_out = scmd->device->host->transportt->eh_timed_out; else if (scmd->device->host->hostt->eh_timed_out) eh_timed_out = scmd->device->host->hostt->eh_timed_out; else eh_timed_out = NULL; if (eh_timed_out) { rtn = eh_timed_out(scmd); It is very easy to understand why we get panic in the iscsi_eh_cmd_timed_out(). A scsi_cmnd from a no-iscsi device definitely can not resolve out a session and session->lock. The panic can be happed anywhere during the differencing. static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *scmd) { struct iscsi_cls_session *cls_session; struct iscsi_session *session; struct iscsi_conn *conn; enum blk_eh_timer_return rc = BLK_EH_NOT_HANDLED; cls_session = starget_to_session(scsi_target(scmd->device)); session = cls_session->dd_data; debug_scsi("scsi cmd %p timedout\n", scmd); spin_lock(&session->lock); This patch fixes the problem by moving the setting of the iscsi_eh_cmd_timed_out to iscsi_add_host, which is after the LLDs have set their transport template to shost->transportt. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
parent
7f977ddd
Please register or sign in to comment