RDMA/nes: Fix nes_nic_cm_xmit() error handling
We are getting crash or hung situation when we are running network cable pull tests during RDMA traffic. In schedule_nes_timer(), we return an error if nes_nic_cm_xmit() returns failure. This is changed to success as skb is being put on the timer routines to be processed later. In send_syn() case, we are indicating connect failure once from nes_connect() and the other when the rexmit retries expires. The other issue is skb->users which we are incrementing before calling nes_nic_cm_xmit() which calls dev_queue_xmit() but in case of failure we are decrementing the skb->users at the same time putting the skb on the rexmit path. Even if dev_queue_xmit() fails, the skb->users is decremented already. We are removing the decrement of skb->users in case of failure from both schedule_nes_timer() as well as from nes_cm_timer_tick(). There is also extra check in nes_cm_timer_tick() for rexmit failure which does a break from the loop is removed. This causes problem as the other nodes have their cm_node->ref_count incremented and are not processed. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
parent
79fc3d74
Please register or sign in to comment