iSCSI VMware fails (#12318192)

iSCSI VMware fails (#12318192)

Postby BWaring » Sun Aug 15, 2010 1:30 pm

Had this issue with a client's 3100 running 4.2.11. iSCSI would randomly hang and NOT just under load. Case was opened (12318192), logs sent, eventually 4.2.12 was installed - about 3 weeks ago. Issue seemed to be resolved until yesterday morning (8/14 5:30 AM EST). ONE of 3 LUNs disappeared from both VMware servers, bringing down the virtual servers associated with this LUN. 3100 has two iSCSI targets, 1 with 2 LUNs and 1 with 1 LUN. It was the 1 with 1 LUN that went down. There was no access through FrontView, although SSH still responded as did the CIFS share. Physical examination today revealed nothing and only solution was to hard reset 3100. After bringing virtual servers back up (7 virutual server, 3 on 1 physical server and 4 on 1 physical server), about 45 minutes later same issue.

Searching up here, issue and hardware configuration is extrememly simialar to iSCSI MCS Fails under load (#12775362)(http://www.readynas.com/forum/viewtopic.php?f=126&t=44155#p249971).

After second failure today, upgraded to 4.2.13 and all is now restarted. Have logs from after each failure and will be forwarding them.
BWaring
ReadyNAS Newbie
 
Posts: 20
Joined: Sun Aug 15, 2010 1:03 pm

Re: iSCSI VMware fails (#12318192)

Postby chirpa » Sun Aug 15, 2010 10:24 pm

The engineers will inspect the logs you sent in in the morning.
User avatar
chirpa
Jedi Council Alumni
 
Posts: 15539
Joined: Mon Sep 24, 2007 11:52 am
Location: San Jose, CA
ReadyNAS: Repertoire

Re: iSCSI VMware fails (#12318192)

Postby chirpa » Mon Aug 16, 2010 2:27 pm

Back on topic.. (merged other posts in here with their original thread, was off-topic)

BWaring, can you try setting up the ACLs for the hosts accessing the LUNs? Also, can you try with jumbo frames disabled if possible?
User avatar
chirpa
Jedi Council Alumni
 
Posts: 15539
Joined: Mon Sep 24, 2007 11:52 am
Location: San Jose, CA
ReadyNAS: Repertoire

Re: iSCSI VMware fails (#12318192)

Postby BWaring » Mon Aug 16, 2010 10:38 pm

This is a production server at a client. Changing ACLs or Jumbo Frames is not an easy thing to do, as it requires coordination to bring down their entire systems for these changes. And without a specific reason that relates to a specific issue, such as "we see XYZ in the log files, so do ABC to correct it" and without any specific way to reproduce the issue to confirm resolution, I hesitate to even suggest it. IS there any specific issue that you see in the log files - any of the three revisions of them that I sent in after each change? Why do you think that either of those changes would make a difference?

And in addition, although I think it is completely unrelated, this unit has a not in the case regarding a bad temp sensor - the Temp 1 sensor continually reads over 114 degrees - for instance, right now Temp 1 is 123, Temp 2 is 86, Temp 3 is 92, and the drive temps are 85-96... the unit will eventually be RMA's for this, but if you think it should be done now rather than later, get it set up....
BWaring
ReadyNAS Newbie
 
Posts: 20
Joined: Sun Aug 15, 2010 1:03 pm

Re: iSCSI VMware fails (#12318192)

Postby chirpa » Tue Aug 17, 2010 12:18 am

I agree, I am only passing along what the engineers asked. I'll see if there is anything else to report, without causing you downtime.

I can add a note in the case regarding an RMA, but the support staff would have to do the actual RMA.
User avatar
chirpa
Jedi Council Alumni
 
Posts: 15539
Joined: Mon Sep 24, 2007 11:52 am
Location: San Jose, CA
ReadyNAS: Repertoire

Re: iSCSI VMware fails (#12318192)

Postby BWaring » Tue Aug 17, 2010 4:55 am

Ok, as I said, WHY do they think those changes might help? Something in the logs, or just because? Was there nothing in the logs to indicate when/why iSCSI stopped responding?
BWaring
ReadyNAS Newbie
 
Posts: 20
Joined: Sun Aug 15, 2010 1:03 pm

Re: iSCSI VMware fails (#12318192)

Postby chirpa » Tue Aug 17, 2010 8:58 am

Just because it seems, I asked the same question originally.
User avatar
chirpa
Jedi Council Alumni
 
Posts: 15539
Joined: Mon Sep 24, 2007 11:52 am
Location: San Jose, CA
ReadyNAS: Repertoire

Re: iSCSI VMware fails (#12318192)

Postby BWaring » Tue Aug 17, 2010 9:45 am

Ok, thanks...
BWaring
ReadyNAS Newbie
 
Posts: 20
Joined: Sun Aug 15, 2010 1:03 pm

Re: iSCSI VMware fails (#12318192)

Postby BWaring » Tue Aug 17, 2010 9:46 am

Obviously, if I change either or both of those and it doesn't go down, I wouldn't know which it was or if it was 4.2.13
BWaring
ReadyNAS Newbie
 
Posts: 20
Joined: Sun Aug 15, 2010 1:03 pm

Re: iSCSI VMware fails (#12318192)

Postby BWaring » Wed Aug 18, 2010 8:20 am

Any update on this? Is there ANY information in the log files as to the cause of the iSCSI hanging?
BWaring
ReadyNAS Newbie
 
Posts: 20
Joined: Sun Aug 15, 2010 1:03 pm

Re: iSCSI VMware fails (#12318192)

Postby VMColonel » Thu Aug 19, 2010 7:48 am

Curious, you say "vmware servers" but do you mean VMware ESX or VMware Server? If it's ESX, you might want to have a look at the logs on the ESX host (/var/log/vmkernel) to give you a better idea as to why the connection is going down. You may also increase logging on the driver level (usually) to see if that yields more info.

Cheers!
VMColonel
ReadyNAS VMware Expert
 
Posts: 29
Joined: Tue Aug 17, 2010 8:26 am
ReadyNAS: Duo

Re: iSCSI VMware fails (#12318192)

Postby BWaring » Thu Aug 19, 2010 8:12 am

I realize only NG has access to the actualy support ticket with all the info.... there are 2 physical servers running ESXi 4.0 with redundant NICs through redundant switches with multipath configured attaching to 3 LUNs on the 3100 - so that's 6 total paths. Basically, from the ESX side, it's as though the pair of paths to the LUN on the 3100 are simultaneously removed - there is no other info other than loss of connectivity and the 'inactive' path status. With 4.2.11, all 6 paths - 2 to each LUN - went inactive. Last Sat with 4.2.12, only 1 LUN (2 paths) went inactive. When that happens, CIFS access to to the 3100 is still available - I can see the shares, but Frontview is down and iSCSI to the affected LUN(s) is down. Double press of power does not shut down the 3100; a hard reset is required.
BWaring
ReadyNAS Newbie
 
Posts: 20
Joined: Sun Aug 15, 2010 1:03 pm

Re: iSCSI VMware fails (#12318192)

Postby VMColonel » Thu Aug 19, 2010 9:31 am

BWaring wrote:I realize only NG has access to the actualy support ticket with all the info.... there are 2 physical servers running ESXi 4.0 with redundant NICs through redundant switches with multipath configured attaching to 3 LUNs on the 3100 - so that's 6 total paths. Basically, from the ESX side, it's as though the pair of paths to the LUN on the 3100 are simultaneously removed - there is no other info other than loss of connectivity and the 'inactive' path status. With 4.2.11, all 6 paths - 2 to each LUN - went inactive. Last Sat with 4.2.12, only 1 LUN (2 paths) went inactive. When that happens, CIFS access to to the 3100 is still available - I can see the shares, but Frontview is down and iSCSI to the affected LUN(s) is down. Double press of power does not shut down the 3100; a hard reset is required.


Hardware or software iSCSI on the ESX end? Do you have a classic ESX host in the environment? (There should be a /var/log/vmkiscsid.log file which may give us more information)

I can't talk on behalf of NetGear here, but -generally- if it's hung, it's a hardware error (at least this would be the case if we were talking about ESX, there is enough error checking in the code that it will PSOD), can you run hardware diagnostic on the array itself?

When the ports simultaneously go down, do you see an APD (All Paths Down) state in the ESX host?

Have you engaged VMware on this issue at all, although from what your telling me it seems like an issue on the target side, VMware may still be able to provide assistance/guidance in troubleshooting this for you.

I'm not sure about error correction/handling inside the ReadyNAS, but is there a log file / core dump that can be analyzed to see why the device is 'hung'? I only have the ReadyNAS duo, but it becomes unresponsive at times (During initial replication, also had it lock up due to bad network card drivers on the client side), but it's a home unit, so I kind of expect it.

You mentioned that you have multiple paths from the ESX host to the array, try breaking the vmkernel interfaces onto two separate vSwitches if you haven't already (I've seen clients have multiple vmkernel port groups in the same vSwitch)

Sorry, I can't be of more help here, if the array isn't responding to things like FrontView, other than telling you the last thing the ESX host did, you'll need Netgear support to assist you with the RCA, but if you need help with the VMware logs, I might be of assistance.
VMColonel
ReadyNAS VMware Expert
 
Posts: 29
Joined: Tue Aug 17, 2010 8:26 am
ReadyNAS: Duo

Re: iSCSI VMware fails (#12318192)

Postby BWaring » Thu Aug 19, 2010 9:56 am

Software, but it's not specific to VMware, it's iSCSI on the NetGear. When it happens, they have also lose the iSCSI connection through Windows as well. iSCSI is down on the NetGear. Yes APD. This is a known issue as othere people here have similar problems (their posts are up here) with 4.2.12 and before, and the 4.2.12 code had a 'fix' for it. I'm just saying it has now happened at least once again with the 4.2.12 code. I have no problem RMAing the unit - if nothing else, for the bad temp sensor - but it doesn't seem like it's the box to me....
BWaring
ReadyNAS Newbie
 
Posts: 20
Joined: Sun Aug 15, 2010 1:03 pm

Re: iSCSI VMware fails (#12318192)

Postby turls » Mon Aug 30, 2010 2:12 am

This is probably still the same thing from here:

viewtopic.php?f=126&t=42542

This is an ongoing issue, and I don't understand why that thread was closed, because .13 doesn't have anything in release notes regarding iSCSI performance. I just keep coming back here hoping that this is finally fixed, and I'm currently loading the .13 on my one of my two boxes that I use for iSCSI. I was dumb enough to look that VMWare certified these boxes and went ahead and put the ReadyNAS in my production environment and I've been fighting it off and on ever since.
turls
ReadyNAS Expert
 
Posts: 215
Joined: Fri May 27, 2005 11:35 am
Location: Central IL
ReadyNAS: NV+

Next

Return to iSCSI



Who is online

Users browsing this forum: No registered users and 1 guest