VMware ESXi 4.1.0 stuck at “Initializing scheduler …” screen on boot up with Cisco UCS C210 M2 servers

Cisco, Uncategorized, Unified Computing System, VMware
November 6, 2010

Recent Visitor 346

Update March 9, 2011:

It looks like we finally have a firmware fix from Cisco: https://blog.terenceluk.com/firmware-update-fix-for-vmware-esxi-41/

——————————————————————————————————————————————————————-

I have been working in 2 separate virtualization environments with new Cisco UCS C210 M2 servers and noticed that in both environments, I would intermittently see the VMware ESXi 4.1.0 Build 260247 boot up process get stuck at the “Initializing scheduler …” screen as shown in the following photo I took with my phone:

What was difficult was that this happens intermittently and it’s not easy to replicate simply by rebooting the server. Some time was spent searching on the internet to see if anyone else has had this problem and a post about a user with an HP server experiencing a similar problem was found but nothing on UCS. After knowing that I wasn’t going to get too far with this, I decided to post a question on the Cisco Support Community forums and almost immediately received a reply from another user who said they had the same problem. Shortly there after, another user posted a reply that I’ll copy and paste here:

Hi Terence / Clint,

Cisco is aware of this problem and has been working with VMware to address this issue – as this issue has also been seen on other vendor’s servers. Additionally, it only seems to be affecting platforms running ESXi.

On the Cisco side, this issue is being tracked via CSCtj19224 – ESXi stuck at Initializing Scheduler – (CCO Account Required to view details)

The workaround is to disable legacy USB within the BIOS.

Hope that helps.

Thanks,
Michael

Link to the post: https://supportforums.cisco.com/thread/2050592?tstart=0

So as it turns out, this isn’t specific to Cisco UCS which was no surprise since a similar post was found but with an HP server.

In case anyone out there is wondering what the process of disabling this for a Cisco C210 M2 server looks like, the following are screenshots I took for one of the servers:

Select Legacy USB Support.

Disable and press F10 to save:

Update November 7, 2010:

I got another response today with the following when I followed up with asking whether this affects 4.0 as well:

In response to Terence, from what I can tell, it seems to mainly affect ESXi 4.1 – based on the number of cases I could find, however I have seen a couple of cases on ESXi 4.0 as well.

Update November 16, 2010:

I got another response on the forums and it looks like this affects ESXi 4.01 as well:

AI ran into this tonight while doing a 1000v upgrade. UCS B250 M1 blades with ESXi 4.0 (Build 236512). Disabling the Legacy USB resolved the issue.

Hello! My name is Terence Luk and welcome to my blog.

About me

14 Responses

Unknown says:

January 11, 2011 at 3:49 pm

You can include also Cisco UCS C250 M2 in the list of affected servers.
Thanks for this post I was resetting CMOS everytime I rebooted the servers.
Regards
Anonymous says:

January 19, 2011 at 9:15 pm

Add the IBM x3650 M3 to the list as well. Unplugging the USB->PS2 adapter solved the issue.
Peter Cronwright says:

March 9, 2011 at 9:13 am

Looks like this is fixed now in 1.4(1m)

"ESXi boot up no longer intermittently hangs at the initializing scheduler. (CSCtj19224"
Terence Luk says:

March 10, 2011 at 4:12 am

Awesome! Thanks for the heads up Peter.
Paul B says:

April 26, 2011 at 3:27 pm

Just upgraded our Test standalone UCS to 1.4m this afternoon and still hung on Initialising Scheduler after installing ESXi4.1 – disabling in the BIOS seems to have resolved the issue but again difficult to be definite at this point due to the intermittent issue.
Anonymous says:

January 10, 2012 at 4:14 pm

I am also in the same situation:
VMware ESXi 4.1.0 on HP Proliant DL385 G7 Hangs at loading VMKernel
Is say "VMKernel Loaded successfully". But hangs after five bars.
Anonymous says:

October 30, 2012 at 1:59 pm

Saw this on an IBM bladeCenter model 7870 (HS22) where USB and media trays are shared with the bladecenter chassis. Was gonna try disabling legacy USB, but a coworker suggested moving the "M/T" assignment to another blade (effectively moving USB and media to another blade). ESXi immediately continued it's bootup once we did that.
Rob Rech says:

April 15, 2013 at 8:33 pm

Apparently this also affects vSphere 5.1 with the R210 too. This setting resolved the issue for me.
chrisgriner says:

October 27, 2013 at 5:42 am

This issue is also seen with 5.1.0 on IBM System x3650 M3 (7945AC1).

I tried disabling legacy USB, but the only thing that worked was unplugging the ps2-2-usb adapter for my KVM.
Unknown says:

December 23, 2014 at 8:45 pm

We have this same issue on our brand new x3650 M4 7915 server and cannot seem to resolve it with IBM. I tried legacy boot-mode and UEFI modes with no success. Can you help me resolve it?
Unknown says:

December 24, 2014 at 5:26 pm

See latest update on my issue: https://communities.vmware.com/message/2461204#2461204
VM-Ware says:

May 13, 2015 at 5:37 pm

I've installed the licensed VM-Ware ESXi 4.1 and, most of the time, it's working perfectly. Randomly, however, I lose connectivity to the virtual machine having SAP Application installed on it. During this timeout period, the application struck
at client end.

General Server Details:

HP DL380-G5 Proliant
RAID level: 0 + 5

Separate VLAN for management

This, to me, indicates that the issue isn't with networking outside of the ESX host, but rather within the virtual machine or the virtual switch. I've moved the VM to
another ESXi host but the problem persists.

Another curious sign is the ping latency from the Local Traffic Manager out to a VM node (same ESXi host):

PING 172.16.xxx.xxx (172.16.xxx.xxx) 56(84) bytes of data.
64 bytes from 172.16.xxx.xxx: icmp_seq=1 ttl=128 time=7.25 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=2 ttl=128 time=9.26 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=3 ttl=128 time=10.2 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=4 ttl=128 time=10.2 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=5 ttl=128 time=9.12 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=6 ttl=128 time=10.3 ms

— 172.16.xxx.xxx ping statistics —
6 packets transmitted, 6 received, 0% packet loss, time 5035ms

rtt min/avg/max/mdev = 7.252/9.421/10.319/1.091 ms

@AndrewPWR:

1. Nothing logged to any of the /var/log files that would be of any help.

2. Performance graphs don't indicate that I'm hitting any sort of ceiling.

3. Outages last for 1 – 2 minutes, then traffic resumes on its own.

After trying different methodologies, configuration, using different network latency test tool. In Last with the help of Mr. Marc (Sr. Infrastructure Specialist) @ SDN Singapore we have found that the bug is in VMXNET 3 driver, all the reports and statics has been forwarded to VM support center and after 1 week they have resolved this bug via releasing a driver patch, details are mentioned below.
Name: ESXi410-201404001
Ver: 4.1.0 Patch 12
Release 2015-04-20
Build: 1682698
I will try my level best in future to identify these types of bugs, which will help us and other to run there all live applications flawless.
Trying to Upgrade and Migrate on Latest Versions as well.
VM-Ware says:

May 13, 2015 at 5:37 pm

I've installed the licensed VM-Ware ESXi 4.1 and, most of the time, it's working perfectly. Randomly, however, I lose connectivity to the virtual machine having SAP Application installed on it. During this timeout period, the application struck
at client end.

General Server Details:

HP DL380-G5 Proliant
RAID level: 0 + 5

Separate VLAN for management

This, to me, indicates that the issue isn't with networking outside of the ESX host, but rather within the virtual machine or the virtual switch. I've moved the VM to
another ESXi host but the problem persists.

Another curious sign is the ping latency from the Local Traffic Manager out to a VM node (same ESXi host):

PING 172.16.xxx.xxx (172.16.xxx.xxx) 56(84) bytes of data.
64 bytes from 172.16.xxx.xxx: icmp_seq=1 ttl=128 time=7.25 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=2 ttl=128 time=9.26 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=3 ttl=128 time=10.2 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=4 ttl=128 time=10.2 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=5 ttl=128 time=9.12 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=6 ttl=128 time=10.3 ms

— 172.16.xxx.xxx ping statistics —
6 packets transmitted, 6 received, 0% packet loss, time 5035ms

rtt min/avg/max/mdev = 7.252/9.421/10.319/1.091 ms

@AndrewPWR:

1. Nothing logged to any of the /var/log files that would be of any help.

2. Performance graphs don't indicate that I'm hitting any sort of ceiling.

3. Outages last for 1 – 2 minutes, then traffic resumes on its own.

After trying different methodologies, configuration, using different network latency test tool. In Last with the help of Mr. Marc (Sr. Infrastructure Specialist) @ SDN Singapore we have found that the bug is in VMXNET 3 driver, all the reports and statics has been forwarded to VM support center and after 1 week they have resolved this bug via releasing a driver patch, details are mentioned below.
Name: ESXi410-201404001
Ver: 4.1.0 Patch 12
Release 2015-04-20
Build: 1682698
I will try my level best in future to identify these types of bugs, which will help us and other to run there all live applications flawless.
Trying to Upgrade and Migrate on Latest Versions as well.
VM-Ware says:

May 13, 2015 at 5:37 pm

I've installed the licensed VM-Ware ESXi 4.1 and, most of the time, it's working perfectly. Randomly, however, I lose connectivity to the virtual machine having SAP Application installed on it. During this timeout period, the application struck
at client end.

General Server Details:

HP DL380-G5 Proliant
RAID level: 0 + 5

Separate VLAN for management

This, to me, indicates that the issue isn't with networking outside of the ESX host, but rather within the virtual machine or the virtual switch. I've moved the VM to
another ESXi host but the problem persists.

Another curious sign is the ping latency from the Local Traffic Manager out to a VM node (same ESXi host):

PING 172.16.xxx.xxx (172.16.xxx.xxx) 56(84) bytes of data.
64 bytes from 172.16.xxx.xxx: icmp_seq=1 ttl=128 time=7.25 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=2 ttl=128 time=9.26 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=3 ttl=128 time=10.2 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=4 ttl=128 time=10.2 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=5 ttl=128 time=9.12 ms
64 bytes from 172.16.xxx.xxx: icmp_seq=6 ttl=128 time=10.3 ms

— 172.16.xxx.xxx ping statistics —
6 packets transmitted, 6 received, 0% packet loss, time 5035ms

rtt min/avg/max/mdev = 7.252/9.421/10.319/1.091 ms

@AndrewPWR:

1. Nothing logged to any of the /var/log files that would be of any help.

2. Performance graphs don't indicate that I'm hitting any sort of ceiling.

3. Outages last for 1 – 2 minutes, then traffic resumes on its own.

After trying different methodologies, configuration, using different network latency test tool. In Last with the help of Mr. Marc (Sr. Infrastructure Specialist) @ SDN Singapore we have found that the bug is in VMXNET 3 driver, all the reports and statics has been forwarded to VM support center and after 1 week they have resolved this bug via releasing a driver patch, details are mentioned below.
Name: ESXi410-201404001
Ver: 4.1.0 Patch 12
Release 2015-04-20
Build: 1682698
I will try my level best in future to identify these types of bugs, which will help us and other to run there all live applications flawless.
Trying to Upgrade and Migrate on Latest Versions as well.

VMware ESXi 4.1.0 stuck at “Initializing scheduler …” screen on boot up with Cisco UCS C210 M2 servers

Hello! My name is Terence Luk and welcome to my blog.

Follow me:

Categories

Related Posts

Subagent Relay vs. Agent Teams in Claude Code: Building the Same Login Portal Twice With the Same Five Agents

Building an AI News Digest with Claude Code: CLAUDE.md, Skills, Tools, and Workflows in One Project

Building an Azure Function App for Network Connectivity Testing Across VNets and Hybrid Environments

Vibe Coding a Local Browser Agent with GitHub Copilot

14 Responses

Subscribe to the mailing list to receive posts updates!

Categories

Recent Posts

Subagent Relay vs. Agent Teams in Claude Code: Building the Same Login Portal Twice With the Same Five Agents

Building an AI News Digest with Claude Code: CLAUDE.md, Skills, Tools, and Workflows in One Project