ESXi no managment connetion but VM still runnning

In our environment 1 Host stopped responding. You cannot reach it over vCenter, Host Client, SSH, DCUI. You cannot login to the ESXi but all VMs are still running. Bad news: you have to restart your host hard but you can shutdown the VM over the Guest system so no dirty shutdown ;).

First VMware Ticket => this happens sometimes restart host …

A few days later second host the same symptoms. After reopening the Ticket they found an internal knowledge-base article. The reason was the “Active Directory Service” of the ESXi. ESXi uses “Likewise” to authenticate again the Active Direcorty. In our case the Likewise cache ran out of memory and the management of the ESXi became unavailable.

So the resultion from VMware was extend the Likewise Cache…

Edit:
Happend again :/ now we actived likewise logging and have to wait for the next crash

Edit:

After a few VMware-Tickets now there is an offical knowledge base artikel with a workaround but at the moment no resolution:

https://kb.vmware.com/s/article/78968

ESXi 6.7U3 qfle3 PSOD

You use Qlogic network card and the qfle driver maybe your ESXi-Host will run into a PSOD. In may case it was the qfle3f driver and the hosts ran serveral times into a PSOD. The version of the driver does not matter in this case. If you the FCoE adapters in hosts then the hosts will always send some communication over thes adapters. In some cases there happens a PSOD because nobody is answering.

If you install the driver you always install a driver package which includes 4 drivers.

-qfle3 => Network driver
-qfle3f => Fibre-Channel over Ethernet
-qfle3i => iSCSI
-qcnic => other network driver (don’t know the exact usage)

After a few cases with VMware I get the tip: “When you don’t use iSCSI/FCoE why don’t you remove it?”

If you remove the drivers and your storage is connected over iSCSI,FCoE you will lose storage connection! Always put your host into maintance mode before changes!
So if you don’t use the protocols/modules here how to remove them:

FCoE:
# esxcli software vib remove –vibname=qfle3f

iSCSI:
# esxcli software vib remove –vibname=qfle3i

Network drivers:

First check which drivers you are using because if you remove the you are using your ESXi-Host is disconnected from network after the reboot

Check network adapters and drivers:
# esxcli network nic list

# esxcli software vib remove –vibname=qcnic

# esxcli software vib remove –vibname=qfle3

After you have removed the modules reboot your hosts and you are done 🙂

multipath.conf for DataCore and EMC

In 2017 I had a customer who uses DataCore as Storage-System (still working great! ;)) and we needed as well to connect not only VMware ESXi Servers to this great Storage-System, no in this Case as well two TSM ISP-Servers with Shared Storage running on SLES 11 SP3/4 (not sure) and with this Post I want to share with you the working multipath.conf for DataCore. Please find here the multipath.conf (in the attachment rename from .txt to .conf)

defaults {
                polling_interval 60
}
blacklist {
        devnode "*"
}
blacklist_exceptions {
        device {
                vendor          "DataCore"
                product         "Virtual Disk"
                }
        device {
                vendor          "DGC"
                product         "VRAID"
                }
        devnode         "^sd[b-z]"
        devnode         "^sd[a-z][a-z]"
}

devices {
        device {
                vendor "DataCore"
                product "Virtual Disk"
                path_checker tur
                prio alua
                failback 10
                no_path_retry fail
                dev_loss_tmo infinity
                fast_io_fail_tmo 5
                rr_min_io_rq 100ac
                # Alternative option – See notes below
                # rr_min_io 100
                path_grouping_policy group_by_prio
                # Alternative policy - See notes below
                # path_grouping_policy failover
                # optional - See notes below
                # user_friendly_names yes
                }
        device {
                vendor "DGC"
                product "VRAID"
                path_checker tur
                prio alua
                failback 10
                no_path_retry fail
                dev_loss_tmo infinity
                fast_io_fail_tmo 5
                rr_min_io 1000
		        path_grouping_policy group_by_prio
		}
}

multipaths {
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-ActLog
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-ActLog-LibManager
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-ArchLog
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-ArchLog-LibManager
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-ClusterDB
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-ClusterQuorum
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-DB2
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-DB2-LibManager
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-InstHome
        }
        multipath {
                wwid                    		XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                alias                   		XXXX-ISP-InstHome-LibManager
        }
        multipath {
                wwid                    XXXXXXXcXXXXXXX9473de447183e711              
                alias					XXX_L00              
        }				
        multipath {				
                wwid                    XXXXXXXcXXXXXXX9673de447183e711                
                alias					XXX_L01              
        }						
        multipath {				
                wwid                    XXXXXXXcXXXXXXX9873de447183e711                
                alias					XXX_L02              
        }				
        multipath {				
                wwid                    XXXXXXXcXXXXXXX9a73de447183e711                
                alias					XXX_L03               
        }				
        multipath {				
                wwid                    XXXXXXXcXXXXXXX9c73de447183e711                
                alias   				XXX_L04              
        }						
        multipath {				
                wwid    				XXXXXXXcXXXXXXX9e73de447183e711                
                alias   				XXX_L05              
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXXa073de447183e711                
                alias   				XXX_L06              
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXXa273de447183e711                
                alias   				XXX_L07              
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXXa473de447183e711                
                alias   				XXX_L08              
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXXa673de447183e711                
                alias   				XXX_L09              
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXX5ea6eb5c64a4e711                
                alias   				XXX_L10               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX60a6eb5c64a4e711                
                alias   				XXX_L11               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX62a6eb5c64a4e711                
                alias   				XXX_L12               
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXX64a6eb5c64a4e711                
                alias   				XXX_L13               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX66a6eb5c64a4e711                
                alias   				XXX_L14               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX68a6eb5c64a4e711                
                alias   				XXX_L15               
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXX6aa6eb5c64a4e711                
                alias   				XXX_L16               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX6ca6eb5c64a4e711                
                alias   				XXX_L17               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX6ea6eb5c64a4e711                
                alias   				XXX_L18               
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXX70a6eb5c64a4e711                
                alias   				XXX_L19               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXXcdc727764a4e711                
                alias   				XXX_L20               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXXedc727764a4e711                
                alias   				XXX_L21               
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXX50dc727764a4e711                
                alias   				XXX_L22               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX52dc727764a4e711                
                alias   				XXX_L23               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX54dc727764a4e711                
                alias   				XXX_L24                
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXX56dc727764a4e711                
                alias   				XXX_L25                
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX58dc727764a4e711                
                alias   				XXX_L26                
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXX5adc727764a4e711                
                alias   				XXX_L27                
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXX5cdc727764a4e711                
                alias   				XXX_L28               
        }								
        multipath {				
                wwid    				XXXXXXXcXXXXXXX5edc727764a4e711                
                alias   				XXX_L29              
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXXe7fc4a9d64a4e711                
                alias   				XXX_L30               
        }						
        multipath {						
                wwid    				XXXXXXXcXXXXXXXe9fc4a9d64a4e711                
                alias   				XXX_L31               
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXXebfc4a9d64a4e711                
                alias   				XXX_L32                
        }								
        multipath {						
                wwid    				XXXXXXXcXXXXXXXedfc4a9d64a4e711                
                alias   				XXX_L33                
        }
}

HPE SSD BUG – RPM Installation

I have a customer who is as a lot of them affected by the HPE SAS Solid State Drives Firmware bug, where the Disks will die after 32,768 power-on-hours. More you will find here about the bug. Within this short post I want to show you how to install under SLES 11.

At first you need to find which Disk you have, on the mentioned Website there are two different models (HPE SAS SSD models launched in Late 2015 and HPE SAS SSD models launched in Mid 2017), in this case we had a in Late 2015 DISKs.

You need to check with the CLI, OneView, ILO or with the SSA if you have Disks listed on the List Bulletin: (Revision) HPE SAS Solid State Drives – Critical Firmware Upgrade Required for Certain HPE SAS Solid State Drive Models to Prevent Drive Failure at 32,768 Hours of Operation

In my case I had the following disks in the Server:

Model VO0960JFDGU
Media Type SSD
Capacity 960 GB

So I downloaded the Online Flash Component for Linux – HPD8 and uploaded it to the SLES 11 Server, after that I installed the rpm with

rpm -ivh firmware-hdd-8ed8893abd-HPD8-1.1.x86_64.rpm

after the installation of the rpm you need to go to the folder /usr/lib/x86_64-linux-gnu/scexe-compat

cd /usr/lib/x86_64-linux-gnu/scexe-compat

with starting the installation

./CP042220.scexe

The installation of the Patch ist starting

so and thats it, we are done.

Here you see the Oneview before the Update:

and here after the Update:

Enjoy, Problem solved 😉

Administrator@vsphere usage short hint

I’ve some customers who are using the local Administrator@vsphere for different things like Backup-User, or Reporting things. From my point of view I can’t recommended this and say please create Service-Users for all different topics. Like one user for Backup – in my example something like: _svc_veeam_bck or for Horizon _svc_vdi. Give the local Administrator a good and secure Password, write it down, put it in Keepass or something else and use it only when it’s really needed! This is your last resort to login to your vCenter.

What do you think about using the Administrator@vsphere User?

Raum ist in der kleinsten HĂŒtte…

Das zumindest denkt sich Veeam Backup & Replication, wenn es eng wird im Repository – zumindest zu eng fĂŒr ein weiteres Vollbackup.

Bei der klassischen “Forward Incremental” Methode, bestehend aus einer (z.B. wöchentlichen) Vollsicherung und mehreren (tĂ€glichen) inkrementellen Sicherungen kann man in einem zu klein dimensionierten Backup Repository schon mal schnell an die Grenzen stossen.

Ich sehe es leider viel zu oft bei meinen Kunden, dass der Speicherplatz viel zu knapp bemessen wurde, so dass stellenweise weniger als drei Vollsicherungen gleichzeitig Platz finden. Daran, dass Veeam ja auch noch etwas rangieren muss, gerade wenn Synthetic Fulls erstellt werden, denken leider die wenigsten Entscheider.

Die Inkrements sind dabei natĂŒrlich alle abhĂ€ngig von sĂ€mtlichen vorherigen Backups im aktuellen Zyklus, so dass der Verlust einer Sicherungsdatei alle darauf folgenden vollkommen unbrauchbar macht.

Aber zurĂŒck zum Titel dieses Posts: Steht am Wochenende mal wieder eine Vollsicherung an, obwohl nicht ausreichend Speicherplatz im Repository vorhanden ist, wird Veeam stattdessen eine weitere inkrementelle Sicherung starten, getreu dem Motto “Besser als gar keine Sicherung!” und “Einer geht noch!”.

So weit, so gut, aber dadurch wird das Problem weiter nach hinten geschoben und frĂŒher oder spĂ€ter ist das letzte Kilobyte geschrieben und der Backup Job lĂ€uft vor den berĂŒhmten Hammer.

Genau das ist mir heute zum wiederholten Male begegnet. Der Backup Job war fĂŒr 30 Restore Points konfiguriert, vorhanden waren 160 – alle in ein- und demselben Backupzyklus, alle abhĂ€ngig von ihren VorgĂ€ngern. Eine unangenehme Situation, wenn man dem Kunden erklĂ€ren muss, dass er sich von seiner kompletten Datensicherung trennen muss, um eine aktuelle zu erstellen. GlĂŒcklich, wer dann die 3-2-1 Regel befolgt hat und ĂŒber weitere Kopien verfĂŒgt.

Wo aber ist der Unterschied zur “Forever Forward Incremental” Methode, in der ja auch niemals eine aktuelle Vollsicherung erstellt und der ursprĂŒngliche Zyklus bis in alle Ewigkeit weitergefĂŒhrt wird?

Die Antwort liegt in der Konfiguration der Restore Point Anzahl versteckt. WĂ€hrend bei der klassischen Methode ganz optimistisch davon ausgegangen wird, dass schon irgendwann wieder Platz fĂŒr ein weiteres Full sein wird, und Ă€ltere Sicherungsdateien aufgrund ihrer Retention automatisch gelöscht werden können, ist bei “Forever” ja von vorneherein klar, dass die Kette unendlich weitergeht.
Um also das Repository nicht platzen zu lassen und gleichzeit die geforderte Anzahl an Restore Points vorzuhalten, wendet Veeam nach der eigentlichen Sicherung die Methode der “Transformation” an. Dabei werden die Datenblöcke des Ă€ltesten inkrementellen Restore Points in das am Anfang der Kette stehenden “Full” injiziert und die danach nicht mehr benötigte inkrementelle Sicherung aus dem Repository gelöscht.
Dadurch hat man immer exakt die eingestellte Anzahl an Wiederherstellungspunkten, bei 30 StĂŒck also eine Vollsicherung und 29 Inkremente.

Die gute Nachricht ist: Ich kann jederzeit aus meinem Forward Incremental Backup ein Forever Forward Incremental machen, indem ich in der Job Konfiguration (unter Storage -> Advanced) einfach die Haken fĂŒr Synthetic Full und Active Full entferne und den Job ein weiteres Mal laufen lasse (siehe Screenshot).

Gesetzt den Fall, dass mein Repository noch ein bisschen Platz zum Rangieren hat, erzeugt das erst eine weitere inkrementelle Sicherungsdatei und transformiert anschliessend die Ă€ltesten Restore Points in eine “neue” Vollsicherung.

In den letzten 9 Stunden sind mittels dieser Vorgehensweise aus den oben genannten 160 Restore Points mittlerweilen 116 geworden – die 30 sind dann wohl irgendwann am Wochenende auch erreicht…

Syslog HASHMAP

I’ve a customer who have several DataCenters in the vCenter and each DataCenter needs different Syslog-Server. With this script you should be able to set for each Datacenter different Syslog-Server.

#Map DC to log server:
 
$servermap = @{
 
    "DC1" = "tcp://syslog01.v-crew.int:514,tcp://syslog02.v-crew.int:514";
    "DC2" = "tcp://syslog03.v-crew.int:514,tcp://syslog04.v-crew.int:514";
    "DC3" = "tcp://syslog05.v-crew.int:514,tcp://syslog06.v-crew.int:514";
 
};
 
  
 
foreach ($vmhost in (Get-VMHost)) {
 
    $DC = (Get-Datacenter -VMHost $vmhost)
 
    echo $vmhost.Name 
 
    echo $servermap.($DC.Name)
 

    $syslog = $servermap.($DC.Name)
 
    Get-AdvancedSetting -Entity $vmhost -Name "SysLog.Global.loghost" | Set-AdvancedSetting -Value $syslog -Confirm:$false
 
    Write-Host "Restarting syslog daemon." -ForegroundColor Green
 
    $esxcli = Get-EsxCli -VMHost $vmhost -V2
 
    $esxcli.system.syslog.reload.Invoke()
 
    Write-Host "Setting firewall to allow Syslog out of $($vmhost)" -ForegroundColor Green
 
    Get-VMHostFirewallException -VMHost $vmhost | where {$_.name -eq 'syslog'} | Set-VMHostFirewallException -Enabled:$true
 
}

https://github.com/Vaiper/syslog-servermap/blob/master/syslog-servermap.ps1

We hope it helps some of you.