Et oui, nous avons encore beaucoup de client utilisant Skype Entreprise comme système de téléphonie à supporter, certes, les nouveaux projets de ce type se font plus rares.
Contexte
Nous avons un problème récurrent dans une organisation, les services stoppent sans raison évidente aléatoirement.
Les messages d'erreurs dans l'event viewer est le suivant à chaque occurence du problème:
--------------------------------------------
Log Name: Lync Server
Source: LS Protocol Stack
Date: 7/27/2018 10:47:38 AM
Event ID: 14397
Task Category: (1001)
Level: Warning
Keywords: Classic
User: N/A
Computer: FE1.contoso.com
Description:
A configured certificate could not be loaded from store. The serial number is attached for reference.
Extended Error Code: 0x80092004.
--------------------------------------------
Log Name: Lync Server
Source: LS Protocol Stack
Date: 7/27/2018 10:47:38 AM
Event ID: 14623
Task Category: (1001)
Level: Error
Keywords: Classic
User: N/A
Computer: FE1.contoso.com
Description:
A serious problem related to certificates is preventing Skype for Business Server from functioning.
Unable to use a certificate as configured.
Transport:TLS, IP address:0.0.0.0, Port:5061, Error:0xC3E93C0D(SIP_E_STACK_TRANSPORT_CERT_NOT_FOUND).
Ensure that a valid certificate is present in the local computer certificate store. Also ensure that the server has sufficient privileges to access the store. The Skype for Business Server failed to initialize with the configured certificate.
--------------------------------------------
Étant un environnement avec 4000 utilisateurs, c'est un problème majeur et impactant le business, toutes les vérifications concernant les certificats ne démontre pas un enjeu avec les certificats existant assignés au serveur, ou la présence d'un certificat intermédiaire ou racine dans un mauvais conteneur.
Cause
Finalement, la cause fut la suivante:
When a server running Lync/SFB service is joined to Azure AD the cert store updates every 5 hours for a new cert, this causes a condition in SfB to fail to find there cert after re-sync.
Azure AD joined devices get a new certificate every 5 hours, this causes a cert store change notification to fire, which causes skype to resync the cert store and validate their cert still exists in the cert store. Occasionally, when this process occurs, Skype can no longer find its cert in the cert store handle it has and this is a fatal failure which causes skype FE to shut down. We just haven’t been able to determine why that last piece would fail, as the cert is indeed in the physical store and we can see the re-sync read it from the physical store. Understanding that last bit is not a trivial task, especially if it is as we expect some sort of race condition scenario is hit.
Vous pouvez confirmer si la machine fut ajouté à Azure AD avec la commande suivante sur un des serveurs FE:
C:\Windows\system32>dsregcmd /status
Device State |
+----------------------------------------------------------------------+
AzureAdJoined : YES
Réponse du support MS:
If the output shows, AzureADJoined as yes, then it is confirmed that server has been added to Azure AD. By default there is feature inbuilt in Windows server 2016 to add servers to AAD based on its pre-requisites being satisfied.
And this causes the SFB to hit some condition to do re-sync of certificate configured it's fails randomly, causing FE service to go down.
Résolution
Assez évidente, la prochaine étape est de retirer les serveurs Frontend de Azure AD