We had a need to create a site-to-site VPN tunnel for a POC from Azure Stack to Azure. It seemed pretty straight forward. Spoiler alert, obviously I'm writing this because it wasn't. The tunnel was created okay, but each morning it would no longer allow traffic to travel across it. The tunnel would show connected in Azure and in Azure Stack but traffic just wouldn't flow; ping, SSH, RDP, DNS and AD all wouldn't work. After some tinkering we found we would have to change the connection's sharedkey value to something random, save it, then change it back to the correct key. This only worked from the Azure Stack side of the connection, to re-initiate successfully and allow traffic to flow again (or recreate the connection from scratch). It would work for another 8 hours and then fail to pass traffic again.
My suspicion was the re-keying, as this would explain why it worked at first and would fail the next day (everyday, for the last 5 days). I tried using VPN diagnostics on the Azure side, as they don't currently support VPN diagnostics on Azure Stack (we are on update 1805). After reviewing the IKE log there were some errors, but it was hard to find something to tell me what was going wrong, more specifically something I could do to fix it. Below is the IKE log file I collected through the VPN diagnostics from Azure.