With the majority of web traffic now served over HTTPS, it is important to decrypt traffic to give visibility to network security monitoring (NSM) tools. The Palo Alto Networks next-generation firewall can decrypt inbound traffic quite effectively.
However, there is one gotcha when enabling this feature on production systems with live traffic. Beware of SSL session caching!
Identifying the SSL decryption transition issue
When I first tested SSL inbound inspection in my Palo Alto firewall, it was in a lab environment and it worked great! The URLs were showing up in the logs, I did not get any SSL errors (decrypt-error,
decrypt-unsupport-param, or decrypt-cert-validation) and it all seemed to work fine. I submitted a change request and was on my marry way.
Then I enabled the feature on a system with a fair amount of active traffic. The results were startling. There was a huge spike in “decrypt-error” logs I couldn’t explain. Enough users were complaining that I ended up reverting the change, puzzled at why what worked flawlessly in the lab didn’t work in production.
“The session terminated because you configured the firewall to block SSL forward proxy decryption or SSL inbound inspection when firewall resources or the hardware security module (HSM) were unavailable. This session end reason is also displayed when you configured the firewall to block SSL traffic that has SSH errors or that produced any fatal error alert other than those listed for the decrypt-cert-validation and decrypt-unsupport-param end reasons. “Palo Alto Networks Pan-OS® Administrator’s Guide 8.0 & 8.1
The second clue was the error that appeared in browser windows of some clients who had an active connection to the server at the time. For Google Chrome it was “ERR_SSL_VERSION_INTERFERENCE” and “ERR_SSL_PROTOCOL_ERROR” for Samung’s browser on Android. Firefox and Microsoft Edge gave similar messages.
This led me to believe that clients with cached SSL sessions were attempting to resume their SSL sessions. When that happened the firewall treated the connections as if they were a new connection but would produce a fatal error when it didn’t receive the expected payloads for a new session.
Resolution to resumed SSL sessions
To resolve this, I tried changing the decryption profile settings such as disabling “Unsupported Mode Checks,” but to no avail. On affected clients I tried clearing the SSL cache and even restarting the machines but that did not correct the issue.
Finally, I reduced the SSL session cache timeout setting on the server itself to 60 seconds. When that happened, the issue disappeared!
I wouldn’t recommend shutting SSL session caching entirely as there could be a huge performance impact to the server, but a 60 second timeout leading up to and immediately after enabling the policy should be adequate. If possible, keep it 60 seconds for as long as what the value was previously. I also found decreasing the timeout even after enabling SSL inbound inspection immediately worked.
Some forum posts suggest restarting the server(s) will also clear the server’s SSL session cache and force a new negotiation, although I did not test this. This isn’t always feasible if sessions are shared across multiple backend servers, there is a load balancer at play, or engineers are turning on decryption for a large number of servers.
When I contacted Palo Alto about this issue, they told me, “(T)here is feature request (FR ID: 5786) in addition to Jira PAN-80072,” that they did not have a work-around, and that “the only thing to be done now is to wait till further notice.” I am disappointed they do not publish the known issues surrounding decryption and did not have a work-around readily available as this would have saved me hours of troubleshooting, research, and some downtime.
If you found this write-up useful, I ask you let people on Twitter and LinkedIn know. If you liked this post, check out Is Network Security Monitoring Dead in the Age of Encryption?