Last year’s ending was quite exciting at work, because in the mid of December one of government’s organizations (I do not name it for purpose) changed an SSL certification – we did the same with the pair of it, of course -, but the connection was still not working. No other modification was communicated, so we tried to figure out what happened at our side.
The penalty of late data provision would have been ~1.700 USD / customer, and 10.000+ of them were affected, so you can imagine how hard we tried to solve the situation.
I want you to be in context: there was serious chance getting a penalty for this incident, because of a deadline which we must provide information before. Because of the communication error we couldn’t send these files and the penalty could be 8-9 digits in HUF. (No other official workaround exists.)
Firstly, we imported ssl certs to local Windows keystore and tried to call remote webservice with a tunnel, which was working, so we realised, that ssl certs are fine and besides them there must be other changes in the system. We could not communicate with the government organization, not getting email answers and did not have direct contact either unfortunately. (Later, when things got really hot, we tried to call their IVR system and pushed buttons randomly, but in the end, nobody was competent in this problem enough.)
We thought about keystore problem, because we do not use the official and recommended JBoss keystore for this cert, but an own one for many reasons I do not explain here. We made some test, nothing suspicious, so moved on.
Photo by Silas Köhler on Unsplash
In JBoss server.log UntrustedURLConnectionIOException appeared (I won’t copy the whole stacktrace). After googling a bit we have found this Apache Jira ticket, which explained us what the problem may be. After checking remote webserver it turned out it is an IIS. We use a JBoss Application Server 6.1 with CXF right now, so the environmental details and JBoss stacktrace were the same as it is described here. I copy the explanation from Jira:
Problem is that client certificate verification is done by IIS in later time, not at the beginning of the SSL communication. I’ll try do describe the communication:
1. SSL Handshake is done between client and IIS server, but first WITHOUT client certificate verification.
2. Client sends the HTTP request (at least the header, where URL is specified).
3. IIS server recognizes the requested URL and re-initiates the SSL handshake (with SSL message Change_cipher_spec). It works in this way because in IIS you can specify different SSL behaviour for different URLs. So for example, for one endpoint you can configure SSL without client certificate and for second endpoint WITH client certificate.
4. During the second SSL handshake, the client certificate is requested by IIS server and verified.
5. Then the rest of request is processed by IIS server.
We took a deep breath, okay, guys, this is a known error, let’s solve it. What is the workaround? None. Maybe a CXF update, maybe not, but Christmas was close, so there were not any chance for doing this with tons of INT, UAT testing, etc. At this point we had external developers on site as well to help and build literally dozens of variations of SSL parametrization in the business application. Remember once more, we were not told that besides ssl cert change, there were other modifications! This is why we focused on our side and tried to do everything we could.
Also checked the RFC of:
7.4.4. Certificate Request
When this message will be sent:
A non-anonymous server can optionally request a certificate from the client, if appropriate for the selected cipher suite. This message, if sent, will immediately follow the ServerKeyExchange message (if it is sent; otherwise, this message follows the server’s Certificate message).
It was strange, nothing helped. This was the point where we asked our official JBoss support about core CXF implementation. After giving all information we had they agreed we checked everything we could so they rebuilt cxf-rt-ws-security*.jar with security modification you see explained in Jira ticket linked above, and after second try it just worked. Penalty is off the table. Huh.
Many days later governmental organization replied, and after some e-ping-pong they admitted they changed IIS SSL settings. After reverting that, the official JBoss CXF implementation worked as well.
End of story. Unfortunately, I am not allowed to publish our temporary code manipulation and solution of CXF security classes here, but I am pretty sure, Jakub will create an official workaround here.