Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ConscryptEngineSocket.close(). #844

Merged
merged 4 commits into from
Jun 15, 2020

Conversation

prbprbprb
Copy link
Collaborator

[Apologies for the duplicate PR, got the original into a state where I can't reopen it]

Previously, this method closed the underlying socket first and then
the SSLEngine. However closing a connected SSLEngine queues a TLS
close notification which obviously can't be sent if the socket is
closed. Also, the pending bytes prevent the engine from freeing
its native resources including pipe file descriptors until the
SSLEngine is eventually garbage collected.

Fixing this exposed that the fix for #781 was incomplete and relied
on the native SSL data not being cleared on close and so that
is also fixed herein.

This may also help with #835 but didn't help me to reproduce that.

@prbprbprb prbprbprb requested review from kruton and daulet April 29, 2020 21:30
Copy link
Contributor

@kruton kruton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this break any tests? Seems kind of hairy and it would be nice to have tests.


// Release any resources we're holding
if (in != null) {
in.release();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the allocated buffer need to be freed in the exceptional case described on 465?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed it does - good catch, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to nest another try/finally around super.close because it declares throws IOException.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Thanks again, perils of late night coding.

@prbprbprb
Copy link
Collaborator Author

No, it didn't break any tests, although fortunately our existing tests helped me to avoid some errors during development and caught the issue that we were doing the wrong thing with getApplicationProtocol().

I'm relatively confident the fix is correct, but it is a behaviour change and that might cause issues - right now we're silently failing to send TLS close notifications so this will change the wire behaviour of apps subtly. #839 is also relevant here as we also don't really track incoming close notifications either. I think I also really need to fix #839 to be able to test this properly as I think a useful test would need to close TLS connections and verify the notification propagates to the peer and back again.

I'll hold off on merging though and see if I can come up with a test that expresses the correct behaviour. It'll certainly be interesting to compare it against the file-descriptor implementation.

@prbprbprb prbprbprb force-pushed the engine_socket_close branch from d0a52f7 to 0ffa358 Compare April 29, 2020 23:14
// If we don't, then closeOutbound won't be able to free resources because there are
// bytes queued for transmission so drain the queue those and call closeOutbound a
// second time.
if (previousState >= STATE_HANDSHAKE_STARTED && out != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me that out != null check belongs to drainOutgoingQueue()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too, but it can be null when if the handshake throws before it starts, e.g. due to invalid TLS params, which fortunately we have a test for. It should never be null after the handshake is started though so I removed the check here to prevent it masking bugs.

@@ -447,14 +449,22 @@ public final void close() throws IOException {
}

try {
// Close the underlying socket.
super.close();
} finally {
// Close the engine.
engine.closeInbound();
engine.closeOutbound();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are potentially calling closeOutbound() twice, can we draining before this line?

Copy link
Collaborator Author

@prbprbprb prbprbprb May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine it's idempotent. First call closes the SSL session which queues a close alert. drainOutputQueue unqueues that from the SSLEngine and queues it for transmission on the socket. The second closeOutbound call notes that the socket is now fully closed with no remaining data to be sent and releases its resources.

However I'm still not happy about this, as we could potentially block for a long time in drainOutputQueue. We could potentially do that on a background thread or executor as suggested in the comment above, but if the app is experiencing network problems we might be queueing an unlimitted number of threads or tasks which is obviously bad.

Note that the structure in this PR isn't actually worse than the fd-based socket. That calls BoringSSL's SSL_close() which also tries to send the close notify and so can potentially block.

// After handshake has started, provide active session otherwise a null session,
// for code which needs to read session attributes without triggering the handshake.
private ConscryptSession provideAfterHandshakeSession() {
return (state < STATE_HANDSHAKE_STARTED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test to assert that states are ordered according to handshake lifecycle? Otherwise we could introduce a new state that is > STATE_HANDSHAKE_STARTED but really is reached before STATE_HANDSHAKE_STARTED step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I nearly added an extra state to handle this case but didn't to avoid exactly that issue. Test added.

Previoly, this method closed the underlying socket first and then
the SSLEngine.  However closing a connected SSLEngine queues a TLS
close notification which obviously can't be sent if the socket is
closed.  Also, the pending bytes prevent the engine from freeing
its native resources including pipe file descriptors until the
SSLEngine is eventually garbage collected.

Fixing this exposed that the fix for google#781 was incomplete and relied
on the native SSL data *not* being cleared on close and so that
is also fixed herein.

This may also help with google#835 but didn't help me to reproduce that.
@prbprbprb prbprbprb force-pushed the engine_socket_close branch from 0ffa358 to 8ecb289 Compare May 1, 2020 11:56
prbprbprb added a commit to prbprbprb/conscrypt that referenced this pull request May 1, 2020
Fixes google#839.

This should be _mostly_ uncontroversial as it is already documented
to do so[1] but could cause app compat issues.  A quick scan of
AOSP suggests no major issues however there is a CTS test for the
old behaviour[2] which will need changing.

The bulk of this change is regression tests for the correct behaviour
for the various possible orderings of close calls and TLS close
alerts. The behaviour change test is
closingInboundBeforeClosingOutboundShouldFail() in place of
closingInboundShouldOnlyCloseInbound().  Changes outside
ConscryptEngineTest are minimal.

Close behaviour before handshaking starts is undefined and we differ
from the RI, but I don't think that's problematic.

Obviously also needs documenting in Conscrypt and Android release
notes.

This also means that STATE_CLOSED_INBOUND is never reached, which
means it can be eliminated in a future CL allowing some minor
simplifications.

NB This can be merged independently of google#844 and I'll rebase that
change on top of it.

[1] https://developer.android.com/reference/javax/net/ssl/SSLEngine#closeInbound()
[2] https://cs.android.com/android/platform/superproject/+/master:libcore/harmony-tests/src/test/java/org/apache/harmony/tests/javax/net/ssl/SSLEngineTest.java;l=611
prbprbprb added a commit to prbprbprb/conscrypt that referenced this pull request May 1, 2020
Fixes google#839.

This should be _mostly_ uncontroversial as it is already documented
to do so[1] but could cause app compat issues.  A quick scan of
AOSP suggests no major issues however there is a CTS test for the
old behaviour[2] which will need changing.

The bulk of this change is regression tests for the correct behaviour
for the various possible orderings of close calls and TLS close
alerts. The behaviour change test is
closingInboundBeforeClosingOutboundShouldFail() in place of
closingInboundShouldOnlyCloseInbound().  Changes outside
ConscryptEngineTest are minimal.

Close behaviour before handshaking starts is undefined and we differ
from the RI, but I don't think that's problematic.

Obviously also needs documenting in Conscrypt and Android release
notes.

This also means that STATE_CLOSED_INBOUND is never reached, which
means it can be eliminated in a future CL allowing some minor
simplifications.

NB This can be merged independently of google#844 and I'll rebase that
change on top of it.

[1] https://developer.android.com/reference/javax/net/ssl/SSLEngine#closeInbound()
[2] https://cs.android.com/android/platform/superproject/+/master:libcore/harmony-tests/src/test/java/org/apache/harmony/tests/javax/net/ssl/SSLEngineTest.java;l=611
prbprbprb added a commit to prbprbprb/conscrypt that referenced this pull request May 1, 2020
Fixes google#839.

This should be _mostly_ uncontroversial as it is already documented
to do so[1] but could cause app compat issues.  A quick scan of
AOSP suggests no major issues however there is a CTS test for the
old behaviour[2] which will need changing.

The bulk of this change is regression tests for the correct behaviour
for the various possible orderings of close calls and TLS close
alerts. The behaviour change test is
closingInboundBeforeClosingOutboundShouldFail() in place of
closingInboundShouldOnlyCloseInbound().  Changes outside
ConscryptEngineTest are minimal.

Close behaviour before handshaking starts is undefined and we differ
from the RI, but I don't think that's problematic.

Obviously also needs documenting in Conscrypt and Android release
notes.

This also means that STATE_CLOSED_INBOUND is never reached, which
means it can be eliminated in a future CL allowing some minor
simplifications.

NB This can be merged independently of google#844 and I'll rebase that
change on top of it.

[1] https://developer.android.com/reference/javax/net/ssl/SSLEngine#closeInbound()
[2] https://cs.android.com/android/platform/superproject/+/master:libcore/harmony-tests/src/test/java/org/apache/harmony/tests/javax/net/ssl/SSLEngineTest.java;l=611
@prbprbprb prbprbprb merged commit 12888f5 into google:master Jun 15, 2020
@prbprbprb prbprbprb deleted the engine_socket_close branch June 15, 2020 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants