-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle customer proxy re-auth response by retrying, not prompting user for different token #6652
base: main
Are you sure you want to change the base?
Conversation
Accept: application/json
header to outgoing HTTP requestsc77e517
to
e7ed90b
Compare
/////////////////////////////////// | ||
// TODO!(sqs): remove | ||
/////////////////////////////////// | ||
if (require('node:fs').existsSync('/tmp/is-custom-auth-challenge-error')) { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is for local testing and will be removed before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I added https://github.com/sourcegraph/cody/blob/main/agent/scripts/reverse-proxy.py which is very simplistic for now but could be easily extended to simulate that kind of proxy response.
e7ed90b
to
1e632ce
Compare
1daa94c
to
c08e156
Compare
c08e156
to
ae6f900
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs work, some ideas inline.
@@ -35,6 +35,17 @@ async def proxy_handler(request): | |||
headers=headers, | |||
data=await request.read() | |||
) as response: | |||
if random.random() < u2f_challenge_chance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but it depends on the frequency of requests we make. Wouldn't it be useful to have a pattern of, for example, accept for n seconds, challenge for m seconds? That would be more like an expiry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to just kill it and then restart with whatever parameter I need when I need it.
But yea, we can definitely make it more useful and support more scenarios when someone else than we will start using it :)
@@ -1,9 +1,9 @@ | |||
#!/usr/bin/env python3 | |||
|
|||
import random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to test the accept/json change that @sqs mentioned by generating JSON here?
|
||
const uri = getTestURI({ repoName: 'whatever', filePath: 'foo/bar.ts' }) | ||
expect(await provider.isUriIgnored(uri)).toBe('has-ignore-everything-filters') | ||
expect(await provider.isUriIgnored(uri)).toBe(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to separate this PR into pieces so we have the retryable errors narrowly focused in one piece, the 2 factor challenges in another, etc.
I think this change is okay because Boolean(e)
is true
for errors e
but it would be easier to review more focused, incremental changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but fixing error handling in network code should really be first PR, and when I inherited this PR I was hoping we will be able to finish it quickly.
Do you want to split it now?
It will definitely take some time to untangle things poperly.
} | ||
|
||
this.contextFiltersSubscriber.notify(contextFilters) | ||
this.contextFiltersSubscriber.notify(this.lastContextFiltersResponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is subtly different because of canonicalizeContextFilters
. Should we start freezing everything coming out of there, because it is getting passed around more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I know what you mean.
But in general I'm wondering if we need canonicalizeContextFilters
at all. I understand we have it so we can relay on object equality later, but I somehow does not like that. Only place which uses it anyway is:
private hasIgnoreEverythingFilters() {
return this.lastContextFiltersResponse === EXCLUDE_EVERYTHING_CONTEXT_FILTERS
}
And we could just do better check here. I may try to rework it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We canonicalize so we can do ===
because it is fast. The field is set once but checked often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I reverted that particular change
@@ -148,12 +150,14 @@ export class ContextFiltersProvider implements vscode.Disposable { | |||
verbose: contextFilters, | |||
}) | |||
} | |||
|
|||
const filters = isError(contextFilters) ? EXCLUDE_EVERYTHING_CONTEXT_FILTERS : contextFilters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be less error prone:
const filters = isError(contextFilters) ? EXCLUDE_EVERYTHING_CONTEXT_FILTERS : contextFilters | |
if (isError(contextFilters)) { | |
contextFilters = EXCLUDE_EVERYTHING_CONTEXT_FILTERS | |
} |
This is better because after this point someone might accidentally use filters
where they meant contextFilters
or vice versa. Only the type system would flap if the point of use did not admit Error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it might be reworked to eliminate that error but I don't think overwriting contextFilters
is actually good idea. Type of contextFilters
is now ContextFilters | Error
(for better or worse) and dropping that error on this var would IMHO be not correct.
I will rework it to make everything less error prone.
@@ -6,9 +6,9 @@ export async function isUriIgnoredByContextFilterWithNotification( | |||
uri: vscode.Uri, | |||
feature: CodyIgnoreFeature | |||
): Promise<IsIgnored> { | |||
const isIgnored = await contextFiltersProvider.isUriIgnored(uri) | |||
const isIgnored = await contextFiltersProvider.isUriIgnored(uri, /* foreceFetch = */ true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling, but see other comments about this parameter
vscode/src/auth/auth.test.ts
Outdated
|
||
await showSignInMenu() | ||
expect(mockShowInputBox).toHaveBeenCalled() | ||
expect(mockShowErrorMessage).toHaveBeenCalledWith('The access token is invalid or has expired') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests need updates.
You are making life difficult for JetBrains by putting the title in the text like that (the 'Invalid Access Token:' part), it would be better to have titles separate so when we do these notifications we can conform to the JetBrains UI guidelines.
VSCode can have a thunk that does the string concatenation if it needs that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have big choice, this is VSCode API:
/**
* Show an error message.
*
* @see {@link window.showInformationMessage showInformationMessage}
*
* @param message The message to show.
* @param items A set of items that will be rendered as actions in the message.
* @return A thenable that resolves to the selected item or `undefined` when being dismissed.
*/
export function showErrorMessage<T extends string>(message: string, ...items: T[]): Thenable<T | undefined>;
private async fetchIfNeeded(): Promise<void> { | ||
if (!this.fetchIntervalId && !this.isTesting) { | ||
private async fetchIfNeeded(forceFetch = false): Promise<void> { | ||
if (forceFetch || (!this.fetchIntervalId && !this.isTesting)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is rotten and we need to clean this up. Specifically:
- startRefetchTimer's timeout calls setRefetchTimer. Thus it is like setInterval, effectively.
- Nothing calls clearTimeout (except reset, which is only used by tests or dispose, so not relevant here.)
- Thus multiple calls to startRefetchTimer will "pile up" the timeouts and hammer the network.
The above complaints have nothing to do with this PR, but we need to clean that up. The intent evidently was if fetchIntervalId
is set, then to not call setTimeout
again. But the await this.fetchContextFilters()
here means any number of fetchIfNeeded
can pile up in here before we reach startRefetchTimer
and set the fetchIntervalId
.
Basically we should put the brakes on multiple scheduled timeouts in startRefetchTimer
. And we should make the context filters fetch de-dup multiple in flight requests given the same endpoint or whatever.
That out of the way, forceFetch
makes this even more dangerous: It skips any attempt at the existing interval check. Seatbelts? Off. Let's roll. 💀
You should go with the grain of the design... it looks like fetchContextFilters
can decide some errors are transient and return a lower refetch period with exponential backoff. Integrate the refetching behavior you want with that instead of papering over it like this.
@@ -226,11 +230,11 @@ export class ContextFiltersProvider implements vscode.Disposable { | |||
return isIgnored | |||
} | |||
|
|||
public async isUriIgnored(uri: vscode.Uri): Promise<IsIgnored> { | |||
public async isUriIgnored(uri: vscode.Uri, forceFetch = false): Promise<IsIgnored> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this idea at all! Don't do it!
First, basic API design lesson, but boolean parameters are terrible because they give no semantic clue at the callsite. I guarantee if you ask three people what isUriIgnored(uri, false)
does, they're not going to say "it means don't make a network request unless the context filters refresh timeout has passed."
Second, there is no brake on isUriIgnored(..., true)
completely spamming us with requests. Today the callers are fine... some human triggered operations via the ...WithNotification method. But "drop everything and do it = true" this is simply the wrong level of abstraction.
The fetcher already has the concept of retrying transient errors like network errors more aggressively. Isn't tuning that enough? If not, let's make this semantic, like if we classified requests as being background/foreground, passive/active, something like that. With the idea being that if the user explicitly did something or is retrying something then you'd act differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second, there is no brake on isUriIgnored(..., true) completely spamming us with requests. Today the callers are fine... some human triggered operations via the ...WithNotification method. But "drop everything and do it = true" this is simply the wrong level of abstraction.
I agree and disagree at the same time.
I think explicit human actions should always force fetch.
E.g. if I had network disabled, I enable it, and then do some Cody action, there should be no situation that some internal mechanism decides "I already tried that 3 times, now I will wait 1 minute before retrying, here is a cached failed result!".
Now, I agree we would need some general mechanism which would prevent e.g. such request being repeated in a loop when triggered just once. Or being triggered by no-human just because of some refactoring in the code.
But I see no easy way to do it and surely not without some big refactoring.
I think in this specific cause if we can somehow guarantee it is called only in isUriIgnoredByContextFilterWithNotification
it will be pretty safe. If isUriIgnoredByContextFilterWithNotification
would be called in the wrong context even without my changes, it would spam user with notifications. So we are already relying on it being called in a proper context.
ae6f900
to
84f8d4e
Compare
logIgnored(document.uri, 'context-filter', isManualCompletion) | ||
const isIgnored = await contextFiltersProvider.isUriIgnored( | ||
document.uri, | ||
/* foreceFetch = */ isManualCompletion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling here too
onShow: () => {}, | ||
}) | ||
} | ||
|
||
// TODO(philipp-spiess): Bring back this code once we have fewer uncaught errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, is this comment related to all these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of, I'm bringing back some but not all :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback inline.
} | ||
|
||
this.contextFiltersSubscriber.notify(contextFilters) | ||
this.contextFiltersSubscriber.notify(this.lastContextFiltersResponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We canonicalize so we can do ===
because it is fast. The field is set once but checked often.
this.lastFetchDelay *= nextIntervalHint.backoff | ||
} else { | ||
this.lastFetchDelay = nextIntervalHint.initialInterval | ||
console.log(this.lastFetchDelay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this console.log
const intervalHint = await this.fetchContextFilters() | ||
this.startRefetchTimer(intervalHint) | ||
} | ||
this.lastResultLifetime = this.lastResultLifetime.then(async intervalHint => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stacking up the Promises like this is nice and simple.
I assume the VM doesn't do anything weird like trying to keep a long chain of callstacks alive for debugging...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I know it should be safe.
Something like this utilizes 100% of my CPU for a few minutes but finishes successfully:
let p = Promise.resolve(0)
for (let i = 0; i < 100000000; i++) {
p = p.then(v => v + 1)
const result = await p
if (result % 1000000 === 0) console.log(result)
}
@@ -269,17 +268,13 @@ export class ContextFiltersProvider implements vscode.Disposable { | |||
} | |||
|
|||
private reset(): void { | |||
this.lastFetchTimestamp = Date.now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to put this in the past so the first fetch happens immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I changed it to be 0 👌
@@ -289,12 +284,12 @@ export class ContextFiltersProvider implements vscode.Disposable { | |||
private hasAllowEverythingFilters(): boolean { | |||
return ( | |||
isDotCom(currentAuthStatus()) || | |||
this.lastContextFiltersResponse === INCLUDE_EVERYTHING_CONTEXT_FILTERS | |||
isEqual(this.lastContextFiltersResponse, INCLUDE_EVERYTHING_CONTEXT_FILTERS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did canonicalization, ===
and hasAllowEverythingFilters
because we wanted to not slow down the common case. If you measured and we don't need it, that is fine, but please measure if you did not already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
A customer has a proxy that enforces recent 2fa on their machines to access the Sourcegraph instance. After 24h, the user needs to tap their 2fa key to make the network connection to Sourcegraph be reauthed and start working again. The responses from the proxy after the 24h period expires are being treated as "invalid token" and causing users to need to re-auth into Cody. The response looks like:
{"auth_url":"https://example.com","error":"request not authenticated","requested_url":"example.com/.api/client-config"}
with an HTTP header of the form
X-${CUSTOMER}-U2f-Challenge: true
.Accept: application/json
headers to ensure we get that JSON response and not an HTTP redirect from the proxy when the 24h period expiresTry Again
button.Fixed issues (by @pkukielka ):
Fixes https://linear.app/sourcegraph/issue/CODY-4695/handle-customer-proxy-re-auth-response-by-retrying-not-prompting-user
Screenshots
When the user needs to complete the auth challenge, they will see the following errors, depending on what they're doing.
If the extension hasn't successfully authenticated since the editor was restarted:
In chat:
After failed completion:
![image](https://private-user-images.githubusercontent.com/1519649/406888181-a7e6c411-065c-4903-a9d6-4eb8801596e0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk5NzUzMjYsIm5iZiI6MTczOTk3NTAyNiwicGF0aCI6Ii8xNTE5NjQ5LzQwNjg4ODE4MS1hN2U2YzQxMS0wNjVjLTQ5MDMtYTlkNi00ZWI4ODAxNTk2ZTAucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTlUMTQyMzQ2WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MzNlOTRmYzNlMzYwNWZkNjEzZGNkODdlZjBjZTMzOWMzZmVmNDMzNjc4MTc2ODUwOWI0ZGRmYThhYTcxZGM4ZiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.m2gF4Xog5HFiue_R0xYZQZJtMjb20kE-PE-cRM3EMa8)
Test plan
TBD
Changelog
/^X-.*-U2f-Challenge$/i
and whose value istrue
, the user is asked to complete the authentication challenge on their device ("Tap YubiKey to Authenticate...") and is not prompted for a different access token. When the device's authentication challenge is successful, the extension will automatically proceeed to sign the user in using their stored access token (if any).