Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agent registration timings #116

Closed
dannykopping opened this issue Oct 18, 2024 · 10 comments · Fixed by coder/coder#15276
Closed

Add agent registration timings #116

dannykopping opened this issue Oct 18, 2024 · 10 comments · Fixed by coder/coder#15276
Assignees

Comments

@dannykopping
Copy link
Collaborator

As part of #44, we should show when the agent registration occurred and its start/end time in this timeline. This gives users the ability to understand the delta between workspace provisioning completion and the agent startup, which is really an approximate measure of how quickly the compute instance spins up and executes the agent process and how long it takes to call back to the control-plane.

In https://codercom.slack.com/archives/CJURPL8DN/p1729196074736939, Spike and I discussed what would the definition for "agent registration" be. We settled upon: The time between an agent process starting and the successful completion of the call to /api/v2/workspaceagents/me/rpc.
See here: https://github.com/coder/coder/blob/7da231bc9286060e7ad499fa38cb829c4b055a7d/agent/agent.go#L750-L759

The workspace_agents.first_connected_at column seems to satisfy this definition fairly well, and is set here:
https://github.com/coder/coder/blob/5366f2576f690a3f7d3ac1a4efb8dd49cc2e9bd1/coderd/workspaceagentsrpc.go#L330-L338

workspace_agent.created_at is set here which is invoked once a provisioner job has been completed:
https://github.com/coder/coder/blob/fac77f956ec28ad04c0c90750e381e69465d2ae1/coderd/provisionerdserver/provisionerdserver.go#L1782-L1784

Based on this, I think we can simply return the value of created_at (start) to first_connected_at (end) as a reasonable approximation of agent registration time.

Nit: do we want to persist with this "registration time" terminology or is "connection time" more clear?

@BrunoQuaresma
Copy link

Ok, so from what I understand, all the data we need is already there. We just need to return this data in the timing endpoint. Is that correct? For example:

return sdk.AgentTiming{
    StartedAt: a. CreatedAt,
    EndedAt: a.FirstConnectedAt,
    Agent: a.Name
}

@BrunoQuaresma BrunoQuaresma self-assigned this Oct 23, 2024
@johnstcn
Copy link
Member

@BrunoQuaresma that sounds about right to me based on the issue description!
I would double-check that the time returned roughly matches with the time it takes to download and start the agent.

@dannykopping
Copy link
Collaborator Author

Hhmm maybe we need the agent download time as well, that would be very useful

@BrunoQuaresma
Copy link

@dannykopping so we want to display two timings for each agent, connection, and download times?

@dannykopping
Copy link
Collaborator Author

@dannykopping so we want to display two timings for each agent, connection, and download times?

I think so, WDYT?

@BrunoQuaresma
Copy link

I'm trying to figure out what is the best way to display the agent times based on this issue's discussion. I recorded a video explaining my doubts and thoughts a bit more.

Screen.Recording.2024-10-25.at.09.40.09.mp4

@chrifro
Copy link

chrifro commented Oct 25, 2024

I'm for option two, then we are consistent with the patterns.
The agent times are part of dev. So dev should have one bar including two smaller boxes. When you click on them, you can see the timings for connection and download times. For the colors, if there is no existing state like success or failed (as we have for start) we can use grey bars without a legend.

Hope that makes sense. Let me know if you prefer a mockup but I think we're aligned @BrunoQuaresma :)

@dannykopping
Copy link
Collaborator Author

OK, I think we need to take a step back here.

In the design you've got "start" and "dev"; I think I see why you're doing this (because one workspace can have many agents). Even so, the "start" section would have to be under "dev" since each agent has its own startup scripts.

Each agent will have (in order) timings for downloading, connecting, and executing startup scripts. Each should be individual lanes of the timing chart like we have already for "init", "plan", "graph", etc.

I think we should assume that 99% of the time workspaces will have a single agent, so let's keep the design simple by excluding the agent name in that case.

If there are multiple agents, we should repeat the "workspace boot" section and in that we could include the agent name.


On a related note, the main panel label "Provisioning time" is too specific, since it does not include the agent timings. I think we should rename this to "Build timeline" as I suggested here.

I also think we should be more clear and rename "workspace boot" to "agent initialization". The whole timeline is effectively the timings for workspace boot, so this will confuse folks.

@BrunoQuaresma
Copy link

@dannykopping it makes sense. It looks like we don't have the download agent timing for now, so I will open a PR to include the connection timing and group it together with the start stage in the same section for each agent. It would look like the following:

Image

  • The section title would be the agent name, in this case, "dev"
  • Connection would be added as a stage of the agent as the Start

Wdyt?

@dannykopping
Copy link
Collaborator Author

Perfect 👍

I think we should name the stages "run startup scripts" and "connect", to remain consist with the other stages (present tense verbs). The help text for the registration step could be "Time taken to establish an RPC connection with the control plane".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants