Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-48194: Deploy a dev prompt processing service for LSSTCam-ImSim #4168

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hsinfang
Copy link
Contributor

@hsinfang hsinfang commented Feb 5, 2025

No description provided.

@hsinfang hsinfang force-pushed the tickets/DM-48194 branch 5 times, most recently from 11c1266 to 51176d7 Compare February 6, 2025 19:07
@hsinfang hsinfang requested a review from kfindeisen February 6, 2025 21:02
Copy link
Collaborator

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though I have questions about some of the settings.

applications/next-visit-fan-out/values.yaml Outdated Show resolved Hide resolved
Comment on lines 7 to 9
# Expect to need roughly n_detector × request_latency / survey_cadence pods
# But we do not have the compute yet. This will be adjusted.
autoscaling.knative.dev/max-scale: "200"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have values for everything in the formula? Certainly 200 is much too low.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'm setting it to 800, based on 189detector*120s/30sec=756.

But expect this to be revised later.

Currently we have 28*44= 1232 cores on d-nodes for OR5.

# @default -- None, must be set
preprocessing: ""
# -- Skymap to use with the instrument
skymap: "lsst_cells_v1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the DC2 skymap? Is patchesPerImage = 16 (which I assume was copied from ComCamSim) still valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original plan is to really use lsst_cells_v1.

Then we found out that the DC2's DC2_cells_v1 and lsst_cells_v1 are identical, just different name.

Only lsst_cells_v1 exists in repo embargo_or5 today so I'd keep it for now. Later we might change it depending on which name is chosen for actual OR5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thorough. Turns out that what DC2_cells_v1 and lsst_cells_v1 really are depend on the exact repo; notes here, though their configs are very very close.

While what we will do with embargo_or5 and the actual OR5 remains unclear, our existing templates in s3://rubin-pp-dev-users/central_repo_2 have data id skymap=DC2_cells_v1. I'll just use DC2_cells_v1 for now.

applications/prompt-proto-service-lsstcamimsim/values.yaml Outdated Show resolved Hide resolved
applications/prompt-proto-service-lsstcamimsim/values.yaml Outdated Show resolved Hide resolved
@hsinfang hsinfang force-pushed the tickets/DM-48194 branch 7 times, most recently from 9b94679 to 2562976 Compare February 12, 2025 22:39
The service is started with mostly configs from ComCam.

Larger cache via refcatsPerImage is used because otherwise
it cannot store all refcat inputs in cache for upload.py test.

More tuning is expected later.
Only a handful of LSSTCam-imSim detectors are used in the small
upload.py test. This config allows sending fanned-out messages
only for those detectors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants