Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify workflow for getting the metriq job id #114

Closed
willzeng opened this issue Jan 1, 2025 · 5 comments · Fixed by #203
Closed

Clarify workflow for getting the metriq job id #114

willzeng opened this issue Jan 1, 2025 · 5 comments · Fixed by #203
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers question Further information is requested

Comments

@willzeng
Copy link
Contributor

willzeng commented Jan 1, 2025

Right now the README instructions say:

If running on quantum cloud hardware, the job will be added to a polling queue. The status of the queue can be checked with

python metriq_gym/run.py poll --job_id <METRIQ_GYM_JOB_ID>

where <METRIQ_GYM_JOB_ID> is the assigned job ID of the job that was dispatched as provided by metriq-gym.

It is not clear to the user where / how to obtain the job ID. One can find it in the .metriq_gym_jobs.jonsl file. Some approaches for solution:

  1. Tell users to open the .metriq_gym_jobs.jonsl file to find their job id. Cons: (i) over time this file will get quite long and likely not that human readable (ii) you need to switch from working in terminal / notebook to opening up a source file and locating the last job, adding extra steps.
  2. Once the new 'list-jobs` command in the PR for feature: add list-jobs cli action. #96 is merged we could tell the user to run that command to get the job id. The downside is that this adds a separate step every time for getting a job id.
  3. We could improve the output of run.py to (i) print the job id as a string that can easily by copied and pasted and (ii) surpress the current logging of things like the IBM queue different transpiler passes to make that printed job id more obvious. The logging could be put behind a --verbose flag. If there are multiple jobs then the list-jobs command can help users keep track on the CLI.

I lean towards doing both 2 and 3 of these options.

Thoughts? Other ideas for how to improve this workflow? @vprusso @cosenal @WrathfulSpatula

@willzeng willzeng added documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested labels Jan 1, 2025
@cosenal
Copy link
Contributor

cosenal commented Jan 1, 2025

We can run list-jobs (the underlying action, not the cli command) by default whenever the user doesn't pass any job_id to the poll command, and let the user pick the job from the ones listed in the same interaction.

@cosenal cosenal added the good first issue Good for newcomers label Jan 24, 2025
@cosenal
Copy link
Contributor

cosenal commented Jan 26, 2025

@Changhao-Li This is a good issue to get you started with our development process, and to get you familiarized with the command-line interface of metriq-gym.

The current user workflow is the following:

  1. user dispatches a job with the run.py dispatch action
  2. user lists all dispatched jobs via run.py list-jobs action
  3. from the list of job ids that come from 2., user picks the one they are interested in, and they poll and fetch the results with the run.py poll action, by passing the job id to the --job-id parameter.

There are many things† that can improved in this workflow, but unfortunately the team had to focus on other priorities so far rather than improving this. However, we believe that step 3. is particularly unintuitive and a clear sticking point. In fact, a first-time user may not realize how to get the id for the job. One incremental improvement that we can make immediately is to have the possibility to run the poll action without specify any job id, upon which, the user will be offered a selection of the jobs that are available for polling.

New workflow:

  1. python run.py dispatch ...
  2. user goes on with their life, make ☕ , etc..
  3. user runs python run.py poll
  4. metriq-gym displays a table with all the dispatched jobs, and prompt the user a question on which job they want to poll/fetch.
  5. user selects the job, and the job is polled/fetched.

We are keeping the CLI very simple, so please don't go with too fancy solutions, as may rewrite it in the future anyway (see #130.)

†As you execute this issue and reflect on the current workflow, don't be shy to create issues on this backlog for whatever improvement you think it's necessary 🙏

@nathanshammah
Copy link
Member

Thanks @cosenal for the explanation, looks like a great issue to get started. Feel free to reach out, @Changhao-Li, if you need support.

@Changhao-Li
Copy link
Contributor

Changhao-Li commented Jan 30, 2025

Thanks @cosenal and @nathanshammah for the information.

I've modified the workflow accordingly by modifying poll_job in run.py, as well as setting the --job-id argument optional for the poll command.

More specifically, the modified poll_job reads:

def poll_job(args: argparse.Namespace, job_manager: JobManager) -> None:
    logger.info("Polling job...")
    if not args.job_id:
        jobs = job_manager.get_jobs()
        if not jobs:
            logger.info("No jobs available for polling.")
            return
        print("Available jobs:")
        for i, job in enumerate(jobs):
            print(f"[{i}] {job.id} - {job.job_type}")
        selected_index = int(input("Select a job index: "))
        args.job_id = jobs[selected_index].id
    
    metriq_job: MetriqGymJob = job_manager.get_job(args.job_id)
    job_type: JobType = JobType(metriq_job.job_type)
    job_data: BenchmarkData = setup_job_data_class(job_type)(**metriq_job.data)
    job_class = setup_job_class(metriq_job.provider_name)
    device = setup_device(metriq_job.provider_name, metriq_job.device_name)
    handler = setup_handler(args, None, job_type)
    quantum_job = [job_class(job_id, device=device) for job_id in job_data.provider_job_ids]
    if all(task.status() == JobStatus.COMPLETED for task in quantum_job):
        result_data: list[ResultData] = [task.result().data for task in quantum_job]
        handler.poll_handler(job_data, result_data)
    else:
        logger.info("Job is not yet completed. Please try again later.")

In cli.py, set

poll_parser.add_argument("--job_id", type=str, required=False, help="Job ID to poll (optional)")

Please let me know if you would like me have a pull request on this to close the issue.

Some other improvements we may consider in future: add a --status flag to poll to allow users to check the status of a job before attempting to poll results; introduce a timeout mechanism when polling jobs to avoid indefinite waits (particularly when there are multi-jobs with large circuits) .

@cosenal
Copy link
Contributor

cosenal commented Jan 31, 2025

@Changhao-Li Yes, please make a pull request and we discuss there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants