Skip to content

Quantum Volume definition

Daniel Strano edited this page Nov 8, 2024 · 9 revisions

Theory

The current metriq-gym reference implementation of Quantum Volume is one of several slightly different statistical definitions of the certification protocol. It can be argued that only the definition accepted by the authors who have proposed the protocol is definitive. Other expert opinions might differ over whether various re-definitions of the protocol have adequate statistical power or validity. We seek to establish some consensus definition of the protocol that is quantitatively acceptable to a survey of domain experts, including the Metriq Open Quantum benchmark committee, but the Metriq technical team forwards our working definition based on the statistical arguments below and the objective and need to pose an economically feasible definition of the protocol that serves its originally intended purpose.

In the context of the random unitary circuits of the protocol, a "heavy output" is defined as an output with greater than median probability, under ideal execution of the trial circuit. By definition of the median, this should include exactly half or 50% of all possible bit strings of "0" and "1" with the same bit length as the number of qubits, or qubit "width," being tested. Therefore, in the absence of all a posteriori knowledge of the actual circuit measurement output distribution, uniformly random guessing of bit strings is expected to produce a 50% "heavy output generation" rate. Assuming that the absolute worst case of performance for quantum hardware does not anti-correlate with the ideal circuit measurement output distribution, compared to a priori uniform random guessing for bit strings, we can define the "p-value" of a Quantum Volume circuit trial experiment as the p-value implied by the actual hardware output distribution under the assumption that these measurements were produced by the worst case of uniform random guessing between all well-formed possible bit strings for the qubit width.

While our reference implementation also checks the standard 2/3 minimum heavy output generation rate for each trial, our p-value is independent of any direct assumption about 2/3 heavy output rate. If our definition of the (single-tail) p-value on a set of different circuit trials is consistently lower (i.e. more significant, less likely to result by chance from random guessing) for all trials than some arbitrary but low p-value threshold, like p=0.01 (no greater than 1% chance on every trial that it could have resulted from random guessing) or 0.001 (no greater than 0.1% chance...), we believe it to be tautologically self-evident, for sufficiently low p-value, that there is strong statistical evidence that the quantum computer is provably exceeding the performance of the a priori guessing case. Hence, the most important figure-of-merit of assessing Quantum Volume certification becomes the (more-or-less continuous) set of p-values for each trial and the overall p-value of all trials. Framed this way, our "null hypothesis" is that these results could be produced by uniformly random bit string guessing.

Schema

Input Meaning
Device Name of quantum computer
Date When the benchmarks were collected
Qubits How many qubits were used
Shots The number of circuit and measurement repetitions per trial
Trials The number of randomly-generated unitary circuits trialed
Confidence The statistical confidence threshold (user setting) to reject the null hypothesis (that this heavy-output generation rate could have been produced by guessing measurement outputs with expected 50% success rate)
Output Meaning
HOG The average rate of "heavy-output generation" per shot (where "heavy outputs" are defined as those with above-median probability in the ideal)
CLOPS The average number of circuit layers per second, of all trials
XEB The average cross entropy of all trials
EPLG Based on amortization of the cross entropy, the average error per layer of gates
Pass? Whether the certification protocol passed for the test
Clone this wiki locally