Parameterised quantum circuits
A PQC on n qubits is defined as a sequence of m unitary gates, each parameterised by a component of a parameter vector θ. Here, we consider the case where the gates are alternating layers of Clifford operations Ci and Pauli rotations \({R}_{i}({\theta }_{i})={e}^{-i{\theta }_{i}/2{P}_{i}}\) where Pi is a multi-qubit Pauli operator. The overall unitary is
$$U({\boldsymbol{\theta }})=\left(\mathop{\prod }\limits_{i=1}^{m}{C}_{i}{R}_{i}({\theta }_{i})\right){C}_{0}.$$
(1)
The parameters θ ∈ [0, 2π]m can therefore be equivalently described as rotation angles. This specific form is operationally relevant as it is featured in many common near-term algorithms31,32,33 and proposals for fault-tolerant architectures34,35, and since Clifford unitaries and Pauli rotations form a universal gate set, any PQC can be cast in this way (up to fixing a subset of the parameters).
Typical VQAs involve initialising the quantum computer in \(\left\vert {\boldsymbol{0}}\right\rangle ={\left\vert 0\right\rangle }^{\otimes n}\), applying the PQC, and measuring an observable to obtain a cost function. We denote the set of single-qubit Pauli operators by \({\mathbb{P}}=\{I,X,Y,Z\}\) and the expectation value for a specific n-qubit Pauli operator \(P\in {{\mathbb{P}}}^{\otimes n}\) by
$$f({\boldsymbol{\theta }}):= {\rm{tr}}(P\,{{\mathcal{U}}}_{{\boldsymbol{\theta }}}[\left\vert {\boldsymbol{0}}\right\rangle \left\langle {\boldsymbol{0}}\right\vert ]),$$
(2)
where the unitary channel is \({{\mathcal{U}}}_{{\boldsymbol{\theta }}}[\cdot ]:= U({\boldsymbol{\theta }})[\cdot ]{U}^{\dagger }({\boldsymbol{\theta }})\). The expectation value of a general observable may be linearly decomposed into expectations of Pauli observables.
Modelling noisy operations
We are interested in the classical simulatability of VQAs affected by noise, and we model the noisy PQC using general Pauli channels, which are probabilistic mixtures of unitary n-qubit Pauli operator evolutions. For a single qubit, a general Pauli channel is given by
$$\begin{array}{ll}{{\mathcal{N}}}_{Pauli}({p}_{X},{p}_{Y},{p}_{Z})[\rho ]=(1-{p}_{X}-{p}_{Y}-{p}_{Z})\rho \\\qquad\qquad\qquad\qquad\qquad +{p}_{X}X\rho X+{p}_{Y}Y\rho Y+{p}_{Z}Z\rho Z.\end{array}$$
(3)
These are often used to model local decoherent processes in quantum hardware. The dephasing channel \({{\mathcal{N}}}_{Pauli}(0,0,p)\) is a particular example which models interactions between a qubit and the external environment. The best-fit noise parameters {pX, pY, pZ} for each qubit can be estimated experimentally via procedures like cycle benchmarking36.
The noisy circuit model we consider takes the form
$${\tilde{{\mathcal{U}}}}_{{\boldsymbol{\theta }}}=\left({\bigcirc}_{i = 1}^{m}\,{{\mathcal{C}}}_{i}\,{\circ}\, {\tilde{{\mathcal{R}}}}_{i}({\theta }_{i})\right)\,{\circ}\, {{\mathcal{C}}}_{0}.$$
(4)
The resulting noisy cost function is labelled \(\tilde{f}({\boldsymbol{\theta }})\). The symbol “∘” refers to concatenation of maps and “\({\bigcirc}_{i = 1}^{m}\)” to repeated concatenation. Each noisy gate is given by the target unitary followed by a Pauli channel acting on the subset of qubits Qi where Pi acts nontrivially. Specifically, we have
$${\tilde{{\mathcal{R}}}}_{i}({\theta }_{i})=\bigotimes _{q\in {Q}_{i}}{{\mathcal{N}}}_{Pauli}^{(q)}\,{\circ}\, {{\mathcal{R}}}_{i}({\theta }_{i}).$$
(5)
We will later consider more general noise model where the Clifford operations also incur noise: \({\tilde{{\mathcal{C}}}}_{i}={{\mathcal{C}}}_{i}\,{\circ}\, {\mathcal{M}}\), where \({\mathcal{M}}\) are multi-qubit Pauli channels, and where noise is allowed to vary across the circuit.
Finally, let us define \(p={\min }_{\sigma = X,Y,Z}{p}_{\sigma }.\) Generally real devices will have a symmetric depolarising component to every operation so we can assume p > 0 holds13.
Pauli transfer matrices and simulation algorithms
When studying generic quantum operations it can often be useful to use the Pauli transfer matrix (PTM) formalism37. Let us briefly review it. In the PTM formalism, one takes the view of the normalised Pauli basis \(\widehat{{\mathbb{P}}}=\frac{1}{\sqrt{2}}\{I,X,Y,Z\}\), where a normalised Pauli operator \({\hat{P}}_{i}\in {\hat{{\mathbb{P}}}}^{\otimes n}\) is a basis vector \(\left.\left\vert {P}_{i}\right\rangle \right\rangle\) in the space \({{\mathbb{R}}}^{{4}^{n}}\). The normalisation ensures that \(\left\langle \left\langle {P}_{i}| {P}_{j}\right\rangle \right\rangle ={\rm{tr}}({\hat{P}}_{i}{\hat{P}}_{j})={\delta }_{ij}\). Quantum states can be written in this basis as \(\left.\left\vert \rho \right\rangle \right\rangle\),
$${[\left.\left\vert \rho \right\rangle \right\rangle ]}_{i}={\rm{tr}}(\rho {\hat{P}}_{i}),$$
(6)
extending the identification of a one-qubit density matrix with its Bloch vector to higher dimensions. For instance, in this basis we represent the density matrix \(\left\vert 0\right\rangle \left\langle 0\right\vert\) as \(\left.\left\vert 0\right\rangle \right\rangle =[1/\sqrt{2},0,0,1/\sqrt{2}]\). Then, a quantum channel \({\mathcal{E}}\) is a matrix (the PTM) \({\bf{E}}\in {{\mathbb{R}}}^{{4}^{n}\times {4}^{n}}\),
$${[{\bf{E}}]}_{ij}=\left\langle \left\langle {P}_{i}| {\bf{E}}| {P}_{j}\right\rangle \right\rangle ={\rm{tr}}({\hat{P}}_{i}{\mathcal{E}}[{\hat{P}}_{j}]),$$
(7)
and therefore expectation values of Pauli operators are written as \(\left\langle \left\langle {P}_{i}| {\bf{E}}| \rho \right\rangle \right\rangle ={\rm{tr}}({\hat{P}}_{i}{\mathcal{E}}[\rho ])\). Composition of quantum channels becomes matrix multiplication.
The PTM formalism can be used to calculate expectation values in the Heisenberg picture via Pauli back-propagation, where the quantum channels are seen as acting on the measurement operator instead of the state38. In PTM form this adjoint operation corresponds to simply taking the transpose of the expectation value
$$\left\langle \left\langle P| {\bf{E}}| \rho \right\rangle \right\rangle =\left\langle \left\langle \rho | {{\bf{E}}}^{{\mathsf{T}}}| P\right\rangle \right\rangle ,$$
(8)
which is possible for any \({\mathcal{E}}\). This perspective provides an efficient approach to classically computing expectation values. Take an n-qubit channel \({\mathcal{E}}\) and assume it can be decomposed as a sum of N Clifford unitary channels \({{\mathcal{E}}}_{i}\) via \({\mathcal{E}}=\mathop{\sum }\nolimits_{i = 1}^{N}{c}_{i}{{\mathcal{E}}}_{i}\), ∑ici = 1. Also consider a stabiliser state38ρ such that the expectation value with any Pauli operator can be evaluated efficiently. Then, given a Pauli P, the expectation value \(\left\langle \left\langle P| {\bf{E}}| \rho \right\rangle \right\rangle\) can be expanded as a sum of N terms \(\left\langle \left\langle P| {{\bf{E}}}_{i}| \rho \right\rangle \right\rangle\). As Clifford unitaries are generalised permutation matrices in the PTM representation we get \(\left\langle \left\langle \rho | {{\bf{E}}}_{i}^{{\mathsf{T}}}| P\right\rangle \right\rangle =\left\langle \left\langle \rho | {P}_{i}^{{\prime} }\right\rangle \right\rangle\) (up to a phase), for some other Pauli operator \({P}_{i}^{{\prime} }\). When \({{\mathcal{E}}}_{i}\) is an n-qubit Clifford unitary then it can be synthesised into at most O(n2/log(n)) gates39 and the change of Pauli frame from P to \({P}_{i}^{{\prime} }\) can be efficiently computed in O(n2)38,40. Finally, since ρ is assumed a stabiliser state, the expectation value \(\left\langle \left\langle \rho | {{\bf{E}}}_{i}^{{\mathsf{T}}}| P\right\rangle \right\rangle\) can be efficiently computed in O(n2). This gives an efficient classical algorithm to compute expectation values when N ~ poly(n).
Prior art
This approach is not new. The decomposition of general channels into sums of stabiliser channels (Cliffords and Pauli measurements) for the purpose of quantum circuit simulation was introduced in ref. 41. A similar sum-over-Clifford algorithm for unitary circuits was explored in ref. 27. A PTM-based algorithm for both exact and noisy circuit simulation has been proposed in ref. 42 from a Schrödinger perspective (state propagation). The work in ref. 43 is the closest to the method used here, as it covers the PTM representation in conjunction with a Heisenberg picture simulation method. In addition, it discusses the effect on simulatability of adding symmetric depolarising noise on z-rotation gates.
However, something that to our knowledge has not been made explicit before is that the method can be generalised beyond decompositions into Clifford unitaries (or near-Clifford unitaries27) and Pauli measurement channels. Indeed, here we will consider general processes \({{\mathcal{E}}}_{i}\) for which the expectation value \(\left\langle \left\langle P| {{\bf{E}}}_{i}| \rho \right\rangle \right\rangle\) can be evaluated efficiently. Notably, the processes \({{\mathcal{E}}}_{i}\) need not even be valid quantum channels (or completely positive trace preserving maps), we only require that its PTM representation is sufficiently sparse. This occurs when the adjoint channel \({{\mathcal{E}}}_{i}^{\dagger }\) maps every Pauli operator into a combination of small, O(poly(n)), number of Pauli operators. This echoes remarks in ref. 44, although that work is in the Schrödinger picture. In our case, the \({{\mathcal{E}}}_{i}\) will correspond to compositions of Clifford unitaries and processes that map every Pauli operator to a single (possibly distinct) Pauli operator or to zero.
Strategy
We first show how the noisy variational circuits considered in Eq. (1) admit a linear decomposition into processes that are amenable to the classical simulation outlined in “Pauli transfer matrices and simulation algorithms ”. To that aim, it turns out that a decomposition into Fourier series of the noisy channel \({\tilde{{\mathcal{U}}}}_{\theta }\), and therefore noisy cost function, results in processes that map a Pauli operator into multiple Pauli operators, and thus their composition may lead to an exponential accumulation of terms. However, a different choice of basis involving trigonometric polynomials remedies this to produce a decomposition for which the dominant coefficients in the expansion can be efficiently computed.
We now assume that all rotations are single qubit z-rotations. This is purely to make the exposition easier to follow, the theorems will be valid for any Pauli rotation. Let Rz(θ) be the PTM of \({{\mathcal{R}}}_{z}(\theta )\) and let N be the PTM of the Pauli noise channel \({{\mathcal{N}}}_{Pauli}\), N = diag(1, qX, qY, qZ). The eigenvalues of the Pauli channel are related to the error probabilities as qX = 1 − 2(pZ + pY), qY = 1 − 2(pZ + pX), qZ = 1 − 2(pX + pY). Then, the noisy channel \({\tilde{{\mathcal{R}}}}_{z}(\theta )={{\mathcal{N}}}_{Pauli}\,{\circ}\, {{\mathcal{R}}}_{z}(\theta )\) has, with respect to the orthonormal basis \(\{\left.\left\vert I\right\rangle \right\rangle ,\left.\left\vert X\right\rangle \right\rangle ,\left.\left\vert Y\right\rangle \right\rangle ,\left.\left\vert Z\right\rangle \right\rangle \}\), the PTM
$${\bf{N}}\cdot {\bf{R}}=\left(\begin{array}{cccc}1&0&0&0\\ 0&{q}_{X}\cos \theta &-{q}_{X}\sin \theta &0\\ 0&{q}_{Y}\sin \theta &{q}_{Y}\cos \theta &0\\ 0&0&0&{q}_{Z}\end{array}\right).$$
(9)
Denote the projectors by \({\Pi }_{0}=\left.\left\vert I\right\rangle \right\rangle \left\langle \left\langle I\right\vert \right.+\left.\left\vert Z\right\rangle \right\rangle \left\langle \left\langle Z\right\vert \right.\), \({\Pi }_{X}=\left.\left\vert X\right\rangle \right\rangle \left\langle \left\langle X\right\vert \right.\) and \({\Pi }_{Y}=\left.\left\vert Y\right\rangle \right\rangle \left\langle \left\langle Y\right\vert \right.\). Then we can define new quantum processes \(\{{{\mathcal{D}}}_{0},{{\mathcal{D}}}_{1},{{\mathcal{D}}}_{-1}\}\) to be used in the simulation algorithm via their PTM representation D0 = Π0NRΠ0, D1 = ΠXNRΠX + ΠYNRΠY and D-1 = ΠXNRΠY + ΠYNRΠX such that
$${\bf{N}}\cdot {\bf{R}}={{\bf{D}}}_{{\bf{0}}}+\cos \theta \,{{\bf{D}}}_{{\bf{1}}}+\sin \theta \,{{\bf{D}}}_{-{\bf{1}}}.$$
(10)
Expanding out these processes, we see that each of them maps any single Pauli operator into at most another single Pauli operator (up to a scaling),
$${{\bf{D}}}_{0}=\left(\begin{array}{cccc}1&0&0&0\\ 0&0&0&0\\ 0&0&0&0\\ 0&0&0&{q}_{Z}\end{array}\right),{{\bf{D}}}_{1}=\left(\begin{array}{cccc}0&0&0&0\\ 0&{q}_{X}&0&0\\ 0&0&{q}_{Y}&0\\ 0&0&0&0\end{array}\right),$$
(11)
$${{\bf{D}}}_{-1}=\left(\begin{array}{cccc}0&0&0&0\\ 0&0&-{q}_{X}&0\\ 0&{q}_{Y}&0&0\\ 0&0&0&0\end{array}\right).$$
(12)
This decomposition allows us to expand the noisy circuits in terms of a multivariate trigonometric basis, which is a more convenient choice for the classical simulation. Consider \({\Phi }_{{\boldsymbol{\omega }}}({\boldsymbol{\theta }}):= \mathop{\prod }\nolimits_{i = 1}^{m}{\phi }_{{\omega }_{i}}({\theta }_{i})\) where \({\phi }_{0}(\theta )=1,\,{\phi }_{1}(\theta )=\cos (\theta ),\,{\phi }_{-1}(\theta )=\sin (\theta )\) are trigonometric monomials that encode the θ dependence. Then, the noisy variational circuits admit the decomposition
$${\tilde{{\mathcal{U}}}}_{\theta }=\sum _{{\boldsymbol{\omega }}\in {\{0,\pm 1\}}^{m}}{\Phi }_{{\boldsymbol{\omega }}}({\boldsymbol{\theta }}){{\mathcal{D}}}_{{\boldsymbol{\omega }}},$$
(13)
where each process \({{\mathcal{D}}}_{{\boldsymbol{\omega }}}\) is labelled by a frequency vector ω ∈ [0, ±1]m and given by
$${{\mathcal{D}}}_{{\boldsymbol{\omega }}}:= \left({\bigcirc}_{i}\,{\tilde{{\mathcal{C}}}}_{i}\,{\circ}\, {{\mathcal{D}}}_{{\omega }_{i}}\right)\,{\circ}\, {{\mathcal{C}}}_{0}.$$
(14)
In keeping with previous work26 we call these channels process modes. Crucially, each process mode maps one Pauli operator onto another Pauli operator, as can be seen from the PTMs above and the defining property of Clifford operations.
Overall, this new decomposition yields the following Fourier series representation for the cost function in Equation (2)
$$\tilde{f}({\boldsymbol{\theta }})=\sum _{{\boldsymbol{\omega }}}{d}_{{\boldsymbol{\omega }}}{\Phi }_{{\boldsymbol{\omega }}}({\boldsymbol{\theta }}).$$
(15)
The Fourier coefficients are given by
$${d}_{{\boldsymbol{\omega }}}:= {\rm{tr}}(P{{\mathcal{D}}}_{{\boldsymbol{\omega }}}[\left\vert 0\right\rangle \left\langle 0\right\vert ])=\sqrt{{2}^{n}}\left\langle \left\langle P| {{\bf{D}}}_{{\boldsymbol{\omega }}}| 0\right\rangle \right\rangle ,$$
(16)
where the factor \(\sqrt{{2}^{n}}\) is necessary since we have defined f(θ) as the expectation value of an unnormalised Pauli operator.
Note that in the above, the Clifford unitaries \({{\mathcal{C}}}_{i}\) were noise-free and the parameterised rotation gates carried a time-independent Pauli noise. A similar decomposition arises when we consider the general Pauli noise model for \({\tilde{{\mathcal{C}}}}_{i}={{\mathcal{M}}}_{i}\,{\circ}\; {{\mathcal{C}}}_{i}\). In this case, we denote the resulting process modes by \({{\mathcal{D}}}_{{\boldsymbol{\omega }}}^{{\prime} }:= ({\bigcirc}_{i}\,{{\mathcal{M}}}_{i}\,{\circ}\, {{\mathcal{C}}}_{i}\,{\circ}\, {{\mathcal{D}}}_{{\omega }_{i}})\,{\circ}\, {{\mathcal{M}}}_{0}\,{\circ}\, {{\mathcal{C}}}_{0}\) and the corresponding coefficients by \({d}_{{\boldsymbol{\omega }}}^{{\prime} }=\sqrt{{2}^{n}}\left\langle \left\langle P| {{\bf{D}}}_{{\boldsymbol{\omega }}}^{{\prime} }| 0\right\rangle \right\rangle\). We first describe and analyse the proposed classical algorithm for the simpler noise model that only affects the parameterised gates. This is purely to make the exposition easier to follow. The same principle works in the general case (see “ General Pauli noise models”). Furthermore, the analysis extends to time-dependent Pauli errors. The noise models considered here also include the local depolarising channels that have been previously used in classical algorithms for noisy random circuit sampling7. Both in our case and in previous work there is an implicit assumption that the Pauli error probabilities for each gate are known a-priori.
The LOWESA simulation algorithm
We are now in a position to state the simulation algorithm, which shares similar features to the algorithm in ref. 6, but applied to the task of estimating expectation values and to a different family of circuits. We name it lowesa for low weight efficient simulation algorithm (pronounced “low-EE-sa”).
Given a cutoff parameter ℓ, lowesa returns a function\(\tilde{g}\) approximating the noisy cost function \(\tilde{f}\) constructed from all the low Hamming weight ∣ω∣ ≔ ∥ω∥1 ≤ ℓ terms. This function is expressed as a trigonometric series and can therefore be used to evaluate the cost estimate for any parameter vector θ using
$$\tilde{g}({\boldsymbol{\theta }})=\sum _{| {\boldsymbol{\omega }}| \le \ell }{d}_{{\boldsymbol{\omega }}}{\Phi }_{{\boldsymbol{\omega }}}({\boldsymbol{\theta }})$$
(17)
with low computational effort. As the algorithm produces all \({\{{d}_{{\boldsymbol{\omega }}}\}}_{| {\boldsymbol{\omega }}| \le l}\), the function evaluation is independent of qubit number and depth.
lowesa involves the following steps:
Algorithm 1
[LOWESA] Simulating cost functions of noisy VQAs with uncorrelated angles
Input: Quantum process given by Eq. (4) defined by process modes as in Eq. (14); measurement Pauli operator P; cutoff parameter ℓ.
Output: \(\tilde{g}({\boldsymbol{\theta }})\), an approximation of \(\tilde{f}({\boldsymbol{\theta }})\).
1: procedure lowesa
2: \(\tilde{g}({\boldsymbol{\theta }})\leftarrow 0\)
3: run Branch(P, (), m) recursively to yield \({d}_{{\boldsymbol{\omega }}}=\sqrt{{2}^{n}}\left\langle \left\langle 0| {{\bf{D}}}_{{\boldsymbol{\omega }}}^{{\mathsf{T}}}| P\right\rangle \right\rangle \,\forall \,| {\boldsymbol{\omega }}| \le \ell \).
4: for all non-zero dω do
5: \(\tilde{g}({\boldsymbol{\theta }})\leftarrow \tilde{g}({\boldsymbol{\theta }})+{d}_{{\boldsymbol{\omega }}}{\Phi }_{{\boldsymbol{\omega }}}({\boldsymbol{\theta }})\)
6: end for
7: return \(\tilde{g}({\boldsymbol{\theta }})\)
8: end procedure
Subroutine: Calculate dω ∀ ∣ω∣≤ℓ via recursion.
1: procedure Branch(Q, ω, i)
2: \(Q\leftarrow {{\mathcal{C}}}_{i}^{\dagger }(Q)\)
3: if i>0 then
4: if [Q,Pi]=0 then
5: Branch(\({{\mathcal{D}}}_{0}^{i\dagger }(Q)\), append(ω←0), i−1)
6: else if ∣ω∣ < ℓ then
7: Branch(\({{\mathcal{D}}}_{1}^{i\dagger }(Q)\), append(ω←1), i−1)
8: Branch(\({{\mathcal{D}}}_{-1}^{i\dagger }(Q)\), append(ω←−1), i−1)
9: else
10: break
11: end if
12: end if
13: yield \({d}_{{\boldsymbol{\omega }}}=\sqrt{{2}^{n}}\left\langle \left\langle 0| Q\right\rangle \right\rangle \)
14: end procedure
We shall now explain how Algorithm 1 works step by step and why it is efficient. Note that, while our exposition here deals with expectation values of a single Pauli operator, the results extend immediately to general observables as explained in “ General measurement operators”.
Start with the target Pauli measurement operator P and propagate in the Heisenberg picture through the circuit. For each Clifford unitary Ci, updating the Pauli operator (by conjugation) takes at most O(n2). Each process mode \({{\mathcal{D}}}_{{\omega }_{i}}^{i}\) within a path \({{\mathcal{D}}}_{{\boldsymbol{\omega }}}\) acts with Pauli generator Pi. Note that the superscript indicates to which gate it corresponds and the subscript ωi ∈ {0, ± 1} labels the type of mode. The gate label is necessary since owing to the different generators Pi the process modes may be different, however their effect on an arbitrary Pauli be easily evaluated making classical simulation possible. If the propagated Pauli operator commutes with Pi then only \({{\mathcal{D}}}_{0}^{i}\) leads to a non-zero path, otherwise if it anticommutes either \({{\mathcal{D}}}_{1}^{i}\) or \({{\mathcal{D}}}_{-1}^{i}\) are valid choices. See Fig. 1.
Circuit model and algorithm flow. a Schematic of the parameterised quantum circuits that can be simulated by lowesa. The light boxes are arbitrary (noisy) Clifford gates, the blue boxes are parameterised Pauli rotations and the red kites represent Pauli noise channels. b Diagrammatic sketch of lowesa as described in Algorithm 1 applied to circuits given by Eq. (1). The Pauli operator P is propagated backwards through the circuit where every Clifford gate transforms it into another Pauli, and the decomposition of the parameterised Pauli rotations into process modes D0, D1, D−1 splits the propagation up into paths that may annihilate. A cutoff of ℓ = 2 is chosen which artificially annihilates paths that branch into D1, D−1 more than 2 times.
As only \({{\mathcal{D}}}_{\pm 1}^{i}\) contribute to the total weight ∣w∣ and since we impose ∣w∣ ≤ ℓ, it suggests a binary tree-like data structure with ℓ layers to keep track of the change of Pauli frame and the different branching possibilities. A branch may terminate sooner than if it propagated the Pauli through the entire circuit. The number of branches and therefore valid paths \({{\mathcal{D}}}_{\omega }\) will be at most 2ℓ. Putting everything together, this reduces the total complexity of evaluating all non-zero dω with ∣ω∣ ≤ ℓ to O(n2m2ℓ) in the worst case. We note that the quadratic scaling in n is for general n-qubit Clifford unitaries, and can be improved for k-local (or sparse) unitaries. In particular, if one fixes the set of Clifford unitaries that are executed within the circuits (for example the set {X, H, CNOT}), one can employ time-memory trade-off tools like look-up tables for each k-body Clifford operation and how they act on every k-body Pauli operator. In Fig. 2 we illustrate the runtime of lowesa using this technique on a circuit structure that is typically challenging for classical simulators.
The circuit structure consists of two parameterised layers of H − Rz(θi) − X − H on each qubit, where the Hadamard and X gates are chosen with 0.5 probability, followed by CNOTs placed on a 2D topology. a Upper bound on the number of paths for a given ℓ, which equals \(\mathop{\sum }\nolimits_{i = 0}^{\ell }\left(\begin{array}{c}m\\ i\end{array}\right){2}^{i}\), and the median number of paths empirically explored by lowesa, which is dramatically lower. b Wall time to run lowesa with truncation parameter ℓ on an average laptop without parallelisation. Each data point represents an average over 1000 different randomised circuits with Pauli Z measurement operators that act on a random subset of qubits. The shading shows the 90% confidence interval. The simulation of the Clifford gates used a look-up table, meaning that the scaling in n is entirely due the scaling of m with n.
To analyse the asymptotic complexity, we need to (1) show that each term dω can be efficiently estimated, (2) count the number of non-zero process modes \({{\mathcal{D}}}_{{\boldsymbol{\omega }}}\) with ∣ω∣ ≤ℓ , and (3) evaluate the accuracy in the approximation \(\tilde{g}\approx \tilde{f}\). Condition (1) is satisfied by construction - the choice of trigonometric basis ensures that (the adjoint of) \({{\mathcal{D}}}_{{\boldsymbol{\omega }}}\) maps a Pauli operator to either zero or a (different, scaled) Pauli operator. Each dω can be individually estimated in at most O(n2m) steps using the Pauli back-propagation method outlined in “ Pauli transfer matrices and simulation algorithms”. For (2), note that while there are \((\begin{array}{c}m\\ | {\boldsymbol{\omega }}| \end{array})\,{2}^{| {\boldsymbol{\omega }}| }\) paths with a fixed weight ∣ω∣ for a total of at most \({m}^{{\mathcal{O}}(\ell )}\) within the cutoff, many of these will be zero when acted upon the input \(\left.\left\vert P\right\rangle \right\rangle\). This is due to the fact that process modes in Equation (11) each annihilate half of the Paulis.
Finally, condition (3) remains to be verified so that lowesa yields an accurate simulation of the noisy cost function. Given a cost function \(\tilde{f}\) and its approximation \(\tilde{g}\), we define the average L2-norm error over the space of parameters Θ = [0, 2π]m or root mean squared error (RMSE)
$$\Delta (\tilde{f},\tilde{g}):= {\left(\frac{1}{| \Theta | }{\int}_{\Theta }| \tilde{f}({\boldsymbol{\theta }})-\tilde{g}({\boldsymbol{\theta }}){| }^{2}d{\boldsymbol{\theta }}\right)}^{1/2},$$
(18)
where the integration measure is dθ = d θ1d θ2…d θm and ∣ Θ ∣ = (2π)m is a normalisation factor so that \(\frac{1}{| \Theta | }\int\,d\,{\boldsymbol{\theta }}=1\). In Methods we prove the following result:
Theorem 1
Consider a n-qubit VQA with a PQC as in Eq. (1) having m independently parameterised rotations affected by a single-qubit Pauli noise channel \({{\mathcal{N}}}_{Pauli}({p}_{X},{p}_{Y},{p}_{Z})\) as in Eq. (4). Recall that \(p=\mathop{\min}\nolimits_{\sigma = X,Y,Z}{p}_{\sigma }\, > \,0\).
Then, for any weight cutoff \(\ell \in {\mathbb{N}}\), lowesa (Algorithm 1) returns an approximation \(\tilde{g}\) for the noisy cost function \(\tilde{f}\) with RMSE
$$\Delta (\tilde{f},\tilde{g})\le {(1-2p)}^{\ell +1}\le {e}^{-2p\ell }$$
(19)
and runs in time at most O(n2m 2ℓ).
It follows from Theorem 1 that lowesa is both accurate and efficient, as its runtime scales polynomially with m and n and the maximum allowed RMSE; however, the scaling with noise probability is considerably worse. For example, suppose we wish to have an error bounded by ϵ. Then one would choose \(\ell \approx \frac{1}{2p}\log {\epsilon }^{-1}\), giving a runtime \(O({\epsilon }^{-\frac{\log 2}{2p}}\,{n}^{2}m)\). While this is asymptotically efficient in the width and depth of the circuit, the dependency on the error rate limits its practicality. Notably the exponent may still be considerably large if the noise is small. When the goal is to simulate the expected outcome of a hardware implementation with a finite number of measurements Ns, the error can be chosen like \(\epsilon \in {\mathcal{O}}(\frac{1}{\sqrt{{N}_{s}}})\), thus relaxing the precision requirements.
In Fig. 3 we illustrate the mean accuracy of the algorithm for an example circuit of the hardware-efficient family. We observe that the error is typically up to two orders of magnitude lower than the bounds, suggesting these are loose and may be improved for the typical case.
We show the L2 error of a single-qubit Pauli Y operator expectation with ℓ < m = 60 for two layers of a n = 10 qubit circuit. The circuit consists of parametrised single-qubit gates Rz(θi) Rx(θi+1) Rz(θi+2) on each qubit followed by CNOT gates in a 2D topology. For this particular circuit, each entangler in the 2D topology was placed with a 0.5 probability. The noise model is symmetric depolarising noise, where the parameters are set pX = pY = pZ = p. Each point is averaged over 1000 random parameterisations of the same circuit to compare to the integral definition of our error bounds. All paths below ℓ = 3 and above ℓ = 21 annihilate. Consequently, the simulation with ℓ = 21 is exact.
Validity of error measure
The use of RMSE as error measure has limitations, the main one being that the error at any given point is in principle unbounded. However we argue that this limitation is weaker than may appear. Applying Markov’s inequality to our Theorem we have the following probabilistic bound:
Corollary 2
For a fixed circuit, choosing the parameters θ uniformly at random from [0, 2π]m, with probability ≥1 − δ the approximation error is bounded by
$$| \tilde{f}({\boldsymbol{\theta }})-\tilde{g}({\boldsymbol{\theta }})| \le {e}^{-p\ell }{\delta }^{-1/2}$$
(20)
Suppose that we wish to have error bounded by ϵ with probability 1 − δ. Then the required cutoff is \(\ell \approx {p}^{-1}\log ({\epsilon }^{-1}{\delta }^{-1/2})\), giving again a runtime that scales unfavourably with p. However for fixed p the scaling is logarithmic in both δ and ϵ meaning that the probability of encountering large deviations can be made arbitrarily small by increasing ℓ. This probabilistic formulation has practical relevance as typical VQAs have their parameters initialised uniformly at random45 and so it is valid at initialisation; however, this analysis breaks down when considering the error over the whole path of gradient descent, which may lead into a region of high deviation.
General measurement operators
Up to now we assumed that the measurement operator is Pauli, however in truth most practical algorithms have more complicated measurement operators. Generally, a measurement takes the form
$$O=\sum _{{P}_{i}\in {{\mathbb{P}}}^{n}/\{{I}^{n}\}}{c}_{i}{P}_{i}$$
(21)
where we can ignore the identity component as it contributes a constant to the cost function. We get the following result, proven in Methods:
Theorem 3
With a general measurement operator as in Eq. (21), LOWESA can simulate the noisy cost function with RMSE Δ ≤ ϵ and with runtime at most
$$O\left({(\parallel {\boldsymbol{c}}{\parallel }_{r}{\epsilon }^{-1})}^{\frac{\log 2}{2p}}{n}^{2}m \right).$$
(22)
Assuming \(p\,\ll\, \log\, \sqrt{2}\), r ≈ 1.
The approach consists in separately simulating each Pauli observable composing O, allocating to Pi a cutoff budget of \({\ell }_{i}=\frac{1}{2p+\log 2}\log | {c}_{i}| +\,\text{const.}\,\), which gives the minimal error for a given total runtime. The procedure is highly parallelisable meaning that the actual running time can be reduced considerably from these estimates.
Comparing this with the maximum number of shots required by a quantum computer to approximate a composite observable with precision ϵ, \({N}_{s}=\parallel {\boldsymbol{c}}{\parallel }_{1}^{2}{\epsilon }^{-2}\)46, we conclude that when \(p\ll\, \log\, \sqrt{2}\) lowesa incurs a cost at most polynomially larger than usual sampling cost. Once again, in practical scenarios the factor of p−1 will dominate the exponent but this does not invalidate the claim of classical simulatability. Thus we conclude that any expectation value that can be measured efficiently on a quantum computer may be efficiently simulated using our algorithm.
General Pauli noise models
The result can be extended to cover multi-qubit Pauli noise affecting all gates, not just the parameterised ones. In Methods we prove the more general result:
Theorem 4
Consider an n-qubit VQA under the noise model
$${\tilde{{\mathcal{U}}}}_{{\boldsymbol{\theta }}}=\left({\bigcirc}_{i = 1}^{m}\,{{\mathcal{M}}}_{i}\,{\circ}\, {{\mathcal{C}}}_{i}\,{\circ}\, {{\mathcal{N}}}_{i}\,{\circ}\, {{\mathcal{R}}}_{i}({\theta }_{i})\right)\,{\circ}\, {{\mathcal{M}}}_{0}\,{\circ}\, {{\mathcal{C}}}_{0}$$
(23)
where \(\{{{\mathcal{M}}}_{i}\}\) are n-qubit Pauli channels with layer-dependent noise parameters and every rotation is followed by a local multi-qubit Pauli noise \({{\mathcal{N}}}_{i}{ = \bigotimes }_{j = 1}^{n}{{\mathcal{N}}}_{Pauli}^{(j)}({p}_{X}^{ij},{p}_{Y}^{ij},{p}_{Z}^{ij})\) with \({p}^{{\prime} }=\mathop{\min}\nolimits_{ij\sigma }\{{p}_{\sigma }^{ij}\} > 0\), which depends on both layer and qubit. Then, for any weight cutoff \(\ell \in {\mathbb{N}}\), lowesa (Algorithm 1) with modified process modes \({{\mathcal{D}}}_{{\boldsymbol{\omega }}}^{{\prime} }=({\bigcirc}_{i = 1}^{m}{{\mathcal{M}}}_{i}\,{\circ}\, {{\mathcal{C}}}_{i}\,{\circ}\, {{\mathcal{D}}}_{{\omega }_{i}})\,{\circ}\, {{\mathcal{C}}}_{0}\,{\circ}\, {{\mathcal{M}}}_{0}\) and coefficients \({d}_{{\boldsymbol{\omega }}}^{{\prime} }=\sqrt{{2}^{n}}\left\langle \left\langle P| {{\bf{D}}}_{{\boldsymbol{\omega }}}^{{\prime} }| 0\right\rangle \right\rangle\) returns an approximation \(\tilde{g}\) for the cost function \(\tilde{f}\) with error
$$\Delta (\tilde{f},\tilde{g})\le {(1-2{p}^{{\prime} })}^{\ell +1}\le {e}^{-2{p}^{{\prime} }\ell }$$
(24)
and runs in time at most O(n2m2ℓ).
The result relies on the fact that any Pauli channel will map a propagated Pauli operator to itself, up to a proportionality factor that can be at most 1. In other words, this means that each of the modified process modes \({{\mathcal{D}}}_{{\boldsymbol{\omega }}}^{{\prime} }\) will act similarly to the previously considered modes \({{\mathcal{D}}}_{{\boldsymbol{\omega }}}\) arising from the simplified error model, so that \({{{\bf{D}}}^{{\prime} }}_{{\boldsymbol{\omega }}}\left.\left\vert P\right\rangle \right\rangle \propto {{\bf{D}}}_{{\boldsymbol{\omega }}}\left.\left\vert P\right\rangle \right\rangle\). Therefore the proof and the bounds follow in the same way as for Theorem 1. The only modification to the algorithm is that to compute \({d}_{{\boldsymbol{\omega }}}^{{\prime} }\) one must also keep track of these proportionality factors along with the propagated Pauli.
For generality we have not assumed that the all noise coefficients of \(\{{{\mathcal{M}}}_{i}\}\) are bigger than 0, thus it is difficult to improve upon the upper bound on the approximation error Δ since one can be in a situation where along the paths of weight ∣ω∣ = ℓ + 1 the proportionality factors might all be 1 when propagating the Pauli operator through each Pauli channel \({{\mathcal{M}}}_{i}\). In practical situations the Clifford gates will come with a depolarising component and the bound can be improved. For instance, let’s assume that in the decomposition of the n qubit Clifford operator \({{\mathcal{C}}}_{i}\) into primitive (single and two-qubit) gates each incurs a local single-qubit depolarising channel \({{\mathcal{N}}}_{dep}\) with error probability η. Then it follows we can find a tighter bound
$$\Delta (\tilde{f},\tilde{g})\le {(1-2{p}^{{\prime} })}^{\ell +1}{(1-\eta )}^{\ell +1}\le {e}^{-(2{p}^{{\prime} }+\eta )\ell }.$$
(25)
This comes from the fact that \({{\mathcal{N}}}_{dep}^{\dagger }(P)=(1-\eta )P\) if P ∈ {X, Y, Z} and \({{\mathcal{N}}}_{dep}^{\dagger }(I)=I\) along with the previous observation that for valid paths leading to non-zero coefficients, \({{\mathcal{D}}}_{\pm 1}\) are applied to qubit qi whenever the propagated Pauli on qubit qi is not I or Z. Therefore the noise from the Clifford part will contribute and at the very least contract by a factor of (1 − η) whenever we have a branching possibility to apply either \({{\mathcal{D}}}_{+1}\) or \({{\mathcal{D}}}_{-1}\), which are the only contributors to the total weight ∣ω∣. Note that this type of noise model has previously been considered in the context of noisy random circuit sampling7. The corresponding result for non-Pauli observables can be obtained similarly as before, giving the same additional factor in the runtime.
Fixed (unparameterised) non-Clifford gates
The extension of lowesa to the case where non-Clifford unparameterised gates are present is straightforward. As was done in ref. 47, one may treat non-Clifford rotation gates as parameterised rotation gates that have their parameters fixed on at a later stage. A circuit with t fixed z-rotation gates and m parameterised z-rotation gates may be transformed into a circuit with m + t z-rotations for simulation purposes, obtaining a cost function F(θ, ϕ). Then the intended cost function is obtained by fixing ϕ. It follows that any statement on the simulation runtime still applies with the substitution m → m + t. However, getting an error bound with non-Clifford gates is more complicated, since we can no longer average over the expanded parameter space owing to the fixed gates. We can still make a weaker probabilistic statement, proven in Methods.
Theorem 5
Consider a variational circuit consisting of m uncorrelated noisy parameterised rotation gates, and t noisy rotation gates with fixed random angles independently and uniformly distributed in [0, 2π]t. The noise model is that of Theorem 4. Then for weight cutoff \(\ell \in {\mathbb{N}}\), with probability ≥1 − δ the simulation error of lowesa (Algorithm 1) with modified process modes obeys
$$\Delta (\tilde{f}-\tilde{g})\le {e}^{-2{p}^{{\prime} }\ell }{\delta }^{-1/2}$$
(26)
and the Algorithm runs in time O(n2(m + t)2ℓ).
Theorem 5 implies that for a typical choice of the ϕ angles the error is still exponentially suppressed in ℓ. For fixed δ, one would choose \(\ell \approx \frac{1}{2{p}^{{\prime} }}(\log {\epsilon }^{-1}+\frac{1}{2}\log {\delta }^{-1})\), giving a runtime which is only slightly worse than the one from the previous Theorems, for reasonable choices of δ.
The case of correlated parameters
The main result has been derived assuming that the parameters controlling the rotation gates in the circuits are uncorrelated. One may therefore wonder whether it extends to correlated parameter circuits, which are ubiquitous in quantum machine learning48 as well as forming the basis of algorithms like the Quantum Approximate Optimisation Algorithm (QAOA)32,49 or the Hamiltonian Variational Ansatz (HVA) for chemistry problems31,46.
However, the argument used in the proof of Theorem 1 does not hold since with correlated angles the basis functions are no longer orthogonal over the correlated parameter space. For example, consider the following case where
$${\Phi }_{2}(\theta )={\cos }^{2}(\theta ),\,\,\,{\Phi }_{-2}(\theta )={\sin }^{2}(\theta )$$
(27)
$$\Rightarrow \frac{1}{2\pi }\int\,{\Phi }_{2}(\theta ){\Phi }_{-2}(\theta )d\theta =\frac{1}{8}\,\ne\, 0$$
(28)
Interestingly, we find that for correlated angles systems the simulation algorithm frequently returns a trivial result. Consider the following 1-qubit correlated parameter circuit
$${U}_{d}(\theta )={({R}_{x}(\theta ))}^{d}$$
(29)
It is simple to show that when Ud(θ) is applied to the initial state \(\left\vert 0\right\rangle\), then measuring the Pauli Z produces the cost function
$${f}_{d}(\theta )=\cos (d\theta )=\mathop{\sum }\limits_{i=0}^{\lfloor d/2\rfloor }{(-1)}^{i}\left(\begin{array}{c}d\\ 2i\end{array}\right)\,{\sin }^{2i}(\theta ){\cos }^{d-2i}(\theta )$$
(30)
whose terms are all of weight d. Therefore, any reconstruction with weight ℓ < d would trivially return \(\tilde{g}=0\). This behaviour can be generalised to any circuit composed of d repeated, identical, and independently parameterised layers
$$U({\boldsymbol{\theta }})=\mathop{\prod }\limits_{i=1}^{d}V({{\boldsymbol{\theta }}}_{i})=\mathop{\prod }\limits_{i=1}^{d}\left(\mathop{\prod }\limits_{j=1}^{h}{e}^{-i{H}_{j}{\theta }_{ij}}\right),$$
(31)
where each layer is generated by the same h Hamiltonians. It can be observed that both QAOA and HVA ansatzes fit in the prescription. In this situation, for lowesa to produce a non-zero approximation function \(\tilde{g}\), we show (see Methods) that the cutoff value ℓ has to be greater than the number of repeated layers.
Theorem 6
Given U(θ) as in Eq. (31) and a Pauli operator P that does not commute with at least one of the generators {Hj}. If the cutoff ℓ < d then lowesa produces a trivial approximation \(\tilde{g}=0\) of the noisy expectation value at any noise level.
This result implies that the complexity requirements lowesa will scale exponentially Ω(2d) with the number of layers. Correlating the angles further, for example by setting θ1 = θ2 = ⋯ = θp does not affect the validity of the result. Improvements to the runtime may be possible if the number of valid paths can be reduced, for instance by leveraging symmetries in the circuit.
While the simulation algorithm may appear to fail for correlated angles since the output is constant for ℓ < d, in fact we have not considered that the simulation RMSE may still be small if the noisy cost function \(\tilde{f}\) has a small variance, since (assuming ℓ < d and \({{\mathbb{E}}}_{{\boldsymbol{\theta }}}\tilde{f}=0\))
$${\Delta }^{2}(\tilde{f},\tilde{g})={\Delta }^{2}(\tilde{f},0)={{\mathbb{E}}}_{{\boldsymbol{\theta }}}{\tilde{f}}^{2}={{\rm{Var}}}_{{\boldsymbol{\theta }}}(\tilde{f}).$$
(32)
Indeed this is the case due to the phenomenon of noise-induced barren plateaus: for our model the cost function variance would decay with depth as \(O({e}^{-pd/\ln 2})\) [13, Lemma 1]. Therefore it is still possible that the result in Theorem 1 may hold for correlated parameter VQAs too. For now, however, we are unable to conclusively demonstrate it, so this leaves room for a quantum advantage in QAOA and HVA, as well as in simulating time evolution on noisy quantum devices, as such tasks commonly involve repeated gate patterns.