Almost all quantum channels are equidistant

Ion Nechita4 , Zbigniew Puchaa1,2 , ukasz Pawela1 and Karol yczkowski2,3

arXiv:1612.00401v1 [quant-ph] 1 Dec 2016

Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Batycka

5, 44-100 Gliwice, Poland
Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University,
ulica prof. Stanisawa ojasiewicza 11, Krakw, Poland
Zentrum Mathematik, M5, Technische Universitat M
unchen, Boltzmannstrasse 3,
85748 Garching, Germany and CNRS, Laboratoire de Physique Theorique, IRSAMC,
Universite de Toulouse, UPS, F-31062 Toulouse, France
December 2, 2016

In this work we analyze properties of generic quantum channels in the case of large
system size. We use the random matrix theory and free probability to show that the distance
between two independent random channels tends to a constant value as the dimension of
the system grows larger. As a measure of the distance we use the diamond norm. In the
case of a flat Hilbert-Schmidt distribution on quantum channels, we obtain that the distance
converges to 21 + 2 . Furthermore, we show that for a random state acting on a bipartite
Hilbert space HA HB , sampled from the Hilbert-Schmidt distribution, the reduced states
TrA and TrB are arbitrarily close to the maximally mixed state. This implies that, for
large dimensions, the state may be interpreted as a Jamiokowski state of a unital map.


We will denote the set of density operators on X by (X ). We will denote the set of quantum
channels : L(X ) L(Y) by (X , Y). We will put (X , X ) (X ).
For any linear map : Md1 (C) Md2 (C), we define it Choi-Jamiokowski matrix as
J() :=


|iihj| (|iihj|) Md1 (C) Md2 (C).



This isomorphism was first studied by Choi [7] and Jamiokowski [18]. Note that some authors
prefer to add a normalization factor of d1
1 if front of the expression for J(). Other authors
use the other order for the tensor product factors, a choice resulting in an awkward order for
the space in which J() lives.
The rank of the matrix J() is called the Choi rank of ; it is the minimum number r such
that the map can be written as
() =


Ai Bi ,

for some operators Ai , Bi Md2 d1 (C).

Given two pairs of unitary operators, U1,2 U(d1 ), respectively V1,2 U(d2 ), define the
rotated map
U1 ,U2 ;V1 .V2 () := V1 (U1 U2 )V2 .
It is an easy exercise to check that the Choi-Jamiokowski matrix of the rotated map is given
J(U1 ,U2 ;V1 .V2 ) := (U1> V1 )J()(U2> V2 ) .
The transposition appearing in the equation above is due to the following key fact (here U is
an arbitrary unitary operator):
(I U )
|ii |ii = (U > I)
|ii |ii.

The diamond norm was introduced in Quantum Information Theory by Kitaev [17, Section
3.3] as a counterpart to the 1-norm in the task of distinguishing quantum channels. First, define
the 1 1 norm of a linear map : Md! (C) Md2 (C) as
kk11 :=

X k(X)k1


Kitaev noticed that the 1 1 norm is not stable under tensor products (as it can easily be
seen by looking at the transposition map), and considered the following regularization:
kk := sup k idn k11 .

In operator theory, the diamond norm was known before as the completely bounded trace norm;
indeed, the 1 1 norm of an operator is the norm of its dual, hence the diamond
norm of is equal to the completely bounded (operator) norm of (see [24, Chapter 3]).
We shall need two simple properties of the diamond norm. First, note that the supremum
in the definition can be replaced by taking the value n = d1 (recall that d1 is the dimension of
the input Hilbert space of the linear map ); actually, one could also take n equal to the Choi
rank of the map , see [28, Theorem 3.3] or [29, Theorem 3.66]. Second, using the fact that the
extremal points of the unit ball of the 1-norm are unit rank matrices, we always have
kk = sup{k( idd1 )(|xihy|)k1 : x, y Cd1 Cd1 , kxk = kyk = 1}.
Moreover, if the map is Hermiticity-preserving (e.g. is the difference of two quantum
channels), one can optimize over x = y in the formula above, see [29, Theorem 3.53].
Given a map , it is in general difficult to compute its diamond norm. Computationally,
there is a semidefinite program for the diamond norm, [30], which has a simple form and which
has been implemented in various places (see, e.g. [20]). We bound next the diamond norm in
terms of the partial trace of the absolute value of the Choi-Jamiokowski matrix.
The diamond norm has application in the problem of quantum channel discrimination.
Suppose we have an experiment in which our goal is to distinguish between two quantum
channels and . Each of the channels may appear with probability 21 . Then, celebrated
result by Helstrom [15] gives us an upper bound on the probability of correct discrimination

1 1
+ k k .
2 4


From the discussion in this section we readily arrive at


1 1
+ kTr2 |J( )|k
2 4


The main goal of this work is to show the asymptotic behavior of the diamond norm. To
achieve this, in Section 2 we find a new upper bound of on the diamond norm of a general map.
In the case of a Hermiticity preserving map it has a nice form
kk kTr2 J()k .


Next, in Section 4 we prove that the well known lower bound on the diamond norm kJ(
)k1 k k converges to a finite value for random independent quantum channels and
in the limit d1,2 . We obtain that for channel sampled from the flat Hilbert-Schmidt
distribution the value of the lower bound is
lim kJ( )k1 =


1 2


The general case is discussed in-depth in the aforementioned section. Finally, in Section 5 we
show that the upper bound also converges to the same value as the lower bound. From these
results we have for channels sampled from the Hilbert-Schmidt distribution
lim k k =


1 2


Some useful bounds for the diamond norm

We discuss in
this section some bounds for the diamond norm. For a matrix X, we denote by

X X and XX its right and left absolute values, i.e.

X X = V V
XX = U U ,
X =U V is the SVD of X. In the case where X is self-adjoint, we obviously have

X X = XX .
In the result below, the lower bound is well-known, while the upper bound appear in a
weaker and less general form in [19, Theorem 2].
Proposition 1 For any linear map : Md1 (C) Md2 (C), we have
k Tr2 J() J()k + k Tr2 J()J() k
kJ()k1 kk


Proof. Consider the semidefinite programs for the diamond norm given in [30, Section 3.2]:
Primal problem
subject to:

Dual problem

hX, J()i + hX , J() i

0 Id2
1 Id2

subject to:

k Tr2 Y0 k + k Tr2 Y1 k

Y0 , Y1 Md+1 d2 (C)

0 , 1 Md1,+
X Md1 d2 (C)

The lower and upper bounds will follow from very simple feasible points for the primal,
resp. the dual problems. Let J() = U V be a SVD of the Choi-Jamiokowski state of the

linear map. For the primal problem, consider the feasible point 0,1 = d1
1 Id1 and X = d1 U V .
The value of the primal problem at this point is
hU V , U V |J()|i + hV U , |J()|V U i = kJ()k1 ,

showing the lower bound.

For the upper bound, set Y0 = J()J() = U U and Y1 = J() J() = V V ,
both PSD matrices. The condition in the dual problem is satisfied:



U 0
1 1
U 0

0 V
1 1
0 V
and the proof is complete.
Remark 2 If the map is Hermiticity-preserving (i.e. the matrix J() is self-adjoint), the
inequality in the statement reads simply
kJ()k1 kk k Tr2 |J()|k .
3 The two bounds in (9) are equal iff the PSD matrices := Tr2 J() J() and
:= Tr2 J()J() are both scalar. Indeed, the lower bound in (9) can be rewritten as
kJ()k1 =
Tr =
Tr ,
and the two bounds are equal exactly when the spectra of and ae flat.
Remark 4 The upper bound in (9) can be seen as a strengthening of the following inequality
kk kJ()k1 , which already appeared in the literature (e.g. [29, Section 3.4]). Indeed, again
in terms of and , we have kk kk1 and kk kk1 .

Discriminating random quantum channels


Probability distributions on the set of quantum channels

There are several ways to endow the convex body of quantum channels with probability distributions. In this section, we discuss several possibilities and the relations between them.
Recall that the Choi-Jamiokowski isomorphism puts into correspondence a quantum channel : Md1 (C) Md2 (C) with a bipartite matrix J() Md1 (C)Md2 (C) having the following
two properties
J() is positive semidefinite
Tr2 J() = Id1 .
The above two properties correspond, respectively, to the fact that is complete positive and
trace preserving. Hence, it is natural to consider probability measures on quantum channels
obtained as the image measures of probabilities on the set of bipartite matrices with the above
properties. Given some fixed dimensions d1 , d2 and a parameter s d1 d2 , let G Md1 d2 s (C)
be a random matrix having i.i.d. standard complex Gaussian entries; such a matrix is called a
Ginibre random matrix. Define then
W := GG Md1 (C) Md2 (C)


D := (Tr2 W )1/2 Id2 W (Tr2 W )1/2 Id2 Md1 (C) Md2 (C).


The random matrices W and D are called, respectively, Wishart and partially normalized
Wishart. The inverse square root in the definition of D uses the Moore-Penrose convention
if W is not invertible; note however that this is almost never the case, since the Wishart matrices with parameter s larger than its size is invertible with unit probability. It is for this reason

we do not consider here smaller integer parameters s. Note that the matrix D satisfies the
two conditions discussed above: it is positive semidefinite and its partial trace over the second
tensor factor is the identity:
Tr2 D = Tr2 (Tr2 W )1/2 Id2 W (Tr2 W )1/2 Id2
= (Tr2 W )1/2 (Tr2 W ) (Tr2 W )1/2 = Id1 .
Hence, there exists a quantum channel G , such that J(G ) = D (note that D, and thus are
functions of the original Ginibre random matrix G).
Definition 1 The image measure of the Gaussian standard measure through the map G 7 G
defined in (10), (11) and the equation J(G ) = D is called the partially normalized Wishart
measure and is denoted by dW1 ,d2 ,s .
Another way of introducing a probability distribution on the set of quantum channels is via
the Stinespring dilation theorem [27]: for any channel : Md1 (C) Md2 (C), there exists, for
some given s d1 d2 , an isometry V : Cd1 Cd2 Cs such that
() = Tr2 (V V ).


Definition 2 For any integer parameter s, let dH1 ,d2 ,s be the image measure of the Haar distribution on isometries V through the map in (12).
Finally, one can consider the Lebesgue measure on the convex body of quantum channels,
In this work, we shall however be concerned only with the measure W coming from
normalized Wishart matrices.
dL1 ,d2 .


The (twoparameter) subtracted Mar

cenko-Pastur distribution

In this section we introduce and study the basic properties of a two-parameter family of probability measures which will appear later in the paper. This family generalizes the symmetrized
Marcenko-Pastur distributions from [26], see also [13, 23] for other occurrences of some special cases. Before we start, recall that the Marcenko-Pastur (of free Poisson) distribution of
parameter x > 0 has density given by [22, Proposition 12.11]
4x (u 1 x)2
dMP x = max(1 x, 0)0 +
1[a,b] (u) du,

where a = ( x 1)2 and b = ( x + 1)2 .

Definition 3 Let a, b be two free random variables having Marcenko-Pastur distributions with
respective parameters x and y. The distribution of the random variable a/x b/y is called the
subtracted Marcenko-Pastur distribution with parameters x, y and is denoted by SMP x,y . In
other words,
SMP x,y = D1/x MP x  D1/y MP y .


We have the following result.

Proposition 5 Let Wx (resp. Wy ) be two Wishart matrices of parameters (d, sx ) (resp (d, sy )).
Assuming that sx /d x and sy /d y for some constants x, y > 0, then, almost surely as
d , we have
2 1
2 1
lim k(xd ) Wx (yd ) Wy k1 = |u| dSMP x,y (u) =: (x, y).

Proof. The proof follows from standard arguments in random matrix theory, and from the fact
that the Schatten 1-norm is the sum of the singular values, which are the absolute values of the
eigenvalues in the case of self-adjoint matrices.
We gather next some properties of the probability measure SMP x,y . Examples of this
distribution are shown in Fig. 1.
Proposition 6 Let x, y > 0. Then,
1. If x + y < 1, then the probability measure SMP x,y has exactly one atom, located at 0, of
mass 1 (x + y). If x + y 1, then SMP x,y is absolutely continuous with respect to the
Lebesgue measure on R.
2. Define
Ux,y (u) = 9u2 (x + y + 2) 9u(x y)(x + y 1) + 2(x + y 1)3
Tx,y (u) = (x + y 1)2 + 3u(y x + u)
Yx,y (u) = Ux,y (u) + [Ux,y (u)]2 4 [Tx,y (u)]3 .


The support of the absolutely continuous part of SMP x,y is the set
{u : [Ux,y (u)]2 4 [Tx,y (u)]3 0}.


This set is the union of two intervals if y (0, yc ) and it is connected when y yc , with
yc = x + 3(2x)2/3 6(2x)1/3 + 4.
3. On its support, the density of SMP x,y is given by

[Y (u)] 32 2 23 T (u)
dSMP x,y

2 43 3u [Y (u)] 13



Proof. The statement regarding the atoms follows from [4, Theorem 7.4]. The formula for
the density and equation (15) comes from Stieltjes inversion, see e.g. [22, Lecture 12]. Indeed,
since the R-transform of the Marcenko-Pastur distribution MP x reads Rx (z) = x/(1 z), the
R-transform of the subtracted measure reads
R(z) =


1 z/x 1 + z/y

The Cauchy transform G of SMP x,y is the functional inverse of K(z) = R(z) + 1/z. To write
down the explicit formula for G, one has to solve a degree 3 polynomial equation, and we omit
here the details.
The statement regarding the number of intervals of the support follows from (15). The
inequality is given by a polynomial of degree 6 which factorizes by u2 , hence an effective degree
4 polynomial. The nature of roots of this polynomial is given by the sign of its discriminant,
which, after some algebra, is the same as the sign of y yc , see [35].
In the case where x = y, some of the formulas from the result above
become simpler (see also
[26]). The distribution SMP x,x is supported between u = 12 10x x2 + (x + 4) 2 x + 2.
Finally, in the case when x = y = 1, which corresponds to a flat Hilbert-Schmidt measure on
the set of quantum channels, we get that (1, 1) = 21 + 2 .

0. 4

0. 3

0. 2

0. 15

u 3

(a) x = 1, y = 1

0. 5

0. 25

0. 25

2 u 4

(b) x = 1, y = 2

0. 5

2 u 4

(c) x = 0.5, y = 1

2 u 4

(d) x = 0.2, y = 0.5

Figure 1: Subtracted Marcenko-Pastur distribution for (x, y)=(1,1) (a), (1, 2) (b), (0.5, 1) (c)
and (0.2, 0.5) (d). The red curve is the plot of (16), while the black histogram corresponds to
Monte Carlo simulation.


The asymptotic diamond norm of the difference of two random quantum


We state here the main result of the paper. For the proof, see the following two sections, each
providing one of the bounds needed to conclude.
Theorem 7 Let , resp. , be two independent random quantum channels from (d1 , d2 )
having W distribution with parameters (d1 , d2 , sx ), resp. (d1 , d2 , sy ). Then, almost surely as
d1,2 in such a way that sx /(d1 d2 ) x, sy /(d1 d2 ) y (for some positive constants x, y),
and d1  d22 ,
lim k k = (x, y) =


|u| dSMP x,y (u).

Proof. The proof follows from Theorems 9 and 13, which give the same asymptotic value.
Corollary 8 Combining Theorem 7 with Hellstroms theorem for quantum channels, we get
that the probability p of distinguishing two quantum channels is equal to:

8 2

Additionally, any maximally entangled state may be used to achieve this value.


The lower bound

In this section we compute the asymptotic value of the lower bound in Theorem 7. Given
two random quantum channels , , we are interested in the asymptotic value of the quantity
1 kJ( )k1 .
Theorem 9 Let , resp. , be two independent random quantum channels from (d1 , d2 )
having W distribution with parameters (d1 , d2 , sx ), resp. (d1 , d2 , sy ). Then, almost surely as
d1,2 in such a way that sx /(d1 d2 ) x and sy /(d1 d2 ) y for some positive constants
x, y,
kJ( )k1 = (x, y) = |u| dSMP x,y (u).
d1,2 d1
The proof of this result (as well as the proof of Theorem 9) uses in a crucial manner the
approximation result for partially normalized Wishart matrices.
Proposition 10 Let W Md1 (C) Md2 (C) a random Wishart matrix of parameters (d1 d2 , s),
and consider its partial normalization D as in (11). Then, almost surely as d1,2 in such
a way that s td1 d2 for a fixed parameter t > 0,

D (td1 d22 )1 W = O(d2 ).
Note that in the statement above, the matrix W is not normalized; we have
d1 d2
1 X
i ((d1 d2 )1 W ) MP t ,
d1 d2

the Marchenko-Pastur distribution of parameter t. In other words, W = GG , where G is

random matrix of size d1 d2 s, having i.i.d. standard complex Gaussian entries.
Let us introduce the random matrices
X = (td1 d22 )1 Tr2 W

and Y = X 1/2 Id2 .

The first observation we make is that the random matrix X is also a (rescaled) Wishart
matrix. Indeed, the partial trace operation can be seen, via duality, as a matrix product, so we
can write
GG ,
td1 d22
is a complex Gaussian matrix of size d1 d1 s; remember that s scales like td1 d2 .
where G
Since, in our model, both d1 , d2 grow to infinity, the behavior of the random matrix X follows
from [10].

Lemma 11 As d1,2 , the random matrix td2 (X Id1 ) converges in moments toward a
standard semicircular distribution. Moreover, almost surely, the limiting eigenvalues converge
to the edges of the support of the limiting distribution:

td2 min (X Id1 ) 2

td2 max (X Id1 ) 2.

Proof. The proof is a direct application of [10, Corollary 2.5 and Theorem 2.7]; we just need
to check the normalization factors. In the setting of [10, Section 2], the Wishart matrices are

not normalized, so the convergence result deals with the random matrices (here d = d1 and
s = td1 d22 )


= td2 (X Id1 ).
td1 d2
td1 d2

We look now for a similar result for the matrix Y ; the result follows by functional calculus.

Lemma 12 Almost surely as d1,2 , the limiting eigenvalues of the random matrix td2 (Y
Id1 d2 ) converge respectively to 1:

td2 min (Y Id1 d2 ) 1

td2 max (Y Id1 d2 ) 1.

Proof. By functional calculus, we have max (Y ) = [min (X)]1/2 , so, using the previous lemma,
we get

1 2
max (Y ) = 1
+ o(d2 )
+ o(d1
2 ),
2 td2
and the conclusion follows. The case of min (Y ) is similar.
We have now all the ingredients to prove Proposition 10.
Proof of Proposition 10. We have

D (td1 d22 )1 W = (td1 d22 )1 (Y W Y W )

= (td1 d22 )1 (Yi I)Wi Yi + Wi (Yi I)
(td1 d22 )1 kYi IkkWi k (1 + kYi k)
td2 kYi Ik (d1 d2 )1 kWi k (1 + kYi k) .
Note that, almost surely, the three random matrix norms in the last equation above converge
respectively to the following finite quantities

td2 kYi Ik 1

(d1 d2 )1 kWi k ( t + 1)2

1 + kYi k 1.
The first and the third limit above follow from Lemma 12, while the second one is the Bai-Yin
theorem [3, Theorem 2] or [2, Theorem 5.11].
Let us now prove Theorem 9.
Proof of Theorem 9. The result follows easily by approximating the partially normalized
Wishart matrices with scalar normalizations. By the triangle inequality, with Dx := J() and
Dy := J(), we have

kDx Dy k1 1 k(xd1 d22 )1 Wx (yd1 d22 )1 Wy k1

kDx (xd1 d22 )1 Wx k1 + kDy (yd1 d22 )1 Wy k1
2 1
d2 kDx (xd1 d2 ) Wx k + d2 kDy (yd1 d22 )1 Wy k .
The conclusion follows from Propositions 5 and 10.

The upper bound

The core technical result of this work consists of deriving the asymptotic value of the upper
bound in Theorem 7. Given two random quantum channels , , we are interested in the
asymptotic value of the quantity k Tr2 |J( )|k .
Theorem 13 Let , resp. , be two independent random quantum channels from (d1 , d2 )
having W distribution with parameters (d1 , d2 , sx ), resp. (d1 , d2 , sy ). Then, almost surely as
d1,2 in such a way that sx /(d1 d2 ) x, sy /(d1 d2 ) y (for some positive constants x, y),
and d1 /d22 0,
lim k Tr2 |J( )|k = (x, y) = |u| dSMP x,y (u).

The proof of Theorem 13 is presented at the end of this Section. It is based on the following
lemma which appears in [12], the proof being left to the reader; see also [6, Eq. (5.10)] or [5,
Chapter X].
Lemma 14 For any matrices A, B of size d, the following holds:
k |A| |B| k C log d kA Bk,


for a universal constant C which does not depend on the dimension d.

For the sake of completeness, we give here a proof, relying on a similar estimate for the Schatten
classes proved in [12].
Proof. Using [12, Theorem 8], we have, for any p [2, ):
k |A| |B| k k |A| |B| kp
4(1 + cp)kA Bkp
4(1 + cp)d1/p kA Bk ,
for some universal constant c 1. Choosing p = log d gives the desired bound, for d large
enough. The case of small values of d is obtained by a standard embedding argument.
Proof of Theorem 13.
Using the triangle inequality and Lemma 14, we first prove an
approximation result (as before, we write Dx := J() and Dy := J()):

k Tr2 |Dx Dy | k k Tr2 |(xd1 d22 )1 Wx (yd1 d22 )1 Wy | k

Tr2 |Dx Dy | Tr2 |(xd1 d22 )1 Wx (yd1 d22 )1 Wy |

= Tr2 |Dx Dy | |(xd1 d22 )1 Wx (yd1 d22 )1 Wy |

d2 |Dx Dy | |(xd1 d22 )1 Wx (yd1 d22 )1 Wy |

Cd2 log(d1 d2 ) (Dx Dy ) ((xd1 d22 )1 Wx (yd1 d22 )1 Wy )

Cd2 log(d1 d2 ) Dx (xd1 d22 )1 Wx + Dy (yd1 d22 )1 Wy

log(d1 d2 )
O(1) 0,

where we have used Proposition 10 and the fact that d1  d22 = log(d1 )  d2 . This proves
the approximation result, and we focus now on the simpler case of Wishart matrices. Let us
Z := (xd1 d2 )1 Wx (yd1 d2 )1 Wy
Z1 := tr2 (|Z|) = Tr2 |(xd1 d22 )1 Wx (yd1 d22 )1 Wy |

It follows from [16, Proposition 4.4.9] that the random matrix Z converges almost surely (see
Appendix A for the definition of almost sure convergence for a sequence of random matrices)
to a non-commutative random variable having distribution SMP x,y , see (13). Moreover, using
a standard strong convergence argument [21], the extremal eigenvalues of Z converge almost
surely to the extremal points of the support of the limiting probability measure SMP x,y . Hence,
the almost sure convergence extends from the traces of the powers of Z to any continuous
bounded function (on the support of SMP x,y ), in particular to the absolute value, i.e. to |Z|.
From Proposition 16, the asymptotic spectrum of the random matrix Z1 is flat, with all the
eigenvalues being equal to
Tr |(xd1 d2 )1 Wx (yd1 d2 )1 Wy |
a = lim E
= |u|dSMP x,y (u),
d1 ,d2
d1 d2
which, by Proposition 5, is equal to (x, y), finishing the proof.

Remark 15 We think that the condition d1  d2 in the statement can be replaced by a much
weaker condition.

Concluding remarks

In this work we analyzed properties of generic quantum channels concentrating on the case
of large system size. Using tools provided by the theory of random matrices and the free
probability calculus we showed that the diamond norm of the difference between two random
channels asymptotically tends to a constant specified in Theorem 7. In the case of channels
corresponding to the simplest case x = y = 1, the limit value of the diamond norm of the
difference is (1, 1) = 1/2 + 2/. In Fig. 2 we illustrate the convergence of the upper and lower
bound to this value. This statement allows us to quantify the mean distinguishability between
two random channels.
To arrive at this result we considered an ensemble of normalized random density matrices,
acting on a bipartite Hilbert space HA HB , and distributed according to the flat (HilbertSchmidt) measure. Such matrices, can be generated with help of a complex Ginibre matrix G
as = GG /TrGG . In the simplest case of square matrices G of order d = d21 the average trace
distance of a random
state from the maximally mixed state = I/d behaves asymptotically
as || ||1 3 3/4 [26]. However, analyzing both reduced matrices A = TrB and
B = TrA we can show that they become  close to the maximally mixed state in sense of the
operator norm, so that their smallest and largest eigenvalues do coincide.
This observation implies that the state can be directly interpreted as a Jamiokowski
state J representing a stochastic map , as its partial trace A is proportional to identity.
Furthermore, as it becomes asymptotically equal to the other partial trace B , it follows that a
generic quantum channel (stochastic map) becomes unital and thus bistochastic.
The partial trace of a random bipartite state is shown to be close to identity provided the
support of the limiting measure characterizing the bipartite state is bounded. In particular,
this holds for a family of subtract MarchenkoPastur distributions defined in Eq. (13) as a free
additive convolution of two rescaled MarchenkoPastur distributions with different parameters
and determining the density of a difference of two random density matrices. In this way we
could establish the upper bound for the average diamond norm between two channels and show
that it asymptotically converges to the lower bound (x, y) given in Theorem 9. The results
obtained can be understood as an application of the measure concentration paradigm [1] to the
space of quantum channels.
Acknowledgments. I.N. would like to thank Anna Jencova and David Reeb for very insightful
discussion regarding the diamond norm, which led to several improvements and simplifications


Diamond norm






Figure 2: The convergence of upper (green circles) and lower (blue triangles) bounds on the
distance between two random quantum channels sampled from the Hilbert-Schmidt distribution
(d1 = d2 = d). The results were obtained via Monte Carlo simulation with 100 samples for each
data point.
of the proof of Proposition 1. I.N.s research has been supported by the ANR projects RMTQIT
ANR-12-IS01-0001-01 and StoQ ANR-14-CE25-0003-01, as well as by a von Humboldt fellowship.

On the partial traces of unitarily invariant random matrices

In this section we show a general result about unitarily invariant random matrices: under some
technical convergence assumptions, the partial trace of a unitarily invariant random matrix is
flat, i.e. it is close in norm to its average.
Recall that the normalized trace functional can be extended to arbitrary permutations as
follows: for a matrix X Md (C), write
tr (X) :=
Tr(X |c| ).
Recall the following definition from [16, Section 4.3].
Definition 4 A sequence of random matrices Xd Md (C) is said to have almost surely limit
distribution if
p 1,

a.s. lim tr(X p ) =


xp d(x),

Proposition 16 Consider a sequence of hermitian random matrices Ad Md1 (d) (C)Md2 (d) (C)
and assume that
1. Both functions d1,2 (d) grow to infinity, in such a way that d1 /d22 0.
2. The matrices Ad are unitarily invariant.
3. The family (Ad ) has almost surely limit distribution , for some compactly supported
probability measure .

Then, the normalized partial traces Bd := d1

2 [id Tr](Ad ) converge almost surely to multiple
of the identity matrix:
a.s. lim kBd aId1 (d) k = 0,

where a is the average of :

a :=


Proof. In the proof, we shall drop the parameter d , but the reader should remember that
the matrix dimensions d1,2 are functions of d and that all the matrices appearing are indexed
by d. To conclude, it is enough to show that
P lim max (Bd ) = a,

since the statement for the smallest eigenvalue follows in a similar manner. Let us denote by
1 X
b :=
i (B) = tr(1) (B)

1 X
(i (B) b)2 =
i (B)2
i (B) = tr(12) (B) [tr(1) (B)]2
v :=



the average eigenvalue and, respectively, the variance of the eigenvalues of B; these are real
random variables (actually, sequences of random variables indexed by d). By Chebyshevs
inequality, we have
max (B) b + v d1 .

Note that one could replace the d1 factor in the inequality above by d1 1 by using Samuelsons inequality [31, 32], but the weaker version is enough for us.
We shall prove now that b a almost surely and later that d1 v 0 almost surely, which
is what we need to conclude. To do so, we shall use the Weingarten formula [11, 33]. In the
graphical formalism for the Weingarten calculus introduced in [8], the expectation value of an
expression involving a random Haar unitary matrix can be computed as a sum over diagrams
indexed by permutation matrices; we refer the reader to [8] or [9] for the details.
Using the unitary invariance of A, we write A = U diag()U , for a Haar-distributed random
unitary matrix U U(d1 d2 ), and some (random) eigenvalue vector . Note that traces of powers
of A depend only on , so we shall write tr () := tr (A). We apply the Weingarten formula
to a general moment of B, given by a permutation :
EU tr (B) =
EU Tr B |ci | ,

where c1 , . . . , c# are the cycles of Sp , and EU denotes the conditional expectation with
respect to the Haar random unitary matrix U . From the graphical representation of the Weingarten formula [8, Theorem 4.1], we can compute the conditional expectation over U (note that
below, the vector of eigenvalues is still random):
EU tr (B) = d#

#( 1 ) #
d2 (d1 d2 )#


tr () Wgd1 d2 (1 ).



Above, Wg is the Weingarten function [11] and tr () is the moment of the diagonal matrix
#( 1 )

diag() corresponding to the permutation . The combinatorial factors d1

and d#
2 come
from the initial wirings of the boxes respective to the vector spaces of dimensions d1 (initial

wiring given by ) and d2 (initial wiring given by the identity permutation), see Figure 3. The
pre-factors d1# dp
2 contain the normalization from the (partial) traces. Finally, the (random)
factors tr () are the normalized power sums of :
tr () =

1 d2
|w |
(d1 d2 )1
j i ,


where w1 , . . . , w# are the cycles of . Recall that we have assumed almost sure convergence
for the sequence (Ad ) (and, thus, for (d )):

a.s. lim tr () =

# Z

x|wi | d(x) =: m ().




1 (i)

Figure 3: The i-th group in the diagram corresponding to m (B).

As a first application of the Weingarten formula (19), let us find the distribution of the
random variable b = Tr(B)/d1 . Obviously,
1 2 2
EU b = EU tr(1) (B) = d1
1 d2 d1 d2 tr(1) ()

= tr(1) ().
d1 d2


Actually, b does not depend on the random unitary matrix U , since

d1 d2
1 X

Tr(B) =
Tr(A) =
Tr(U diag()U ) =
i = tr(1) ().
d1 d2
d1 d2
d1 d2

From the hypothesis (20) (with = (1)), we have that, almost surely as d , the random
variable b converges to the scalar a = m(1) ().
Let us now move on to the variance v of the eigenvalues. First, we compute its expectation
EU v = tr(12) (B) tr(1)(2) (B). We apply now the Weingarten formula (19) for EU tr(12) (B); the
sum has 2!2 = 4 terms, which we compute below:
= = (1)(2): T1 = d21 d22 tr(1)(2) () d2 d12 1
1 2

= (1)(2), = (12): T2 = tr(12) () d2 d12 1

1 2

= (12), = (1)(2): T3 = d21 tr(12) () d2 d12 1

1 2

= = (12): T4 = d21 tr(12) () d2 d12 1 .

1 2

Combining the expressions above with (21), we get

EU v =

d21 1
(tr(12) () tr(1)(2) ()).
d21 d22 1

Using the hypothesis (20), we have thus, as d1,2 ,

Ev = (1 + o(1))d2
2 (m(12) () m(1)(2) ()).

Let us now proceed and estimate the variance of v; more precisely, let us compute E(v 2 ). As
before, we shall compute the expectation in two steps: first with respect to the random Haar
unitary matrix U , and then, using our assumption (20), with respect to , in the asymptotic
limit. To perform the unitary integration, note that the Weingarten sum is indexed by a couple
(, ) S42 , so it contains 4!2 = 576 terms, see [35]. In Appendix B we have computed the
variance of v with the usage of symmetry arguments. The result, to the first order reads
EU (v 2 ) (EU v)2 = (1 + o(1))2d2
1 d2 [m(12) () m(1)(2) ()] .

Taking the expectation over and the limit (we are allowed to, by dominated convergence), we
Var(v) = (1 + o(1))2d2
1 d2 [m(12) () m(1)(2) ()] .
We put now all the ingredients together:
P( d1 v ) = P(v 2 d1
1 )

1 d2

0 2 2
[2 d1
[2 d1
1 Ev]
1 (1 + o(1))C d2 ]

where C, C 0 non-negative constants depending on the limiting measure . Using d1  d22 , the
dominating term in the denominator above is 2 d1
1 , and thus we have:
P( d1 v ) . C4 d4
2 .
P 4
Since the series
d2 is summable, we obtain the announced almost sure convergence by the
Borel-Cantelli lemma, finishing the proof.

Calculation of the variance

We remind here, that A Md1 (C) Md2 (C) and B = d1

2 [id Tr](A). Because we assume that
A has unitarly invariant distribution, we can write
A = U diag()U =

d2 1

i |Ui ihUi |,


i Tr2 |Ui ihUi |.



where |Ui i = U |ii is i-th column of matrix U and


2 Tr2 A


d2 1

We denote i = Tr2 |Ui ihUi | and consider mixed moments computed in Appendix C.
M(i, j, k, l) = ETr(i j )Tr(k l ).


and symmetric mixed moments

SM(i, j, k, l) = ETr(i j )Tr(k l ) ETr(i j )ETr(k l ).


Proposition 17 Let v = tr(B 2 ) (tr(B))2 we have

V ar(v) =
as d1 , d2 , in the above k =

d1 d2

2(21 2 )2
(1 + o(1))
d21 d42
ki .


Direct computations with the usage of symmetric moments SM give us

1 h
(d1 d2 )4 41 SM(0, 1, 2, 3)
d21 d42

+ 2(d1 d2 )3 2 21 SM(0, 0, 1, 2) + 2SM(0, 1, 0, 2) 3SM(0, 1, 2, 3)

+ 4(d1 d2 )2 3 1 SM(0, 0, 0, 1) SM(0, 0, 1, 2) 2SM(0, 1, 0, 2) + 2SM(0, 1, 2, 3)

+ (d1 d2 )2 22 SM(0, 0, 1, 1) 2SM(0, 0, 1, 2) + 2SM(0, 1, 0, 1) 4SM(0, 1, 0, 2) + 3SM(0, 1, 2, 3)

+ d1 d2 4 SM(0, 0, 0, 0) 4SM(0, 0, 0, 1) SM(0, 0, 1, 1) + 4SM(0, 0, 1, 2) 2SM(0, 1, 0, 1)
+ 8SM(0, 1, 0, 2) 6SM(0, 1, 2, 3)

2 d21 1 d22 1

d41 d42 21 2 2
= 2 2 2
d2 d1 d2 1
d1 d2 13d1 d2 + 36

+ d21 d22 1141 222 21 + 203 1 422 54 + 5 322 41 3 + 4

V ar(v) =

2(21 2 )2
(1 + o(1))
d21 d42

Mixed moments calculation

Lemma 18 We have the following formulas for mixed moments, which covers all possible cases
(because of the symmetry)

d2 d31 + 2 d22 + 2 d21 + d2 d22 + 10 d1 + 4d22 + 2
M(0, 0, 0, 0) =
(d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)

d22 1 (d1 (d1 + d2 ) (d1 d2 + 4) + 2)
M(0, 0, 0, 1) =
(d1 d2 1) (d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)
(d1 d2 (d1 d2 + 2) 4) (d1 + d2 ) 2 + 4
M(0, 0, 1, 1) =
d1 d2 (d1 d2 1) (d1 d2 + 2) (d1 d2 + 3)

d22 1 (d1 (d1 + d2 ) (d1 d2 (d1 d2 + 4) + 2) 2)
M(0, 0, 1, 2) =
d1 d2 (d1 d2 1) (d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)

d22 1 d1 6d2 + d1 d22 (d1 d2 + 5) 2 + 2
M(0, 1, 0, 1) =
d1 d2 (d1 d2 1) (d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)

d22 1 d1 d1 d2 d1 3d22 + d1 d22 1 d2 4 3d2 + 2 8d2 2
M(0, 1, 0, 2) =
d1 d2 (d1 d2 2) (d1 d2 1) (d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)

d22 1 d22 d22 1 d41 + 2 7 6d22 d21 + 22

M(0, 1, 2, 3) =
d21 d22 d21 d22 7 2 36
First we note

d2 d31 + 2 d22 + 2 d21 + d2 d22 + 10 d1 + 4d22 + 2
M(0, 0, 0, 0) = E(Tr(0 ) ) =
(d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)
2 2



Next we consider the case M(0, 0, 0, 1)

M(0, 0, 0, 1) = ETr20 Tr(0 1 ) =

d1 d2 1

ETr20 Tr(0 (d2 1d1 0 ))

d2 ETr20 E(Tr20 )2
d1 d2 1

d1 + d2
M(0, 0, 0, 0)
d1 d2 1
d1 d2 + 1

d22 1 (d1 (d1 + d2 ) (d1 d2 + 4) + 2)
(d1 d2 1) (d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)


In order to get other mixed moments we need to perform another integration.


Inner integral

Here we will consider expectations of the following kind

ETr20 Tr21 = ETr(Tr1 |U0 ihU0 |)2 Tr(Tr1 |U1 ihU1 |)2 ,


note, that if we multiply matrix U by a unitary matrix which does not change the first column
we will not change the expectation, in fact we can integrate over the subgroup of matrices which
does not change the first column of U . Now for a moment we fix matrix U and consider
Tr(Tr1 |U0 ihU0 |)2 EV Tr(Tr1 U V |1ih1|V U )2 ,
where matrices V are in the form


V =




v1,d1 d2 1
v2,d1 d2 1







0 vd1 d2 1,1 vd1 d2 1,2 ... vd1 d2 1,d1 d2 1

The EV is an expectation with respect to the Haar measure on U (d1 d2 1) embedded in U (d1 d2 ),
in the above way. Note, that the vector U V |1i represents a random orthogonal vector to the
|U0 i = U |0i.
First we calculate
EV (U V |1ih1|V U ) (U V |1ih1|V U )
= EV (U |V1 ihV1 |U ) (U |V1 ihV1 |U )


= (U U )EV |V1 ihV1 | |V1 ihV1 |(U U ) .

Now, using standard integrals we obtain
EV |V1 ihV1 | |V1 ihV1 | =
(i1 j1 i1 j1 + i1 j2 i2 j1 ) (i1 i2 j1 j2 )|i1 i2 ihj1 j2 |.
(d1 d2 1)d1 d2
i1 i2 j1 j2

where (x) = (1 x,0 ) and incorporates the condition that first element of vector |V1 i is zero.
Now we obtain, after elementary calculations, using the fact, that U is unitary
(U U )EV |V1 ihV1 | |V1 ihV1 |(U U )
(i1 j1 ui1 ,0 uj1 ,0 )(i2 j2 ui2 ,0 uj2 ,0 )
(d1 d2 1)d1 d2
i1 i2 j1 j2

+ (i1 j2 ui1 ,0 uj2 ,0 )(i2 j1 ui2 ,0 uj1 ,0 ) (i1 i2 j1 j2 )|i1 i2 ihj1 j2 |

1d1 d2 d1 d2 + |U0 ihU0 | |U0 ihU0 | 1d1 d2 |U0 ihU0 | |U0 ihU0 | 1d1 d2
(d1 d2 1)d1 d2

+ Sd1 d2 ,d1 d2 (1d1 d2 d1 d2 + |U0 ihU0 | |U0 ihU0 | 1d1 d2 |U0 ihU0 | |U0 ihU0 | 1d1 d2 ) ,

where Sd1 d2 ,d1 d2 is a swap operation on two systems of dimensions d1 d2 , i.e. S =

So we get

|i1 i2 ihi2 i1 |.

(U U )EV |V1 ihV1 | |V1 ihV1 |(U U )

(1d1 d2 d1 d2 + Sd1 d2 ,d1 d2 ) 1d1 d2 + |U0 ihU0 | |U0 ihU0 | 1d1 d2 |U0 ihU0 | |U0 ihU0 | 1d1 d2 ,
(d1 d2 1)d1 d2
Next we consider
EV Tr(Tr2 U |V1 ihV1 |U )2 = ETrSd1 ,d1 (Tr2 U |V1 ihV1 |U ) (Tr2 U |V1 ihV1 |U )
= EV TrSd1 ,d1 Tr2,4 (U |V1 ihV1 |U U |V1 ihV1 |U )
= TrSd1 ,d1 Tr2,4 (U U )EV |V1 ihV1 | |V1 ihV1 |(U U )

TrSd1 ,d1 Tr2,4 (1d1 d2 d1 d2 + Sd1 d2 ,d1 d2 ) 1d1 d2 d1 d2 + |U0 ihU0 | |U0 ihU0 |
(d1 d2 1)d1 d2

1d1 d2 |U0 ihU0 | |U0 ihU0 | 1d1 d2


So we have obtained
EV Tr(Tr1 U |V1 ihV1 |U )2

d1 d22 + Tr20 d2 Tr0 d2 Tr0 + d21 d2 + Tr(00 )2 d1 Tr00 d1 Tr00
(d1 d2 1)d1 d2

d1 d22 + d21 d2 2d1 2d2 + 2Tr20
(d1 d2 1)d1 d2
in the above formulas we used 00 = Tr1 |U0 ihU0 | and fact, that Tr20 = Tr(00 )2 .
Using above we write
M(0, 0, 1, 1) = ETr20 Tr21 = ETr20 EV Tr(Tr1 U |V1 ihV1 |U )2

(d1 d22 + d21 d2 2d1 2d2 )ETr20 + 2E(Tr20 )2
(d1 d2 1)d1 d2

d1 + d2
(d1 d22 + d21 d2 2d1 2d2 )
+ 2M(0, 0, 0, 0)
(d1 d2 1)d1 d2
d1 d2 + 1
(d1 d2 (d1 d2 + 2) 4) (d1 + d2 ) + 4
d1 d2 (d1 d2 1) (d1 d2 + 2) (d1 d2 + 3)


Using inner integral we can also calculate M(0, 1, 0, 1), here we write
M(0, 1, 0, 1) = ETr0 1 Tr0 1 = ETr(0 0 )(1 1 )
= ETr(0 0 )Tr2,4 (U U )EV |V1 ihV1 | |V1 ihV1 |(U U )
ETr(0 0 )Tr2,4 (1d1 d2 d1 d2 + Sd1 d2 ,d1 d2 )
(d1 d2 1)d1 d2

1d1 d2 d1 d2 + |U0 ihU0 | |U0 ihU0 | 1d1 d2 |U0 ihU0 | |U0 ihU0 | 1d1 d2

E Tr(0 0 )(d22 1d21 + 0 0 d2 1d1 0 d2 0 1d2 )
(d1 d2 1)d1 d2

+ TrSd1 d2 ,d1 d2 1d1 d2 d1 d2 + |U0 ihU0 | |U0 ihU0 | 1d1 d2 |U0 ihU0 | |U0 ihU0 | 1d1 d2 (0 1d2 0 1d2 )

E d22 + (Tr20 )2 2d2 Tr0 Tr20
(d1 d2 1)d1 d2

+ Tr(20 1d2 ) + Tr(|U0 ihU0 |0 1d2 )2 2Tr(0 1d2 )(|U0 ihU0 |0 1d2 )

E d22 2d2 Tr20 + (Tr20 )2 + d2 Tr20 + (Tr20 )2 2Tr30
(d1 d2 1)d1 d2

d1 + d2
+ 2M(0, 0, 0, 0) 2ETr30
d22 d2
(d1 d2 1)d1 d2
d1 d2 + 1

d2 1 d1 6d2 + d1 d22 (d1 d2 + 5) 2 + 2
d1 d2 (d1 d2 1) (d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)

(d1 +d2 ) +d1 d2 +1

This is because ETr30 = (d
see [34].
1 d2 +1)(d1 d2 +2)
Using above results we obtain other moments

ETr20 Tr(1 (d2 1d1 0 1 ))
d1 d2 2

d2 ETr20 ETr20 Tr(1 0 ) ETr20 Tr21
d1 d2 2

d1 + d2
M(0, 0, 0, 1) M(0, 0, 1, 1)
d1 d2 2
d1 d2 + 1

d2 1 (d1 (d1 + d2 ) (d1 d2 (d1 d2 + 4) + 2) 2)
= 2
d1 d2 (d1 d2 1) (d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)

M(0, 0, 1, 2) = ETr20 Tr(1 2 ) =


Next we consider for mixed moment of type (0, 1, 0, 2)

ETr(0 1 )Tr(0 (d2 1d1 0 1 ))
d1 d2 2

d2 ETr(0 1 ) ETr(0 1 )Tr(20 ) ETr(0 1 )Tr(0 1 )
d1 d2 2

d22 1 d1 d1 d2 d1 3d22 + d1 d22 1 d2 4 3d2 + 2 8d2 2
d1 d2 (d1 d2 2) (d1 d2 1) (d1 d2 + 1) (d1 d2 + 2) (d1 d2 + 3)

M(0, 1, 0, 2) = ETr(0 1 )Tr(0 2 ) =


Now the last case of all different indices

M(0, 1, 2, 3) = ETr(0 1 )Tr(2 3 )
ETr(0 1 )Tr(2 (d2 1d1 0 1 2 )
d1 d2 3
ETr(0 1 ) (d2 Tr2 0 Tr2 1 Tr2 2 )
d1 d2 3

d22 1 d22 d22 1 d41 + 2 7 6d22 d21 + 22

d21 d22 d21 d22 7 2 36



