Networks of Companies From Stock Price Correlations

Networks of Companies from Stock
Price Correlations
J. Kertész1,2, L. Kullmann1,
J.-P. Onnela2, A. Chakraborti2, K. Kaski2,
A. Kanto3
1Departmentof Theoretical Physics
Budapest University of Technology and Economics, Hungary
2Laboratory of Computational Engineering
Helsinki University of Technology, Finland
3Dept of Quantitative Methods in Economics and Management Science
Helsinki School of Economics, Finland
Motivation
• Financial market is a self-adaptive complex system;
many interacting units, obvious networking.
Networks:
• Cooperation Most important and most difficult
• Activity, ownership
• Similarity
• Temporal aspects
• Networks generated by time dependencies
• Time dependent networks
• Revealing NW structure is crucial for understanding

and also for pragmatic reasons (e.g., portfolio opt.)
Many groups active: Palermo, Rome, Seoul etc.

Outline
• Classification by Minimum Spanning

Trees (MST) (Mantegna)
• Temporal evolution
• Relation to portfolio optimization
• Correlations vs. noise: Parametric
aggregational classification
• Temporal correlations: Directed NW
of influence
Data: price and return
• Daily price data for N=477 of NYSE stocks (CRSP of U. of
Chicago), such as GE, MOT, and KO
• Time span S=5056 trading days: Jan 1980 – Dec 1999
Daily closure price of GE:

PGE(t)
Daily logarithmic price:

lnPGE(t)
Daily logarithmic return:

rGE(t)=lnPGE(t) – lnPGE(t-1)
Correlations and distances
For each window R t a correlation matrix Ct is defined with
N T N N
elements being the equal time correlation coefficients:
ri r j   ri r j 
 
t
, where  1   ijt  1
ij
r
i
2

  ri  2 r j   r j  2
2

where ri ,rj  Rt,  .. denotes time average. Transformation
to distance-matrix with elements:
dijt  2(1  ijt )  Dt , where 2  dijt  0

N N
Minimum spanning tree (MST), which is a graph linking N
vertices (stocks) with N-1 edges such that the sum of
distances is minimum. Efficient algorithms.
Central vertex
To characterise positions of companies in the tree the
concept of central vertex is introduced:
• Reference vertex to measure locations of other vertices,
needed to extract further information from asset trees
• Central vertex should be a company whose price changes
strongly affect the market; three possible criteria:
(1) Vertex degree criterion: vertex with the highest vertex
degree, i.e., the number of incident edges; Local.
(2) Weighted vertex degree criterion: vertex with the
highest correlation coefficient weighted vertex degree;
Local.
(3) Center of mass criterion: vertex vi giving minimum
value for mean occupation layer (l(t,vi)); Global.
Central vertex: comparison
(1) Vertex degree
criterion (local):
GE: 67.2%
(2) Weighted vertex

degree criterion
(local):
GE: 65.6%
(3) Center of mass

criterion (global):
GE: 52.8%
Asset tree and clusters
Business sectors (Forbes)
Yahoo
data
Potts superparamagnetic clustering
Kullmann, JK, Mantegna
Antiferromagnetic
bonds
Asset tree clustering
Mismatch between tree clusters and business sectors?
1. Random price fluctuations introduce noise to the system
2. Business sector definitions vary by institutions (Forbes…)
3. Historical data should be matched with a contemporary
business sector definition
4. Classifications are ambiguous and less informative for
highly diversified companies
5. MST classification mechanism imposes constraint
6. Uniformity and strength of correlations vary by business
sector (c.f. Energy sector vs. Technology)
Mean occupation layer
In order to characterise the spread of vertices on
the asset tree, concept of mean occupation layer
is introduced:
N
1
l (t , vc ) 
N
 lev( v )
i 1
t
i
where vc is the central vertex, lev(vi) denotes the

level of vertex vi , such that lev(vc) = 0.
Both static and dynamic central vertex may be

used: exhibit similar behaviour  Robustness
Asset tree: topology change
Normal market topology crash topology
Yahoo
data
Robustness: single-step survival
Robustness of dynamic asset tree topology measured as
the ratio of surviving connections when moving by one step:
1
Single-step survival ratio:  t  E t  E t 1
N 1
T = 4 years, T = 1 month
Tree evolution: multi-step survival
Connections survived vs. time • Within the first region decay
1 is exponential
 t ,t  k  E t  E t 1    E t k
N 1 • After this there is cross-over
to power law behaviour:
Power law  (t,k) ~ t--z

decay: z ≈1.2
Half life vs. window width
T (y) t1/2 (y)
2 0.22
4 0.46
6 0.75
t1/2=0.12T
k=0
k=1
k=2
k=4
k=6
k=12
k=24
k=36
k=48
Distribution of vertex degrees
The topological nature of the network is

studied by analysing the distribution of
vertex degrees:
• Power law distribution would indicate scale-free
topology, a feature unexpected by random network
models
• Vandewalle et al. find for one year data   2.2

while we found 1.8  0.1    2.1  0.1
• Power law fit ambiguous due to limited range of data

Distribution of vertex degrees
• L: normal
• R: crash
Portfolio optimisation
In the Markowitz portfolio optimisation theory risks of
financial assets are characterised by standard deviations
of average returns of assets:
The aim is to optimise the asset weights wi so that the
overall portfolio risk is minimized for a given portfolio return
(minimum risk portfolio is uniquely defined)
1 N

2 i , j 1
wi w j Cov( ri , rj )  min
N
given r   wi ri
i 1
N
and w
i 1
i 1
No short - selling : wi  0

Weighted portfolio layer
How are minimum risk portfolio assets located
on graph?
• Weighted portfolio layer is defined
by imposing no short-selling, i.e. wi  0, and it is

compared with the mean occupation layer l(t).
Portfolio layer
No short-selling Short-selling
Static c.v. Static c.v.
Dynamic c.v. Dynamic c.v.
portfolio layer
mean occupation layer
Correlations vs. noise
Correlation matrix contains systematics and noise.
MST: Non-parametric, unique classification scheme, but!

Even for uncorrelated random matrix MST would lead to
classification…
Meaningful clustering and robustness already signalize
significance.
Different methods to separate noise from information:

• Eigenvalue spectra (Boston, Paris)
• Independent/principal component analysis
(economists)
Here: Building up the FCG
Tree condition may ignore important correlations.
(General classification problem)
Visualization through Parametrized Aggregated
Classification (PAC): Add links one by one to the graph,
according to their rank, started by the strongest and ended
with a Fully Connected Graph (FCG). Strongly correlated
parts get early interconnected, clustering coefficient
becomes high. Ci = # of -s / [k(k-1) / 2] where k is the
degree of node i
Price time series data for a set of 477 companies.
Window width T=1000 business days (4 years),
located at the beginning of the 1980’s
Comparison with random graph (obtained by shuffling the

data)
size= 0
size= 10
size= 20
size= 30
size= 40
size= 50
size= 60
size= 70
size= 80
size= 90
size= 100
size= 120
size= 140
size= 160
size= 180
size= 200
size= 300
size= 400
size= 500
size= 600
size= 700
size= 800
size= 900
size=1000
Elementary graph concepts
Graph size:
number of edges in the graph (variable)
Graph order:
number of vertices in the graph (constant)
Spanned graph order:

number of vertices in the subgraph spanned by the edges,
thus excluding the isolated vertices (variable)
These definitions can be applied also to clusters (two types)

(1) edge cluster
(2) vertex cluster
Edge clusters are more meaningful in the asset graph context

Cluster growth
The growth patterns of clusters can be divided into four
topologically different types:
(I) Create a new cluster (two nodes and the incident edge)
when neither of the two end nodes are part of an existing
edge cluster (spanned cluster order +2, size +1)
(II) Add a node and the incident edge to an already existing
edge cluster (spanned cluster order +1, size +1)
(III) Merge two edge clusters by adding an edge between
them (combined spanned cluster size +1)
(IV) Add an edge to an already existing edge cluster, thus
creating a cycle in it (spanned cluster size +1)
Cluster growth
N=477
empirical random
Spanned graph order
N=477
empirical random
Number of vertex clusters
N=477
empirical random
Cluster size for edge clusters
N=477
empirical random
Vertex degree distribution
N=477 p=0.01
empirical random
Vertex degree distribution
N=477 p=0.25
empirical random
Clustering coefficient
N=116
empirical random
Mean clustering coefficient
N=116
empirical random
NO TIME REVERSAL SYM. ON THE MARKETS
Physics close to equilibrium: Time reversal symmetry (TRS) 

Detailed balance 
Symmetric correlation functions, Fluctuation Dissipation Th. (FDT)
No fundamental principle forcing TRS on the market.
In contrast: The elementary process, a transaction is irreversible:

Though the price is set by equilibrating supply and demand,
both parties (or at least one of them) feel that the
transaction is for their advantage and would not agree
to revert it.
Possibility of
• Asymmetry in the cross correlation functions
• Differences between the decay of spontaneous fluctuations
and of response to external perturbations
Time dependent cross correlations
log return of stock A between t and tt
Correlation fn between returns of company A and B

It depends on t and . Is it symmetric?
Difficulties:
• trade not syncronized, frequencies are very different
• bad signal/noise ratio
Approptiate averaging
Toy model to test the method:
Persistent 1d random walk (increment x  1):
We take two such walks, which are correlated, with increments

x and y
The correlation function can be calculated:
(o=200,  =1000,  =0.99)
We corrupt the data to have similar quality to real ones

Only 1% of the data are kept.
The measured correlations on a finite set of data depends
on the averaging procedure (moving average)
The appropriate choice is t min  t  o
DATA set:
Trade And Quote, 10000 companies tick by tick
54 days: 195 companies traded more than 15000 times
t = 100s but results checked for 50-500s.
• We measure max, C(max), and R = C(max)/noise
• Consider Imax I > 100, C(max) > 0.04, and R > 6 as ‘effect’
Results
XON: Exxon
(oil)
ESV: Ensco
(oil wells)
• Not all pairs of comp’s show the effect

• Peak not only shifted but also asymmetric
• Large, frequently traded companies ‘pull’ the smaller ones
• Weak effect and short characteristic time (minutes)
Directed network of influence
• No chains
• Many leaders for a follower
• Many followers for a leader
• Disconnected graph
Conclusions
• Networks constructed from cross correlations of stock

price time series (MST, PAC)
• Though Cij noisy, much information content, useful for
portfolio optimization
• MST robust, reasonable classification, interesting dyn.
at crash-time
• Clusters (branches) not equally correlated, PAC reveals
differences, separation of noise from info
• Asymmetric time dependent cross correlations lead to
directed network of influence

Networks of Companies From Stock Price Correlations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Networks of Companies From Stock Price Correlations

Uploaded by

Copyright:

Available Formats

Networks of Companies from Stock

• Revealing NW structure is crucial for understanding

Many groups active: Palermo, Rome, Seoul etc.

• Classification by Minimum Spanning

Daily closure price of GE:

Daily logarithmic price:

Daily logarithmic return:

dijt  2(1  ijt )  Dt , where 2  dijt  0

(2) Weighted vertex

(3) Center of mass

Kullmann, JK, Mantegna

where vc is the central vertex, lev(vi) denotes the

Both static and dynamic central vertex may be

Power law  (t,k) ~ t--z

The topological nature of the network is

• Vandewalle et al. find for one year data   2.2

• Power law fit ambiguous due to limited range of data

No short - selling : wi  0

by imposing no short-selling, i.e. wi  0, and it is

Dynamic c.v. Dynamic c.v.

MST: Non-parametric, unique classification scheme, but!

Different methods to separate noise from information:

Comparison with random graph (obtained by shuffling the

Spanned graph order:

These definitions can be applied also to clusters (two types)

Edge clusters are more meaningful in the asset graph context

Physics close to equilibrium: Time reversal symmetry (TRS) 

No fundamental principle forcing TRS on the market.

In contrast: The elementary process, a transaction is irreversible:

Correlation fn between returns of company A and B

We take two such walks, which are correlated, with increments

The correlation function can be calculated:

(o=200,  =1000,  =0.99)

We corrupt the data to have similar quality to real ones

The appropriate choice is t min  t  o

• Not all pairs of comp’s show the effect

• Networks constructed from cross correlations of stock

You might also like