You are on page 1of 18

LETTER OPEN

doi:10.1038/nature14668

The octopus genome and the evolution of cephalopod


neural and morphological novelties
Caroline B. Albertin1*, Oleg Simakov2,3*, Therese Mitros4, Z. Yan Wang5, Judit R. Pungor5, Eric Edsinger-Gonzales2,4,
Sydney Brenner2, Clifton W. Ragsdale1,5 & Daniel S. Rokhsar2,4,6

Coleoid cephalopods (octopus, squid and cuttlefish) are active, 97% of expressed protein-coding genes and 83% of the estimated
resourceful predators with a rich behavioural repertoire1. They 2.7 gigabase (Gb) genome size (Methods and Supplementary Notes
have the largest nervous systems among the invertebrates2 and 13). The unassembled fraction is dominated by high-copy repetitive
present other striking morphological innovations including cam- sequences (Supplementary Note 1). Nearly 45% of the assembled gen-
era-like eyes, prehensile arms, a highly derived early embryogenesis ome is composed of repetitive elements, with two bursts of transposon
and a remarkably sophisticated adaptive colouration system1,3. To activity occurring ,25-million and ,56-million years ago (Mya)
investigate the molecular bases of cephalopod brain and body (Supplementary Note 4).
innovations, we sequenced the genome and multiple transcrip- We predicted 33,638 protein-coding genes (Methods and Supple-
tomes of the California two-spot octopus, Octopus bimaculoides. mentary Note 4) and found alternate splicing at 2,819 loci, but no locus
We found no evidence for hypothesized whole-genome duplica- showed an unusually high number of splice variants (Supplementary
tions in the octopus lineage46. The core developmental and neur- Note 4). A-to-G discrepancies between the assembled genome and
onal gene repertoire of the octopus is broadly similar to that found transcriptome sequences provided evidence for extensive mRNA edit-
across invertebrate bilaterians, except for massive expansions in ing by adenosine deaminases acting on RNA (ADARs). Many candid-
two gene families previously thought to be uniquely enlarged in ate edits are enriched in neural tissues7 and are found in a range of gene
vertebrates: the protocadherins, which regulate neuronal develop- families, including housekeeping genes such as the tubulins, which
ment, and the C2H2 superfamily of zinc-finger transcription fac- suggests that RNA edits are more widespread than previously appre-
tors. Extensive messenger RNA editing generates transcript and ciated (Extended Data Fig. 1 and Supplementary Note 5).
protein diversity in genes involved in neural excitability, as prev- Based primarily on chromosome number, several researchers pro-
iously described7, as well as in genes participating in a broad range posed that whole-genome duplications were important in the evolu-
of other cellular functions. We identified hundreds of cephalopod- tion of the cephalopod body plan46, paralleling the role ascribed to the
specific genes, many of which showed elevated expression levels in independent whole-genome duplication events that occurred early in
such specialized structures as the skin, the suckers and the nervous vertebrate evolution11. Although this is an attractive framework for
system. Finally, we found evidence for large-scale genomic rear- both gene family expansion and increased regulatory complexity
rangements that are closely associated with transposable element across multiple genes, we found no evidence for it. The gene family
expansions. Our analysis suggests that substantial expansion of a expansions present in octopus are predominantly organized in
handful of gene families, along with extensive remodelling of gen- clusters along the genome, rather than distributed in doubly conserved
ome linkage and repetitive content, played a critical role in the synteny as expected for a paleopolyploid12,13 (Supplementary Note 6.2).
evolution of cephalopod morphological innovations, including Although genes that regulate development are often retained in multiple
their large and complex nervous systems. copies after paleopolyploidy in other lineages, they are not generally
Soft-bodied cephalopods such as the octopus (Fig. 1a) show remark- expanded in octopus relative to limpet, oyster and other invertebrate
able morphological departures from the basic molluscan body plan, bilaterians11,14 (Table 1 and Supplementary Notes 7.4 and 8).
including dexterous arms lined with hundreds of suckers that function Hox genes are commonly retained in multiple copies following
as specialized tactile and chemosensory organs, and an elaborate chro- whole-genome duplication15. In O. bimaculoides, however, we found
matophore system under direct neural control that enables rapid only a single Hox complement, consistent with the single set of Hox
changes in appearance1,8. The octopus nervous system is vastly modi- transcripts identified in the bobtail squid Euprymna scolopes with
fied in size and organization relative to other molluscs, comprising a PCR16. Remarkably, octopus Hox genes are not organized into clusters
circumesophageal brain, paired optic lobes and axial nerve cords in as in most other bilaterian genomes15, but are completely atomized
each arm2,3. Together these structures contain nearly half a billion (Extended Data Fig. 2 and Supplementary Note 9). Although we can-
neurons, more than six times the number in a mouse brain2,9. Extant not rule out whole-genome duplication followed by considerable gene
coleoid cephalopods show extraordinarily sophisticated behaviours loss, the extent of loss needed to support this claim would far exceed
including complex problem solving, task-dependent conditional dis- that which has been observed in other paleopolyploid lineages, and it is
crimination, observational learning and spectacular displays of cam- more plausible that chromosome number in coleoids increased by
ouflage1,10 (Supplementary Videos 1 and 2). chromosome fragmentation.
To explore the genetic features of these highly specialized animals, Mechanisms other than whole-genome duplications can drive
we sequenced the Octopus bimaculoides genome by a whole-genome genomic novelty, including expansion of existing gene families, evolu-
shotgun approach (Supplementary Note 1) and annotated it using tion of novel genes, modification of gene regulatory networks, and
extensive transcriptome sequence from 12 tissues (Methods and reorganization of the genome through transposon activity. Within
Supplementary Note 2). The genome assembly captures more than the O. bimaculoides genome, we found evidence for all of these
1
Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois 60637, USA. 2Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 9040495, Japan.
3
Centre for Organismal Studies, University of Heidelberg, 69117 Heidelberg, Germany. 4Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA. 5Department of
Neurobiology, University of Chicago, Chicago, Illinois 60637, USA. 6Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA.
*These authors contributed equally to this work.

2 2 0 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5
G2015 Macmillan Publishers Limited. All rights reserved
LETTER RESEARCH

a b PF08266/00028 Cadherin
PF05375 Pacifastin_I
PF02868/01447 Peptidase_M4
PF02037 SAP
PF06083 IL17
PF00002 7tm_2
PF07690 MFS_1
PF14830 Haemocyan_bet_s
PF13465/00096 zf-C2H2
PF05970 PIF1
PF00264 Tyrosinase
PF00582 Usp
PF00147 Fibrinogen_C
PF00024 PAN_1
PF01582 TIR
PF00092 VWA
PF01531 Glyco_transf_11
PF02931 Neur_chan_LBD
PF01607 CBM_14
PF05485 THAP

Mmu
Lch
Bfl
Obi

Cgi

Hro
Lgi
Pfu

Dme
Cte

Gga

Hsa
Cel

Xtr
Dre
2 0 2
Row Z-score

Figure 1 | Octopus anatomy and gene family representation analysis. lophotrochozoans (green) and molluscs (yellow), including O. bimaculoides
a, Schematic of Octopus bimaculoides anatomy, highlighting the tissues (light blue). For a domain to be labelled as expanded in a group, at least 50% of
sampled for transcriptome analysis: viscera (heart, kidney and its associated gene families need a corrected P value of 0.01 against the outgroup
hepatopancreas), yellow; gonads (ova or testes), peach; retina, orange; optic average. Some Pfams (for example, Cadherin and Cadherin_2) may occur
lobe (OL), maroon; supraesophageal brain (Supra), bright pink; subesophageal in the same gene, however multiple domains in a given gene were counted
brain (Sub), light pink; posterior salivary gland (PSG), purple; axial nerve cord only once. Bfl, Branchiostoma floridae; Cel, Caenorhabditis elegans; Cgi,
(ANC), red; suckers, grey; skin, mottled brown; stage 15 (St15) embryo, Crassostrea gigas; Cte, Capitella teleta; Dme, Drosophila melanogaster; Dre,
aquamarine. Skin sampled for transcriptome analysis included the eyespot, Danio rerio; Gga, Gallus gallus; Hsa, Homo sapiens; Hro, Helobdella robusta;
shown in light blue. b, C2H2 and protocadherin domain-containing gene Lch, Latimeria chalumnae; Lgi, Lottia gigantea; Mmu, Mus musculus; Obi, O.
families are expanded in octopus. Enriched Pfam domains were identified in bimaculoides; Pfu, Pinctada fucata; Xtr, Xenopus tropicalis.

mechanisms, including expansions in several gene families, a suite and octopus protocadherin arrays arose independently. Unlinked
of octopus- and cephalopod-specific genes, and extensive genome octopus protocadherins appear to have expanded ,135 Mya, after
shuffling. octopuses diverged from squid. In contrast, clustered octopus proto-
In gene family content, domain architecture and exonintron cadherins are much more similar in sequence, either due to more
structure, the octopus genome broadly resembles that of the limpet recent duplications or gene conversion as found in clustered proto-
Lottia gigantea17, the polychaete annelid Capitella teleta17 and the cadherins in zebrafish and mammals21.
cephalochordate Branchiostoma floridae14 (Supplementary Note 7 The expression of protocadherins in octopus neural tissues (Fig. 2) is
and Extended Data Fig. 3). Relative to these invertebrate bilaterians, consistent with a central role for these genes in establishing and main-
we found a fairly standard set of developmentally important trans- taining cephalopod nervous system organization as they do in verte-
cription factors and signalling pathway genes, suggesting that the brates. Protocadherin diversity provides a mechanism for regulating
evolution of the cephalopod body plan did not require extreme expan- the short-range interactions needed for the assembly of local neural
sions of these toolkit genes (Table 1 and Supplementary Note 8.2). circuits18, which is where the greatest complexity in the cephalopod
However, statistical analysis of protein domain distributions across nervous system appears2. The importance of local neuropil interac-
animal genomes did identify several notable gene family expansions tions, rather than long-range connections, is probably due to the limits
in octopus, including protocadherins, C2H2 zinc-finger proteins placed on axon density and connectivity by the absence of myelin, as
(C2H2 ZNFs), interleukin-17-like genes (IL17-like), G-protein- thick axons are then required for rapid high-fidelity signal conduction
coupled receptors (GPCRs), chitinases and sialins (Figs 1b, 2 and 3; over long distances. The sequence divergence between octopus and
Extended Data Figs 46 and Supplementary Notes 8 and 10).
The octopus genome encodes 168 multi-exonic protocadherin
Table 1 | Metazoan developmental control genes
genes, nearly three-quarters of which are found in tandem clusters
on the genome (Fig. 2b), a striking expansion relative to the 1725
genes found in Lottia, Crassostrea gigas (oyster) and Capitella gen- Obi Lgi Cte Dme Cel Bfl Hsa
omes. Protocadherins are homophilic cell adhesion molecules whose
Ligands
function has been primarily studied in mammals, where they are Fibroblast growth factor 3 2 1 3 3 8 22
required for neuronal development and survival, as well as synaptic Wnt 12 10 12 7 5 17 19
specificity18. Single protocadherin genes are found in the invertebrate TGFb/BMP 12 9 14 6 5 22 33
deuterostomes Saccoglossus kowalevskii (acorn worm) and Strongylo- Delta/Jagged 4 1 1 2 4 2 7
Hedgehog 1 1 1 1 0 1 3
centrotus purpuratus (sea urchin), indicating that their absence in Axon guidance 10 9 9 6 8 23 33
Drosophila melanogaster and Caenorhabditis elegans is due to gene Transcription factors
loss. Vertebrates also show a remarkable expansion of the protocad- C2H2 zinc-finger 1,790 413 222 326 211 1,338 764
herin repertoire, which is generated by complex splicing from a clus- Homeodomain 114 121 111 104 99 133 333
High mobility group 23 15 14 13 16 51 125
tered locus rather than tandem gene duplication (reviewed in ref. 19). Helix loop helix 50 63 64 59 42 78 118
Thus both octopuses and vertebrates have independently evolved a Nuclear hormone receptor 40 44 45 16 274 33 48
diverse array of protocadherin genes. Fox 16 28 26 17 18 42 43
A search of available transcriptome data from the longfin inshore Tbox 9 9 7 8 21 9 18
squid Doryteuthis (formerly, Loligo) pealeii20 also demonstrated an Number of members of developmental ligand and transcription factor families from O. bimaculoides
and selected other taxa. Dendrogram above species names reflects their evolutionary relationships. Bfl,
expanded number of protocadherin genes (Supplementary Note Branchiostoma floridae; Cel, Caenorhabditis elegans; Cte, Capitella teleta; Dme, Drosophila melanogaster;
8.3). Surprisingly, our phylogenetic analyses suggest that the squid Hsa, Homo sapiens; Lgi, Lottia gigantea; Obi, O. bimaculoides.

1 3 AU G U S T 2 0 1 5 | VO L 5 2 4 | N AT U R E | 2 2 1
G2015 Macmillan Publishers Limited. All rights reserved
RESEARCH LETTER

a c
*

Hsa_dac
IX

hsous_2_CRA_a_ Hsa_CD H-8

U
80_
VIII

oR R 7 3 8 -7 3

9_C
Hsa_ P_ PCDH-15
oRR023_ C14_ Z

T70
A mphiC dhr1 0

A mphiC dhr1 1
I

oR

A mphiC dhr9
Hsa_C

C te _ 1 8 1 0 6 0
AmphiDCHS
Hsa_P_

Amphi Cdhr2
Lgi_174335

NvHedgling
Hsa_P

dh r1

Amph iCdhr3

9_o
R1

Hsa_

4.2
EAX0494
H sa _ C

AmphiCD
Hsa_

Cte_ 1927 r5
N v C dh r 5
Hsa_

Hsa_

64541
NvPCDH
NvDCHS

NvCdhr4

12_I
54-

Hsa _CD DH-18

NvCdhr2

Lgi_ 2340 18
NvCdhr1

G2703
8 _ C2 8 X

38-
iCdh
Hsa

A mphiC

28
Hs

NvFAT

93

1
DH-24

r3

883 -4_C p
CDH -19_12
165

92
_CDH

54.
164 590
CDH
Hs

CDH-8
Hsa

CDH

9-9
a_fla mi ng

NvCdh
CDH -22

1645
Hsa

DH-1 1
_P_C H-1 2
Dm Cte _18

1645

Lgi _22 555


Lgi_1
Amph
Hs

a_CD -4_ AB228


_oT
a_ CDH_

ph iCd _23_AA
2.1
Hsa
Hs

955
-92

.1
Cd hr4
_CD

H-18

3
Hsa

H
-19_
fla
a_

_CD
e_s

Lgi
mi

-18_

_1
-11

Hsa_C Cd hr1
229

075
Lgi_

86
a_
Hs

004 R226_ LSR

3
Lgi_
Fla H_EG

_CD

696
ng

Lgi_

hr2

45
_16
a_CD

_CD 3

oRR
tar

_A
Lgi_

_22
_1
NvC

H-1
_C

2
Lg i_2 3DKX
mi

o_

2b
oR phiCE .1

DH

-2

A8
2
VII

phi
oRR

a_C -CDH
ry_

H-6

a_ CD H-4_1
phi
195

de sm og le in -3 _D sc 2a
o_

DH
8E5L C10X
3_EA F6 19 gr_1
ng F_LA

sc
H-9
Am

Cte

CA
ELS
EG AAG0 GR

3a
Am
2_ AA G_7p 80.1

_D
nig
o_

DH
Am
259

a_C

E-CD 1
319

H_
F_LA

sc
R

H-

_D
a_N
1_ G_7p -8_1

-2
W6

Am
ht

_5
3
Hs a_CD H- 15
4

-2
de sm oc ol lin
Hs
95
Hs Ct e2 87

H-

_1
49

_a
CD H-13
Hs

_1
oc ol lin
Hs
a_

13
CD

26
lin

_2
14

Hs

-4
oc
CD
a_
PC 28 85

H-
30

Hs

-4

n_
00

ol
a_
Ct

sm
Hs
Hs

H- like_ in -2
a_
.1

og le in
DH

ei
Ct

Hs
e_
a_

D m R586 iC dh 16_1 prot


og
Ct

de
Hs

sm
e_

a_
_2

le
21
Fa

Hs

Hs a_ de
e_ ph iF 2

Hs a_ de sm
a_
17
Am 6V0I fa t
t4 Dm

22

Hs

Hs a_ de
oR

83

Hs a_CD sm
2

a_
_Q

58

Hs
57 AT
Ct i_ 10 32_W

54
R5

Am a_CD H-

te e_ sh _CX3 r1 3
oR 13 N vC H2 3

21
e_
Lg _C T_

Hs
36

R2 E6

a_
22 51 63
_C

7.
e_

ph 24 04 2 n
28 L2

A m _2 17 49 gu
20

Hs

oR ph

ot
08

dh 4
Lg pp iC 36 2 9

.1
r1

77
Nv

p hi 14 60 73 e
_2
dh

02 lik
W

72
CD
NvC D -N
D te _2 15E4 53 e

Am _1 iC
oR

Lg i_ _X ELSR
r6
m
NvC D H 4

C
te

s_ ph 88
C
R Lg vF A 22

DH 1
e_ 26 L2

n
14

C
ni

A i_ 23 24 _0
C
0_ 16 T- lik 5

te
H

te

P
VI II
C

yn

C 17
15
ls
i_

65
N _2

d hr
oR ca
C

m 16
Lg 2_
70
te

R10 i_ 18 2
A 86 _C
Hs m 10
ph 3_ 23 20 20 T095
Hs a_PC iF C33 44 87 _1
74

i_
Lg te _2 058_ 21 1 -1
Hs a_PC DH A T- LE 66 56 in
a_ 4 C R en in -3
DH _F lik oR i_ 11 81 nt
Am PCDH _F at sy nt en
_2 e Lg te _1 al -1
ph _F at _c sy TN in
Lg iC at _3 C
sa _c al LS nt en
Am i_ 15 dh _1 H iC sy
Ct ph 42
r1 H sa ph C al
oR e_ iC 08 2 Am e_ N 51
Dm R8 22 dh D m CLST 04
e_ 99 84 r8 00
Ct CDH_ 12
_C 43 Nv e_ 21 00
e_ X Ct 21 66 53
e_ 02
oR Lg i_2 21 55 87A Ct e_ 22 00 8
R6 53 Ct e_ 21 24
Am 81 29 24
ph -2 9 Ct i_2 29 91 4
Am iC _C 13 Lg 59
91
6
4
dh i_1
Am ph iC r1 Lg i_1 59 dh r1
Ct
ph dh 6 Lg ph iC 23 6
e_ iC dh r7 1.1
Lg 19 r6 Am i_2 30 97 2 40
i_2 91 71 Lg i_2 34 27
31
oR Lg i_1 29 24 Lg
00
Hs R8 54 7 XP 8
a_d 88 54 h_ 82
Ct _C s_ 71 1
ac e_ 12 4 i_1 87
hs 13 D3 Lg i_2 37 0
ou
s_1 Ct e_

V
95 Lg 87
_E 13 22 37 9
AW 96 i_2 86
Hs 09 Lg 37 6
a_P 6868 i_2 82
Am CD 3.1 Lg 71 8
ph H- i_1 86
Dm iCd 16 Lg 37 4
s_p e_d i_2
cdh ach
hr1
8
Lg 36 7823
11_ Lg sou i_2
Xlin i_1 Lg 666
ked Cte 430 s i_1 27
_XP 94 Lg 388
_00 _13 563 Lg
i_2 147 3
274
Am 139 8 Lgi
_17
768
0
phi
Am PC 0.1 _23 8
phi DH Lgi 788
PC 2 _16 6
Hsa DH Lgi 767
_PC 1 _23
Hsa DH Lgi 9p
_P_ -12 438
Hsa PCD _16 3
_PC H-1 Lgi 562
Hsa DH- 7 _17
_PC 19_ Lgi 039
DH- a 173
Hsa 19_c Lgi_ 823
Hsa _PC 173
_PC DH- Lgi_ p
Hsa_ DH_ 10_1 867
alph 237
PCD a-C2 Lgi_ 75
Hsa_ H_al _1 1658
PCD pha- Lgi_
H_al C1_1 22
Hsa_ pha- 1718
PCD 5_1 Lgi_ 21
Hsa_ H_alp 1718
PCDH ha-6 Lgi_ 2
_1 3279
Hsa_PCDH _alpha-8_1 Lgi_2 hr20
Hsa_P _alph hiCd
a-11_ Amp
CDH_ 1 01617
Hsa_P alpha Cte_2
CDH_alpha--10_1 DHR_3
Hsa_P 12_1 Hsa_C
CDH_a 6564
Hsa_PC lpha-1 Lgi_15
DH_alp _1 6560
Hsa_PC ha-2_1 Lgi_15
DH_alpha-13_1 _C3
oRR180

Protocadherins
Hsa_PCD
H_alpha-3_1 oRR277_C2DF C2DX
Hsa_PCD -8_T060_
H_alpha-4_1 opRR526
Hsa_PCDH 5K
_alpha-7_1 oRR836_C
Hsa_PCDH_ Lgi_232259
alpha-9_2
Hsa_PCDH_alp Lgi_160343
ha-9_1
Hsa_PCDH_beta- Cte_199156
13
Hsa_PCDH_beta-8 oRR625_C15
Hsa_PCDH_beta-15 Lgi_161952
Hsa_Ret_a
Hsa_PCDH_beta-6
Dme_CDH_96Ca
Hsa_PCDH_beta-5
oE098_C4MR_IZV
Hsa_PCDH_beta-4
oRR092_C2_IZ

IV
Hsa_PCDH_beta-3 oRR444_C2Y_IZ
Hsa_PCDH_beta-2 opRR671_C3A_IZ
Hsa_PCDH_beta-7 oC831_C2X_IZ
10
Hsa_PCDH_beta- oC826_C2Y_IZ
ta-9
Hsa_PCDH_be oRR894_C2DY_IZ
beta-14 oT039_C3X
Hsa_PCDH_ 2_IZ
_beta-11 oC832_C3
Hsa_PCDH _IZ
2
H_beta-1 oE097_oT898_C3
Hsa_PCD
a-C5_1 _IZ
H_gamm oE093_C
Hsa_PCD 1 6TDS_IZ
ma-C4_ opE100 UOJV
DH_gam 3_1 _C3DR_
Hsa_PC UOWV
DH_gamma-C opC829
_C3DR
Hsa_PC amma-B2_1 opC82
5_C3D
_ZUOW
V
CDH_g 1 R_UOW
Hsa_P a-B1_ opRR5 V
CDH_gamm a-B7_1 78_C3
Hsa_P
gamm opC828_C3 DR_UOQV
CDH_ 1
Hsa_P a-B6_ oE096_C6A DRS_UOWV
gamm 5_1
CDH_ ma-B oRR5 _ZUO
Hsa_P _gam 1 76_C WV
PCDH -B4_ oT09 2_Z
Hsa_ mma _1 6_C3
H_ga -A11 oRR _Z
PCD mma 026_
Hsa_ H_ga _1 C3A_Z
PCD -A12 oG80
Hsa_ mma _1 8_C3
H_ga A10 oG8 SX_U
10-8
Hsa_
PCD
gam
ma-
A9_
1
oT1 13_C OWJQ
DH_ ma- 2 86_C 6TD
_PC gam A8_ oJ054_C 3L_Z NSF
Hsa DH_ ma- _ZU
_PC gam 1 OWJ
Hsa A8_ oG8 6AT QV
DH_ ma- 1 12_C NF_
_PC gam A5_ oJ0 6AT ZUO
Hsa DH_ 52_ WJQ
_PC ma-
gam a-A2_1 oG8 C6T NSF_ZU V
Hsa DH_ 09_ DNF OW
_PC mm 1_1 oJ0 C6T _ZU JQV
Hsa _ga a-A 53_ DNF OW
DH mm 7_1 C3_ _ZU JQV
_PC _ga a-A oG8 Z
Hsa DH 15_ OW
_PC mm 6_1 opR C6T JQV
Hsa _ga a-A R54 DN
DH mm _1 oB333 4_C F_Z
_PC _ga ma-A4 3 UO
Hsa DH oJ0 _C6 5TDNFX WJV
_PC H_gam _22 572 7 51_ TN _UO
Hsa CD 241 oF9 C3 F_I
ZU
Cte
_22 25 9 oB 73_C3
DA OW WJQV
Hsa_P Cte 82 323
_Z JQ
26 _C HTSX V
Ct e_1 22 79 oB 4S
e_ 88 oB
324 _ZV 2_UOW
Ct 22
25
75 32 _C6D V
e_ 60 oB 6_C 2X
Ct 22 322_C 5_I _IZ
e_ 87 oB Z QV
Ct 87 32 4S
21 90 oT 7_C4 X_
e_ 23 84 ZV
Ct e8 56 oB 0_ S_
Ct 60 31 C4 ZV
17 72 oR 7_C6 X_
e_ 21 IZ
Ct e7 00 oB R075 DS
Ct 88 31 _C X_
21 63 8_ IZU
e_ 33
oB C6 6_IZU WJ
Ct e3 69 oB 316_ D2
89 _Z OWJ QV
Ct
22 37 oB 310_ C6SD UW
96
Ct
e_
21 34 oB 309_ C4_Z _IZUO JQV
e_ 60 oB 311_ C4_Z
Ct 15 26 65 JV
e_
Ct Ct e9 51 33 oL 312_ C3_Z
19 26 85 oR 482_ C4_Z
e_ oB R8 C4
Ct Ct e3 27 31 oK 332_ _C
83 _Z
e6 60
Ct e9 68 47 oB 749_ C6 3_Z
Ct 14 45 08 oK 329_ C6 TDSF
M
e_ i7 17 03 a oK 75
3_ C6 S_ _Z
Ct Lg 17 d_ oK 747_ C6 SN ZW UO
i7 W
Lg ke a oB 746_ C6 TS _IZUO JV JQ
lin ked_ p oR 32 C6 SF NF_Z W V
X- 92
1_ Y-lin 30 65 oF R8 8_C6 TP _ZUO UO JQ
19 27 oR 98 26_C DT SF W V
H-1 1_ i_ 12 i5 oR R71 5_C 6S SF D_IZ JQ QV
W
CD H-1 Lg Lg i8 20 51 R1 2_ 6T P_ _U W V
94
oR R90 13 24 34

_P CD Lg 20 26 C MF_ ZU OW JQ
oF 67
90

Hsa _P
90 27

_C 5_ JQ V
97 1_C 6T BN

i_ O
oT 982_ C 6TSE IZU WJQ OW V
Lg 13

6S UO IZU WJV V
0_ 7_ 15

sa
7_

Lg
oF 97 9_C
i_

H
i_ i6

F_ W O
U
Lg

6M SF
oF 81

IZ V W
Lg i5 20 7X2

Q
45 8_C

W
oC

SF X_ OW WJQ V
6M SX_Z X_IZ OW V

V
Lg Lg i5 16 62

JQ
5_
oR Lg

oE 67 53_C MSF
C

X2 ZU JQ
Lg i_ 10 16 92

V
_U OW V V
81
oT R9 C6 ZW

6S
83

6_ C6_

O
oR 817_ C2
Lg i_ 10 46 32
R
Lg

W JQ
i7

C6D IZ SN UO
oC 6_ C2X_ C2X

U UO
5_

JQ V
Lg 10 51 90

oI

V
S_ UO

_Z WJQ
48

15 10

31 5_
oI RR88 _C
oR 46_C 10 51 26

U
31
op R314

6M _Z
12

oR 8_ C4DT
V

JQ
i5

oI 657_
i_

oR oRR8 _C ZU

_I IZ
oL R827 53 7_ZV

oN 1_
31

F_ W
Z
42 7_

6_
oI 818_

ZU JQ
7_ _C6_ 6_
i_

85 C4TP X_IZ UW W

oC
32
Lg

_C

C5
_IZ ZU

oH

3X
OW Q

oR

D_ SF_IZ

JQ
W
R7

oA UOW JQ

_IZ
C3 SF_Z IZUO WJQ V
85
R2

oR
BF UO

IZ
R7
1_ QV

V
oR

C6
oR

4_

_IZ
oA TSGF _ZUO ZUO

R2
C6

oR

63
C6
OW V

R2
oH
X_

76
1_C4 X_ZU WQ

_C
F_UO QV

R7
oR

TS
77
_C
843_C 4_C5 WV

84
oH
48
6T

6D

UO
U

60

UO
_Z

F_
R6 C6X_

_C
85

oR

6H
848_CUOW

9_
_Z
_C

oH

84

FX
_C

W
TF

5_Z

6H

W
82
oR

R5

C6

TS
oA839 _C 5_ZU
55

JQ
7_

_Z
oL484 83_C6 FX2

852

6H

JV
C6

_C5_Z5_ZUO

_C

TS
FX

R4

71_C GFX_ UOWJ


oR

FX
HT
0_
R8

_Z

UO

V
oRR73 _C6TSF ZUWJQV

TS

WQV V
ST
4_

_C4_V

E_
6K
_C
2_C6 TS
83

oRR

_Z
47_
UO
oR

FX_ ZUOW

R6
5X

B_
84

411 _C

oRR

FX
ZO
V
oA

6S

TG
oRR
3_C6

OW
_C

6S

III
53_C6K oD438_ OWQV

IZU
oA

SDF ZUOW

IZU
oRR

C6
oA

846

_IZ
ZU

oP0

521
oA

WJ
39_C4T QV

G_

_Z
6_Z SFX4_
oRR

175

1_C
_ZV

JQ
oRR
840

TSX_I
84

WQ
569

UO
Z

OW
C5_
44

oRR

UO
55_ 42_C4_
oA

C4_Z

QV
ZO
UOW

519
oRR

93_C6T T_ZUOW
C3_

WQV

oRR172_ 5TS_IZUO

_C6

U
WV

6TS
oA

DT
oD

oRR5

WJ
X_Z

218_C6 _IUWJQV JQV

ZU
opA

JQV

_C6
oRR1
oD

W
V
727_

Q
85

QV

oM05
WJQV

_C6
oM06

799_
WJQV

oN658
oN659_C6T_I IZUOWJQV
550_

117_
_ZUO

QV
OW
TS_
oA

oT078
ZUOWJQV
oD437_C oRR837_
X_Z

C6_ZUOW
oRR027_ 6TSF_ZO

opRR161_C5_ UJQV
oRR783
oA826_C6TS_ZUOWJ
C5T

6TF_ZUOWJQV

TSF
opT761_C3_OWJ

oA836_C
UOW

opB320_C6TS_ZO
JQV
oA8

V
oA819_C6SX_IZUOWJQ

oA837_C6TSX_ZUOWV
oA838_C6TS_IZOWV

TS_
49_C

C6T

_ZO
opT761_C2DT
oA811_C6TSF_ZUWJQV

oRR730_C6TX_ZUOWJQV
oA816_C6HTBFDX_ZUOWJQV

oM065_C6TFX_IZUOWJQ

_IZU
JQ

opT 2 3 5 -1 2 _ C5 _ U0 JQ

oA835_ C6_ ZUO

69_C6TS_ OWJQV
oA822_ C6T_ ZUOWJQV

oRR261_ C6TX_ IZUOWJQ

JV
oA821_ C6TD_ ZUOWJQV

oRR804_ C6_ IZUOWV

C6B
S_ZUOWJQ

5_C6T_IZU IZUOWJQV

_IZ
TSF

oRR

C6TF_ZU JQV
1_C6T
_ZUO

_ZU

QV
C6T

oD4

C6T_
C6SF

_IZUO

IZW
ZUO
_C6HT
opT 2 3 5 -1 4 _ C5 _ U0
5TF_UOW

WJ
_C5_Z

OW
AS_Z

OW
X_IZ
oA8

OW
_C6T_Z
40_

QV

QV
IZOW

JQV
WJQ
6TX_ZOWJQV

KDF_

JQV
_C6T_
7_C6

oA832_
_C6TX

UOW
8_C6

JQV
oD4

S_IZU

ZUOW

JQV
U

OWJ

V
IZUOW
oA820_C6T
oA8

oA82

opA824_C
oA82

oA810_C

WJQ
S_OWJQV
oP088

QV
oA833

WJQV

JQV
WV

V
JQ

b Scaffold 30672 Cadherins

100 kb

20 kb
Ova

Testes

Viscera

PSG

Suckers

Skin

St15

Retina

OL

Supra

Sub

ANC
Scaffold 9600

20 kb

3 2 1 0 1 2 3
Row Z-score

Figure 2 | Protocadherin expansion in octopus. a, For a larger version of contain the two largest clusters of protocadherins, with 31 and 17, respectively.
panel a, see Extended Data Fig. 11. Phylogenetic tree of cadherin genes in Hsa Clustered protocadherins vary greatly in genomic span and are oriented in a
(red), Dme (orange), Nematostella vectensis (mustard yellow), Amphimedon head-to-tail manner along each scaffold. c, Expression profiles of 161
queenslandica (yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus protocadherins and 19 cadherins in 12 octopus tissues; 7 protocadherins were
kowalevskii (purple). I, Type I classical cadherins; II, calsyntenins; III, octopus not detected in the tissues sampled. Cells are coloured according to number of
protocadherin expansion (168 genes); IV, human protocadherin expansion (58 standard deviations from the mean expression level. Protocadherins have high
genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical expression in neural tissues. Cadherins generally show a similar expression
cadherins. Asterisk denotes a novel cadherin with over 80 extracellular pattern, with the exception of a group of sucker-specific cadherins.
cadherin domains found in Obi and Cte. b, Scaffold 30672 and Scaffold 9600

aa Scaffold 19852 Figure 3 | C2H2 ZNF expansion in octopus.


a, Genomic organization of the largest C2H2
cluster. Scaffold 19852 contains 58 C2H2 genes
100 kb that are transcribed in different directions.
bb cc
0.045
b, Expression profile of C2H2 genes along Scaffold
C2H2 Zinc finger, tandem
C2H2 Zinc finger, non-tandem 19852 in 12 octopus transcriptomes. Neural and
0.25
0.04
c finger, tandems Not clustered developmental transcriptomes show high levels of
ger, non-tandem
Clustered expression for a majority of these C2H2 genes.
0.035
In a and b, arrow denotes scaffold orientation.
Scaffold 19852 C2H2 genes

0.20
0.03 c, Distribution of fourfold synonymous site
0.15
transversion distances (4DTv) between C2H2-
Fraction
frequency

0.025
domain-containing genes.
0.02
0.10
0.015

0.05
0.01

0.005
0.00
5

25

5
02

17

32

47

77

92
6

0
0.

0.

0.

0.

0.

0.

0.
ANC
Ova

Testes

Viscera

PSG

Suckers

Skin

St15

Retina

OL

Supra

Sub

0 0.2 0.4 0.6 0.8 1


4DTv4DTv
distance

3 2 1 0 1 2 3
Row Z-score

2 2 2 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5
G2015 Macmillan Publishers Limited. All rights reserved
LETTER RESEARCH

squid protocadherin expansions may reflect the notable differences most of which are tandemly arrayed in clusters (Extended Data Fig. 7).
between octopuses and decapodiforms in brain organization, which These subunits lack several residues identified as necessary for the
have been most clearly demonstrated for the vertical lobe, a key struc- binding of acetylcholine26, so it is unlikely that they function as acetyl-
ture in cephalopod learning and memory circuits2,22. Finally, the inde- choline receptors. The high level of expression of these divergent sub-
pendent expansions and nervous system enrichment of protocadherins units within the suckers raises the interesting possibility that they act as
in coleoid cephalopods and vertebrates offers a striking example of sensory receptors, as do some divergent glutamate receptors in other
convergent evolution between these clades at the molecular level. protostomes27. In addition, we identified 74 Aplysia-like and 11 verte-
As with the protocadherins, we found multiple clusters of C2H2 brate-like candidate chemoreceptors among the octopus GPCR super-
ZNF transcription factor genes (Fig. 3a and Supplementary Note 8.4). family of ,330 genes (Extended Data Fig. 6).
The octopus genome contains nearly 1,800 multi-exonic C2H2- We found, amid extensive transcription of octopus transposons,
containing genes (Table 1), more than the 200400 C2H2 ZNFs found that a class of octopus-specific short interspersed nuclear element
in other lophotrochozoans and the 500700 found in eutherian sequences (SINEs) is highly expressed in neural tissues (Supplemen-
mammals, in which they form the second-largest gene family23. tary Note 4 and Extended Data Fig. 8). Although the role of active
C2H2 ZNF transcription factors contain multiple C2H2 domains that, transposons is unclear, elevated transposon expression in neural
in combination, result in highly specific nucleic acid binding. The tissues has been suggested to serve an important function in learning
octopus C2H2 ZNFs typically contain 1020 C2H2 domains but some and memory in mammals and flies28.
have as many as 60 (Supplementary Note 8.4). The majority of the Transposable element insertions are often associated with genomic
transcripts are expressed in embryonic and nervous tissues (Fig. 3b). rearrangements29 and we found that the transposon-rich octopus gen-
This pattern of expression is consistent with roles for C2H2 ZNFs ome displays substantial loss of ancestral bilaterian linkages that are
in cell fate determination, early development and transposon silencing, conserved in other species (Supplementary Note 6 and Extended Data
as demonstrated in genetic model systems23. Fig. 9). Interestingly, genes that are linked in other bilaterians but not
The expansion of the O. bimaculoides C2H2 ZNFs coincides with a in octopus are enriched in neighbouring SINE content. SINE inser-
burst of transposable element activity at ,25 Mya (Fig. 3c). The flank- tions around these genes date to the time of tandem C2H2 expansion
ing regions of these genes show a significant enrichment in a 7090 base (Extended Data Fig. 9d), pointing to a crucial period of genome evolu-
pair (bp) tandem repeat (31% for C2H2 genes versus 4% for all genes; tion in octopus. Other transposons such as Mariner show no such
Fishers exact test P value ,1 3 10216), which parallels the linkage of enrichment, suggesting distinct roles for different classes of transpo-
C2H2 gene expansions to b-satellite repeats in humans24. We also sons in shaping genome structure (Extended Data Fig. 9c).
found an expanded C2H2 ZNF repertoire in amphioxus (Table 1), Transposable element activity has been implicated in the modifica-
showing a similar enrichment in satellite-like repeats. These parallels tion of gene regulation across several eukaryotic lineages29. We found
suggest a common mode of expansion of a highly dynamic transcrip- that in the nervous system, the degree to which a genes expression is
tion factor family implicated in lineage-specific innovations. tissue-specific is positively correlated with the transposon load around
To investigate further the evolution of gene families implicated in that gene (r2 values ranging from 0.49 in the optic lobe to 0.81 in
nervous system development and function, we surveyed genes assoc- the subesophageal brain; Extended Data Fig. 8 and Supplementary
iated with axon guidance (Table 1) and neurotransmission (Table 2), Note 4). This correlation may reflect modulation of gene expression
identifying their homologues in octopus and comparing numbers by transposon-derived enhancers or a greater tolerance for transposon
across a diverse set of animal genomes (Supplementary Notes 810). insertion near genes with less complex patterns of tissue-specific gene
Several patterns emerged from this survey. The gene complements regulation.
present in the model organisms D. melanogaster and C. elegans often Using a relaxed molecular clock, we estimate that the octopus and
showed striking departures from those seen in lophotrochozoans squid lineages diverged ,270 Mya, emphasizing the deep evolutionary
and vertebrates (Table 2 and Supplementary Note 10). For example, history of coleoid cephalopods8,30 (Supplementary Note 7.1 and
D. melanogaster encodes one member of the discs large (DLG) family, Extended Data Fig. 10a). Our analyses found hundreds of coleoid-
a key component of the postsynaptic scaffold. In contrast, mammals and octopus-specific genes, many of which were expressed in tissues
have four DLGs, which (along with other observations) led to sugges- containing novel structures, including the chromatophore-laden skin,
tions that vertebrates possess uniquely complex synaptic machinery25. the suckers and the nervous system (Extended Data Fig. 10 and
However, we found three DLGs in both octopus and limpet, suggesting Supplementary Note 11). Taken together, these novel genes, the
that vertebrate and fly gene number differences are not necessarily
diagnostic of exceptional vertebrate synaptic complexity (Supplemen-
tary Note 10.6). Table 2 | Ion channel subunits
Overall, neurotransmission gene family sizes in the octopus were
very similar to those seen in other lophotrochozoans (Table 2 and
Supplementary Note 10), except for a few strikingly expanded gene Obi Aca Lgi Cte Dme Cel Hsa

families such as the sialic acid vesicular transporters (sialins) Voltage-gated calcium channels 8 8 6 10 9 10 10
(Supplementary Note 10.2). We did find variations in the sizes of Voltage-gated sodium channels 3 2 3 2 4 0 13
neurotransmission gene families between human and lophotrochozo- Transient receptor potential channels 36 45 40 43 13 23 29
K1 channels
ans (Table 2 and Supplementary Note 10), but no evidence for sys- Voltage-gated 30 23 29 20 10 51 40
tematic expansion of these gene families in vertebrates relative to Calcium-activated, small/large conductance 12 8 9 6 3 6 8
octopus or other lophotrochozoans. Although some gene families were Inward rectifying 3 4 5 6 4 3 16
larger in mammals or absent in lophotrochozoans (for example, Two pore 12 9 12 14 11 47 15
Non-voltage-gated 27 21 26 26 18 72 39
ligand-gated 5-HT receptors), others were absent in mammals and Cys-loop receptors
present in invertebrates (for example, anionic glutamate and acetyl- Glutamate 21 15 47 36 30 15 18
choline receptors). The complement of neurotransmission genes Nicotinic acetylcholine 53 16 52 77 10 88 16
in octopus may be broadly typical for a lophotrochozoan, but our Inhibitory acetylcholine 3 2 5 2 0 4 0
5-HT3 0 0 0 0 0 1 5
findings suggest it is also not obviously smaller than is found in mam- GABA 6 5 4 9 3 7 19
mals. Glutamate-gated chloride channels 7 5 8 5 1 6 0
Among the octopus complement of ligand-gated ion channels, we Number of subunits of representative ion channel families in O. bimaculoides and across examined taxa.
identified a set of atypical nicotinic acetylcholine receptor-like genes, Dendrogram above species names shows their evolutionary relationships. Aca, Aplysia californica.

1 3 AU G U S T 2 0 1 5 | VO L 5 2 4 | N AT U R E | 2 2 3
G2015 Macmillan Publishers Limited. All rights reserved
RESEARCH LETTER

expansion of C2H2 ZNFs, genome rearrangements, and extensive 21. Noonan, J. P., Grimwood, J., Schmutz, J., Dickson, M. & Myers, R. M. Gene
conversion and the evolution of protocadherin gene cluster diversity. Genome Res.
transposable element activity yield a new landscape for both trans- 14, 354366 (2004).
and cis-regulatory elements in the octopus genome, resulting in 22. Shomrat, T. et al. Alternative sites of synaptic plasticity in two homologous fan-out
changes in an otherwise typical lophotrochozoan gene complement fan-in learning and memory networks. Curr. Biol. 21, 17731782 (2011).
23. Liu, H., Chang, L. H., Sun, Y., Lu, X. & Stubbs, L. Deep vertebrate roots for
that contributed to the evolution of cephalopod neural complexity and mammalian zinc finger transcription factor subfamilies. Genome Biol. Evol. 6,
morphological innovations. 510525 (2014).
24. Eichler, E. E. et al. Complex b-satellite repeat structures and the expansion of the
Online Content Methods, along with any additional Extended Data display items zinc finger gene cluster in 19p12. Genome Res. 8, 791808 (1998).
and Source Data, are available in the online version of the paper; references unique 25. Nithianantharajah, J. et al. Synaptic scaffold evolution generated components of
to these sections appear only in the online paper. vertebrate cognitive complexity. Nature Neurosci. 16, 1624 (2013).
26. Brejc, K. et al. Crystal structure of an ACh-binding protein reveals the ligand-
Received 26 December 2014; accepted 16 June 2015. binding domain of nicotinic receptors. Nature 411, 269276 (2001).
27. Croset, V. et al. Ancient protostome origin of chemosensory ionotropic glutamate
receptors and the evolution of insect taste and olfaction. PLoS Genet. 6, e1001064
1. Hanlon, R. T. & Messenger, J. B. Cephalopod Behaviour (Cambridge Univ. Press,
(2010).
1996).
28. Erwin, J. A., Marchetto, M. C. & Gage, F. H. Mobile DNA elements in the generation of
2. Young, J. Z. The Anatomy of the Nervous System of Octopus vulgaris (Clarendon diversity and complexity in the brain. Nature Rev. Neurosci. 15, 497506 (2014).
Press, 1971). 29. Chenais, B., Caruso, A., Hiard, S. & Casse, N. The impact of transposable elements
3. Wells, M. J. Octopus: Physiology and Behaviour of an Advanced Invertebrate on eukaryotic genomes: from genome size increase to genetic adaptation to
(Chapman and Hall, 1978). stressful environments. Gene 509, 715 (2012).
4. Bonnaud, L., Ozouf-Costaz, C. & Boucher-Rodoni, R. A molecular and karyological 30. Strugnell, J., Norman, M., Jackson, J., Drummond, A. J. & Cooper, A. Molecular
approach to the taxonomy of Nautilus. C. R. Biol. 327, 133138 (2004). phylogeny of coleoid cephalopods (Mollusca: Cephalopoda) using a multigene
5. Hallinan, N. M. & Lindberg, D. R. Comparative analysis of chromosome counts approach; the effect of data partitioning on resolving phylogenies in a Bayesian
infers three paleopolyploidies in the mollusca. Genome Biol. Evol. 3, 11501163 framework. Mol. Phylogenet. Evol. 37, 426441 (2005).
(2011).
6. Yoshida, M. A. et al. Genome structure analysis of molluscs revealed whole genome Supplementary Information is available in the online version of the paper.
duplication and lineage specific repeat variation. Gene 483, 6371 (2011).
Acknowledgements We thank C. T. Brown and J. Rosenthal for making Doryteuthis
7. Rosenthal, J. J. & Seeburg, P. H. A-to-I RNA editing: effects on proteins key to RNA-seq data available before publication; C. Ha, J. Orenstein, J. Brandenburger,
neural excitability. Neuron 74, 432439 (2012). M. Glotzer and H. Gui for bioinformatic assistance; S. Shigeno for help with tissue
8. Kroger, B., Vinther, J. & Fuchs, D. Cephalopod origin and evolution. Bioessays 33, dissection; C. Huffard and R. Caldwell for providing the O. bimaculoides specimen used
602613 (2011). for genomic DNA isolation; and E. Begovic for genomic DNA preparation. This work was
9. Herculano-Houzel, S., Mota, B. & Lent, R. Cellular scaling rules for rodent brains. supported by the Molecular Genetics Unit of the Okinawa Institute of Science and
Proc. Natl Acad. Sci. USA 103, 1213812143 (2006). Technology Graduate University (S.B. and D.S.R.) and by funding from the NSF
10. Grasso, F. W. & Basil, J. A. The evolution of flexible behavioral repertoires in (IOS-1354898) and NIH (R03 HD064887) to C.W.R. and from the NSF (DGE-0903637)
cephalopod molluscs. Brain Behav. Evol. 74, 231245 (2009). to Z.Y.W. This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC
11. Holland, P. W., Garcia-Fernandez, J., Williams, N. A. & Sidow, A. Gene duplications Berkeley, supported by NIH S10 instrumentation grants S10RR029668 and
and the origins of vertebrate development. Development (Suppl.), 125133 S10RR027303, and the University of Chicago Functional Genomics Facility, supported
(1994). by NIH grant UL1 TR000430.
12. Dietrich, F. S. et al. The Ashbya gossypii genome as a tool for mapping the ancient
Saccharomyces cerevisiae genome. Science 304, 304307 (2004). Author Contributions The Chicago and the OIST/Berkeley groups initiated their
13. Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient transcriptome and genome projects independently. In the subsequent collaboration,
genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617624 both groups worked closely on every aspect of the project. Chicago group: C.B.A., Z.Y.W.,
(2004). J.R.P. and C.W.R.; OIST/Berkeley group: O.S., T.M., E.E.-G., S.B. and D.S.R.
14. Putnam, N. H. et al. The amphioxus genome and the evolution of the chordate Author Information Genome and transcriptome sequence reads have been deposited
karyotype. Nature 453, 10641071 (2008). in the SRA as BioProjects PRJNA270931 and PRJNA285380. A browser of this
15. Duboule, D. The rise and fall of Hox gene clusters. Development 134, 25492560 genome assembly is available at (http://octopus.metazome.net/). Reprints and
(2007). permissions information is available at www.nature.com/reprints. The authors declare
16. Callaerts, P. et al. HOX genes in the sepiolid squid Euprymna scolopes: implications no competing financial interests. Readers are welcome to comment on the online
for the evolution of complex body plans. Proc. Natl Acad. Sci. USA 99, 20882093 version of the paper. Correspondence and requests for materials should be addressed
(2002). to C.W.R. (cragsdale@uchicago.edu) or D.S.R. (dsrokhsar@gmail.com).
17. Simakov, O. et al. Insights into bilaterian evolution from three spiralian genomes.
Nature 493, 526531 (2013). This work is licensed under a Creative Commons Attribution-
18. Zipursky, S. L. & Sanes, J. R. Chemoaffinity revisited: Dscams, protocadherins, and NonCommercial-ShareAlike 3.0 Unported licence. The images or other
neural circuit assembly. Cell 143, 343353 (2010). third party material in this article are included in the articles Creative Commons licence,
19. Chen, W. V. & Maniatis, T. Clustered protocadherins. Development 140, unless indicated otherwise in the credit line; if the material is not included under the
32973302 (2013). Creative Commons licence, users will need to obtain permission from the licence holder
20. Brown, C. T., Graveley, B. & Rosenthal, J. J. Loligo pealeii (Squid) Data Dump (http:// to reproduce the material. To view a copy of this licence, visit http://creativecommons.
ivory.idyll.org/blog/2014-loligo-transcriptome-data.html) (2014). org/licenses/by-nc-sa/3.0

2 2 4 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5
G2015 Macmillan Publishers Limited. All rights reserved
LETTER RESEARCH

METHODS expanded in O. bimaculoides, were manually curated and analysed. We searched


Data access. Genome and transcriptome sequence reads are deposited in the SRA the octopus genome and transcriptome assemblies using BLASTP and TBLASTN
as BioProjects PRJNA270931 and PRJNA285380. The genome assembly and with annotated sequences from human, mouse and D. melanogaster. Bulk analyses
annotation are linked to the same BioProject ID. A browser of this genome assem- were also performed using Pfam45 and PANTHER46. We used BLASTP and
bly is available at (http://octopus.metazome.net/). TBLASTX to search for specific gene families in deposited genome and transcrip-
Genome sequencing and assembly. Genomic DNA from a single male Octopus tome databases for L. gigantea, A. californica, C. gigas, C. teleta, T. castaneum,
bimaculoides31 was isolated and sequenced using Illumina technology to 60-fold D. melanogaster, C. elegans, N. vectensis, A. queenslandica, S. kowalevskii, B. floridae,
redundant coverage in libraries spanning a range of pairs from ,350 bp to 10 C. intestinalis, D. rerio, M. musculus and H. sapiens. Candidate genes were verified
kilobases (kb). These data were assembled with meraculous32 achieving a contig with BLAST47 and Pfam45 analysis. Genes identified in the octopus genome were
N50-length of 5.4 kb and a scaffold N50-length of 470 kb. The longest scaffold confirmed and extended using the transcriptomes. Multiple gene models that
contains 99 genes and half of all predicted genes are on scaffolds with 8 or more matched the same transcript were combined. The identified sequences from octo-
genes (Supplementary Note 1). pus and other bilaterians were aligned with either MUSCLE48, CLUSTALO49,
Genome size and heterozygosity. The O. bimaculoides haploid genome size was MacVector 12.6 (MacVector, North Carolina), or Jalview50. Phylogenetic trees
estimated to be ,2.7 gigabases (Gb) based on fluorescence (2.662.68 Gb) and were constructed with FastTree51 using the JonesTaylorThornton model of
k-mer (2.86 Gb) measurements (Supplementary Notes 1 and 2), making it several amino acid evolution, and visualized with FigTree v1.3.1.
times larger than other sequenced molluscan and lophotrochozoan genomes17. Synteny. Microsynteny was computed based on metazoan node gene families
We observed nucleotide-level heterozygosity within the sequenced genome to be (Supplementary Note 7). We used Nmax 10 (maximum of 10 intervening genes)
0.08%, which may reflect a small effective population size relative to broadcast- and Nmin 3 (minimum of three genes in a syntenic block) according to the
spawning marine invertebrates. pipeline described in ref. 17 (Supplementary Note 6). To simplify gene family
Transcriptome sequencing. Twelve transcriptomes were sequenced from RNA assignments we limited our analyses to 4,033 gene families shared among human,
isolated from ova, testes, viscera, posterior salivary gland (PSG), suckers, skin, amphioxus, Capitella, Helobdella, Octopus, Lottia, Crassostrea, Drosophila and
developmental stage 15 (St15)33, retina, optic lobe (OL), supraesophageal brain Nematostella. We required ancestral bilaterian syntenic blocks to have a minimum
(Supra), subesophageal brain (Sub), and axial nerve cord (ANC) (Supplementary of one species present in both ingroups, or in one ingroup and one outgroup. To
Note 2). RNA was isolated using TRIzol (Invitrogen) and 100-bp paired-end reads examine the effect of fragmented genome assemblies, we simulated shorter assem-
(insert size 300 bp) were generated on an Illumina HiSeq2000 sequencing machine. blies by artificially fragmenting genomes to contain on average 5 genes per scaffold
De novo transcriptome assembly. Adapters and low-quality reads were removed (Supplementary Note 6).
before assembling transcriptomes using the Trinity de novo assembly package In comparison with other bilaterian genomes, we find that the octopus genome
(version r2013-02-25 (refs 34, 35)). Assembly statistics are summarized in is substantially rearranged. In looking at microsyntenic linkages of genes with a
Supplementary Table 2.2. Following assembly, peptide-coding regions were trans- maximum of 10 intervening genes, we found that octopus conserves only 34 out of
lated using TransDecoder in the Trinity package. We compared the de novo 198 ancestral bilaterian microsyntenic blocks; the limpet Lottia and amphioxus
assembled RNA-seq output to the genome to evaluate the completeness of the retain more than twice as many such linkages (96 and 140, respectively). This
genome assembly. To minimize the number of spuriously assembled transcripts, difference remains significant after accounting for genes missed through orthology
only transcripts with ORFs predicted by TransDecoder were mapped onto the assignment as well as simulations of shorter scaffold sizes (Supplementary Note 6;
genome with BLASTN. Only 1,130 out of 48,259 transcripts with ORFs (2.34%) Extended Data Fig. 9b). Scans for intra-genomic synteny, and doubly conserved
did not have a match in the genome with a minimum identity of 95%. synteny with Lottia, were performed as described in Supplementary Note 6.
Annotation of transposable elements. Transposable elements were identified Transposable elements and synteny dynamics. The 5 kb upstream and down-
with RepeatScout and RepeatModeler36, and the masking was done with stream regions of genes were surveyed for transposable element (TE) content. For
RepeatMasker37, as outlined in Supplementary Note 4.2. The most abundant genes with non-zero TE load, their assignment to either conserved or lost bilater-
transposable element is a previously identified octopus-specific SINE38 that ian synteny in octopus was done using the microsynteny calculation described
accounts for 4% of the assembled genome. above. The number of genes for each category and TE class were as follows: 484
Annotation of protein-coding genes. Protein-coding genes were annotated by genes for retained synteny and 1,193 genes in lost synteny for all TE classes; 440
combining transcriptome evidence with homology-based and de novo gene pre- and 1,107, respectively, for SINEs; and 116 and 290, respectively, for Mariner.
diction methods (Supplementary Note 4). For homology prediction we used pre- Wilcoxon U-tests for the difference of TE load in linked versus non-linked genes
dicted peptide sets of three previously sequenced molluscs (L. gigantea, C. gigas, were conducted in R.
and A. californica) along with selected other metazoans. Alternative splice iso- To assess transposon activity we assigned transcriptome reads aligned to
forms were identified with PASA39. Annotation statistics are provided in 5,496,558 annotated transposon loci using BEDTools44. Of these, 2,685,265 loci
Supplementary Table 4.1.1. Genes known in vertebrates to have many isoforms, showed expression in at least one of the tissues.
such as ankyrin, TRAK1 and LRCH1, also show alternative splicing in octopus but RNA editing. RNA-seq reads were mapped to the genome with TopHat52, and
at a more limited level. Octopus genes with ten or more alternative splice forms are SAMtools43 was used to identify SNPs between the genomic and the RNA
provided in Supplementary Table 4.1.2. sequences. To identify polymorphic positions in the genome, SNPs and indels
Calibration of sequence divergence with respect to time. The divergence were predicted using GATK HaplotypeCaller version 3.1-1 in discovery mode
between squid and octopus was estimated using r8s40 by fixing cephalopod diver- with a minimum Phred scaled probability score of 30, based on an alignment of
gence from bivalves and gastropods to 540 Mya8. Our estimate of 270 Mya for the 350 bp and 500 bp genomic fragment libraries using BWA-MEM version
the squidoctopus divergence corresponds to mean neutral substitution rate of dS 0.7.6a. Using BEDTools44, we removed SNPs predicted in both the transcriptome
,2 based on the protein-directed CDS alignments between the species (Supple- and the genome and discarded SNPs that had a Phred score below 40 or
mentary Fig. 6.1.2) and a dS estimation using the yn00 program41. Throughout the were outside of predicted genes. SNPs were binned according to the type of
manuscript we convert from sequence divergence to time by assuming that dS nucleotide change and the direction of transcription. Candidate edited genes were
,1 corresponds to 135 million years. For example, unlinked octopus protocad- taken as those having SNPs with A-to-G substitutions in the predicted
herins appear to have expanded ,135 Mya based on mean pairwise dS ,1, after mRNA transcripts.
octopuses diverged from squid. In contrast, clustered octopus protocadherins are Cephalopod-specific genes. Cephalopod novelties were obtained by BLASTP
much more similar in sequence (mean pairwise dS ,0.4, or ,55 Mya). and TBLASTN searches against the whole NR database53 and a custom database
Quantifying gene expression. Transcriptome reads were mapped to the genome of several mollusc transcriptomes (Supplementary Note 11.1). To ensure that we
assembly with TopHat 2.0.11 (ref. 42). A range of 7690% of reads from the had as close to full-length sequence as possible, we extended proteins predicted
different samples mapped to the genome. Mapped reads were sorted and indexed from octopus genomic sequence with our de novo assembled transcriptomes,
with SAMtools43. The read counts in each tissue were produced with BEDTools using the longest match to query NR, transcriptome and EST sequences from
multicov program44 using the gene model coordinates. The counts were normal- other animals. Gene sequences with transcriptome support but without a match
ized by the total transcriptome size of each tissue and by the length of the gene. to non-cephalopod animals at an e-value cutoff of 1 3 1023 were considered for
Heat maps showing expression patterns were generated in R using the heatmap.2 further analysis. Octopus sequences with a match of 1 3 1025 or better to a
function. sequence from another cephalopod were used to construct gene families, which
Gene complement. Gene families of particular interest, including developmental were characterized by their BLAST alignments, HMM, PFAM-A/B, and
regulatory genes, neural-related genes, and gene families that appear to be UNIREF90 hits. The cephalopod-specific gene families are listed in the Source

G2015 Macmillan Publishers Limited. All rights reserved


RESEARCH LETTER

Data file for Extended Data Fig. 10. Octopus-specific novelties were defined as 45. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42,
sequences with transcriptome support but without any matches to sequences D222D230 (2014).
from any other animals (,1 3 1023), including nautiloid and decapodiform 46. Mi, H., Muruganujan, A. & Thomas, P. D. PANTHER in 2013: modeling the evolution
of gene function, and other gene attributes, in the context of phylogenetic trees.
cephalopods. Nucleic Acids Res. 41, D377D386 (2013).
31. Pickford, G. E. & McConnaughey, B. H. The Octopus bimaculatus problem: a study 47. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein
in sibling species. Bulletin of the Bingham Oceanographic Collection 12, 166 database search programs. Nucleic Acids Res. 25, 33893402 (1997).
(1949). 48. Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time
32. Chapman, J. A. et al. Meraculous: de novo genome assembly with short paired-end and space complexity. BMC Bioinformatics 5, 113 (2004).
reads. PLoS ONE 6, e23501 (2011). 49. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence
33. Naef, A., Boletzky, S. v. & Roper, C. F. E. Cephalopoda. Embryology (Smithsonian alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
Institution Libraries, 2000). 50. Waterhouse, A. M., Procter, J. B., Martin, D. M., Clamp, M. & Barton, G. J. Jalview
34. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data Version 2a multiple sequence alignment editor and analysis workbench.
without a reference genome. Nature Biotechnol. 29, 644652 (2011). Bioinformatics 25, 11891191 (2009).
35. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using 51. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2approximately maximum-
the Trinity platform for reference generation and analysis. Nature Protocols 8, likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
14941512 (2013). 52. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq
36. Smit, A. & Hubley, R. RepeatModeler Open-1.0. (20082010). experiments with TopHat and Cufflinks. Nature Protocols 7, 562578 (2012).
37. Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0. (19962010). 53. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a
38. Ohshima, K. & Okada, N. Generality of the tRNA origin of short interspersed curated non-redundant sequence database of genomes, transcripts and proteins.
repetitive elements (SINEs). Characterization of three different tRNA-derived Nucleic Acids Res. 33, D501D504 (2005).
retroposons in the octopus. J. Mol. Biol. 243, 2537 (1994). 54. Palavicini, J. P., OConnell, M. A. & Rosenthal, J. J. An extra double-stranded RNA
39. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal binding domain confers high activity to a squid RNA editing enzyme. RNA 15,
transcript alignment assemblies. Nucleic Acids Res. 31, 56545666 (2003). 12081218 (2009).
40. Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and 55. Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic
divergence times in the absence of a molecular clock. Bioinformatics 19, 301302 trees. Bioinformatics 17, 754755 (2001).
(2003). 56. Starnes, T., Broxmeyer, H. E., Robertson, M. J. & Hromas, R. Cutting edge: IL-17D,
41. Yang, Z. & Nielsen, R. Estimating synonymous and nonsynonymous substitution a novel member of the IL-17 family, stimulates cytokine production and inhibits
rates under realistic evolutionary models. Mol. Biol. Evol. 17, 3243 (2000). hemopoiesis. J. Immunol. 169, 642646 (2002).
42. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with 57. Cummins, S. F. et al. Candidate chemoreceptor subfamilies differentially
RNA-seq. Bioinformatics 25, 11051111 (2009). expressed in the chemosensory organs of the mollusc Aplysia. BMC Biol. 7, 28
43. Li, H. & Durbin, R. Fast and accurate short read alignment with BurrowsWheeler (2009).
transform. Bioinformatics 25, 17541760 (2009). 58. van Nierop, P. et al. Identification of molluscan nicotinic acetylcholine receptor
44. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing (nAChR) subunits involved in formation of cation- and anion-selective nAChRs.
genomic features. Bioinformatics 26, 841842 (2010). J. Neurosci. 25, 1061710626 (2005).

G2015 Macmillan Publishers Limited. All rights reserved


LETTER RESEARCH

a Hsa_ADAT1 b
1

Mmu_ADAT1
1
Ocbimv22027735m_ADAT ADAT
0.33
Cin_ADAT
0.9
Lgi_231337
0.16
0.64

Cte_224434 ADAR1
Cte_220233
0.88
Lgi_139693
0.92
Z- Adenosine
Ocbimv22010033m_ADAR-like dsRBD
alpha deaminase
0.88

1
Hsa_ADAD1 ADAR-like
0.98
Mmu_ADAD1
1
Hsa_ADAD2
Mmu_ADAD2
Cte_171815
0.57 0.91
Cte_228448 ADAR2
Lgi_166687
0.59
Cin_ADAR Adenosine
Hsa_ADAR dsRBD dsRBD
0.14
1 deaminase
0.85
Mmu_ADAR
Cte_183692
ADAR1
0.16
1

1
Lgi_133731
Ocbimv22018643m_ADAR
0.85
Dme_ADAR ADAR-like/ADAD
Cin_ADAR2
1 Cte_176450
Hsa_ADARB1 Adenosine
1 dsRBD dsRBD
0.58 Mmu_ADARB1 deaminase
0.98

1
Hsa_ADARB2 ADAR2
0.18 Mmu_ADARB2
Lgi_128560
0.97
Ocbimv22009676m_ADAR2
1

1
Dop_ADAR_B
Dop_ADAR_A
c d

1500
Number of DNA-RNA differences

ADAR1 varia

A-C
A-G
1000 A-T
ADAR2
C-A
C-G
C-T
ADAR-like
G-A
/ADAD G-C
Viscera

Suckers

OL
Skin
St15

Sub
PSG
Ova

ANC
Testes

Supra
Retina

500 G-T
T-A
T-C
T-G

3 2 1 0 1 2 3 0

Row Z-score Ova Testes Viscera PSG Suckers Skin St15 Retina OL Supra Sub ANC
DNA RNAdiff

Extended Data Figure 1 | RNA editing in octopus. a, Approximate bimaculoides show prominent A-to-G changes. Histogram illustrates the
maximum likelihood tree of adenosine deaminases acting on RNA (ADARs) in number of DNARNA differences detected between coding sequences in
bilaterians. ADAR1, ADAR2, ADAR-like/ADAD, and ADAT (tRNA-specific the genome and 12 O. bimaculoides transcriptomes after filtering out
adenosine deaminase) were identified in Hsa, Mmu, Cin, Dme, Cte, Lgi, polymorphisms identified in genomic sequencing. Differences were binned
D. opalescens (Dop54), and Obi with ShimodairaHasegawa-like support by the type of change (see key) in the direction of transcription. A-to-G
indicated at the nodes. b, O. bimaculoides ADAR1, ADAR2 and ADAR-like changes are the most prevalent, particularly in neural tissues and during
proteins contain one or two double-stranded RNA binding domains (dsRBD) development, paralleling the expression of octopus ADARs in c. Other types
as well as an adenosine deaminase domain. ADAR1 also has a z-alpha of changes were also detected at lower levels, possibly resulting from
domain. c, Expression profiles of the three ADAR genes found in 12 uncharacterized polymorphisms.
O. bimaculoides tissues by RNA-seq profiling. d, DNARNA differences in O.

G2015 Macmillan Publishers Limited. All rights reserved


RESEARCH LETTER

H. sapiens 107 kb

199 kb

117 kb

98 kb

B. floridae 448 kb
Hox1 Hox2 Hox3 Hox4 Hox5 Hox6 Hox7 Hox8 Hox9 Hox10 Hox11 Hox12 Hox13 Hox14 Hox15

D. melanogaster 392 kb // 320 kb


Lab Pb Zen
Zen Dfd Scr Ftz Antp
Zen

Ubx Abd-A Abd-B

C. teleta 243 kb // 22 kb
Lab Pb Hox3 Dfd Scr Lox5 Antp Lox4 // Lox2 Post2 Post1

L. gigantea 471 kb
Lab Pb Hox3 Dfd Scr Lox5 Antp Lox4 Lox2 Post2 Post1

O. bimaculoides 421 kb
Lab
474 kb
Scr
751 kb
Lox5
53 kb
Antp
137 kb
Lox4
437 kb
Lox2
231 kb
Post2
187 kb
Post1

Extended Data Figure 2 | Local arrangement of Hox gene complement in on three scaffolds17. L. gigantea has a single cluster with the full known
O. bimaculoides and selected bilaterians. At the top, the four compact Hox lophotrochozoan gene complement. In O. bimaculoides many of the scaffolds
clusters of H. sapiens and the single B. floridae cluster are depicted. The are several hundred kb long, and no two Hox genes are on the same scaffold.
D. melanogaster Hox complex is split into two clusters. We included genes in The positions of O. bimaculoides genes approximate their locations on
the D. melanogaster locus that are homologues of Hox genes but have lost their scaffolds. Dashed lines indicate that the scaffold continues beyond what is
homeotic function, such as fushi tarazu (ftz), bicoid, zen and zen2 (the latter shown. Scaffold length is depicted to scale with size noted on the left. Genes are
three are represented as overlapping boxes). Hox genes in C. teleta are found positioned to illustrate orthology, which is also highlighted by colour.

G2015 Macmillan Publishers Limited. All rights reserved


LETTER RESEARCH

Extended Data Figure 3 | Gene complement and gene architecture Spu, S. purpuratus; Hma, Hydra magnipapillata; Adi, Acropora digitifera.
evolution in metazoans. a, Principal component analysis of gene family For methods, see Supplementary Note 7.4. bd, MrBayes55 tree (constrained
counts. O. bimaculoides highlighted in green. Deuterostomes are indicated in topology) on binary characters of presence or absence of Pfam domain
blue, ecdysozoans in red, lophotrochozoans in green, and sponges and architectures (b), introns (c), or indels (d); scale bar represents estimated
cnidarians in orange. Xtr, Xenopus tropicalis; Gga, Gallus gallus; Tca, Tribolium changes per site. For methods, see Supplementary Note 7.3.
castaneum; Dpu, Daphnia pulex; Isc, Ixodes scapularis; Ava, Adineta vaga;

G2015 Macmillan Publishers Limited. All rights reserved


RESEARCH LETTER

a c
0810m 39309m
0811m
0816m 39310m
0819m
0820m 39311m
0821m
0822m 39312m
0824m
0826m 39316m
0827m
0828m 39317m
Sca old 30672

Sca old 9600


0830m 39318m
0832m
0833m 39320m
0835m
0836m 39322m
0837m
0838m 39323m
0839m
0840m 39324m
0841m
0842m 39326m
0843m 39327m
0844m
0846m 39328m
0848m
0851m 39329m
0852m
0853m 39332m
0854m
0855m 39333m

Testes

Sub
St15

Retina

OL
PSG

Supra

ANC
Viscera
Ova

Skin
Testes

Sub

Suckers
St15

Retina

OL
PSG

Supra

ANC
Viscera
Ova

Skin
Suckers

3 2 1 0 1 2 3
Row Z-Score

b d
s_ pc s_ pc
dh 11 Ct e_ dh 11 Ct e_
_X lin Ct e_ 13 95 _X lin
ke d_ Ct e_ 13 95
13 96 22 ke d_ 13 96 22
XP 09 XP
_0 02 _0 02 09
Am 74 13 Am 74 13
ph iPC 90 .1 ph iPC 90 .1
Am DH Am
Hs ph iPC 2 Hs ph iPC
DH
2
Hsa_Pa_PCD DH 1 Hsa_Pa_PCDHDH 1
Hsa _P H-1
_PC CDH-1 2 Hsa_P _PCD -12
Hsa
_PC
DH-19 7 Hsa CD H-17
Hsa_PC Hsa DH _a _PC H-19_a
_PC -19_c Hsa_PCHsa_PC DH-19
Hsa_PC DH_alp DH-10_1 Hsa_PC DH_alp DH-10_1
_c
ha-
Hsa_PCDH_alph C2_1 DH_ ha-C2_
a-C Hsa_PC alpha-C 1
Hsa_PC DH_alph 1_1
a-5_ Hsa_PC DH_alph 1_1
Hsa_PC DH_alph 1 a-5_
a-6_1 Hsa_PC DH_alph 1
Hsa_PCD DH_alph a-6_1
a-8_ Hsa_PCD DH_alph
Hsa_PCD H_alpha-11_ 1 a-8_
H_alpha- 1 Hsa_PCD H_alpha-11_ 1
Hsa_PCDH 10_1 Hsa_PCD H_alpha-10_11
Hsa_PCDH_alpha-12_ H_al
1 Hsa_PCDH pha-12_1
Hsa_PCDH__alpha-1_1
Hsa_PCDH_ alpha-2_1 Hsa_PCDH__alpha-1_1
alpha-13_1 Hsa_PCDH_ alpha-2_1
Hsa_PCDH_a alpha-13_1
Hsa_PCDH_alph lpha-3_1 Hsa_PCDH_a
Hsa_PCDH_alp lpha-3_1
Hsa_PCDH_alpha- a-4_1
Hsa_PCDH_alpha-97_1 Hsa_PCDH_alphha-4_1
_2 Hsa_PCDH_alpha-9 a-7_1
Hsa_PCDH_alpha-9_1 _2
Hsa_PCDH_beta-13 Hsa_PCDH_alpha-9_1
Hsa_PCDH_beta-8 Hsa_PCDH_beta-13
Hsa_PCDH_beta-15 Hsa_PCDH_beta-8
Hsa_PCDH_beta-6 Hsa_PCDH_beta-15
Hsa_PCDH_beta-5 Hsa_PCDH_beta-6
oE098_C4MR_IZV Hsa_PCDH_beta-5
Hsa_PCDH_beta-4 oE098_C4MR_IZV
oRR092_C2_IZ
Hsa_PCDH_beta-3 Hsa_PCDH_beta-4
oRR444_C2Y_IZ oRR092_C2_IZ
Hsa_PCDH_beta-2 opRR671_C3A_IZ Hsa_PCDH_beta-3 oRR444_C2Y_IZ
Hsa_PCDH_beta-7 oC831_C2X_IZ Hsa_PCDH_beta-2 opRR671_C3A_IZ
Hsa_PCDH_beta-10 oC826_C2Y_IZ Hsa_PCDH_beta-7 oC831_C2X_IZ
Hsa_PCDH_beta-9 oRR894_C2DY_IZ Hsa_PCDH_beta-10 oC826_C2Y_IZ
Hsa_PCDH_beta-141 oT039_C3X2_IZ Hsa_PCDH_beta-9 oRR894_C2DY_IZ
Hsa_PCDH_beta-1 a-12
oC832_C3_IZ Hsa_PCDH_beta-14 -11
oT039_C3X2_IZ
oE097_oT898_
Hsa_PCDH_betC5_1 Hsa_PCDH_betaa-12 oC832_C3_IZ
amma- oE093_C6TDS C3_IZ oE097_oT898_
Hsa_PCDH_ggamma-C4_1 opE100_C3D _IZUOJV Hsa_PCDH_betC5_1
amma- oE093_C6TDS C3_IZ
Hsa_PCDH_ gamma-C3_1 opC829_C3D R_UOWV Hsa_PCDH_ggamma-C4_1 opE100_C3D _IZUOJV
Hsa_PCDH__gamma-B2_1 opC825_C R_ZUOWV Hsa_PCDH_ gamma-C3_1 opC829_C R_UOWV
Hsa_PCDH _gamma-B 1
1_1 opRR578_ 3DR_UOW Hsa_PCDH__gamma-B2_11 opC825_C 3DR_ZUOW
V V
Hsa_PCDHH_gamma
-B7_ opC828_C C3DR_UOQ opRR578_ 3DR_UOW
-B6_1 oE096_C6 3DRS_UO V Hsa_PCDHH_gamma-B1_ 1 V
Hsa_PCD H_gamma 5_1 Hsa_PCD H_gamma
-B7_ opC828_C C3DR_UOQ
oRR576_ A_ZUOWV WV a-B6_1 oE096_C 3DRS_U V
Hsa_PCDDH_gamma-BB4_1 oT096_ C2_Z Hsa_PCD H_gamm B5_1
ma- oRR576_ 6A_ZUOWOWV
Hsa_PC DH_gam A11_1 oRR C3_Z Hsa_PCDDH_gamma- B4_1 V
ma- 026 oT096_ C2_Z
Hsa_PCDH_gam _1 oG808_ _C3A_Z Hsa_PC DH_gam A11_1
ma- C3_Z
ma-A12 1 C3S oRR
026
Hsa_PC DH_gam a-A10_ oG8
10-813 X_UOW
ma-
Hsa_PCDH_gam a-A12_1 oG808_ _C3A_Z
mm 9_1 oT1
Hsa_PC DH_ga mma-A 8_2 86_ _C6 JQ
Hsa_PC DH_ga
mm 10_1 oG8 C3S
10-813 X_UOW
oJ054_ C3L_Z TDNSF_ mma-A 9_1
Hsa_PC DH_ga mma-A _1 oG812 C6ATN ZUO Hsa_PC DH_ga mma-A 8_2
oT1
86_C3L_C6TDN JQ
_PC -A8 WJQV
Hsa DH_ga ma _1 oJ0 _C F_Z Hsa_PC DH_ga mma-A _1
oJ054_ _Z SF_
_PC H_gam ma-A5 52_ 6AT UO oG812 C6ATN ZUO
Hsa CD gam -A2
_1 oG809 C6TD NSF_Z WJQV Hsa
_PC
DH_ga ma
-A8
_1 _C F_Z
WJQV
Hsa_P CDH_ gamma -A1_1 _C6T NF_ZU UOWJ oJ0
oJ0 _PC H_gam ma-A5 52_ 6AT UO
oG8153_C3_ DNF_ OWJQ QV
Hsa CD _1 oG809 C6TD NSF_Z WJQV
Hsa_P CDH_ amma -A7_1 gam -A2
_g _1 opRR 5_C6 Z ZUOW V Hsa_P CDH_ amma -A1_1 oJ0 _C6T NF_ZU UOWJ
Hsa_PPCDH ma _g oG8153_C3_ DNF_ OWJQ QV
_gam ma-A64_1 oB33 544_ TDNF JQV Hsa_PPCDH ma _1
Hs a_ DH
_gam ma-A 23 _gam ma-A76_1 opRR 5_C6 Z ZUOW V
PC oJ05 3_C6 C5TD _ZUO Hsa_ PCDH
Hsa_ PCDH _gam 22 57 17 oF97 1_C3 TNF_ NFX_
WJ _gam ma-A 4_1 oB33 544_ TDNF JQV
e_ V Hsa_ PCDH _gam -A _ZUO
Hsa_ PCDH Ct 22 24 59 oB 3_C3 DA_Z IZUOWUOWJ a_ DH ma 57 23 oJ05 3_C6 C5T TD WJ
Hsa_ Ct e_ 18 22 26 oB 323_C4 HTSX JQV QV Hs PC _game_ 22 24 17 oF97 1_C3 TNF_ NFX_
o V
Ct e_ 22 79 88 oB 324_ S_ 2_UO Hsa_ PCDH Ct 22 59 oB 3_C3 DA_Z IZUOWUOWJ
Ct e_ 22 25 75 oB 326_ C6D2
ZV
WV Hsa_ Ct e_ 18 22 26 oB 323_C4 HTSX JQV QV
60
Ct e_ 22 87 87 oB 322_ C5_IZ X_IZ Ct e_ 22 79 88 oB 324_ S_ 2_UO
ZV
Ct e_ 21 23 90 oT 327_ C4SX QV Ct e_ 22 25 75 oB 326_ C6D2 WV
Ct
e_ e8 56 oB 840_ C4S_ _ZV Ct e_ 22
60 87 oB 322_ C5_I X_IZ
Ct 17 60 2
17 0 oR 317_ C4X_ ZV Ct
e_ 21 87 90 oT 327_ C4SX Z QV
e_
Ct C te 72 80 3 oB R075 C6DS IZ Ct
e_ e8 23 56 oB 840_ C4S_ _ZV
18 36 oB 318_ _C X_ Ct 17 60 2 oR 317_ C4X_ ZV
_2 9 oB 316_ C6 6_IZ IZUW e_ 17 0
oB R075 C6DS IZ
C te C te 3328 96 7 oB 310_ C6 D2_Z UOW JQ Ct C te 72 80 3 oB 318_ _C
18 36 X_
_2 19 63 4 oB 309_ C4_SD_I UW J V _2 9 oB 316_ C6 6_IZ IZUW
C te _2 56 03 5 oB 31 C Z ZUO JQV C te C te 3328 96 7 oB 310_ C6 D2_Z UOW JQ
C te _1 92 66 3 oL 31 1_C 4_Z JV _2 19 63 4 oB 309_ C4_SD_I UW J V
C te C te 95 13 5 oR 48 2_C 3_Z C te _2 56 03 5 oB 311_ C Z ZUO JQV
_1 32 6873 1 oB R882_C 4_Z C te _1 92 66 3 oL 312_ C 4_Z JV
C te C te 62 86 0 oK 33 3_ 4_ C te C te 95 13 5 oR 48 3_
7 oB 74 2_C C3_ Z C
oB R882_C 4_Z
Z
C tete 96 54 8 oK 32 9_C 6T Z _1 32 6873 1
C 44 7 0 3 oK 75 9_ 6M DS C te C te 62 86 0 oK 33 3_ 4_
_1 i7 1 7 0 a oK 74 3_ C6S S_Z F_ZU 7 oB 74 2_C C3_ Z
C te L g i7 1 d_ a oB 74 7_ C6T N_I W O
C tete 96 54 8 oK 32 9_C 6T Z
44 7 0 3 oK 75 9_ 6M DS
L g inke d_ 2 p oR 32 6_ C6S S ZU JV WJQ C
_1 i7 1 7 0 a
ke oF R 8_ C6T F_ NF_ O ooK 74 3_ C6S S_Z F_ZU
X-l in 0 9 6 5 oR 98 82 C6D P ZU ZU WJQ V C te L g i7 1 d_ a oB 74 7_ C6T N_I W O
1_ Y-l 1 2 3 1 9 2 7 1 oR R 5_ 6_C T SFD OW OW V L g inke d_ 2 p oR 32 6_ C6S S ZU JV WJQ
_ 0
H-1 -11 g i_ L g i5 i8 2 4 5 R12 712_ C6T 6S SF _I JQ Q ke
X-l in 0 9 6 5 o R 8_ C6T F NF O V
CD H L
R45908_ 3 9 4 2 4

9 P _U Z 1_ Y-l 1 2 3 1 9 2 7 1 o F9 82 C6D P _Z _Z WJQ


oF 67 2_ C6S S _IZ WJQZUO JQV

Lg 20
oR R i_ 1 i6 2 9 0 3

V
6_ C5_ MF _Z O WJQ V
0 7

_P CD oR RR 85_ 6_C T SF UOW UOW V


oT 98 5_ C6T S UO
97 1_ C6T B E_Z UW

_ 0
H-1 -11 g i_ L g i5 i8 2 4 5
L 0_ C7_ 1 5

i_ C6S U W
oR L g L g 1 3

P U _IZ O R12 712 C6 6S SF D_I JQ Q


oF 97 9_ C6D IZ SNF_ O

7_ C6M S NX U JQ

Hsasa_ Lg JQ V
L g g i5 C7X U

CD H L
R45 8_ 3 9 4 2 4

F_I OW UO WJV V 9
i_

oF 81 6_ C6_ 6M F_ZU

C6M

TM P_Z _U ZW V V

oF 67 2_ C6S S _IZ WJQZUO JQV


Lg 20
oR R90 i_ 1 i6 2 9 0 3

_
L g L g i7 1 0 6 2

0 7
Lg

_P CD 6_ C5 F
oC 81 5_ C S

oT 98 5_ C6T S UO F_ W
O

97 1_ C6T B E_Z UW
Lg i_ 1 i5 1 6 9 2

ZW V W C6S _ _ UO W JQV
L 0_ C7_ 1 5

i_
oE 67 3_ 6M W

oR L g L g 1 3
SF FX_Z UO UO JQV

oF 97 9_ C6D IZ SN

7_ C6M S NX U JQ
Q Hsasa_ Lg
Lg i_ 10 0 4 6 8 3 2

U IZ JQ
L g g i5 C7X U
oT R95 7_C 2_IZ _IZ

F_I OW UO WJV V
S _Z Z W V

JQ V
i_
X2_ U W WJQ

oF 81 6_ C6_ 6M F_ZU

C6M
2

L g L g i7 1 0 6 2
oR 81 C 2X 2X Z

Lg

oC 81 5_ C S
48 3 2

H
X _I O
oC 316_ C 6_C _I

Lg i_ 1 i5 1 6 9 2
UO OW V V

ZW V W
R74 Lg i_ i5 51 1090

oE 67 3_ C6M IZW

SF X_Z UO O
6

oI 315_ 88 C3X Z

Lg i_ 10 0 4 6 8 3 2

Q
W JQ

oT R95 7_ 2_ _IZ

SF _Z ZU WJQV
JQ V

X2_ U W WJQ V
oI RR 4_

2
R 6_C 51 26

JQ V

oR 81 C 2X 2X Z
op R31 C5D4DTS

V
2
oR oR 242_ 7_ 12

X _I O
oC 316_ C 6_C _I

UO OW V V
V
oR 318_ C IZ UO UOW JQ V

63
10 15

JQ

90
Lg 10

48 827_ 3_ 7_ZV V

oI 315_ 88 C3X Z
oI 657_ C3_ SF_Z IZ UOW JQ
R85 C ZU

oR 746_ i_ 10 15 10

W JQ
48
i_

oN 321_ C6 TSF_ _Z

oI RR 4_ 5D_I F_
oA 0_ _C6T C6_ 6_ZU6_Z

26

JQ V
oI 818_ C6 6DFX FX_Z WJQ WJQ

i5 51

op R31 C 4DTS
4_ TP X_IZ IZU W

oR 242_ C7_ 51 12
oC 854_ _C

V
oR 318_ C IZ UO UOW JQ V
V

JQ
10

R8 oL48R82 853_ C7_ V


7_ C C

oH R763 _C6H TS FX_IZ Q

oI 657_ C3_ SF_Z IZ UOW JQ


C6 BF UO W
2_ C6TS oA UO WJQQ

ZU
7_ 7_C C ZV
oR R276 _C6H TS UW W

i_

oN 321_ C6 TSF_ _Z
_Z O W

oR R277 _C6H B_IZ UO

oA 0_ _C6T C6_ 6_ZU6_Z


Lg
oA TSGFFX_Z 1_ QV

oI 818_ C6 6DFX FX_Z WJQ WJQ


W
oR R760 C6HT TG_Z WV

4_ TP X_IZ IZU W
_I F_IZ
C4 ZU WQVO

oC 854_ _C
oR 849_ _C6K IZUO Q

V
JQ
X_ UO ZU
85 W

oH R763 _C6H TS FX_IZ Q


oH R682 C6X_ G_ZO WJV V
oR
oR

C6 BF UO W
2_ C6TS oA UO WJQQ
R Lg
oA oA84 F_UO QV

oR 847_ _C6S

oR R276 _C6H TS IZUW W


TF X_

V
C5 C5 WV

O
_Z O W
oH R571 SGFX _IUOW

oR R277 _C6H B_
oA X_ZU _ZU

oA TSGFFX_Z 1_ QV
oR 2_C6 6TSX
ST OW
R

oR R760 C6HT TG_Z WV

W
oA 848_ OW

C4 ZU WQVO
oH
oA83 411_ C5_Z _Z

oR 849_ _C6K IZUO Q


R
R
oR 4_C6 6DTS X2_IZ

JQ
UO
9_C5 C5_Z U

X_ UO ZU
85 W

oH R682 C6X_ G_ZO


oR
oL48 83_C 6TSF OWJQ

oA oA84 F_UO QV
opA8 _Z UO

85
oL

oR 847_ _C6S
oD 846_ C5

oRR6 31_C 6TSF_ZUWJQV

TF X_

V
40_CUOW

C5 C5 WV
R4

TS E_ZO UO
oRR7 21_C TS_Z UOWJ
85 C4

WJV QV V

oH R571 SGFX _IUOW


W
_C6TS5TFX _ZUO V

oA X_ZU _ZU
83 55

oRR5 75_C6 TSF_Z OWJQV


4_

oR 852_C6 6TSX
OW
DFX__ZUOWVW

oRR1 69_C6 TS_ZU JQV

oA 848_ OW
4__

JQ
47_C _ZU FX4_ OWJQ

IZU
oH
oRR5 19_C6 ZUOW QV

oR

oA83 411_ C5_Z _Z


R8

KTSF 438_C WQV


C4

oR 4_C6 6DTS X2_IZ


oRR5 C6T_I UOWJ
oD439 UOWQ U

9_ C5_Z U
oP0 218_C6

oL48 83_C 6TSF OWJQ


_C4T_ V

UO
oRR

OW
ZV
oR

5_Z

oD 846_ C5

oRR6 31_C 6TSF_ZUWJQV


oRR

W
oD440 855_C 2_C4 0_

R4

TS E_ZO UO
oD437_ oRR837 _C3_Z

oRR7 21_C TS_Z UOWJ


oRR

85 C4

WJV QV V
OW V V
oA

oRR027C6TSF_ _C4_Z

_C6TS5TFX _ZUO V
843_
841_

oRR

83 55

oRR5 75_C6 TSF_Z OWJQV


SF_ZUO QV

4_
oRR172 C5TS_IZ

DFX__ZUOWVW
oA827_C6AS_ZUOW WV

93_

oA84 A840_ZUO
ZO

oRR1 69_C6 TS_ZU JQV


ST
oRR549_ C6TS_Z WJQV

oD440 855_C 2_C4 _C4_


84

JQ
oA 443_

oA826_C66S_ZUOW JQV

47_C _ZU FX4_ OWJQ


C6

oRR169_ 6T_IZUO UOWJQV

oRR5 19_C6 ZUOW QV


oA833_C6 T_ZUOWJJQV

oM055_C TKDF_IZ JQV

KTSF 438_C WQV


QV

oM061_C6 HTS_IZUW

727_C6 BX_IZU WJQV

oRR5 C6T_I UOWJ


WJQV

oN658_C6 _ZU

oD439 UOWQ U
ZOW

oN659_C6 T_IZUOWJQV

799_C6 TF_ZUO JQV

oP0 218_C6
oT078_C5 5_IZUOWJQ

V
_ZUO JQ
oA84 pA

oRR _C4T_ZVV
opRR161_C

117_C6 T_IZOW JQV


oA810_C6TF _C6_ZUOW

oRR
opT761_C3_OJQV

oR

5_Z
WJ

oRR
OWJQV
T_IZUOWJQ

WJQV
550

oD437_ oRR837 _C3_Z


JQ
oD

X_Z

oRR
oRR730_C6TX_ZUOWJQV

OW
oM065_C6TFX_IZUOWJQ

oA
oR

SF_ZOW _Z
o
85

843_
841_

oRR
op C5
_UOWJQ

QV

oRR172 C5TS_IZ
oA811_C6TSF_ZUWJQ

oA827_C6AS_ZUOW WV

93_
ZO

oRR549_ C6TS_Z WJQV


oA 443_

oA826_C66S_ZUOW JQV
C6

oRR169_ 6T_IZUO UOWJQV


_C6
oA819_C6SX_IZUOW

oA833_C6 T_ZUOWJJQV
oD

_C4
_ZUOW

oM055_C TKDF_IZ
oRR

QV
TX_IZUO

oM061_C6 HTS_IZUW
T_Z

727_C6 BX_IZU WJQV

UO
WJQV

oN658_C6 _ZU
oN659_C6 T_IZUOWJQV

799_C6 TF_ZUO JQV


T_IUW OWJQV

oT078_C5 5_IZUOWJQ

_ZUO JQ
V

opRR161_C _ZUJQV

117_C6 T_IZOW JQV


oA810_C6TF _C6_ZUOW
ZUO

oRR783_C6T ZOWJQV
opT761_C3_OJQV

oA836_C6TX_ ZOWV
WJ

opB320
oA820_C6TS_ZUO

OWJQV
T_IZUOWJQ

oA838_C
oA838

V
WJQV
550
IZW V

oA837_C6TSX_ZUO
Q
oD

opT761_C2DTS_OWJQ
X_Z

oA811_C6TSF_ZUWJQV

opB320_C6TS_ZOWV OWV
oRR730_C6TX_ZUOWJQV
_C6

JQ

oM065_C6TFX_IZUOWJQ
oA816_C6HTBFDX_ZUOWJQV

oRR261_C6TX_IZUOWJQ
opT235-12_C5_U0JQ

oA835_C6_ZUO
oA821_C6TD_ZUOWJQV
oA822_C6T_ZUOWJQV oRR804_C6_IZUOWV
85
opA824_C5TF_U

oA819_C6SX_IZUOWJ
_
53_C6

_C6
oD

oA828_C _C6SF_
oA

_ZUOW

opT235-14_C5_U0

WJV V
QV

TX_IZUO

T_Z
oA832

V
JQV

T_IUW OWJQV
V
oP088_C6
oA828_C

UOW
UOW

oA820_C6TS_ZUO
QV

IZW
C6T

_C6TS_
C6TS_IZ
opA824_C5TF_U
oA8

6TS_IZO
53_C6
oA

QV
oA832
oRR027

JQV
JQV

V
oP088_C6

UOW
UOW

QV

V
oA8

JQV
JQV
WV
V

50.0

Extended Data Figure 4 | Protocadherin genes within a genomic cluster are Scaffold 9600. Almost all of these protocadherins are most highly expressed in
similar in sequence and sites of expression. a, Expression profile of the 31 nervous tissues, with the exception of Ocbimv220039316m, which is most
protocadherin genes located on Scaffold 30672 in 12 octopus transcriptomes. highly expressed in the St15 sample. d, Phylogenetic tree highlighting Scaffold
Over three-quarters of the protocadherins are highly expressed throughout 9600 protocadherins in grey bars. As seen in b, protocadherins of the same
central brain, OL and ANC, while the others show more mixed distributions. scaffold tend to cluster together on the tree. Order of the genes in the heat maps
b, Phylogenetic tree highlighting Scaffold 30672 protocadherins in grey (a, c) follows the ordering on the corresponding scaffold.
bars. c, Expression profile of the 17 protocadherin genes located on

G2015 Macmillan Publishers Limited. All rights reserved


LETTER RESEARCH

a b A902
Mmu_AAX90603-1 A903
Hsa_AAA59134-1 A904
Mmu_EDL28238-1 Mammalian A905
Hsa_AAA74137-1 IL1 , 1 , & 7 A906
Mmu_AAI10554-1 A907
A908
Hsa_AAH47698-1
A910
Mmu_NM_145837 A911
Hsa_AAH36243-1 A912
Mmu_EDL09761-1 A914
Hsa_AAF28104-1 A915
Mmu_AAK59816_1 A918
Hsa_AAG40848-1 Mammalian A919
A920
Mmu_EDL11677_1 IL17
A921
Hsa_AAF28105_1
A922
Mmu_EDL14378-1 A923
Hsa_AAH67505-1 A924
Mmu_AAQ88439-1 A925
Hsa_AAH70124-1 A927
Cgi_KJ531893_1 A928
Cgi_KJ531897_1 A929
B403
Lgi_172928
B404
Cgi_KJ531894_1 B405
Cgi_KJ531895_1 B406
Lgi_152638 C804
Lgi_152641 C805
Lgi_152639 D104
Lgi_176347 Annelid & non- E183
cephalopod

PSG

Retina

Supra
St15

Sub

ANC
Testes

Skin

OL
Ova

Suckers
Viscera
Lgi_228210
Cte_199819 mollusc
Cte_207036 IL17-like
Cgi_KJ531896_1
Cte_209751
3 2 1 0 1 2 3
Cte_226557 Row Z-score
Cte_209750 c
Cte_209765
Cte_210775
Cgi_ABO93467-1
oIL17L_E183
oIL17L_A908
oIL17L_A910
oIL17L_A907
oIL17L_B404
oIL17L_A906
oIL17L_A904 Octopus
IL17-like
oIL17L_D104
oIL17L_A928
oIL17L_A927
oIL17L_A911
oIL17L_C805
oIL17L_C804
oIL17L_B406
oIL17L_B405
Octopus
oIL17L_A903
IL17-like
oIL17L_A905 A
oIL17L_A902 B

oIL17L_B403 Human C
IL17 D
oIL17L_A929 E
F
oIL17L_A922
oIL17L_A924
oIL17L_A925
oIL17L_A923 Lottia
IL17-like
oIL17L_A921
oIL17L_A912
oIL17L_A915
oIL17L_A919 Capitella
oIL17L_A918 IL17-like
oIL17L_A914
oIL17L_A920
Crassostrea
IL17

Extended Data Figure 5 | Expansion of interleukin 17 (IL17)-like genes. and the Scaffold D gene is enriched in the viscera. c, Conserved cysteine
a, Phylogenetic tree of interleukin genes in Obi, Cte, Cgi, and Lgi. Mammalian residues in human IL17 and invertebrate IL17-like proteins. The human IL17
IL1A, IL1B, and IL7 used as outgroups. Human and mouse IL17s branch proteins share a conserved cysteine motif comprising 4 cysteine residues, which
from other members of the IL family. Octopus ILs (as well as all identified may form interchain disulfide bonds and facilitate dimerization56. Octopus
invertebrate ILs) group with the mammalian IL17 branch and are named IL17-like proteins also contain this four-cysteine motif, highlighted in yellow.
IL17-like. The 31 octopus genes are distributed across 5 scaffolds: scaffold A One octopus sequence encodes only 3 of these highly conserved cysteine
(Obi_A), 23 members; scaffold B (Obi_B), 4 members; scaffold C (Obi_C), 2 residues. These four cysteines are also present to varying degrees in Lottia,
members; scaffolds D (Obi_D) and E (Obi_E), 1 member each. b, Expression Capitella and Crassostrea sequences. Two additional conserved cysteine
profile of 31 octopus IL17-like genes. Heat map rows are arranged by order residues were found in the octopus sequences and are highlighted in red. The
on each scaffold. Blank rows indicate genes not expressed in our first cysteine residue is found in all invertebrate sequences examined, and none
transcriptomes. The 27 genes found in our transcriptomes have strong of the mammalian IL17 sequences.
expression in the suckers and skin. The scaffold C genes are enriched in the PSG

G2015 Macmillan Publishers Limited. All rights reserved


RESEARCH LETTER

a b

Vertebrate-like
Chemosensory

c *
Opsin

d
Frizzled

e

Aplysia-like
Chemosensory
Adhesion

PSG

Sub
Supra
Ova
Testes

Retina
Viscera

Suckers

OL
Skin
St15

ANC
3 2 1 0 1 2 3
Row Z-score
Sub
Supra

ANC
PSG
Testes
Ova

Viscera

Suckers
Skin
St15

OL
Retina

Extended Data Figure 6 | G-protein-coupled receptors. GPCRs, also the Aplysia chemosensory GPCRs57 and 11 GPCRs are similar to vertebrate
known as 7-transmembrane (7TM) or serpentine receptors, form a large olfactory receptors. c, We identified 4 opsins in the octopus genome (from
superfamily that activates intracellular second messenger systems upon ligand top to bottom): rhodopsin, rhabdomeric opsin, peropsin, and retinochrome.
binding. This figure considers a subset of the 329 GPCRs we identified in d, The octopus class F GPCRs comprises 6 genes: 5 Frizzled genes and 1
O. bimaculoides. The full complement of GPCRs is presented in Supple- Smoothened gene (*). e, Thirty octopus genes show similarity to vertebrate
mentary Note 8.5. a, b, As reported for other lophotrochozoan genomes, the adhesion GPCRs.
octopus genome contains chemosensory-like GPCRs; 74 GPCRs are similar to

G2015 Macmillan Publishers Limited. All rights reserved


LETTER RESEARCH

Alpha 7
Beta 1
Alpha

Obi_22010697m+
Alpha 1-4

Obi_22028723m
a 9/10

Obi_2 2017

1
248
9505
Dme_030160
Cte_221893

74 8m
Cte_141057
b like

Cte_16 5266
Lgi_136425

CHa7
Lgi_ 1412
Mmu_A

Dme_ 0089
Mm u_

Hsa_AC
Hsa_A

Ha7
Dme _007

6
Alpha 1-4

5
*

12 01

73 15
20 00

6
Ct e_

95
Ct e_

88 42
Mmu_A

5
Hsa_AC
732m

14 30

89 39

e_ 03 4 0 6 1
AC Ha
CHa10
Ct e_

CHa9
like

2
Beta 1

Lg i_1
22 78

Ha9

e_ 00
Ob i_2
C te
C te

78

03 34
21 54

e_ 02

3
C te

Ct e_

02
e_ 02

08
23 82
_2 37
_1 12

10
21

84
C te

Dm

e_0
_1 35 79
C te

17

Dm

48
Alpha 7

00
C te

43
Dm
_6

5
41
31 3
_1

Dm

19

4
e_

C te 1 6 7 1
52 1

Dm
50

48
C te

_1

35

_9
C te

5
Dm

14
C te 2 2 1
94

40

C te
_6

5
_

m
_1

52
C te

67
55

C te
5
28

51
_
Lg

3
23

52
_1

84
9
_8

96

07
i_
Lg

87

_1

2
21

90

00
21
i_

08
3
Alpha 9/10

C te
95

98
Lg

19
O

22
i_

30

3
b i_

i_

47

Lg
68

14
i_
Lg

C te _ 1 4
O

7m
64

Ob
22

2
60
34
b i_

i_

13

60
37

C te

m
73
02

_1
O

22

b i_ 1 8 6

50
22 032
b i_

3
29
97
m

00

48
C
22 48

72

i_

b i_ 2 2
te

73

03
48

Lg
03

_2

m
64
68 03

21
Lg a4

O
79 22

69
C i_ H
O 10 C
Mammalian

O
m b i_

6
b i_ te a4
_1 31 + O u _A H
22 11 2
0 0 592 M
m AC Ha
a_
Alpha Ob C te
Lg
i_
93
21 05m
44
Hs
m
u_
AC
Ha
2
3
Alpha 1-6
i_ 07 M AC Ha
22
74
_2
0304
30
10
Hs
a_
u_
AC
Ha
3 Beta 1-4
9 m
Lg 12 M AC Ha
6
Ob C te 21
m
i_ 5 + H s a_
u_
AC 6
Delta
i_ 2 _6 59
Mm Ha
AC
20
24
87
25
98
Hs
a_
AC
Ha
5
Gamma
Lg 9m
m u_ a5
Ob C te
i_ 5
24
81
+ M
_A
CH
Hb
3 Epsilon
i_ 2
20
35
_5
25
45
Hsa
u_
AC Alpha
82 Mm CH
b3
Lg i_ 6 m _A
10 25 + Hsa Ha1
C te 97 _AC A1
O bi _2 28 Hsa HR
_2 20 74 5 _AC A1
06 96 Hsa HR
4m u_AC
Lg i_
15 22
+ Mm G
89 ACHR
Ct e_ Hsa_ G
52 94 ACHR
Ct e_ 0 Mmu_ E
Ob i_2
20 37
20 12

40 5m
63 Hs a_ACHR
ACHR
E *
Mmu_
Ob i_2
Lg i_1
52 29
0 Hsa_AC
HRD *
200 091
1m + Mmu_A
CHRD *
Lgi_ 523 HRB1
* Cte_ 1991
85
33
Hsa_AC
Mmu_AC
HRB1
Beta 3 Lgi_1 6244
1 Mmu_ ACHb
2
Non-Alpha
Dme_0 077724
Hsa_ACHb2
Lgi_12821 2 Mmu_ACH b4
Lgi_168269 Hsa_ACHb4
Obi_22034659m Obi_22012266m
Obi_22006184m+ Obi_22012265m

Obi_2200 6182m Obi_2201 2263m+


m Obi_22 029097
Obi_22 034660 m
6 Obi_2 2029
Cte_1 7608 099m
98 Obi_ 2201
Cte_ 2240 2259 m
483 Obi _22
Cte _24 004 961
m
823 3 Ob i_2
201 812
Cte _21 7m
2 Ob i_2
99 41 20 18
Cte _1 Ob i_2 12 9m
92 +
21 39 20 12
Ct e_ Ob i_2 26 0m
33 5m 20 12
20 27 Ob i_ 26 2m
Ob i_2 58
e_ 21 45 22 01
22 61
Ct 2 O bi
26 01 _2 20 m
Ct e_ O bi 30 09

C te
_5 29
79
12 O bi
_2 20
30 84
4m
Putative
_2 20 2m
_9 01 2
C te 06
O bi
_2 20
30 84
0m Non-binding
_2 04 1 Ob 30 84
C te
_1
14
50
21
Ob
i_ 2
20
06
3m Putative
C te 54 i_ 2 51
7m
11 Ob 20
C te
_
_1
82
08
3
3 Ob
i_ 2
20
06
51
8m
Non-
72 i_ 2 15
C te 56
C te
_2
07
29
28
Ob
Ob
i_ 2
20
06
51
0m binding
_5 92 i_ 2
20
06 9m
C te 31 Ob 52
_9 43 i_
20
06 2m
C te 28 Lg 22 52
_9 79
8 i_ 00 1m
C te 05 + Lg 67
95
65
20
_1 6m Lg
i_
16 8 m
Alpha C te 4 5 5 7 5 2
01 2
01
Lg
i_
96
90
51
22 23 L g i_ 1 6
32
i_ i_ 19 + 5
b Lg 16 4 m L g i_ 9 22
O
Lg
i_ 4 91 6 11
* 03
36 88
i_
56 295

L g i_ 6
0

21 63
10

* 2 2 te _

i_
Lg
48

1
87

14 863
b i_
*
29

C
_1

Lg

69
m

O
_9
te

64
i_
Lg
29
97

*
C

te

13
i_
16
81
C

Lg
97

57
**
54

Ova
Testes
Viscera
PSG
Suckers
Skin
St15
Retina
OL
Supra
Sub
ANC
13
01

i_
16

Lg
95

69
38
11
i_
22

13

i_
16

Lg
83

0
Lg

48
96
b i_

i_

13

Lg
4
85

i_
91

75
Lg

27
10
Lg
2
O

i_

_2

i_ 5
12
00

*
0
Lg

36
Lg
8

i_ 1 3 0 2
C te

45
18
91
_2

Ob
64

62
i_ 5

**
21

46
12

Lg
4
_1
C te

18

i_ 2
59

4
Ob
1

09
_1
C te

i_ 1
74
_9

_6 30 7

C te
67

20

*
C te

53 68

i_ 2
O bi
61
C te

29

60
_1

Lg i_
94 9

10
_1 18
Lg i_
43

20
_1

80
_2 20
C te

Lg i_
37

83
Lg i_

Lg i_1
5
_1 37
10 42

33
C te

6
13 88
Ob i_2
36
99 76
18 41

1m
Lg i_7
7

24 7
17 23
C te

Lgi _94
174 7

18
62 90

Lgi_ 748
146 9

22 65
13 19

10 55
Lgi_ 1088
63

Lgi_1 0904
8

Lgi_10 8943
Cte_20 8789
Cte_209716

Cte_222058
Cte_226535

Cte_216740
Cte_115452
C te
Ct e_

5m
Cte_1 4665

39 81

87
Ct e_

Cte_ 1526
Ct e_

05
20 07
34 68
Cte _11
Cte _1

61
Cte _18
Ct e_

3m
361

8
92

97 4m
94

Non-
8

3 2 1 0 1 2 3
Alpha
Row Z-Score

Lst_AchBP Y N - A I S K P E V L T P Q L A R V V S D G E V L Y M P S I R Q R F S C D V S G V D T E S G - A T C R I K I G S W T H H S R E I S V D P T T - - N S D D S E Y F S Q Y S R F E I L D V T Q K K N S V T Y S C C P - E A Y
c Hsa_AchR7 Y N S A D E R F D A T F H T N V L V N S S G H C Q Y L P P G I F K S S C Y I D V R W F P F D V Q H C K L K F G S W S Y G G W S L D L Q M Q - - - E A D I S G Y I P N G E W D L V G I P G - K R S E R F Y E C C K - E P Y
Obi_10697+ Y N S A N E V F D A T Y P T N V L V S Y N G F C H W V P P G M F K S T C Q I D I A W F P F D D Q K C T L K F G S W T H D G R Y L D L Q L D G D G N G D T S S F I R N G E W K L I A V P G - S R N V V K Y D C C P Q I Y L
Obi_12266 CNSVSGDFSFDVDKEVTVKYDGFVHLHIDKIFKTYCRINVENYPFDQHECDITVCLEHQMYMEETIEDF---VIDVKLKTKSNQWNFSFEET-EMEKDD-------VI
Obi_12265 CNSVTGKFSFDDNKEVTVNRNGDVNLYIDKIFETYCRINVEKYPFDEHECDISVCFEHQMYVEETVGEF---DYEVKLQSASNQWDFNFEKS-DVENDN-------IV
Obi_12263+ CNGVMDRFKLDEDTEIFLTNEGTVFLYIDNVFQTYCRINVNKYPFDEHECDLLVCLNHQMRERKRKPSK---------------------------------------
Obi_29097 CNSASGKFTFDEDTGVTLTSNGNTSLYIDRIVNTYCKVNINKYPFDEHECDISVCFRHQINTEETLNNF---VYNVTYNPTYNQWEYTFKEK-DILKEG-------II
Obi_29099 CNSVTGKFTFDGNWGVTIKSDGSVHLHIDQIFHTYCKVNVNKYPFDEHECDISVCFEHQMNLEVMLHDF---MYRVTYKPISNQWDYRYEYR-EVEKEE-------II
Obi_12259 CNDMSGNFAVHK-GGATIEYDGTVTFHMDGIFQTYCTIDMHKYPFDEHECYIKSCLRHQKYKEQTIQNF---SFYNMYNSSSDTWDYKFVVG-DVMENG-------II
Obi_04961 CNDMSGKFSQHEGEGATVKYDGSVSLHMDGIFQTHCTINMLKYPLDEHECNITVCLGHQENIEKTMQSF---SFNNLHNAEADKWEYKFAVG-NVTEKE-------II
Obi_18127 CNSVEGKFKFDEDKQVSVRHDGIVNLNTEGIFNTYCEINMENYPFDEHIC----------------------------------------------------------
Obi_18129+ RNSAEDKFIFQKNKQVFIKYDGTINLHIDGTYRIYCRIDIDKYPFDEHICYLSICLGTEMENQETIQFQ--------------EWEFRLEKT-SEE----------NF
Obi_12260 SNAEKVTKIPSLSEYITVSYDGRTSYFIRSIYRTYCSIDFYKYPFSLNVCKIYFWLSNEMVSYLKLQNV---DLANTSNIWTTIWNIQLDGH-KYDDDNS------TD
Obi_12262 CNAETIYNVSTPIPEAVMLSNGTIKTSTTLVYTIHCKIDNTKYPFDKQACEVHICLPLSKLNNVRIKTI----TTFKKQVTLRNWNVEIDKV-LQNHNER------IF
Obi_12261 CNAIKLIGNHRYERQVTVWQNGVVEEESFYTYQLFCGVDNSRYPFDVQNCPTYICLPYQMNNLTLIKSL---RTDPVKNMEG--WHIHTSTE-PPITYND------QE
Obi_30094 FVNGLSAVESAAEPAIRLEYSGNLNKYQKLSLKTFCPTEKDQYSFS---CPFMLKTYPLPSTQERLRVT---DFEVNEKFQSHQWNAEVNTN-ETRIYNED-------
Obi_30842 CNSVNEQDDSNINREVYVHYNGTVELWSLKYIETYCQVNAYTYPFDDQKCKIQMCVGLHSPDETRLKTI---CYWNMKFTESYKWDIHFSGK-ANGINSQ------SS
Obi_30840 CNSVNEQGDSNINREVYVDYEGTVYLWSLKYIETYCQVNAYTYPFDVHYCGIEMCVPLHSPNETRIQTI---YYRNMNFTENYKWDIHFSGE-ANGKVEE------FS
Obi_30843 CNSMTQSEEKDSLDDVLIYYDGFVRMLSFTLLQTYCQVNAYSYPFDEHKCEIRMCSATYHTDEANVTSF---LLNVYSEEENYKWYMSISDQ-ETY----------SS
Obi_06517 CNSMENSEDKDDFPELWIFSNGHVVMYSFRLLNTYCEVNAYTYPFDKHMCEIYMCVALHSVQHTRIKTL---DYHELNFIQNYKWDITLEGT-VNATNDK------FN
Obi_06518 CNSMDKSEENDGVGELMLTYTGWINMWSFRLLHTYCQINAYTYPFDEHTCEIYLCVALHTINHTRIKEL---IYEDSKFTQNYKWDINVSGK-VNGTDEL------FS
Obi_15560 CNAMKESEEKGSFLEVKVFNNGRVQMRSLKLLKSYCTFDAYAYPFDQHDCEIYICVALHDPVHTRIRTL---TYDNLNYSPNYNWDIDYNGI-KNASDQR------FS
Obi_06519 CNSMKESDDEDNFPEVRIFNNGLVERWSLPLLQSYCEVNAYAYPFDEHICKIYMCIALHTPQHTQINTL---IYYDADHTQNYKWNVNISGE-MKGIKFS------FS
Obi_06522 CNTMKQSEDKDNPSEVSVYFNGSIEISLIKLLHTYCEINAYTFPFDEHTCNVSMCVSLQELHHAKRTKL---TYKS-RQAKHSKWDIKFSGG-TNGTNYYH-----YS
Obi_06521 FNSRTESKYKYSYQDVTVYSKGSVEMVSIRFLHTDCQIEAYIYPFDLQTCYIFLGIPTYKPQDTKIKEI---LCGKENDTTNYQWDITLYCN-VDSANKH------YN
Obi_06520 LNTLMETQSKNNFLEMTVDFNGSVTMVEIKLLQTFCEIYVYNYPFDAQTCVISMGIPSHKFQDTKIKEL---SCYRKSDISDSEWGISFSCN-VHGTNNS------FS

Extended Data Figure 7 | O. bimaculoides acetylcholine receptor (AChR) c, Divergent octopus subunits lack nearly all residues necessary for ACh
subunits. a, Phylogenetic tree of AChR subunit genes identified in Hsa, Mmu, binding. Alignment of sequence flanking the cysteine loop (yellow) of the
Dme, Cte, Lgi and Obi. Black asterisk indicates a Dme sequence that groups L. stagnalis ACh-binding protein (Lst_AchBP), the human and octopus alpha-
with alpha 1-4-like subunits despite lacking two defining cysteine residues. 7 receptor subunits (Hsa_AchR7, Obi_106971), and the 23 divergent AChR
b, Expression profiles of octopus AChR subunits. Genes ordered as in the tree subunits. Essential ACh-binding residues on the primary (pink) and
(a), starting from the grey arrow and continuing anticlockwise. Putative non- complementary (blue) side of the ligand-binding domain are indicated26, with
ACh-binding subunits are highly expressed in the suckers. One sequence conservative substitutions in a lighter shade. Outside of the binding residues,
was not detected in our transcriptome data sets. In a and b, red asterisks residues shared between the alpha-7 subunits are shaded in light grey, with
indicate subunits with the substitution known to confer anionic permissivity58. bold letters for conservative substitutions.

G2015 Macmillan Publishers Limited. All rights reserved


RESEARCH LETTER

Extended Data Figure 8 | Active transposable elements and gene expression (defined as having at least 75% of expression in a single tissue; see
expression specificity. a, Transposable element expression across 12 tissues. Source Data file for this figure). P value indicates the F-statistic for the
b, Correlation between the total transposable element (TE) load (in bp) in the significance of linear regression (H0: r2 5 0), with tissues with a P value #0.05
5 kb regions flanking the gene and the fraction of genes with tissue-specific indicated in pink.

G2015 Macmillan Publishers Limited. All rights reserved


LETTER RESEARCH

Extended Data Figure 9 | Synteny dynamics in octopus and the effect of loss rates. Branch lengths, estimated with MrBayes55, reflect extent of local
transposable element (TE) expansions. a, Circos plot showing shared synteny genome rearrangement (Supplementary Note 6). c, Enrichment of overall and
across 6 genomes. Individual scaffolds are plotted according to bp length; specific TE classes (base pairs masked) around genes from ancient bilaterian
scaffolds with no synteny are merged together (lighter arcs). Despite the large synteny blocks, including those absent in octopus (see key). Asterisks indicate
size of the octopus genome, only a small proportion of the scaffolds show MannWhitney U-test with P value ,0.02. d, Transposable element insertion
synteny. b, Synteny reduction in octopus quantified based on synteny inference history (JukesCantor distance adjusted, see text) into the vicinity of genes
using gene families with at least one representative in human, amphioxus, from lost synteny blocks. Note that only one SINE peak is present; a more
Capitella, Helobdella, Octopus, Lottia, Crassostrea, Drosophila, and recent peak (visible in All genomic SINEs) cannot be recovered from
Nematostella. Drosophila, Helobdella and Octopus show the highest synteny those insertions.

G2015 Macmillan Publishers Limited. All rights reserved


RESEARCH LETTER

Extended Data Figure 10 | Cephalopod phylogeny and novelties. a, Whole- octopus genome. c, d, Novel gene expression across multiple tissues. Bars depict
genome-derived phylogeny of molluscs and select other phyla showing the all cephalopod novelties; dark grey indicates sequences with no similarity to
relative position of octopus at the base of the coleoid cephalopods. For methods non-cephalopod genes using HMM searches (see Source Data for this figure).
see Supplementary Note 7.1. Members of the cephalopod class are indicated in c, Counts of tissue-specific novelties in a given tissue. d, Proportion of
blue, scale indicates number of substitutions per site. b, Phylogenetic tree of expression of novel genes versus total expression in individual tissues. CNS
reflectin genes. Reflectins are cephalopod-specific genes that allow for rapid (central nervous system) combines Supra, Sub, OL and ANC expression data.
and reversible changes in iridescence. Six reflectin genes were identified in the

G2015 Macmillan Publishers Limited. All rights reserved


LETTER RESEARCH

Hsa_dachs
*

ous_2_CR

_U
C80
IX

oRR738-738_hiCDH

709_
oRR

Hsa_P_PCDH-15
A_a_EAX04 H-8
VIII

oRR023_C14_Z

.2
1 54

9_oT
AmphiCdhr10
AmphiCdhr11
Hsa_CDCDH-11

4
Hsa_P_ CDH-22

Hsa_P_C

AmphiCdhr9
AmphiDCHS

3
Hs a_fla H_E o_1_ AG_ CDH 88528

Cte_181060
Hsa_CD
I

Hsa_ CDH-18_21

Lgi_174335

270
-16

AmphiCddhr1
NvHedgling

54.1
AmphiCd hr2

Lgi_ 229555 C12_I


Hsa_
Hsa_ DH-18_ 8

Hsa_CD -8
Hs _CD ing F_L _P 22 75 4

NvCdhr3 hr3
Amp
Hsa_C _CDH-1

Hsa_ H-19_1
a_fl min G AA 7p -8

Hsa_P
Hs _Flam _EG Hs C _21 7835 21

Cte_ iCdhr5
Hs a_C D

N vP CD H

Am _2207 9-938-
NvDCHS

Amph64541
NvCdhr5

Lgi_2192718
NvCdhr4
Hs

5_o

NvCdhr2
NvCdhr1

Hs CD hr23 AAG
Lgi_1634028

Lgi_ 883-4_ p
Lgi_ 4593
a

228
NvFAT
a

.1
AmphiC
am go F_L G0 GR _1

Hsa_C

Lgi_ 164592
CDH-
Dm Cte_ 2319 KX

oRR 164590
Hsa _CDH -9

H-24_1
CDH-11

6
C28X
T00 oRR phiCE914.1

942.1
Hsa _CDH 3

oRR 229554

458
ing _2_ AG 00 _2

DH

Hs a_CDDH-4_ _AAB
Hsa phiCd H_23_
Hsa 16195 R

Cte 696-92
Am phiCd 53
e_s

Lgi_1

Am a_CD hr17
Hs phiCd hr4
Lgi_vCELS ht

_CD
4 _ C 2 2 6 LS

Hs sa_ des smo glein in-3 _Ds c2b


o_3 AA _7 80
H

19_2

A8
N y_nig 4

DH-1 0

o mp CD H- ogle in- _1 sc3a a


tarr 18259 5

Hs a_C H-4 2
Lgi_5L3D10X

A a_ D m le -4 D 2
Hsaa_N-C H-2

Hs a_E DH-1 _1
Hs _ C DH

H sa_ C D H H_CA
Am W64 930.11

H sa_ de mo coll in-2 Ds


H-1

c
_E F6 pgr_ .1

8E _C R
VII

H sa_ des mo coll -2_


A

_a
-6

H sa_ des -13 _5

_
Hs sa_C DH-1-15
H sa_C DH -3

H sa_ des moc _1


H sa_ des mo ollin
H a_C -CD

26
_

H _ DH 3
a

D RR hiC H like in 4_2

in_
9
Hs

C te_ _s 6_ r1 _1 rote
Cte _1 257 AT

C me 58 dh -16 _p -2
a_

Ctete_2 hiF I7.2

te
C Amp V0 _fat
Fa
oR

g
t4_ D 22 63
R5

8
o R _C1

a
36

Q6 m e 0 0 8
Ctegi_1 C32 T_Wr6

A te 21 ho CX 3
R2 3E Nv CD H3 1
L 8_ L2 dh 2

C m _2 74 tgu 3
_2 051 _W

Am te_ phiC240 92 n
2 6 C H

1
7.
L _ p ph 8 8 h r 4

27
L gi_ p_ iC 36 19

iC 41 0 27 e
1 d 4

k
37
N v v CD D H - N
oR

nin

ph 161 246 _00 Rli


Amgi_ 23 XP ELS 2
DmCte C1 67 -lik 45

N C H 4
R
te

Nv CD 665 L2
1 4 L g Nv F _ 2 2
yn
e _ 2 4 3
0_ i_ AT 27
_2 5E 05 e als
VI _c

dh 7
oR

5
C2 II

r1
R1 L
C

H 18 2 5_
te

g
H sa Am 03 i_ 6 0 9

s
8 2 0 1
Hs sa_ _PC ph _C 104 23 0 T -1
_
a_ PC D iFA 33 48 i_ 22 8_ 21 in 3
PC D H_ T- LE 7 Lg te_ 05 166 561 nten nin-
Am D H_ Fa lik 4 C RR 1 81 y te -1
p H_ Fa t_ e o gi_ _1 als yn N nin
L h 2 L te _c als ST nte
Am gi_ iCd Fat t_3 C sa _c L lsy
o C p 15 h _1 H sa h a iC
Dm RR te_ hiC 420r12 H mp _C TN 51
e_ 899 228 dh 8 e
A m LS 04 0 0
C C D _ C 4 4 r8 D vC 21 00 3
N te_ 21 665 2
oR Lg te_2 H_8 12X 3 C te_ 22 000 8
R i_ 1 7 C te_ 21 24
Am 681 229 555 A
3 C te_ 29 914
Am phiC-2_C 249 C gi_2 59 916 4
Am ph dh 1 3 L gi_1 59 hr1
Cte phiCiCdh r16 L i_1 iCd 36 1.1
Lg _19 dh r7 Lg mph 302 72 3140
Hs o L i_
RR gi_ 229 171 9 r6 A i_2 349 027
a_d
ach 8 8 15 24 Lg i_2 P0 28
C 8_ 45 7 Lg h_X 718 1
V so u
s_1 C te_1 C12D44
_E te_ 395 3
s_ i_1 787 0
g 3
L i_2 78 9 7
Hs AW6 1396 22 Lg i_23 786
s_p
a 8
Am _PCD 683.19
0 Lg i_23 1826
cdh Dm phiC H-1 Lg i_17 7868
11_ e_d dh 6 Lg i_23 6784
Xlin Lg ach r18 Lg i_23 623
ked
_XP Cte i_143sous Lg i_166 827
_00 _13 094
2 5 Lg i_238 473
7
Am 413 38 6 Lg i_171 680
Am phiPC 90.1 Lg i_237 88
H phiP DH Lg 1678 76
Hsa sa_PC CDH 2 Lgi_ 2376 89p
1
Hsa _P_PC DH-12 Lgi_ 1643
Hsa _PCDHDH-17 Lgi_ 175623
_P -1 Lgi_ 173039
Hsa_P Hsa_P DH-1 9_aC
Hsa_P CDH_a CDH-10_ 9_c Lgi_ 173823 p
lp Lgi_ 237867
Hsa_PCDH_alp ha-C2_11 Lgi_ 165875
Hsa_ CDH_alpha-C1_1 Lgi_ 171822
PCDH ha-5 Lgi_ 171821
Hsa_
Hsa_PC PCDH _alpha-6 _1 Lgi_ 32792
_alph _1
Hsa_PC DH_alpha-a-8_1 Lgi_2 iCdhr20
Hsa_PC DH_alpha-111_1 Amph 1617
DH 0_ Cte_20 HR_3
Hsa_PC _alpha-12_ 1 Hsa_CD564
DH lpha-1_11
Hsa_PCD _a Lgi_156 560
Hsa_PC H_alpha-2_1 Lgi_156 C3
DH oRR180__C2DF
Hsa_PCDH_alpha-13_1
Hsa_PCDH__alpha-3_1 oRR277 T060_C2DX
Hsa_PCDH_ alpha-4_1 opRR526-8_K
alph oRR836_C5
Hsa_PCDH_al a-7_1 Lgi_232259
pha-9
Hsa_PCDH_alpha _2 Lgi_160343
-9_1
Hsa_PCDH_beta-13 Cte_199156
Hsa_PCDH_beta-8 oRR625_C15
Hsa_PCDH_beta-15 Lgi_161952
Hsa_PCDH_beta-6 Hsa_Ret_a
Hsa_PCDH_beta-5 Dme_CDH_96Ca
oE098_C4MR_IZV
IV Hsa_PCDH_beta-4
Hsa_PCDH_beta-3
oRR092_C2_IZ
oRR444_C2Y_IZ
Hsa_PCDH_beta-2 opRR671_C3A_IZ
Hsa_PCDH_beta-7 oC831_C2X_IZ
Hsa_PCDH_beta-10 oC826_C2Y_IZ
Hsa_PCDH_beta-9 oRR894_C2DY_
Hsa_PCDH_be -11
ta-14 oT039_C3X2_I IZ
Hsa_PCDH_ beta-12
beta oC832_C3_ Z
oE097_oT8 IZ
Hsa_PCDH_ a-C5_1 oE093_ 98_C3_IZ
_gamm
Hsa_PCDH _gamma-C4_1 opE100 C6TDS_IZUOJV
DH
Hsa_PC H_gamma-C3 _1
_1 opC829 _C3DR_UOWV
Hsa_PCDDH_gamma-B21_1 opC825 _C3DR_ZUOW
-B opRR57_C3DR_UO V
Hsa_PC DH_gamma -B7_1 opC8 8_C3DR WV
Hsa_PC DH_gamma -B6_1 oE09 28_C3DRS _UOQV
Hsa_PC DH_gammaa-B5_1 oRR56_C6A_ZU _UOWV
m
Hsa_PC DH_gam ma-B4_1 oT09 76_C2_Z OWV
6_
Hsa_
PC _gam a-A11_11 oRR C3 _Z
PCDH m oG80026_C3A
Hsa_ CDH_gam ma-A12__1
am 10 oG81 8_C3S _Z
Hsa_P CDH_g amma-A A9_1 oT18 0-813_ X_UOW
Hsa_P CDH_g amma- A8_2 oJ05 6_C3L C6TDN JQ
_g a-
Hsa_P_PCDH _gamm a-A8_1 oG 4_C6A _Z SF_
ZUO
H sa D H m 5_1 oJ0 812_C6A TNF_Z WJQ
_PC _gam a-A 1 oG 52_C TNS UOW V
Hsa CDH _gamm a-A2_ 1
_P
Hsa _PCDH _gamm a-A1 _1
_ oJ0 809_C 6TDNF F_ZU JQV
m oG 53_C 6TDN _ZUO OWJQ
Hsa _PCDH _gam ma-A76_1 opR815_C 3_Z F_ZU WJQV V
sa D H a m -A O
H _PC H_g ma 4 _ 1 oB R54 TD 6 WJQ
Hsa _PCD H_gammma-A5723 oJ0333_C4_C5 NF_Z V
Hsa _PCD H_ga te_22224179 oF 51_ 6TN TDN UOW
a
H s _PCD C _2 25 oB 973_ C3D F_IZ FX_U JV
Cte _1827926 oB 323 C3H A_Z UOW OWJQ
Hsa Cte _22 2588 oB 324 _C4S TSX JQV V
Cte _22 60757 oB 326 _C6 _ZV 2_UO
Cte _22 878 0 oB 322__C5_D2X_ WV
3
oT 27 C4 IZ IZQV
Cte _21 239 6 oB 840 _C4 SX_
Cte Cte87605 2 oR 31 _C S_Z ZV
7
_ 1 21 0 oB R077_C64X_IZ V
Cte Cte7188063 oB 31 5_ DS
_2 33 3 6 9 o 31 8_C C6 X_
Cte Cte 289 37 o B31 6_C 6D _IZU IZU
_2 196 34 o B30 0_C 6S 2_Z OW WJ
Ctete_2 560 665 o B31 9_ 4_ D_ UW J QV
C te_1 92 133 5 o B31 1_ C4_ Z IZUO JQV
C Cte 195 68 1 oRL48 2_CC3_ Z JV
_ 32 73 0 o R 2_ 4 Z
Cte Cte 62 86 7 o B3 8 8 C 4 _ Z
o K7 32 3_ _Z
Ctete96 454 08 o B3 49 _C C3
C 14 17 03 oK K75 29_ _C6 6TD _Z
_ i7 7 a o 7 3 C M S
Cte Lg gi71 ed_ _a o K7 47 _C 6S S_ F_
L nk e d 2 p o B3 46 _C 6T N ZW ZU
-li nk 09 65 o RR 28 _C 6 S _IZ J O
_X -li 3 19 27 o F9 8 _ 6 SF NF U V WJQ
11 _Y 12 i5 0 51 oR RR 85 26_ C6D TPS _ZU _ZUOW V
H -11 gi_ Lg gi82 94
- R 71 _C C6 TS FD OW O JQV
R 4 90 8 1 3 2 4 0 3 4

oF T6 82 _C C6T DS_ UO F_ OW

12 2_ 6T S F _ W
CD H L L 20
L 0_ C 7 1 5 7

6_ C M P_ _U IZ JQ QV
o F9 75 _ 6 IZ SN ZU
5 _ 90 2

97 1_ C S SE IZ J U JQ
oR RR gi_ Lgi6 39

_P PCD i_
o F9 19 _C 6_ 6M F_

Lg C 5_ F_ ZU O WJ V
7_ C 6T BN _Z UW Q OW V
1

Lg gi52C7X _U

a
7
o 8 6 C
i_

H s sa_ 6S U IZ O W Q
Lg Lg i71 06 2

C 6M SX X U JQ
Lg

oC E81 75_ 3_C 6M W

F_ OW UO W JQ V
L i_ i5 69 2

6M S _ _I O
o T6 95 _C _IZ IZ

H
_
Lg gi_1 104 168 2

IZ V W JV V
o RR 17 C2 X_ X

SF FX ZU ZUOWJ
i_ 0 63 3
o L

W
o C8 6_ C2 C2 _IZ

Q
L 1 0 48 2

X2 _Z OW W QV
6
o I31 5_ 86_ 3X

JQ V
oR 74 Lgi_ gi51 511 90

o I31 R8 4_C _IZ SF_

_U UO JQ JQ
0

V
o R 1 5D T
oR oRR24 _C7 051 526

O WJ V V
op RR3 8_C C4D IZ ZUO UOW WJQ QV
oA R85 oL4 R82 R85 _C7_ZU 12

W Q
o I31 7_ 3_ F_ _IZ UO WJ V
V

o N65 _C 6S SF _Z _ZO JQ JQV

JQ V
oA 30_ _C6 7_C _C6 _C6_ZV

o I321 8_C 6T DFX FX OW OW


85 C4 TX 6_ _Z _Z

V
o 81 4_C C6 HTS E_Z IZU
C6 BF IZU UW W
6 1

oC 85 63_ C6 TS X_ Q

S
TF X_ O V

W
oH R7 6_ 6H SF UW
4_ TP _ IZ U

o ZU OW Q

oR R2777_CC6HTB_IZ ZUOW
oA 6TSGSFX A85 OW JQ
W

oR 2 0_
1_ FX_ ZUO _ZU V
2
5 8 7 3

V
Z
oR R76 _C6 6KT UOW Q
C4 ZU W O
Q

oR 849 2_C X_IZ _ZO OWJ V


oA oA STF_ OW QV

oH R68 C6 SG _ZU WJQ

R
843 84 U Q V
R

oR 847_ 1_C6 GFX _IUO


o 5X_ 5_ Z V

oH R57 _C6S TSX


oR

o A 8 4 ZU U
_C 4_C OW

oR 852 7_C6 U

JQ
o A84 8_ OW
1

oH R44 C6_Z TSFX _IZO V


oA8 D411 6_C5C5_Z

oR 484_ _C6D SFX2 OWJQ


3 9 _ _C5 _ZU

V
oL 683 C6T F_Z JQV V
O

IZU
_

oRRR731__C6TS _ZUW OWJQ


oD4 A855_ A842_ 840_CUOW

oR 521 C6TS _ZU JQV


C6T 5TFX 4_ZU 4_V
C 5 _ _ZU

oRRR175_ C6TSF ZUOW V


SDF _ZU OW
_

oR R569_ C6TS_ OWJQ V


6KT oD438 ZOWQ V

OW
oR R519_ _IZU OWJQ

H
_C5 V

oR 3_C6T6T_ZU JQV

W QV V
_ZU
OW

oP09218_C 6T_IUW OWJQ

T
oRR 9_C4T_ZQV

oRR 727_C X_IZU WJQV


oRR 550_C3_ V
Z

oRR 799_C6B F_ZUO JQV

JV
oRR0 7_C6TSF837_C4_ Z

JQ
oRR 117_C6T _IZOW
oA82 27_C6S _ZOWQ Z

oRR 2_C6T IZUOW V


T

oRR1749_C5TS_ ZUOWJQ

III
OWV
oD43 _ZUOW

oRR5 69_C6TS_UOWJQV
oA82 C6S_ZU WJQV

G
2_ _ C 6

oRR1 5_C6T_IZ _IZUOWJQ

V
oR

oA833_ 6_C6T_ZU OWJQV

oM05 1_C6TKDF
8

oP088_ TX_IZUOW JQV

oM06 C6HTS_IZU
oN659_ C6T_IZUOW JQV

oN658_ C5_ZU
JQV

_
oT078_61_C5_IZUOWJ
oA832_C OWJQV

opRR1 C6T_ZUJQV
6_ZUOW

oRR783_ X_ZOWJQV
C

QV

oA836_C6T TS_ZOWV
opA824_C5761_C3_OWJ
pA
84

opB320_C6 _IZOWV
oA820_C6TS UOWJQV

oA838_C6TS ZUOWV
X_

oA819_C6SX_I OWJQV

oA837_C6TSX_ OWJQV
oA811_C6TSF_ZUWWJQ

opT761_C2DTS_
JQV

oRR730_C6TX_ZUO WJQ
Q
oA 443

oM065_C6TFX_IZUO
JQV
C

oRR261_C6TX_IZUOWJQ
opT235-12_C5_U0JQ

oA835_C6_ZUO
oA821_C6TD_ZUOWJQV
oA822_C6T_ZUOWJQV oRR804_C6_IZUOWV
F_ZU

OW

opT235-14_C5_U0J
o

F_ZUOWJ
oA827_C6AS_ZUO

V
oD

SFX
85

oA816_C6HTBFDX_ZUOW
ZUO
C

C6T_IZU

4_IZ WJQ
o

_ZU
TF_

V
40_

WQ V
3_C

oA810_C6T
o

opT
C6
oD43

8_

V
oA85

WJQV

JQV

V
WJQV

Extended Data Figure 11 | Phylogenetic tree of cadherin genes. This is a protocadherin expansion (168 genes); IV, human protocadherin expansion (58
larger image of Fig. 2a. Phylogenetic tree of cadherin genes in Hsa (red), Dme genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical
(orange), Nematostella vectensis (mustard yellow), Amphimedon queenslandica cadherins. Asterisk denotes a novel cadherin with over 80 extracellular
(yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus kowalevskii cadherin domains found in Obi and Cte.
(purple). I, Type I classical cadherins; II, calsyntenins; III, octopus

G2015 Macmillan Publishers Limited. All rights reserved

You might also like