Professional Documents
Culture Documents
doi:10.1038/nature14668
Coleoid cephalopods (octopus, squid and cuttlefish) are active, 97% of expressed protein-coding genes and 83% of the estimated
resourceful predators with a rich behavioural repertoire1. They 2.7 gigabase (Gb) genome size (Methods and Supplementary Notes
have the largest nervous systems among the invertebrates2 and 13). The unassembled fraction is dominated by high-copy repetitive
present other striking morphological innovations including cam- sequences (Supplementary Note 1). Nearly 45% of the assembled gen-
era-like eyes, prehensile arms, a highly derived early embryogenesis ome is composed of repetitive elements, with two bursts of transposon
and a remarkably sophisticated adaptive colouration system1,3. To activity occurring ,25-million and ,56-million years ago (Mya)
investigate the molecular bases of cephalopod brain and body (Supplementary Note 4).
innovations, we sequenced the genome and multiple transcrip- We predicted 33,638 protein-coding genes (Methods and Supple-
tomes of the California two-spot octopus, Octopus bimaculoides. mentary Note 4) and found alternate splicing at 2,819 loci, but no locus
We found no evidence for hypothesized whole-genome duplica- showed an unusually high number of splice variants (Supplementary
tions in the octopus lineage46. The core developmental and neur- Note 4). A-to-G discrepancies between the assembled genome and
onal gene repertoire of the octopus is broadly similar to that found transcriptome sequences provided evidence for extensive mRNA edit-
across invertebrate bilaterians, except for massive expansions in ing by adenosine deaminases acting on RNA (ADARs). Many candid-
two gene families previously thought to be uniquely enlarged in ate edits are enriched in neural tissues7 and are found in a range of gene
vertebrates: the protocadherins, which regulate neuronal develop- families, including housekeeping genes such as the tubulins, which
ment, and the C2H2 superfamily of zinc-finger transcription fac- suggests that RNA edits are more widespread than previously appre-
tors. Extensive messenger RNA editing generates transcript and ciated (Extended Data Fig. 1 and Supplementary Note 5).
protein diversity in genes involved in neural excitability, as prev- Based primarily on chromosome number, several researchers pro-
iously described7, as well as in genes participating in a broad range posed that whole-genome duplications were important in the evolu-
of other cellular functions. We identified hundreds of cephalopod- tion of the cephalopod body plan46, paralleling the role ascribed to the
specific genes, many of which showed elevated expression levels in independent whole-genome duplication events that occurred early in
such specialized structures as the skin, the suckers and the nervous vertebrate evolution11. Although this is an attractive framework for
system. Finally, we found evidence for large-scale genomic rear- both gene family expansion and increased regulatory complexity
rangements that are closely associated with transposable element across multiple genes, we found no evidence for it. The gene family
expansions. Our analysis suggests that substantial expansion of a expansions present in octopus are predominantly organized in
handful of gene families, along with extensive remodelling of gen- clusters along the genome, rather than distributed in doubly conserved
ome linkage and repetitive content, played a critical role in the synteny as expected for a paleopolyploid12,13 (Supplementary Note 6.2).
evolution of cephalopod morphological innovations, including Although genes that regulate development are often retained in multiple
their large and complex nervous systems. copies after paleopolyploidy in other lineages, they are not generally
Soft-bodied cephalopods such as the octopus (Fig. 1a) show remark- expanded in octopus relative to limpet, oyster and other invertebrate
able morphological departures from the basic molluscan body plan, bilaterians11,14 (Table 1 and Supplementary Notes 7.4 and 8).
including dexterous arms lined with hundreds of suckers that function Hox genes are commonly retained in multiple copies following
as specialized tactile and chemosensory organs, and an elaborate chro- whole-genome duplication15. In O. bimaculoides, however, we found
matophore system under direct neural control that enables rapid only a single Hox complement, consistent with the single set of Hox
changes in appearance1,8. The octopus nervous system is vastly modi- transcripts identified in the bobtail squid Euprymna scolopes with
fied in size and organization relative to other molluscs, comprising a PCR16. Remarkably, octopus Hox genes are not organized into clusters
circumesophageal brain, paired optic lobes and axial nerve cords in as in most other bilaterian genomes15, but are completely atomized
each arm2,3. Together these structures contain nearly half a billion (Extended Data Fig. 2 and Supplementary Note 9). Although we can-
neurons, more than six times the number in a mouse brain2,9. Extant not rule out whole-genome duplication followed by considerable gene
coleoid cephalopods show extraordinarily sophisticated behaviours loss, the extent of loss needed to support this claim would far exceed
including complex problem solving, task-dependent conditional dis- that which has been observed in other paleopolyploid lineages, and it is
crimination, observational learning and spectacular displays of cam- more plausible that chromosome number in coleoids increased by
ouflage1,10 (Supplementary Videos 1 and 2). chromosome fragmentation.
To explore the genetic features of these highly specialized animals, Mechanisms other than whole-genome duplications can drive
we sequenced the Octopus bimaculoides genome by a whole-genome genomic novelty, including expansion of existing gene families, evolu-
shotgun approach (Supplementary Note 1) and annotated it using tion of novel genes, modification of gene regulatory networks, and
extensive transcriptome sequence from 12 tissues (Methods and reorganization of the genome through transposon activity. Within
Supplementary Note 2). The genome assembly captures more than the O. bimaculoides genome, we found evidence for all of these
1
Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois 60637, USA. 2Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 9040495, Japan.
3
Centre for Organismal Studies, University of Heidelberg, 69117 Heidelberg, Germany. 4Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA. 5Department of
Neurobiology, University of Chicago, Chicago, Illinois 60637, USA. 6Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA.
*These authors contributed equally to this work.
2 2 0 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5
G2015 Macmillan Publishers Limited. All rights reserved
LETTER RESEARCH
a b PF08266/00028 Cadherin
PF05375 Pacifastin_I
PF02868/01447 Peptidase_M4
PF02037 SAP
PF06083 IL17
PF00002 7tm_2
PF07690 MFS_1
PF14830 Haemocyan_bet_s
PF13465/00096 zf-C2H2
PF05970 PIF1
PF00264 Tyrosinase
PF00582 Usp
PF00147 Fibrinogen_C
PF00024 PAN_1
PF01582 TIR
PF00092 VWA
PF01531 Glyco_transf_11
PF02931 Neur_chan_LBD
PF01607 CBM_14
PF05485 THAP
Mmu
Lch
Bfl
Obi
Cgi
Hro
Lgi
Pfu
Dme
Cte
Gga
Hsa
Cel
Xtr
Dre
2 0 2
Row Z-score
Figure 1 | Octopus anatomy and gene family representation analysis. lophotrochozoans (green) and molluscs (yellow), including O. bimaculoides
a, Schematic of Octopus bimaculoides anatomy, highlighting the tissues (light blue). For a domain to be labelled as expanded in a group, at least 50% of
sampled for transcriptome analysis: viscera (heart, kidney and its associated gene families need a corrected P value of 0.01 against the outgroup
hepatopancreas), yellow; gonads (ova or testes), peach; retina, orange; optic average. Some Pfams (for example, Cadherin and Cadherin_2) may occur
lobe (OL), maroon; supraesophageal brain (Supra), bright pink; subesophageal in the same gene, however multiple domains in a given gene were counted
brain (Sub), light pink; posterior salivary gland (PSG), purple; axial nerve cord only once. Bfl, Branchiostoma floridae; Cel, Caenorhabditis elegans; Cgi,
(ANC), red; suckers, grey; skin, mottled brown; stage 15 (St15) embryo, Crassostrea gigas; Cte, Capitella teleta; Dme, Drosophila melanogaster; Dre,
aquamarine. Skin sampled for transcriptome analysis included the eyespot, Danio rerio; Gga, Gallus gallus; Hsa, Homo sapiens; Hro, Helobdella robusta;
shown in light blue. b, C2H2 and protocadherin domain-containing gene Lch, Latimeria chalumnae; Lgi, Lottia gigantea; Mmu, Mus musculus; Obi, O.
families are expanded in octopus. Enriched Pfam domains were identified in bimaculoides; Pfu, Pinctada fucata; Xtr, Xenopus tropicalis.
mechanisms, including expansions in several gene families, a suite and octopus protocadherin arrays arose independently. Unlinked
of octopus- and cephalopod-specific genes, and extensive genome octopus protocadherins appear to have expanded ,135 Mya, after
shuffling. octopuses diverged from squid. In contrast, clustered octopus proto-
In gene family content, domain architecture and exonintron cadherins are much more similar in sequence, either due to more
structure, the octopus genome broadly resembles that of the limpet recent duplications or gene conversion as found in clustered proto-
Lottia gigantea17, the polychaete annelid Capitella teleta17 and the cadherins in zebrafish and mammals21.
cephalochordate Branchiostoma floridae14 (Supplementary Note 7 The expression of protocadherins in octopus neural tissues (Fig. 2) is
and Extended Data Fig. 3). Relative to these invertebrate bilaterians, consistent with a central role for these genes in establishing and main-
we found a fairly standard set of developmentally important trans- taining cephalopod nervous system organization as they do in verte-
cription factors and signalling pathway genes, suggesting that the brates. Protocadherin diversity provides a mechanism for regulating
evolution of the cephalopod body plan did not require extreme expan- the short-range interactions needed for the assembly of local neural
sions of these toolkit genes (Table 1 and Supplementary Note 8.2). circuits18, which is where the greatest complexity in the cephalopod
However, statistical analysis of protein domain distributions across nervous system appears2. The importance of local neuropil interac-
animal genomes did identify several notable gene family expansions tions, rather than long-range connections, is probably due to the limits
in octopus, including protocadherins, C2H2 zinc-finger proteins placed on axon density and connectivity by the absence of myelin, as
(C2H2 ZNFs), interleukin-17-like genes (IL17-like), G-protein- thick axons are then required for rapid high-fidelity signal conduction
coupled receptors (GPCRs), chitinases and sialins (Figs 1b, 2 and 3; over long distances. The sequence divergence between octopus and
Extended Data Figs 46 and Supplementary Notes 8 and 10).
The octopus genome encodes 168 multi-exonic protocadherin
Table 1 | Metazoan developmental control genes
genes, nearly three-quarters of which are found in tandem clusters
on the genome (Fig. 2b), a striking expansion relative to the 1725
genes found in Lottia, Crassostrea gigas (oyster) and Capitella gen- Obi Lgi Cte Dme Cel Bfl Hsa
omes. Protocadherins are homophilic cell adhesion molecules whose
Ligands
function has been primarily studied in mammals, where they are Fibroblast growth factor 3 2 1 3 3 8 22
required for neuronal development and survival, as well as synaptic Wnt 12 10 12 7 5 17 19
specificity18. Single protocadherin genes are found in the invertebrate TGFb/BMP 12 9 14 6 5 22 33
deuterostomes Saccoglossus kowalevskii (acorn worm) and Strongylo- Delta/Jagged 4 1 1 2 4 2 7
Hedgehog 1 1 1 1 0 1 3
centrotus purpuratus (sea urchin), indicating that their absence in Axon guidance 10 9 9 6 8 23 33
Drosophila melanogaster and Caenorhabditis elegans is due to gene Transcription factors
loss. Vertebrates also show a remarkable expansion of the protocad- C2H2 zinc-finger 1,790 413 222 326 211 1,338 764
herin repertoire, which is generated by complex splicing from a clus- Homeodomain 114 121 111 104 99 133 333
High mobility group 23 15 14 13 16 51 125
tered locus rather than tandem gene duplication (reviewed in ref. 19). Helix loop helix 50 63 64 59 42 78 118
Thus both octopuses and vertebrates have independently evolved a Nuclear hormone receptor 40 44 45 16 274 33 48
diverse array of protocadherin genes. Fox 16 28 26 17 18 42 43
A search of available transcriptome data from the longfin inshore Tbox 9 9 7 8 21 9 18
squid Doryteuthis (formerly, Loligo) pealeii20 also demonstrated an Number of members of developmental ligand and transcription factor families from O. bimaculoides
and selected other taxa. Dendrogram above species names reflects their evolutionary relationships. Bfl,
expanded number of protocadherin genes (Supplementary Note Branchiostoma floridae; Cel, Caenorhabditis elegans; Cte, Capitella teleta; Dme, Drosophila melanogaster;
8.3). Surprisingly, our phylogenetic analyses suggest that the squid Hsa, Homo sapiens; Lgi, Lottia gigantea; Obi, O. bimaculoides.
1 3 AU G U S T 2 0 1 5 | VO L 5 2 4 | N AT U R E | 2 2 1
G2015 Macmillan Publishers Limited. All rights reserved
RESEARCH LETTER
a c
*
Hsa_dac
IX
U
80_
VIII
oR R 7 3 8 -7 3
9_C
Hsa_ P_ PCDH-15
oRR023_ C14_ Z
T70
A mphiC dhr1 0
A mphiC dhr1 1
I
oR
A mphiC dhr9
Hsa_C
C te _ 1 8 1 0 6 0
AmphiDCHS
Hsa_P_
Amphi Cdhr2
Lgi_174335
NvHedgling
Hsa_P
dh r1
Amph iCdhr3
9_o
R1
Hsa_
4.2
EAX0494
H sa _ C
AmphiCD
Hsa_
Cte_ 1927 r5
N v C dh r 5
Hsa_
Hsa_
64541
NvPCDH
NvDCHS
NvCdhr4
12_I
54-
NvCdhr2
Lgi_ 2340 18
NvCdhr1
G2703
8 _ C2 8 X
38-
iCdh
Hsa
A mphiC
28
Hs
NvFAT
93
1
DH-24
r3
883 -4_C p
CDH -19_12
165
92
_CDH
54.
164 590
CDH
Hs
CDH-8
Hsa
CDH
9-9
a_fla mi ng
NvCdh
CDH -22
1645
Hsa
DH-1 1
_P_C H-1 2
Dm Cte _18
1645
ph iCd _23_AA
2.1
Hsa
Hs
955
-92
.1
Cd hr4
_CD
H-18
3
Hsa
H
-19_
fla
a_
_CD
e_s
Lgi
mi
-18_
_1
-11
Hsa_C Cd hr1
229
075
Lgi_
86
a_
Hs
3
Lgi_
Fla H_EG
_CD
696
ng
Lgi_
hr2
45
_16
a_CD
_CD 3
oRR
tar
_A
Lgi_
_22
_1
NvC
H-1
_C
2
Lg i_2 3DKX
mi
o_
2b
oR phiCE .1
DH
-2
A8
2
VII
phi
oRR
a_C -CDH
ry_
H-6
a_ CD H-4_1
phi
195
de sm og le in -3 _D sc 2a
o_
DH
8E5L C10X
3_EA F6 19 gr_1
ng F_LA
sc
H-9
Am
Cte
CA
ELS
EG AAG0 GR
3a
Am
2_ AA G_7p 80.1
_D
nig
o_
DH
Am
259
a_C
E-CD 1
319
H_
F_LA
sc
R
H-
_D
a_N
1_ G_7p -8_1
-2
W6
Am
ht
_5
3
Hs a_CD H- 15
4
-2
de sm oc ol lin
Hs
95
Hs Ct e2 87
H-
_1
49
_a
CD H-13
Hs
_1
oc ol lin
Hs
a_
13
CD
26
lin
_2
14
Hs
-4
oc
CD
a_
PC 28 85
H-
30
Hs
-4
n_
00
ol
a_
Ct
sm
Hs
Hs
H- like_ in -2
a_
.1
og le in
DH
ei
Ct
Hs
e_
a_
de
Hs
sm
e_
a_
_2
le
21
Fa
Hs
Hs a_ de
e_ ph iF 2
Hs a_ de sm
a_
17
Am 6V0I fa t
t4 Dm
22
Hs
Hs a_ de
oR
83
Hs a_CD sm
2
a_
_Q
58
Hs
57 AT
Ct i_ 10 32_W
54
R5
Am a_CD H-
te e_ sh _CX3 r1 3
oR 13 N vC H2 3
21
e_
Lg _C T_
Hs
36
R2 E6
a_
22 51 63
_C
7.
e_
ph 24 04 2 n
28 L2
A m _2 17 49 gu
20
Hs
oR ph
ot
08
dh 4
Lg pp iC 36 2 9
.1
r1
77
Nv
p hi 14 60 73 e
_2
dh
02 lik
W
72
CD
NvC D -N
D te _2 15E4 53 e
Am _1 iC
oR
Lg i_ _X ELSR
r6
m
NvC D H 4
C
te
s_ ph 88
C
R Lg vF A 22
DH 1
e_ 26 L2
n
14
C
ni
A i_ 23 24 _0
C
0_ 16 T- lik 5
te
H
te
P
VI II
C
yn
C 17
15
ls
i_
65
N _2
d hr
oR ca
C
m 16
Lg 2_
70
te
R10 i_ 18 2
A 86 _C
Hs m 10
ph 3_ 23 20 20 T095
Hs a_PC iF C33 44 87 _1
74
i_
Lg te _2 058_ 21 1 -1
Hs a_PC DH A T- LE 66 56 in
a_ 4 C R en in -3
DH _F lik oR i_ 11 81 nt
Am PCDH _F at sy nt en
_2 e Lg te _1 al -1
ph _F at _c sy TN in
Lg iC at _3 C
sa _c al LS nt en
Am i_ 15 dh _1 H iC sy
Ct ph 42
r1 H sa ph C al
oR e_ iC 08 2 Am e_ N 51
Dm R8 22 dh D m CLST 04
e_ 99 84 r8 00
Ct CDH_ 12
_C 43 Nv e_ 21 00
e_ X Ct 21 66 53
e_ 02
oR Lg i_2 21 55 87A Ct e_ 22 00 8
R6 53 Ct e_ 21 24
Am 81 29 24
ph -2 9 Ct i_2 29 91 4
Am iC _C 13 Lg 59
91
6
4
dh i_1
Am ph iC r1 Lg i_1 59 dh r1
Ct
ph dh 6 Lg ph iC 23 6
e_ iC dh r7 1.1
Lg 19 r6 Am i_2 30 97 2 40
i_2 91 71 Lg i_2 34 27
31
oR Lg i_1 29 24 Lg
00
Hs R8 54 7 XP 8
a_d 88 54 h_ 82
Ct _C s_ 71 1
ac e_ 12 4 i_1 87
hs 13 D3 Lg i_2 37 0
ou
s_1 Ct e_
V
95 Lg 87
_E 13 22 37 9
AW 96 i_2 86
Hs 09 Lg 37 6
a_P 6868 i_2 82
Am CD 3.1 Lg 71 8
ph H- i_1 86
Dm iCd 16 Lg 37 4
s_p e_d i_2
cdh ach
hr1
8
Lg 36 7823
11_ Lg sou i_2
Xlin i_1 Lg 666
ked Cte 430 s i_1 27
_XP 94 Lg 388
_00 _13 563 Lg
i_2 147 3
274
Am 139 8 Lgi
_17
768
0
phi
Am PC 0.1 _23 8
phi DH Lgi 788
PC 2 _16 6
Hsa DH Lgi 767
_PC 1 _23
Hsa DH Lgi 9p
_P_ -12 438
Hsa PCD _16 3
_PC H-1 Lgi 562
Hsa DH- 7 _17
_PC 19_ Lgi 039
DH- a 173
Hsa 19_c Lgi_ 823
Hsa _PC 173
_PC DH- Lgi_ p
Hsa_ DH_ 10_1 867
alph 237
PCD a-C2 Lgi_ 75
Hsa_ H_al _1 1658
PCD pha- Lgi_
H_al C1_1 22
Hsa_ pha- 1718
PCD 5_1 Lgi_ 21
Hsa_ H_alp 1718
PCDH ha-6 Lgi_ 2
_1 3279
Hsa_PCDH _alpha-8_1 Lgi_2 hr20
Hsa_P _alph hiCd
a-11_ Amp
CDH_ 1 01617
Hsa_P alpha Cte_2
CDH_alpha--10_1 DHR_3
Hsa_P 12_1 Hsa_C
CDH_a 6564
Hsa_PC lpha-1 Lgi_15
DH_alp _1 6560
Hsa_PC ha-2_1 Lgi_15
DH_alpha-13_1 _C3
oRR180
Protocadherins
Hsa_PCD
H_alpha-3_1 oRR277_C2DF C2DX
Hsa_PCD -8_T060_
H_alpha-4_1 opRR526
Hsa_PCDH 5K
_alpha-7_1 oRR836_C
Hsa_PCDH_ Lgi_232259
alpha-9_2
Hsa_PCDH_alp Lgi_160343
ha-9_1
Hsa_PCDH_beta- Cte_199156
13
Hsa_PCDH_beta-8 oRR625_C15
Hsa_PCDH_beta-15 Lgi_161952
Hsa_Ret_a
Hsa_PCDH_beta-6
Dme_CDH_96Ca
Hsa_PCDH_beta-5
oE098_C4MR_IZV
Hsa_PCDH_beta-4
oRR092_C2_IZ
IV
Hsa_PCDH_beta-3 oRR444_C2Y_IZ
Hsa_PCDH_beta-2 opRR671_C3A_IZ
Hsa_PCDH_beta-7 oC831_C2X_IZ
10
Hsa_PCDH_beta- oC826_C2Y_IZ
ta-9
Hsa_PCDH_be oRR894_C2DY_IZ
beta-14 oT039_C3X
Hsa_PCDH_ 2_IZ
_beta-11 oC832_C3
Hsa_PCDH _IZ
2
H_beta-1 oE097_oT898_C3
Hsa_PCD
a-C5_1 _IZ
H_gamm oE093_C
Hsa_PCD 1 6TDS_IZ
ma-C4_ opE100 UOJV
DH_gam 3_1 _C3DR_
Hsa_PC UOWV
DH_gamma-C opC829
_C3DR
Hsa_PC amma-B2_1 opC82
5_C3D
_ZUOW
V
CDH_g 1 R_UOW
Hsa_P a-B1_ opRR5 V
CDH_gamm a-B7_1 78_C3
Hsa_P
gamm opC828_C3 DR_UOQV
CDH_ 1
Hsa_P a-B6_ oE096_C6A DRS_UOWV
gamm 5_1
CDH_ ma-B oRR5 _ZUO
Hsa_P _gam 1 76_C WV
PCDH -B4_ oT09 2_Z
Hsa_ mma _1 6_C3
H_ga -A11 oRR _Z
PCD mma 026_
Hsa_ H_ga _1 C3A_Z
PCD -A12 oG80
Hsa_ mma _1 8_C3
H_ga A10 oG8 SX_U
10-8
Hsa_
PCD
gam
ma-
A9_
1
oT1 13_C OWJQ
DH_ ma- 2 86_C 6TD
_PC gam A8_ oJ054_C 3L_Z NSF
Hsa DH_ ma- _ZU
_PC gam 1 OWJ
Hsa A8_ oG8 6AT QV
DH_ ma- 1 12_C NF_
_PC gam A5_ oJ0 6AT ZUO
Hsa DH_ 52_ WJQ
_PC ma-
gam a-A2_1 oG8 C6T NSF_ZU V
Hsa DH_ 09_ DNF OW
_PC mm 1_1 oJ0 C6T _ZU JQV
Hsa _ga a-A 53_ DNF OW
DH mm 7_1 C3_ _ZU JQV
_PC _ga a-A oG8 Z
Hsa DH 15_ OW
_PC mm 6_1 opR C6T JQV
Hsa _ga a-A R54 DN
DH mm _1 oB333 4_C F_Z
_PC _ga ma-A4 3 UO
Hsa DH oJ0 _C6 5TDNFX WJV
_PC H_gam _22 572 7 51_ TN _UO
Hsa CD 241 oF9 C3 F_I
ZU
Cte
_22 25 9 oB 73_C3
DA OW WJQV
Hsa_P Cte 82 323
_Z JQ
26 _C HTSX V
Ct e_1 22 79 oB 4S
e_ 88 oB
324 _ZV 2_UOW
Ct 22
25
75 32 _C6D V
e_ 60 oB 6_C 2X
Ct 22 322_C 5_I _IZ
e_ 87 oB Z QV
Ct 87 32 4S
21 90 oT 7_C4 X_
e_ 23 84 ZV
Ct e8 56 oB 0_ S_
Ct 60 31 C4 ZV
17 72 oR 7_C6 X_
e_ 21 IZ
Ct e7 00 oB R075 DS
Ct 88 31 _C X_
21 63 8_ IZU
e_ 33
oB C6 6_IZU WJ
Ct e3 69 oB 316_ D2
89 _Z OWJ QV
Ct
22 37 oB 310_ C6SD UW
96
Ct
e_
21 34 oB 309_ C4_Z _IZUO JQV
e_ 60 oB 311_ C4_Z
Ct 15 26 65 JV
e_
Ct Ct e9 51 33 oL 312_ C3_Z
19 26 85 oR 482_ C4_Z
e_ oB R8 C4
Ct Ct e3 27 31 oK 332_ _C
83 _Z
e6 60
Ct e9 68 47 oB 749_ C6 3_Z
Ct 14 45 08 oK 329_ C6 TDSF
M
e_ i7 17 03 a oK 75
3_ C6 S_ _Z
Ct Lg 17 d_ oK 747_ C6 SN ZW UO
i7 W
Lg ke a oB 746_ C6 TS _IZUO JV JQ
lin ked_ p oR 32 C6 SF NF_Z W V
X- 92
1_ Y-lin 30 65 oF R8 8_C6 TP _ZUO UO JQ
19 27 oR 98 26_C DT SF W V
H-1 1_ i_ 12 i5 oR R71 5_C 6S SF D_IZ JQ QV
W
CD H-1 Lg Lg i8 20 51 R1 2_ 6T P_ _U W V
94
oR R90 13 24 34
_P CD Lg 20 26 C MF_ ZU OW JQ
oF 67
90
Hsa _P
90 27
_C 5_ JQ V
97 1_C 6T BN
i_ O
oT 982_ C 6TSE IZU WJQ OW V
Lg 13
6S UO IZU WJV V
0_ 7_ 15
sa
7_
Lg
oF 97 9_C
i_
H
i_ i6
F_ W O
U
Lg
6M SF
oF 81
IZ V W
Lg i5 20 7X2
Q
45 8_C
W
oC
SF X_ OW WJQ V
6M SX_Z X_IZ OW V
V
Lg Lg i5 16 62
JQ
5_
oR Lg
oE 67 53_C MSF
C
X2 ZU JQ
Lg i_ 10 16 92
V
_U OW V V
81
oT R9 C6 ZW
6S
83
6_ C6_
O
oR 817_ C2
Lg i_ 10 46 32
R
Lg
W JQ
i7
C6D IZ SN UO
oC 6_ C2X_ C2X
U UO
5_
JQ V
Lg 10 51 90
oI
V
S_ UO
_Z WJQ
48
15 10
31 5_
oI RR88 _C
oR 46_C 10 51 26
U
31
op R314
6M _Z
12
oR 8_ C4DT
V
JQ
i5
oI 657_
i_
oR oRR8 _C ZU
_I IZ
oL R827 53 7_ZV
oN 1_
31
F_ W
Z
42 7_
6_
oI 818_
ZU JQ
7_ _C6_ 6_
i_
85 C4TP X_IZ UW W
oC
32
Lg
_C
C5
_IZ ZU
oH
3X
OW Q
oR
D_ SF_IZ
JQ
W
R7
oA UOW JQ
_IZ
C3 SF_Z IZUO WJQ V
85
R2
oR
BF UO
IZ
R7
1_ QV
V
oR
C6
oR
4_
_IZ
oA TSGF _ZUO ZUO
R2
C6
oR
63
C6
OW V
R2
oH
X_
76
1_C4 X_ZU WQ
_C
F_UO QV
R7
oR
TS
77
_C
843_C 4_C5 WV
84
oH
48
6T
6D
UO
U
60
UO
_Z
F_
R6 C6X_
_C
85
oR
6H
848_CUOW
9_
_Z
_C
oH
84
FX
_C
W
TF
5_Z
6H
W
82
oR
R5
C6
TS
oA839 _C 5_ZU
55
JQ
7_
_Z
oL484 83_C6 FX2
852
6H
JV
C6
_C5_Z5_ZUO
_C
TS
FX
R4
FX
HT
0_
R8
_Z
UO
V
oRR73 _C6TSF ZUWJQV
TS
WQV V
ST
4_
_C4_V
E_
6K
_C
2_C6 TS
83
oRR
_Z
47_
UO
oR
FX_ ZUOW
R6
5X
B_
84
411 _C
oRR
FX
ZO
V
oA
6S
TG
oRR
3_C6
OW
_C
6S
III
53_C6K oD438_ OWQV
IZU
oA
SDF ZUOW
IZU
oRR
C6
oA
846
_IZ
ZU
oP0
521
oA
WJ
39_C4T QV
G_
_Z
6_Z SFX4_
oRR
175
1_C
_ZV
JQ
oRR
840
TSX_I
84
WQ
569
UO
Z
OW
C5_
44
oRR
UO
55_ 42_C4_
oA
C4_Z
QV
ZO
UOW
519
oRR
93_C6T T_ZUOW
C3_
WQV
oRR172_ 5TS_IZUO
_C6
U
WV
6TS
oA
DT
oD
oRR5
WJ
X_Z
ZU
opA
JQV
_C6
oRR1
oD
W
V
727_
Q
85
QV
oM05
WJQV
_C6
oM06
799_
WJQV
oN658
oN659_C6T_I IZUOWJQV
550_
117_
_ZUO
QV
OW
TS_
oA
oT078
ZUOWJQV
oD437_C oRR837_
X_Z
C6_ZUOW
oRR027_ 6TSF_ZO
opRR161_C5_ UJQV
oRR783
oA826_C6TS_ZUOWJ
C5T
6TF_ZUOWJQV
TSF
opT761_C3_OWJ
oA836_C
UOW
opB320_C6TS_ZO
JQV
oA8
V
oA819_C6SX_IZUOWJQ
oA837_C6TSX_ZUOWV
oA838_C6TS_IZOWV
TS_
49_C
C6T
_ZO
opT761_C2DT
oA811_C6TSF_ZUWJQV
oRR730_C6TX_ZUOWJQV
oA816_C6HTBFDX_ZUOWJQV
oM065_C6TFX_IZUOWJQ
_IZU
JQ
opT 2 3 5 -1 2 _ C5 _ U0 JQ
69_C6TS_ OWJQV
oA822_ C6T_ ZUOWJQV
JV
oA821_ C6TD_ ZUOWJQV
C6B
S_ZUOWJQ
5_C6T_IZU IZUOWJQV
_IZ
TSF
oRR
C6TF_ZU JQV
1_C6T
_ZUO
_ZU
QV
C6T
oD4
C6T_
C6SF
_IZUO
IZW
ZUO
_C6HT
opT 2 3 5 -1 4 _ C5 _ U0
5TF_UOW
WJ
_C5_Z
OW
AS_Z
OW
X_IZ
oA8
OW
_C6T_Z
40_
QV
QV
IZOW
JQV
WJQ
6TX_ZOWJQV
KDF_
JQV
_C6T_
7_C6
oA832_
_C6TX
UOW
8_C6
JQV
oD4
S_IZU
ZUOW
JQV
U
OWJ
V
IZUOW
oA820_C6T
oA8
oA82
opA824_C
oA82
oA810_C
WJQ
S_OWJQV
oP088
QV
oA833
WJQV
JQV
WV
V
JQ
100 kb
20 kb
Ova
Testes
Viscera
PSG
Suckers
Skin
St15
Retina
OL
Supra
Sub
ANC
Scaffold 9600
20 kb
3 2 1 0 1 2 3
Row Z-score
Figure 2 | Protocadherin expansion in octopus. a, For a larger version of contain the two largest clusters of protocadherins, with 31 and 17, respectively.
panel a, see Extended Data Fig. 11. Phylogenetic tree of cadherin genes in Hsa Clustered protocadherins vary greatly in genomic span and are oriented in a
(red), Dme (orange), Nematostella vectensis (mustard yellow), Amphimedon head-to-tail manner along each scaffold. c, Expression profiles of 161
queenslandica (yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus protocadherins and 19 cadherins in 12 octopus tissues; 7 protocadherins were
kowalevskii (purple). I, Type I classical cadherins; II, calsyntenins; III, octopus not detected in the tissues sampled. Cells are coloured according to number of
protocadherin expansion (168 genes); IV, human protocadherin expansion (58 standard deviations from the mean expression level. Protocadherins have high
genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical expression in neural tissues. Cadherins generally show a similar expression
cadherins. Asterisk denotes a novel cadherin with over 80 extracellular pattern, with the exception of a group of sucker-specific cadherins.
cadherin domains found in Obi and Cte. b, Scaffold 30672 and Scaffold 9600
0.20
0.03 c, Distribution of fourfold synonymous site
0.15
transversion distances (4DTv) between C2H2-
Fraction
frequency
0.025
domain-containing genes.
0.02
0.10
0.015
0.05
0.01
0.005
0.00
5
25
5
02
17
32
47
77
92
6
0
0.
0.
0.
0.
0.
0.
0.
ANC
Ova
Testes
Viscera
PSG
Suckers
Skin
St15
Retina
OL
Supra
Sub
3 2 1 0 1 2 3
Row Z-score
2 2 2 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5
G2015 Macmillan Publishers Limited. All rights reserved
LETTER RESEARCH
squid protocadherin expansions may reflect the notable differences most of which are tandemly arrayed in clusters (Extended Data Fig. 7).
between octopuses and decapodiforms in brain organization, which These subunits lack several residues identified as necessary for the
have been most clearly demonstrated for the vertical lobe, a key struc- binding of acetylcholine26, so it is unlikely that they function as acetyl-
ture in cephalopod learning and memory circuits2,22. Finally, the inde- choline receptors. The high level of expression of these divergent sub-
pendent expansions and nervous system enrichment of protocadherins units within the suckers raises the interesting possibility that they act as
in coleoid cephalopods and vertebrates offers a striking example of sensory receptors, as do some divergent glutamate receptors in other
convergent evolution between these clades at the molecular level. protostomes27. In addition, we identified 74 Aplysia-like and 11 verte-
As with the protocadherins, we found multiple clusters of C2H2 brate-like candidate chemoreceptors among the octopus GPCR super-
ZNF transcription factor genes (Fig. 3a and Supplementary Note 8.4). family of ,330 genes (Extended Data Fig. 6).
The octopus genome contains nearly 1,800 multi-exonic C2H2- We found, amid extensive transcription of octopus transposons,
containing genes (Table 1), more than the 200400 C2H2 ZNFs found that a class of octopus-specific short interspersed nuclear element
in other lophotrochozoans and the 500700 found in eutherian sequences (SINEs) is highly expressed in neural tissues (Supplemen-
mammals, in which they form the second-largest gene family23. tary Note 4 and Extended Data Fig. 8). Although the role of active
C2H2 ZNF transcription factors contain multiple C2H2 domains that, transposons is unclear, elevated transposon expression in neural
in combination, result in highly specific nucleic acid binding. The tissues has been suggested to serve an important function in learning
octopus C2H2 ZNFs typically contain 1020 C2H2 domains but some and memory in mammals and flies28.
have as many as 60 (Supplementary Note 8.4). The majority of the Transposable element insertions are often associated with genomic
transcripts are expressed in embryonic and nervous tissues (Fig. 3b). rearrangements29 and we found that the transposon-rich octopus gen-
This pattern of expression is consistent with roles for C2H2 ZNFs ome displays substantial loss of ancestral bilaterian linkages that are
in cell fate determination, early development and transposon silencing, conserved in other species (Supplementary Note 6 and Extended Data
as demonstrated in genetic model systems23. Fig. 9). Interestingly, genes that are linked in other bilaterians but not
The expansion of the O. bimaculoides C2H2 ZNFs coincides with a in octopus are enriched in neighbouring SINE content. SINE inser-
burst of transposable element activity at ,25 Mya (Fig. 3c). The flank- tions around these genes date to the time of tandem C2H2 expansion
ing regions of these genes show a significant enrichment in a 7090 base (Extended Data Fig. 9d), pointing to a crucial period of genome evolu-
pair (bp) tandem repeat (31% for C2H2 genes versus 4% for all genes; tion in octopus. Other transposons such as Mariner show no such
Fishers exact test P value ,1 3 10216), which parallels the linkage of enrichment, suggesting distinct roles for different classes of transpo-
C2H2 gene expansions to b-satellite repeats in humans24. We also sons in shaping genome structure (Extended Data Fig. 9c).
found an expanded C2H2 ZNF repertoire in amphioxus (Table 1), Transposable element activity has been implicated in the modifica-
showing a similar enrichment in satellite-like repeats. These parallels tion of gene regulation across several eukaryotic lineages29. We found
suggest a common mode of expansion of a highly dynamic transcrip- that in the nervous system, the degree to which a genes expression is
tion factor family implicated in lineage-specific innovations. tissue-specific is positively correlated with the transposon load around
To investigate further the evolution of gene families implicated in that gene (r2 values ranging from 0.49 in the optic lobe to 0.81 in
nervous system development and function, we surveyed genes assoc- the subesophageal brain; Extended Data Fig. 8 and Supplementary
iated with axon guidance (Table 1) and neurotransmission (Table 2), Note 4). This correlation may reflect modulation of gene expression
identifying their homologues in octopus and comparing numbers by transposon-derived enhancers or a greater tolerance for transposon
across a diverse set of animal genomes (Supplementary Notes 810). insertion near genes with less complex patterns of tissue-specific gene
Several patterns emerged from this survey. The gene complements regulation.
present in the model organisms D. melanogaster and C. elegans often Using a relaxed molecular clock, we estimate that the octopus and
showed striking departures from those seen in lophotrochozoans squid lineages diverged ,270 Mya, emphasizing the deep evolutionary
and vertebrates (Table 2 and Supplementary Note 10). For example, history of coleoid cephalopods8,30 (Supplementary Note 7.1 and
D. melanogaster encodes one member of the discs large (DLG) family, Extended Data Fig. 10a). Our analyses found hundreds of coleoid-
a key component of the postsynaptic scaffold. In contrast, mammals and octopus-specific genes, many of which were expressed in tissues
have four DLGs, which (along with other observations) led to sugges- containing novel structures, including the chromatophore-laden skin,
tions that vertebrates possess uniquely complex synaptic machinery25. the suckers and the nervous system (Extended Data Fig. 10 and
However, we found three DLGs in both octopus and limpet, suggesting Supplementary Note 11). Taken together, these novel genes, the
that vertebrate and fly gene number differences are not necessarily
diagnostic of exceptional vertebrate synaptic complexity (Supplemen-
tary Note 10.6). Table 2 | Ion channel subunits
Overall, neurotransmission gene family sizes in the octopus were
very similar to those seen in other lophotrochozoans (Table 2 and
Supplementary Note 10), except for a few strikingly expanded gene Obi Aca Lgi Cte Dme Cel Hsa
families such as the sialic acid vesicular transporters (sialins) Voltage-gated calcium channels 8 8 6 10 9 10 10
(Supplementary Note 10.2). We did find variations in the sizes of Voltage-gated sodium channels 3 2 3 2 4 0 13
neurotransmission gene families between human and lophotrochozo- Transient receptor potential channels 36 45 40 43 13 23 29
K1 channels
ans (Table 2 and Supplementary Note 10), but no evidence for sys- Voltage-gated 30 23 29 20 10 51 40
tematic expansion of these gene families in vertebrates relative to Calcium-activated, small/large conductance 12 8 9 6 3 6 8
octopus or other lophotrochozoans. Although some gene families were Inward rectifying 3 4 5 6 4 3 16
larger in mammals or absent in lophotrochozoans (for example, Two pore 12 9 12 14 11 47 15
Non-voltage-gated 27 21 26 26 18 72 39
ligand-gated 5-HT receptors), others were absent in mammals and Cys-loop receptors
present in invertebrates (for example, anionic glutamate and acetyl- Glutamate 21 15 47 36 30 15 18
choline receptors). The complement of neurotransmission genes Nicotinic acetylcholine 53 16 52 77 10 88 16
in octopus may be broadly typical for a lophotrochozoan, but our Inhibitory acetylcholine 3 2 5 2 0 4 0
5-HT3 0 0 0 0 0 1 5
findings suggest it is also not obviously smaller than is found in mam- GABA 6 5 4 9 3 7 19
mals. Glutamate-gated chloride channels 7 5 8 5 1 6 0
Among the octopus complement of ligand-gated ion channels, we Number of subunits of representative ion channel families in O. bimaculoides and across examined taxa.
identified a set of atypical nicotinic acetylcholine receptor-like genes, Dendrogram above species names shows their evolutionary relationships. Aca, Aplysia californica.
1 3 AU G U S T 2 0 1 5 | VO L 5 2 4 | N AT U R E | 2 2 3
G2015 Macmillan Publishers Limited. All rights reserved
RESEARCH LETTER
expansion of C2H2 ZNFs, genome rearrangements, and extensive 21. Noonan, J. P., Grimwood, J., Schmutz, J., Dickson, M. & Myers, R. M. Gene
conversion and the evolution of protocadherin gene cluster diversity. Genome Res.
transposable element activity yield a new landscape for both trans- 14, 354366 (2004).
and cis-regulatory elements in the octopus genome, resulting in 22. Shomrat, T. et al. Alternative sites of synaptic plasticity in two homologous fan-out
changes in an otherwise typical lophotrochozoan gene complement fan-in learning and memory networks. Curr. Biol. 21, 17731782 (2011).
23. Liu, H., Chang, L. H., Sun, Y., Lu, X. & Stubbs, L. Deep vertebrate roots for
that contributed to the evolution of cephalopod neural complexity and mammalian zinc finger transcription factor subfamilies. Genome Biol. Evol. 6,
morphological innovations. 510525 (2014).
24. Eichler, E. E. et al. Complex b-satellite repeat structures and the expansion of the
Online Content Methods, along with any additional Extended Data display items zinc finger gene cluster in 19p12. Genome Res. 8, 791808 (1998).
and Source Data, are available in the online version of the paper; references unique 25. Nithianantharajah, J. et al. Synaptic scaffold evolution generated components of
to these sections appear only in the online paper. vertebrate cognitive complexity. Nature Neurosci. 16, 1624 (2013).
26. Brejc, K. et al. Crystal structure of an ACh-binding protein reveals the ligand-
Received 26 December 2014; accepted 16 June 2015. binding domain of nicotinic receptors. Nature 411, 269276 (2001).
27. Croset, V. et al. Ancient protostome origin of chemosensory ionotropic glutamate
receptors and the evolution of insect taste and olfaction. PLoS Genet. 6, e1001064
1. Hanlon, R. T. & Messenger, J. B. Cephalopod Behaviour (Cambridge Univ. Press,
(2010).
1996).
28. Erwin, J. A., Marchetto, M. C. & Gage, F. H. Mobile DNA elements in the generation of
2. Young, J. Z. The Anatomy of the Nervous System of Octopus vulgaris (Clarendon diversity and complexity in the brain. Nature Rev. Neurosci. 15, 497506 (2014).
Press, 1971). 29. Chenais, B., Caruso, A., Hiard, S. & Casse, N. The impact of transposable elements
3. Wells, M. J. Octopus: Physiology and Behaviour of an Advanced Invertebrate on eukaryotic genomes: from genome size increase to genetic adaptation to
(Chapman and Hall, 1978). stressful environments. Gene 509, 715 (2012).
4. Bonnaud, L., Ozouf-Costaz, C. & Boucher-Rodoni, R. A molecular and karyological 30. Strugnell, J., Norman, M., Jackson, J., Drummond, A. J. & Cooper, A. Molecular
approach to the taxonomy of Nautilus. C. R. Biol. 327, 133138 (2004). phylogeny of coleoid cephalopods (Mollusca: Cephalopoda) using a multigene
5. Hallinan, N. M. & Lindberg, D. R. Comparative analysis of chromosome counts approach; the effect of data partitioning on resolving phylogenies in a Bayesian
infers three paleopolyploidies in the mollusca. Genome Biol. Evol. 3, 11501163 framework. Mol. Phylogenet. Evol. 37, 426441 (2005).
(2011).
6. Yoshida, M. A. et al. Genome structure analysis of molluscs revealed whole genome Supplementary Information is available in the online version of the paper.
duplication and lineage specific repeat variation. Gene 483, 6371 (2011).
Acknowledgements We thank C. T. Brown and J. Rosenthal for making Doryteuthis
7. Rosenthal, J. J. & Seeburg, P. H. A-to-I RNA editing: effects on proteins key to RNA-seq data available before publication; C. Ha, J. Orenstein, J. Brandenburger,
neural excitability. Neuron 74, 432439 (2012). M. Glotzer and H. Gui for bioinformatic assistance; S. Shigeno for help with tissue
8. Kroger, B., Vinther, J. & Fuchs, D. Cephalopod origin and evolution. Bioessays 33, dissection; C. Huffard and R. Caldwell for providing the O. bimaculoides specimen used
602613 (2011). for genomic DNA isolation; and E. Begovic for genomic DNA preparation. This work was
9. Herculano-Houzel, S., Mota, B. & Lent, R. Cellular scaling rules for rodent brains. supported by the Molecular Genetics Unit of the Okinawa Institute of Science and
Proc. Natl Acad. Sci. USA 103, 1213812143 (2006). Technology Graduate University (S.B. and D.S.R.) and by funding from the NSF
10. Grasso, F. W. & Basil, J. A. The evolution of flexible behavioral repertoires in (IOS-1354898) and NIH (R03 HD064887) to C.W.R. and from the NSF (DGE-0903637)
cephalopod molluscs. Brain Behav. Evol. 74, 231245 (2009). to Z.Y.W. This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC
11. Holland, P. W., Garcia-Fernandez, J., Williams, N. A. & Sidow, A. Gene duplications Berkeley, supported by NIH S10 instrumentation grants S10RR029668 and
and the origins of vertebrate development. Development (Suppl.), 125133 S10RR027303, and the University of Chicago Functional Genomics Facility, supported
(1994). by NIH grant UL1 TR000430.
12. Dietrich, F. S. et al. The Ashbya gossypii genome as a tool for mapping the ancient
Saccharomyces cerevisiae genome. Science 304, 304307 (2004). Author Contributions The Chicago and the OIST/Berkeley groups initiated their
13. Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient transcriptome and genome projects independently. In the subsequent collaboration,
genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617624 both groups worked closely on every aspect of the project. Chicago group: C.B.A., Z.Y.W.,
(2004). J.R.P. and C.W.R.; OIST/Berkeley group: O.S., T.M., E.E.-G., S.B. and D.S.R.
14. Putnam, N. H. et al. The amphioxus genome and the evolution of the chordate Author Information Genome and transcriptome sequence reads have been deposited
karyotype. Nature 453, 10641071 (2008). in the SRA as BioProjects PRJNA270931 and PRJNA285380. A browser of this
15. Duboule, D. The rise and fall of Hox gene clusters. Development 134, 25492560 genome assembly is available at (http://octopus.metazome.net/). Reprints and
(2007). permissions information is available at www.nature.com/reprints. The authors declare
16. Callaerts, P. et al. HOX genes in the sepiolid squid Euprymna scolopes: implications no competing financial interests. Readers are welcome to comment on the online
for the evolution of complex body plans. Proc. Natl Acad. Sci. USA 99, 20882093 version of the paper. Correspondence and requests for materials should be addressed
(2002). to C.W.R. (cragsdale@uchicago.edu) or D.S.R. (dsrokhsar@gmail.com).
17. Simakov, O. et al. Insights into bilaterian evolution from three spiralian genomes.
Nature 493, 526531 (2013). This work is licensed under a Creative Commons Attribution-
18. Zipursky, S. L. & Sanes, J. R. Chemoaffinity revisited: Dscams, protocadherins, and NonCommercial-ShareAlike 3.0 Unported licence. The images or other
neural circuit assembly. Cell 143, 343353 (2010). third party material in this article are included in the articles Creative Commons licence,
19. Chen, W. V. & Maniatis, T. Clustered protocadherins. Development 140, unless indicated otherwise in the credit line; if the material is not included under the
32973302 (2013). Creative Commons licence, users will need to obtain permission from the licence holder
20. Brown, C. T., Graveley, B. & Rosenthal, J. J. Loligo pealeii (Squid) Data Dump (http:// to reproduce the material. To view a copy of this licence, visit http://creativecommons.
ivory.idyll.org/blog/2014-loligo-transcriptome-data.html) (2014). org/licenses/by-nc-sa/3.0
2 2 4 | N AT U R E | VO L 5 2 4 | 1 3 AU G U S T 2 0 1 5
G2015 Macmillan Publishers Limited. All rights reserved
LETTER RESEARCH
Data file for Extended Data Fig. 10. Octopus-specific novelties were defined as 45. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42,
sequences with transcriptome support but without any matches to sequences D222D230 (2014).
from any other animals (,1 3 1023), including nautiloid and decapodiform 46. Mi, H., Muruganujan, A. & Thomas, P. D. PANTHER in 2013: modeling the evolution
of gene function, and other gene attributes, in the context of phylogenetic trees.
cephalopods. Nucleic Acids Res. 41, D377D386 (2013).
31. Pickford, G. E. & McConnaughey, B. H. The Octopus bimaculatus problem: a study 47. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein
in sibling species. Bulletin of the Bingham Oceanographic Collection 12, 166 database search programs. Nucleic Acids Res. 25, 33893402 (1997).
(1949). 48. Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time
32. Chapman, J. A. et al. Meraculous: de novo genome assembly with short paired-end and space complexity. BMC Bioinformatics 5, 113 (2004).
reads. PLoS ONE 6, e23501 (2011). 49. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence
33. Naef, A., Boletzky, S. v. & Roper, C. F. E. Cephalopoda. Embryology (Smithsonian alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
Institution Libraries, 2000). 50. Waterhouse, A. M., Procter, J. B., Martin, D. M., Clamp, M. & Barton, G. J. Jalview
34. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data Version 2a multiple sequence alignment editor and analysis workbench.
without a reference genome. Nature Biotechnol. 29, 644652 (2011). Bioinformatics 25, 11891191 (2009).
35. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using 51. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2approximately maximum-
the Trinity platform for reference generation and analysis. Nature Protocols 8, likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
14941512 (2013). 52. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq
36. Smit, A. & Hubley, R. RepeatModeler Open-1.0. (20082010). experiments with TopHat and Cufflinks. Nature Protocols 7, 562578 (2012).
37. Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0. (19962010). 53. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a
38. Ohshima, K. & Okada, N. Generality of the tRNA origin of short interspersed curated non-redundant sequence database of genomes, transcripts and proteins.
repetitive elements (SINEs). Characterization of three different tRNA-derived Nucleic Acids Res. 33, D501D504 (2005).
retroposons in the octopus. J. Mol. Biol. 243, 2537 (1994). 54. Palavicini, J. P., OConnell, M. A. & Rosenthal, J. J. An extra double-stranded RNA
39. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal binding domain confers high activity to a squid RNA editing enzyme. RNA 15,
transcript alignment assemblies. Nucleic Acids Res. 31, 56545666 (2003). 12081218 (2009).
40. Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and 55. Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic
divergence times in the absence of a molecular clock. Bioinformatics 19, 301302 trees. Bioinformatics 17, 754755 (2001).
(2003). 56. Starnes, T., Broxmeyer, H. E., Robertson, M. J. & Hromas, R. Cutting edge: IL-17D,
41. Yang, Z. & Nielsen, R. Estimating synonymous and nonsynonymous substitution a novel member of the IL-17 family, stimulates cytokine production and inhibits
rates under realistic evolutionary models. Mol. Biol. Evol. 17, 3243 (2000). hemopoiesis. J. Immunol. 169, 642646 (2002).
42. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with 57. Cummins, S. F. et al. Candidate chemoreceptor subfamilies differentially
RNA-seq. Bioinformatics 25, 11051111 (2009). expressed in the chemosensory organs of the mollusc Aplysia. BMC Biol. 7, 28
43. Li, H. & Durbin, R. Fast and accurate short read alignment with BurrowsWheeler (2009).
transform. Bioinformatics 25, 17541760 (2009). 58. van Nierop, P. et al. Identification of molluscan nicotinic acetylcholine receptor
44. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing (nAChR) subunits involved in formation of cation- and anion-selective nAChRs.
genomic features. Bioinformatics 26, 841842 (2010). J. Neurosci. 25, 1061710626 (2005).
a Hsa_ADAT1 b
1
Mmu_ADAT1
1
Ocbimv22027735m_ADAT ADAT
0.33
Cin_ADAT
0.9
Lgi_231337
0.16
0.64
Cte_224434 ADAR1
Cte_220233
0.88
Lgi_139693
0.92
Z- Adenosine
Ocbimv22010033m_ADAR-like dsRBD
alpha deaminase
0.88
1
Hsa_ADAD1 ADAR-like
0.98
Mmu_ADAD1
1
Hsa_ADAD2
Mmu_ADAD2
Cte_171815
0.57 0.91
Cte_228448 ADAR2
Lgi_166687
0.59
Cin_ADAR Adenosine
Hsa_ADAR dsRBD dsRBD
0.14
1 deaminase
0.85
Mmu_ADAR
Cte_183692
ADAR1
0.16
1
1
Lgi_133731
Ocbimv22018643m_ADAR
0.85
Dme_ADAR ADAR-like/ADAD
Cin_ADAR2
1 Cte_176450
Hsa_ADARB1 Adenosine
1 dsRBD dsRBD
0.58 Mmu_ADARB1 deaminase
0.98
1
Hsa_ADARB2 ADAR2
0.18 Mmu_ADARB2
Lgi_128560
0.97
Ocbimv22009676m_ADAR2
1
1
Dop_ADAR_B
Dop_ADAR_A
c d
1500
Number of DNA-RNA differences
ADAR1 varia
A-C
A-G
1000 A-T
ADAR2
C-A
C-G
C-T
ADAR-like
G-A
/ADAD G-C
Viscera
Suckers
OL
Skin
St15
Sub
PSG
Ova
ANC
Testes
Supra
Retina
500 G-T
T-A
T-C
T-G
3 2 1 0 1 2 3 0
Row Z-score Ova Testes Viscera PSG Suckers Skin St15 Retina OL Supra Sub ANC
DNA RNAdiff
Extended Data Figure 1 | RNA editing in octopus. a, Approximate bimaculoides show prominent A-to-G changes. Histogram illustrates the
maximum likelihood tree of adenosine deaminases acting on RNA (ADARs) in number of DNARNA differences detected between coding sequences in
bilaterians. ADAR1, ADAR2, ADAR-like/ADAD, and ADAT (tRNA-specific the genome and 12 O. bimaculoides transcriptomes after filtering out
adenosine deaminase) were identified in Hsa, Mmu, Cin, Dme, Cte, Lgi, polymorphisms identified in genomic sequencing. Differences were binned
D. opalescens (Dop54), and Obi with ShimodairaHasegawa-like support by the type of change (see key) in the direction of transcription. A-to-G
indicated at the nodes. b, O. bimaculoides ADAR1, ADAR2 and ADAR-like changes are the most prevalent, particularly in neural tissues and during
proteins contain one or two double-stranded RNA binding domains (dsRBD) development, paralleling the expression of octopus ADARs in c. Other types
as well as an adenosine deaminase domain. ADAR1 also has a z-alpha of changes were also detected at lower levels, possibly resulting from
domain. c, Expression profiles of the three ADAR genes found in 12 uncharacterized polymorphisms.
O. bimaculoides tissues by RNA-seq profiling. d, DNARNA differences in O.
H. sapiens 107 kb
199 kb
117 kb
98 kb
B. floridae 448 kb
Hox1 Hox2 Hox3 Hox4 Hox5 Hox6 Hox7 Hox8 Hox9 Hox10 Hox11 Hox12 Hox13 Hox14 Hox15
C. teleta 243 kb // 22 kb
Lab Pb Hox3 Dfd Scr Lox5 Antp Lox4 // Lox2 Post2 Post1
L. gigantea 471 kb
Lab Pb Hox3 Dfd Scr Lox5 Antp Lox4 Lox2 Post2 Post1
O. bimaculoides 421 kb
Lab
474 kb
Scr
751 kb
Lox5
53 kb
Antp
137 kb
Lox4
437 kb
Lox2
231 kb
Post2
187 kb
Post1
Extended Data Figure 2 | Local arrangement of Hox gene complement in on three scaffolds17. L. gigantea has a single cluster with the full known
O. bimaculoides and selected bilaterians. At the top, the four compact Hox lophotrochozoan gene complement. In O. bimaculoides many of the scaffolds
clusters of H. sapiens and the single B. floridae cluster are depicted. The are several hundred kb long, and no two Hox genes are on the same scaffold.
D. melanogaster Hox complex is split into two clusters. We included genes in The positions of O. bimaculoides genes approximate their locations on
the D. melanogaster locus that are homologues of Hox genes but have lost their scaffolds. Dashed lines indicate that the scaffold continues beyond what is
homeotic function, such as fushi tarazu (ftz), bicoid, zen and zen2 (the latter shown. Scaffold length is depicted to scale with size noted on the left. Genes are
three are represented as overlapping boxes). Hox genes in C. teleta are found positioned to illustrate orthology, which is also highlighted by colour.
Extended Data Figure 3 | Gene complement and gene architecture Spu, S. purpuratus; Hma, Hydra magnipapillata; Adi, Acropora digitifera.
evolution in metazoans. a, Principal component analysis of gene family For methods, see Supplementary Note 7.4. bd, MrBayes55 tree (constrained
counts. O. bimaculoides highlighted in green. Deuterostomes are indicated in topology) on binary characters of presence or absence of Pfam domain
blue, ecdysozoans in red, lophotrochozoans in green, and sponges and architectures (b), introns (c), or indels (d); scale bar represents estimated
cnidarians in orange. Xtr, Xenopus tropicalis; Gga, Gallus gallus; Tca, Tribolium changes per site. For methods, see Supplementary Note 7.3.
castaneum; Dpu, Daphnia pulex; Isc, Ixodes scapularis; Ava, Adineta vaga;
a c
0810m 39309m
0811m
0816m 39310m
0819m
0820m 39311m
0821m
0822m 39312m
0824m
0826m 39316m
0827m
0828m 39317m
Sca old 30672
Testes
Sub
St15
Retina
OL
PSG
Supra
ANC
Viscera
Ova
Skin
Testes
Sub
Suckers
St15
Retina
OL
PSG
Supra
ANC
Viscera
Ova
Skin
Suckers
3 2 1 0 1 2 3
Row Z-Score
b d
s_ pc s_ pc
dh 11 Ct e_ dh 11 Ct e_
_X lin Ct e_ 13 95 _X lin
ke d_ Ct e_ 13 95
13 96 22 ke d_ 13 96 22
XP 09 XP
_0 02 _0 02 09
Am 74 13 Am 74 13
ph iPC 90 .1 ph iPC 90 .1
Am DH Am
Hs ph iPC 2 Hs ph iPC
DH
2
Hsa_Pa_PCD DH 1 Hsa_Pa_PCDHDH 1
Hsa _P H-1
_PC CDH-1 2 Hsa_P _PCD -12
Hsa
_PC
DH-19 7 Hsa CD H-17
Hsa_PC Hsa DH _a _PC H-19_a
_PC -19_c Hsa_PCHsa_PC DH-19
Hsa_PC DH_alp DH-10_1 Hsa_PC DH_alp DH-10_1
_c
ha-
Hsa_PCDH_alph C2_1 DH_ ha-C2_
a-C Hsa_PC alpha-C 1
Hsa_PC DH_alph 1_1
a-5_ Hsa_PC DH_alph 1_1
Hsa_PC DH_alph 1 a-5_
a-6_1 Hsa_PC DH_alph 1
Hsa_PCD DH_alph a-6_1
a-8_ Hsa_PCD DH_alph
Hsa_PCD H_alpha-11_ 1 a-8_
H_alpha- 1 Hsa_PCD H_alpha-11_ 1
Hsa_PCDH 10_1 Hsa_PCD H_alpha-10_11
Hsa_PCDH_alpha-12_ H_al
1 Hsa_PCDH pha-12_1
Hsa_PCDH__alpha-1_1
Hsa_PCDH_ alpha-2_1 Hsa_PCDH__alpha-1_1
alpha-13_1 Hsa_PCDH_ alpha-2_1
Hsa_PCDH_a alpha-13_1
Hsa_PCDH_alph lpha-3_1 Hsa_PCDH_a
Hsa_PCDH_alp lpha-3_1
Hsa_PCDH_alpha- a-4_1
Hsa_PCDH_alpha-97_1 Hsa_PCDH_alphha-4_1
_2 Hsa_PCDH_alpha-9 a-7_1
Hsa_PCDH_alpha-9_1 _2
Hsa_PCDH_beta-13 Hsa_PCDH_alpha-9_1
Hsa_PCDH_beta-8 Hsa_PCDH_beta-13
Hsa_PCDH_beta-15 Hsa_PCDH_beta-8
Hsa_PCDH_beta-6 Hsa_PCDH_beta-15
Hsa_PCDH_beta-5 Hsa_PCDH_beta-6
oE098_C4MR_IZV Hsa_PCDH_beta-5
Hsa_PCDH_beta-4 oE098_C4MR_IZV
oRR092_C2_IZ
Hsa_PCDH_beta-3 Hsa_PCDH_beta-4
oRR444_C2Y_IZ oRR092_C2_IZ
Hsa_PCDH_beta-2 opRR671_C3A_IZ Hsa_PCDH_beta-3 oRR444_C2Y_IZ
Hsa_PCDH_beta-7 oC831_C2X_IZ Hsa_PCDH_beta-2 opRR671_C3A_IZ
Hsa_PCDH_beta-10 oC826_C2Y_IZ Hsa_PCDH_beta-7 oC831_C2X_IZ
Hsa_PCDH_beta-9 oRR894_C2DY_IZ Hsa_PCDH_beta-10 oC826_C2Y_IZ
Hsa_PCDH_beta-141 oT039_C3X2_IZ Hsa_PCDH_beta-9 oRR894_C2DY_IZ
Hsa_PCDH_beta-1 a-12
oC832_C3_IZ Hsa_PCDH_beta-14 -11
oT039_C3X2_IZ
oE097_oT898_
Hsa_PCDH_betC5_1 Hsa_PCDH_betaa-12 oC832_C3_IZ
amma- oE093_C6TDS C3_IZ oE097_oT898_
Hsa_PCDH_ggamma-C4_1 opE100_C3D _IZUOJV Hsa_PCDH_betC5_1
amma- oE093_C6TDS C3_IZ
Hsa_PCDH_ gamma-C3_1 opC829_C3D R_UOWV Hsa_PCDH_ggamma-C4_1 opE100_C3D _IZUOJV
Hsa_PCDH__gamma-B2_1 opC825_C R_ZUOWV Hsa_PCDH_ gamma-C3_1 opC829_C R_UOWV
Hsa_PCDH _gamma-B 1
1_1 opRR578_ 3DR_UOW Hsa_PCDH__gamma-B2_11 opC825_C 3DR_ZUOW
V V
Hsa_PCDHH_gamma
-B7_ opC828_C C3DR_UOQ opRR578_ 3DR_UOW
-B6_1 oE096_C6 3DRS_UO V Hsa_PCDHH_gamma-B1_ 1 V
Hsa_PCD H_gamma 5_1 Hsa_PCD H_gamma
-B7_ opC828_C C3DR_UOQ
oRR576_ A_ZUOWV WV a-B6_1 oE096_C 3DRS_U V
Hsa_PCDDH_gamma-BB4_1 oT096_ C2_Z Hsa_PCD H_gamm B5_1
ma- oRR576_ 6A_ZUOWOWV
Hsa_PC DH_gam A11_1 oRR C3_Z Hsa_PCDDH_gamma- B4_1 V
ma- 026 oT096_ C2_Z
Hsa_PCDH_gam _1 oG808_ _C3A_Z Hsa_PC DH_gam A11_1
ma- C3_Z
ma-A12 1 C3S oRR
026
Hsa_PC DH_gam a-A10_ oG8
10-813 X_UOW
ma-
Hsa_PCDH_gam a-A12_1 oG808_ _C3A_Z
mm 9_1 oT1
Hsa_PC DH_ga mma-A 8_2 86_ _C6 JQ
Hsa_PC DH_ga
mm 10_1 oG8 C3S
10-813 X_UOW
oJ054_ C3L_Z TDNSF_ mma-A 9_1
Hsa_PC DH_ga mma-A _1 oG812 C6ATN ZUO Hsa_PC DH_ga mma-A 8_2
oT1
86_C3L_C6TDN JQ
_PC -A8 WJQV
Hsa DH_ga ma _1 oJ0 _C F_Z Hsa_PC DH_ga mma-A _1
oJ054_ _Z SF_
_PC H_gam ma-A5 52_ 6AT UO oG812 C6ATN ZUO
Hsa CD gam -A2
_1 oG809 C6TD NSF_Z WJQV Hsa
_PC
DH_ga ma
-A8
_1 _C F_Z
WJQV
Hsa_P CDH_ gamma -A1_1 _C6T NF_ZU UOWJ oJ0
oJ0 _PC H_gam ma-A5 52_ 6AT UO
oG8153_C3_ DNF_ OWJQ QV
Hsa CD _1 oG809 C6TD NSF_Z WJQV
Hsa_P CDH_ amma -A7_1 gam -A2
_g _1 opRR 5_C6 Z ZUOW V Hsa_P CDH_ amma -A1_1 oJ0 _C6T NF_ZU UOWJ
Hsa_PPCDH ma _g oG8153_C3_ DNF_ OWJQ QV
_gam ma-A64_1 oB33 544_ TDNF JQV Hsa_PPCDH ma _1
Hs a_ DH
_gam ma-A 23 _gam ma-A76_1 opRR 5_C6 Z ZUOW V
PC oJ05 3_C6 C5TD _ZUO Hsa_ PCDH
Hsa_ PCDH _gam 22 57 17 oF97 1_C3 TNF_ NFX_
WJ _gam ma-A 4_1 oB33 544_ TDNF JQV
e_ V Hsa_ PCDH _gam -A _ZUO
Hsa_ PCDH Ct 22 24 59 oB 3_C3 DA_Z IZUOWUOWJ a_ DH ma 57 23 oJ05 3_C6 C5T TD WJ
Hsa_ Ct e_ 18 22 26 oB 323_C4 HTSX JQV QV Hs PC _game_ 22 24 17 oF97 1_C3 TNF_ NFX_
o V
Ct e_ 22 79 88 oB 324_ S_ 2_UO Hsa_ PCDH Ct 22 59 oB 3_C3 DA_Z IZUOWUOWJ
Ct e_ 22 25 75 oB 326_ C6D2
ZV
WV Hsa_ Ct e_ 18 22 26 oB 323_C4 HTSX JQV QV
60
Ct e_ 22 87 87 oB 322_ C5_IZ X_IZ Ct e_ 22 79 88 oB 324_ S_ 2_UO
ZV
Ct e_ 21 23 90 oT 327_ C4SX QV Ct e_ 22 25 75 oB 326_ C6D2 WV
Ct
e_ e8 56 oB 840_ C4S_ _ZV Ct e_ 22
60 87 oB 322_ C5_I X_IZ
Ct 17 60 2
17 0 oR 317_ C4X_ ZV Ct
e_ 21 87 90 oT 327_ C4SX Z QV
e_
Ct C te 72 80 3 oB R075 C6DS IZ Ct
e_ e8 23 56 oB 840_ C4S_ _ZV
18 36 oB 318_ _C X_ Ct 17 60 2 oR 317_ C4X_ ZV
_2 9 oB 316_ C6 6_IZ IZUW e_ 17 0
oB R075 C6DS IZ
C te C te 3328 96 7 oB 310_ C6 D2_Z UOW JQ Ct C te 72 80 3 oB 318_ _C
18 36 X_
_2 19 63 4 oB 309_ C4_SD_I UW J V _2 9 oB 316_ C6 6_IZ IZUW
C te _2 56 03 5 oB 31 C Z ZUO JQV C te C te 3328 96 7 oB 310_ C6 D2_Z UOW JQ
C te _1 92 66 3 oL 31 1_C 4_Z JV _2 19 63 4 oB 309_ C4_SD_I UW J V
C te C te 95 13 5 oR 48 2_C 3_Z C te _2 56 03 5 oB 311_ C Z ZUO JQV
_1 32 6873 1 oB R882_C 4_Z C te _1 92 66 3 oL 312_ C 4_Z JV
C te C te 62 86 0 oK 33 3_ 4_ C te C te 95 13 5 oR 48 3_
7 oB 74 2_C C3_ Z C
oB R882_C 4_Z
Z
C tete 96 54 8 oK 32 9_C 6T Z _1 32 6873 1
C 44 7 0 3 oK 75 9_ 6M DS C te C te 62 86 0 oK 33 3_ 4_
_1 i7 1 7 0 a oK 74 3_ C6S S_Z F_ZU 7 oB 74 2_C C3_ Z
C te L g i7 1 d_ a oB 74 7_ C6T N_I W O
C tete 96 54 8 oK 32 9_C 6T Z
44 7 0 3 oK 75 9_ 6M DS
L g inke d_ 2 p oR 32 6_ C6S S ZU JV WJQ C
_1 i7 1 7 0 a
ke oF R 8_ C6T F_ NF_ O ooK 74 3_ C6S S_Z F_ZU
X-l in 0 9 6 5 oR 98 82 C6D P ZU ZU WJQ V C te L g i7 1 d_ a oB 74 7_ C6T N_I W O
1_ Y-l 1 2 3 1 9 2 7 1 oR R 5_ 6_C T SFD OW OW V L g inke d_ 2 p oR 32 6_ C6S S ZU JV WJQ
_ 0
H-1 -11 g i_ L g i5 i8 2 4 5 R12 712_ C6T 6S SF _I JQ Q ke
X-l in 0 9 6 5 o R 8_ C6T F NF O V
CD H L
R45908_ 3 9 4 2 4
Lg 20
oR R i_ 1 i6 2 9 0 3
V
6_ C5_ MF _Z O WJQ V
0 7
_ 0
H-1 -11 g i_ L g i5 i8 2 4 5
L 0_ C7_ 1 5
i_ C6S U W
oR L g L g 1 3
7_ C6M S NX U JQ
Hsasa_ Lg JQ V
L g g i5 C7X U
CD H L
R45 8_ 3 9 4 2 4
F_I OW UO WJV V 9
i_
oF 81 6_ C6_ 6M F_ZU
C6M
TM P_Z _U ZW V V
_
L g L g i7 1 0 6 2
0 7
Lg
_P CD 6_ C5 F
oC 81 5_ C S
oT 98 5_ C6T S UO F_ W
O
97 1_ C6T B E_Z UW
Lg i_ 1 i5 1 6 9 2
ZW V W C6S _ _ UO W JQV
L 0_ C7_ 1 5
i_
oE 67 3_ 6M W
oR L g L g 1 3
SF FX_Z UO UO JQV
oF 97 9_ C6D IZ SN
7_ C6M S NX U JQ
Q Hsasa_ Lg
Lg i_ 10 0 4 6 8 3 2
U IZ JQ
L g g i5 C7X U
oT R95 7_C 2_IZ _IZ
F_I OW UO WJV V
S _Z Z W V
JQ V
i_
X2_ U W WJQ
oF 81 6_ C6_ 6M F_ZU
C6M
2
L g L g i7 1 0 6 2
oR 81 C 2X 2X Z
Lg
oC 81 5_ C S
48 3 2
H
X _I O
oC 316_ C 6_C _I
Lg i_ 1 i5 1 6 9 2
UO OW V V
ZW V W
R74 Lg i_ i5 51 1090
oE 67 3_ C6M IZW
SF X_Z UO O
6
oI 315_ 88 C3X Z
Lg i_ 10 0 4 6 8 3 2
Q
W JQ
oT R95 7_ 2_ _IZ
SF _Z ZU WJQV
JQ V
X2_ U W WJQ V
oI RR 4_
2
R 6_C 51 26
JQ V
oR 81 C 2X 2X Z
op R31 C5D4DTS
V
2
oR oR 242_ 7_ 12
X _I O
oC 316_ C 6_C _I
UO OW V V
V
oR 318_ C IZ UO UOW JQ V
63
10 15
JQ
90
Lg 10
48 827_ 3_ 7_ZV V
oI 315_ 88 C3X Z
oI 657_ C3_ SF_Z IZ UOW JQ
R85 C ZU
oR 746_ i_ 10 15 10
W JQ
48
i_
oN 321_ C6 TSF_ _Z
oI RR 4_ 5D_I F_
oA 0_ _C6T C6_ 6_ZU6_Z
26
JQ V
oI 818_ C6 6DFX FX_Z WJQ WJQ
i5 51
op R31 C 4DTS
4_ TP X_IZ IZU W
oR 242_ C7_ 51 12
oC 854_ _C
V
oR 318_ C IZ UO UOW JQ V
V
JQ
10
ZU
7_ 7_C C ZV
oR R276 _C6H TS UW W
i_
oN 321_ C6 TSF_ _Z
_Z O W
4_ TP X_IZ IZU W
_I F_IZ
C4 ZU WQVO
oC 854_ _C
oR 849_ _C6K IZUO Q
V
JQ
X_ UO ZU
85 W
C6 BF UO W
2_ C6TS oA UO WJQQ
R Lg
oA oA84 F_UO QV
oR 847_ _C6S
V
C5 C5 WV
O
_Z O W
oH R571 SGFX _IUOW
oR R277 _C6H B_
oA X_ZU _ZU
oA TSGFFX_Z 1_ QV
oR 2_C6 6TSX
ST OW
R
W
oA 848_ OW
C4 ZU WQVO
oH
oA83 411_ C5_Z _Z
JQ
UO
9_C5 C5_Z U
X_ UO ZU
85 W
oA oA84 F_UO QV
opA8 _Z UO
85
oL
oR 847_ _C6S
oD 846_ C5
TF X_
V
40_CUOW
C5 C5 WV
R4
TS E_ZO UO
oRR7 21_C TS_Z UOWJ
85 C4
WJV QV V
oA X_ZU _ZU
83 55
oR 852_C6 6TSX
OW
DFX__ZUOWVW
oA 848_ OW
4__
JQ
47_C _ZU FX4_ OWJQ
IZU
oH
oRR5 19_C6 ZUOW QV
oR
9_ C5_Z U
oP0 218_C6
UO
oRR
OW
ZV
oR
5_Z
oD 846_ C5
W
oD440 855_C 2_C4 0_
R4
TS E_ZO UO
oD437_ oRR837 _C3_Z
85 C4
WJV QV V
OW V V
oA
oRR027C6TSF_ _C4_Z
_C6TS5TFX _ZUO V
843_
841_
oRR
83 55
4_
oRR172 C5TS_IZ
DFX__ZUOWVW
oA827_C6AS_ZUOW WV
93_
oA84 A840_ZUO
ZO
JQ
oA 443_
oA826_C66S_ZUOW JQV
oM061_C6 HTS_IZUW
oN658_C6 _ZU
oD439 UOWQ U
ZOW
oN659_C6 T_IZUOWJQV
oP0 218_C6
oT078_C5 5_IZUOWJQ
V
_ZUO JQ
oA84 pA
oRR _C4T_ZVV
opRR161_C
oRR
opT761_C3_OJQV
oR
5_Z
WJ
oRR
OWJQV
T_IZUOWJQ
WJQV
550
X_Z
oRR
oRR730_C6TX_ZUOWJQV
OW
oM065_C6TFX_IZUOWJQ
oA
oR
SF_ZOW _Z
o
85
843_
841_
oRR
op C5
_UOWJQ
QV
oRR172 C5TS_IZ
oA811_C6TSF_ZUWJQ
oA827_C6AS_ZUOW WV
93_
ZO
oA826_C66S_ZUOW JQV
C6
oA833_C6 T_ZUOWJJQV
oD
_C4
_ZUOW
oM055_C TKDF_IZ
oRR
QV
TX_IZUO
oM061_C6 HTS_IZUW
T_Z
UO
WJQV
oN658_C6 _ZU
oN659_C6 T_IZUOWJQV
oT078_C5 5_IZUOWJQ
_ZUO JQ
V
opRR161_C _ZUJQV
oRR783_C6T ZOWJQV
opT761_C3_OJQV
oA836_C6TX_ ZOWV
WJ
opB320
oA820_C6TS_ZUO
OWJQV
T_IZUOWJQ
oA838_C
oA838
V
WJQV
550
IZW V
oA837_C6TSX_ZUO
Q
oD
opT761_C2DTS_OWJQ
X_Z
oA811_C6TSF_ZUWJQV
opB320_C6TS_ZOWV OWV
oRR730_C6TX_ZUOWJQV
_C6
JQ
oM065_C6TFX_IZUOWJQ
oA816_C6HTBFDX_ZUOWJQV
oRR261_C6TX_IZUOWJQ
opT235-12_C5_U0JQ
oA835_C6_ZUO
oA821_C6TD_ZUOWJQV
oA822_C6T_ZUOWJQV oRR804_C6_IZUOWV
85
opA824_C5TF_U
oA819_C6SX_IZUOWJ
_
53_C6
_C6
oD
oA828_C _C6SF_
oA
_ZUOW
opT235-14_C5_U0
WJV V
QV
TX_IZUO
T_Z
oA832
V
JQV
T_IUW OWJQV
V
oP088_C6
oA828_C
UOW
UOW
oA820_C6TS_ZUO
QV
IZW
C6T
_C6TS_
C6TS_IZ
opA824_C5TF_U
oA8
6TS_IZO
53_C6
oA
QV
oA832
oRR027
JQV
JQV
V
oP088_C6
UOW
UOW
QV
V
oA8
JQV
JQV
WV
V
50.0
Extended Data Figure 4 | Protocadherin genes within a genomic cluster are Scaffold 9600. Almost all of these protocadherins are most highly expressed in
similar in sequence and sites of expression. a, Expression profile of the 31 nervous tissues, with the exception of Ocbimv220039316m, which is most
protocadherin genes located on Scaffold 30672 in 12 octopus transcriptomes. highly expressed in the St15 sample. d, Phylogenetic tree highlighting Scaffold
Over three-quarters of the protocadherins are highly expressed throughout 9600 protocadherins in grey bars. As seen in b, protocadherins of the same
central brain, OL and ANC, while the others show more mixed distributions. scaffold tend to cluster together on the tree. Order of the genes in the heat maps
b, Phylogenetic tree highlighting Scaffold 30672 protocadherins in grey (a, c) follows the ordering on the corresponding scaffold.
bars. c, Expression profile of the 17 protocadherin genes located on
a b A902
Mmu_AAX90603-1 A903
Hsa_AAA59134-1 A904
Mmu_EDL28238-1 Mammalian A905
Hsa_AAA74137-1 IL1 , 1 , & 7 A906
Mmu_AAI10554-1 A907
A908
Hsa_AAH47698-1
A910
Mmu_NM_145837 A911
Hsa_AAH36243-1 A912
Mmu_EDL09761-1 A914
Hsa_AAF28104-1 A915
Mmu_AAK59816_1 A918
Hsa_AAG40848-1 Mammalian A919
A920
Mmu_EDL11677_1 IL17
A921
Hsa_AAF28105_1
A922
Mmu_EDL14378-1 A923
Hsa_AAH67505-1 A924
Mmu_AAQ88439-1 A925
Hsa_AAH70124-1 A927
Cgi_KJ531893_1 A928
Cgi_KJ531897_1 A929
B403
Lgi_172928
B404
Cgi_KJ531894_1 B405
Cgi_KJ531895_1 B406
Lgi_152638 C804
Lgi_152641 C805
Lgi_152639 D104
Lgi_176347 Annelid & non- E183
cephalopod
PSG
Retina
Supra
St15
Sub
ANC
Testes
Skin
OL
Ova
Suckers
Viscera
Lgi_228210
Cte_199819 mollusc
Cte_207036 IL17-like
Cgi_KJ531896_1
Cte_209751
3 2 1 0 1 2 3
Cte_226557 Row Z-score
Cte_209750 c
Cte_209765
Cte_210775
Cgi_ABO93467-1
oIL17L_E183
oIL17L_A908
oIL17L_A910
oIL17L_A907
oIL17L_B404
oIL17L_A906
oIL17L_A904 Octopus
IL17-like
oIL17L_D104
oIL17L_A928
oIL17L_A927
oIL17L_A911
oIL17L_C805
oIL17L_C804
oIL17L_B406
oIL17L_B405
Octopus
oIL17L_A903
IL17-like
oIL17L_A905 A
oIL17L_A902 B
oIL17L_B403 Human C
IL17 D
oIL17L_A929 E
F
oIL17L_A922
oIL17L_A924
oIL17L_A925
oIL17L_A923 Lottia
IL17-like
oIL17L_A921
oIL17L_A912
oIL17L_A915
oIL17L_A919 Capitella
oIL17L_A918 IL17-like
oIL17L_A914
oIL17L_A920
Crassostrea
IL17
Extended Data Figure 5 | Expansion of interleukin 17 (IL17)-like genes. and the Scaffold D gene is enriched in the viscera. c, Conserved cysteine
a, Phylogenetic tree of interleukin genes in Obi, Cte, Cgi, and Lgi. Mammalian residues in human IL17 and invertebrate IL17-like proteins. The human IL17
IL1A, IL1B, and IL7 used as outgroups. Human and mouse IL17s branch proteins share a conserved cysteine motif comprising 4 cysteine residues, which
from other members of the IL family. Octopus ILs (as well as all identified may form interchain disulfide bonds and facilitate dimerization56. Octopus
invertebrate ILs) group with the mammalian IL17 branch and are named IL17-like proteins also contain this four-cysteine motif, highlighted in yellow.
IL17-like. The 31 octopus genes are distributed across 5 scaffolds: scaffold A One octopus sequence encodes only 3 of these highly conserved cysteine
(Obi_A), 23 members; scaffold B (Obi_B), 4 members; scaffold C (Obi_C), 2 residues. These four cysteines are also present to varying degrees in Lottia,
members; scaffolds D (Obi_D) and E (Obi_E), 1 member each. b, Expression Capitella and Crassostrea sequences. Two additional conserved cysteine
profile of 31 octopus IL17-like genes. Heat map rows are arranged by order residues were found in the octopus sequences and are highlighted in red. The
on each scaffold. Blank rows indicate genes not expressed in our first cysteine residue is found in all invertebrate sequences examined, and none
transcriptomes. The 27 genes found in our transcriptomes have strong of the mammalian IL17 sequences.
expression in the suckers and skin. The scaffold C genes are enriched in the PSG
a b
Vertebrate-like
Chemosensory
c *
Opsin
d
Frizzled
e
Aplysia-like
Chemosensory
Adhesion
PSG
Sub
Supra
Ova
Testes
Retina
Viscera
Suckers
OL
Skin
St15
ANC
3 2 1 0 1 2 3
Row Z-score
Sub
Supra
ANC
PSG
Testes
Ova
Viscera
Suckers
Skin
St15
OL
Retina
Extended Data Figure 6 | G-protein-coupled receptors. GPCRs, also the Aplysia chemosensory GPCRs57 and 11 GPCRs are similar to vertebrate
known as 7-transmembrane (7TM) or serpentine receptors, form a large olfactory receptors. c, We identified 4 opsins in the octopus genome (from
superfamily that activates intracellular second messenger systems upon ligand top to bottom): rhodopsin, rhabdomeric opsin, peropsin, and retinochrome.
binding. This figure considers a subset of the 329 GPCRs we identified in d, The octopus class F GPCRs comprises 6 genes: 5 Frizzled genes and 1
O. bimaculoides. The full complement of GPCRs is presented in Supple- Smoothened gene (*). e, Thirty octopus genes show similarity to vertebrate
mentary Note 8.5. a, b, As reported for other lophotrochozoan genomes, the adhesion GPCRs.
octopus genome contains chemosensory-like GPCRs; 74 GPCRs are similar to
Alpha 7
Beta 1
Alpha
Obi_22010697m+
Alpha 1-4
Obi_22028723m
a 9/10
Obi_2 2017
1
248
9505
Dme_030160
Cte_221893
74 8m
Cte_141057
b like
Cte_16 5266
Lgi_136425
CHa7
Lgi_ 1412
Mmu_A
Dme_ 0089
Mm u_
Hsa_AC
Hsa_A
Ha7
Dme _007
6
Alpha 1-4
5
*
12 01
73 15
20 00
6
Ct e_
95
Ct e_
88 42
Mmu_A
5
Hsa_AC
732m
14 30
89 39
e_ 03 4 0 6 1
AC Ha
CHa10
Ct e_
CHa9
like
2
Beta 1
Lg i_1
22 78
Ha9
e_ 00
Ob i_2
C te
C te
78
03 34
21 54
e_ 02
3
C te
Ct e_
02
e_ 02
08
23 82
_2 37
_1 12
10
21
84
C te
Dm
e_0
_1 35 79
C te
17
Dm
48
Alpha 7
00
C te
43
Dm
_6
5
41
31 3
_1
Dm
19
4
e_
C te 1 6 7 1
52 1
Dm
50
48
C te
_1
35
_9
C te
5
Dm
14
C te 2 2 1
94
40
C te
_6
5
_
m
_1
52
C te
67
55
C te
5
28
51
_
Lg
3
23
52
_1
84
9
_8
96
07
i_
Lg
87
_1
2
21
90
00
21
i_
08
3
Alpha 9/10
C te
95
98
Lg
19
O
22
i_
30
3
b i_
i_
47
Lg
68
14
i_
Lg
C te _ 1 4
O
7m
64
Ob
22
2
60
34
b i_
i_
13
60
37
C te
m
73
02
_1
O
22
b i_ 1 8 6
50
22 032
b i_
3
29
97
m
00
48
C
22 48
72
i_
b i_ 2 2
te
73
03
48
Lg
03
_2
m
64
68 03
21
Lg a4
O
79 22
69
C i_ H
O 10 C
Mammalian
O
m b i_
6
b i_ te a4
_1 31 + O u _A H
22 11 2
0 0 592 M
m AC Ha
a_
Alpha Ob C te
Lg
i_
93
21 05m
44
Hs
m
u_
AC
Ha
2
3
Alpha 1-6
i_ 07 M AC Ha
22
74
_2
0304
30
10
Hs
a_
u_
AC
Ha
3 Beta 1-4
9 m
Lg 12 M AC Ha
6
Ob C te 21
m
i_ 5 + H s a_
u_
AC 6
Delta
i_ 2 _6 59
Mm Ha
AC
20
24
87
25
98
Hs
a_
AC
Ha
5
Gamma
Lg 9m
m u_ a5
Ob C te
i_ 5
24
81
+ M
_A
CH
Hb
3 Epsilon
i_ 2
20
35
_5
25
45
Hsa
u_
AC Alpha
82 Mm CH
b3
Lg i_ 6 m _A
10 25 + Hsa Ha1
C te 97 _AC A1
O bi _2 28 Hsa HR
_2 20 74 5 _AC A1
06 96 Hsa HR
4m u_AC
Lg i_
15 22
+ Mm G
89 ACHR
Ct e_ Hsa_ G
52 94 ACHR
Ct e_ 0 Mmu_ E
Ob i_2
20 37
20 12
40 5m
63 Hs a_ACHR
ACHR
E *
Mmu_
Ob i_2
Lg i_1
52 29
0 Hsa_AC
HRD *
200 091
1m + Mmu_A
CHRD *
Lgi_ 523 HRB1
* Cte_ 1991
85
33
Hsa_AC
Mmu_AC
HRB1
Beta 3 Lgi_1 6244
1 Mmu_ ACHb
2
Non-Alpha
Dme_0 077724
Hsa_ACHb2
Lgi_12821 2 Mmu_ACH b4
Lgi_168269 Hsa_ACHb4
Obi_22034659m Obi_22012266m
Obi_22006184m+ Obi_22012265m
C te
_5 29
79
12 O bi
_2 20
30 84
4m
Putative
_2 20 2m
_9 01 2
C te 06
O bi
_2 20
30 84
0m Non-binding
_2 04 1 Ob 30 84
C te
_1
14
50
21
Ob
i_ 2
20
06
3m Putative
C te 54 i_ 2 51
7m
11 Ob 20
C te
_
_1
82
08
3
3 Ob
i_ 2
20
06
51
8m
Non-
72 i_ 2 15
C te 56
C te
_2
07
29
28
Ob
Ob
i_ 2
20
06
51
0m binding
_5 92 i_ 2
20
06 9m
C te 31 Ob 52
_9 43 i_
20
06 2m
C te 28 Lg 22 52
_9 79
8 i_ 00 1m
C te 05 + Lg 67
95
65
20
_1 6m Lg
i_
16 8 m
Alpha C te 4 5 5 7 5 2
01 2
01
Lg
i_
96
90
51
22 23 L g i_ 1 6
32
i_ i_ 19 + 5
b Lg 16 4 m L g i_ 9 22
O
Lg
i_ 4 91 6 11
* 03
36 88
i_
56 295
L g i_ 6
0
21 63
10
* 2 2 te _
i_
Lg
48
1
87
14 863
b i_
*
29
C
_1
Lg
69
m
O
_9
te
64
i_
Lg
29
97
*
C
te
13
i_
16
81
C
Lg
97
57
**
54
Ova
Testes
Viscera
PSG
Suckers
Skin
St15
Retina
OL
Supra
Sub
ANC
13
01
i_
16
Lg
95
69
38
11
i_
22
13
i_
16
Lg
83
0
Lg
48
96
b i_
i_
13
Lg
4
85
i_
91
75
Lg
27
10
Lg
2
O
i_
_2
i_ 5
12
00
*
0
Lg
36
Lg
8
i_ 1 3 0 2
C te
45
18
91
_2
Ob
64
62
i_ 5
**
21
46
12
Lg
4
_1
C te
18
i_ 2
59
4
Ob
1
09
_1
C te
i_ 1
74
_9
_6 30 7
C te
67
20
*
C te
53 68
i_ 2
O bi
61
C te
29
60
_1
Lg i_
94 9
10
_1 18
Lg i_
43
20
_1
80
_2 20
C te
Lg i_
37
83
Lg i_
Lg i_1
5
_1 37
10 42
33
C te
6
13 88
Ob i_2
36
99 76
18 41
1m
Lg i_7
7
24 7
17 23
C te
Lgi _94
174 7
18
62 90
Lgi_ 748
146 9
22 65
13 19
10 55
Lgi_ 1088
63
Lgi_1 0904
8
Lgi_10 8943
Cte_20 8789
Cte_209716
Cte_222058
Cte_226535
Cte_216740
Cte_115452
C te
Ct e_
5m
Cte_1 4665
39 81
87
Ct e_
Cte_ 1526
Ct e_
05
20 07
34 68
Cte _11
Cte _1
61
Cte _18
Ct e_
3m
361
8
92
97 4m
94
Non-
8
3 2 1 0 1 2 3
Alpha
Row Z-Score
Lst_AchBP Y N - A I S K P E V L T P Q L A R V V S D G E V L Y M P S I R Q R F S C D V S G V D T E S G - A T C R I K I G S W T H H S R E I S V D P T T - - N S D D S E Y F S Q Y S R F E I L D V T Q K K N S V T Y S C C P - E A Y
c Hsa_AchR7 Y N S A D E R F D A T F H T N V L V N S S G H C Q Y L P P G I F K S S C Y I D V R W F P F D V Q H C K L K F G S W S Y G G W S L D L Q M Q - - - E A D I S G Y I P N G E W D L V G I P G - K R S E R F Y E C C K - E P Y
Obi_10697+ Y N S A N E V F D A T Y P T N V L V S Y N G F C H W V P P G M F K S T C Q I D I A W F P F D D Q K C T L K F G S W T H D G R Y L D L Q L D G D G N G D T S S F I R N G E W K L I A V P G - S R N V V K Y D C C P Q I Y L
Obi_12266 CNSVSGDFSFDVDKEVTVKYDGFVHLHIDKIFKTYCRINVENYPFDQHECDITVCLEHQMYMEETIEDF---VIDVKLKTKSNQWNFSFEET-EMEKDD-------VI
Obi_12265 CNSVTGKFSFDDNKEVTVNRNGDVNLYIDKIFETYCRINVEKYPFDEHECDISVCFEHQMYVEETVGEF---DYEVKLQSASNQWDFNFEKS-DVENDN-------IV
Obi_12263+ CNGVMDRFKLDEDTEIFLTNEGTVFLYIDNVFQTYCRINVNKYPFDEHECDLLVCLNHQMRERKRKPSK---------------------------------------
Obi_29097 CNSASGKFTFDEDTGVTLTSNGNTSLYIDRIVNTYCKVNINKYPFDEHECDISVCFRHQINTEETLNNF---VYNVTYNPTYNQWEYTFKEK-DILKEG-------II
Obi_29099 CNSVTGKFTFDGNWGVTIKSDGSVHLHIDQIFHTYCKVNVNKYPFDEHECDISVCFEHQMNLEVMLHDF---MYRVTYKPISNQWDYRYEYR-EVEKEE-------II
Obi_12259 CNDMSGNFAVHK-GGATIEYDGTVTFHMDGIFQTYCTIDMHKYPFDEHECYIKSCLRHQKYKEQTIQNF---SFYNMYNSSSDTWDYKFVVG-DVMENG-------II
Obi_04961 CNDMSGKFSQHEGEGATVKYDGSVSLHMDGIFQTHCTINMLKYPLDEHECNITVCLGHQENIEKTMQSF---SFNNLHNAEADKWEYKFAVG-NVTEKE-------II
Obi_18127 CNSVEGKFKFDEDKQVSVRHDGIVNLNTEGIFNTYCEINMENYPFDEHIC----------------------------------------------------------
Obi_18129+ RNSAEDKFIFQKNKQVFIKYDGTINLHIDGTYRIYCRIDIDKYPFDEHICYLSICLGTEMENQETIQFQ--------------EWEFRLEKT-SEE----------NF
Obi_12260 SNAEKVTKIPSLSEYITVSYDGRTSYFIRSIYRTYCSIDFYKYPFSLNVCKIYFWLSNEMVSYLKLQNV---DLANTSNIWTTIWNIQLDGH-KYDDDNS------TD
Obi_12262 CNAETIYNVSTPIPEAVMLSNGTIKTSTTLVYTIHCKIDNTKYPFDKQACEVHICLPLSKLNNVRIKTI----TTFKKQVTLRNWNVEIDKV-LQNHNER------IF
Obi_12261 CNAIKLIGNHRYERQVTVWQNGVVEEESFYTYQLFCGVDNSRYPFDVQNCPTYICLPYQMNNLTLIKSL---RTDPVKNMEG--WHIHTSTE-PPITYND------QE
Obi_30094 FVNGLSAVESAAEPAIRLEYSGNLNKYQKLSLKTFCPTEKDQYSFS---CPFMLKTYPLPSTQERLRVT---DFEVNEKFQSHQWNAEVNTN-ETRIYNED-------
Obi_30842 CNSVNEQDDSNINREVYVHYNGTVELWSLKYIETYCQVNAYTYPFDDQKCKIQMCVGLHSPDETRLKTI---CYWNMKFTESYKWDIHFSGK-ANGINSQ------SS
Obi_30840 CNSVNEQGDSNINREVYVDYEGTVYLWSLKYIETYCQVNAYTYPFDVHYCGIEMCVPLHSPNETRIQTI---YYRNMNFTENYKWDIHFSGE-ANGKVEE------FS
Obi_30843 CNSMTQSEEKDSLDDVLIYYDGFVRMLSFTLLQTYCQVNAYSYPFDEHKCEIRMCSATYHTDEANVTSF---LLNVYSEEENYKWYMSISDQ-ETY----------SS
Obi_06517 CNSMENSEDKDDFPELWIFSNGHVVMYSFRLLNTYCEVNAYTYPFDKHMCEIYMCVALHSVQHTRIKTL---DYHELNFIQNYKWDITLEGT-VNATNDK------FN
Obi_06518 CNSMDKSEENDGVGELMLTYTGWINMWSFRLLHTYCQINAYTYPFDEHTCEIYLCVALHTINHTRIKEL---IYEDSKFTQNYKWDINVSGK-VNGTDEL------FS
Obi_15560 CNAMKESEEKGSFLEVKVFNNGRVQMRSLKLLKSYCTFDAYAYPFDQHDCEIYICVALHDPVHTRIRTL---TYDNLNYSPNYNWDIDYNGI-KNASDQR------FS
Obi_06519 CNSMKESDDEDNFPEVRIFNNGLVERWSLPLLQSYCEVNAYAYPFDEHICKIYMCIALHTPQHTQINTL---IYYDADHTQNYKWNVNISGE-MKGIKFS------FS
Obi_06522 CNTMKQSEDKDNPSEVSVYFNGSIEISLIKLLHTYCEINAYTFPFDEHTCNVSMCVSLQELHHAKRTKL---TYKS-RQAKHSKWDIKFSGG-TNGTNYYH-----YS
Obi_06521 FNSRTESKYKYSYQDVTVYSKGSVEMVSIRFLHTDCQIEAYIYPFDLQTCYIFLGIPTYKPQDTKIKEI---LCGKENDTTNYQWDITLYCN-VDSANKH------YN
Obi_06520 LNTLMETQSKNNFLEMTVDFNGSVTMVEIKLLQTFCEIYVYNYPFDAQTCVISMGIPSHKFQDTKIKEL---SCYRKSDISDSEWGISFSCN-VHGTNNS------FS
Extended Data Figure 7 | O. bimaculoides acetylcholine receptor (AChR) c, Divergent octopus subunits lack nearly all residues necessary for ACh
subunits. a, Phylogenetic tree of AChR subunit genes identified in Hsa, Mmu, binding. Alignment of sequence flanking the cysteine loop (yellow) of the
Dme, Cte, Lgi and Obi. Black asterisk indicates a Dme sequence that groups L. stagnalis ACh-binding protein (Lst_AchBP), the human and octopus alpha-
with alpha 1-4-like subunits despite lacking two defining cysteine residues. 7 receptor subunits (Hsa_AchR7, Obi_106971), and the 23 divergent AChR
b, Expression profiles of octopus AChR subunits. Genes ordered as in the tree subunits. Essential ACh-binding residues on the primary (pink) and
(a), starting from the grey arrow and continuing anticlockwise. Putative non- complementary (blue) side of the ligand-binding domain are indicated26, with
ACh-binding subunits are highly expressed in the suckers. One sequence conservative substitutions in a lighter shade. Outside of the binding residues,
was not detected in our transcriptome data sets. In a and b, red asterisks residues shared between the alpha-7 subunits are shaded in light grey, with
indicate subunits with the substitution known to confer anionic permissivity58. bold letters for conservative substitutions.
Extended Data Figure 8 | Active transposable elements and gene expression (defined as having at least 75% of expression in a single tissue; see
expression specificity. a, Transposable element expression across 12 tissues. Source Data file for this figure). P value indicates the F-statistic for the
b, Correlation between the total transposable element (TE) load (in bp) in the significance of linear regression (H0: r2 5 0), with tissues with a P value #0.05
5 kb regions flanking the gene and the fraction of genes with tissue-specific indicated in pink.
Extended Data Figure 9 | Synteny dynamics in octopus and the effect of loss rates. Branch lengths, estimated with MrBayes55, reflect extent of local
transposable element (TE) expansions. a, Circos plot showing shared synteny genome rearrangement (Supplementary Note 6). c, Enrichment of overall and
across 6 genomes. Individual scaffolds are plotted according to bp length; specific TE classes (base pairs masked) around genes from ancient bilaterian
scaffolds with no synteny are merged together (lighter arcs). Despite the large synteny blocks, including those absent in octopus (see key). Asterisks indicate
size of the octopus genome, only a small proportion of the scaffolds show MannWhitney U-test with P value ,0.02. d, Transposable element insertion
synteny. b, Synteny reduction in octopus quantified based on synteny inference history (JukesCantor distance adjusted, see text) into the vicinity of genes
using gene families with at least one representative in human, amphioxus, from lost synteny blocks. Note that only one SINE peak is present; a more
Capitella, Helobdella, Octopus, Lottia, Crassostrea, Drosophila, and recent peak (visible in All genomic SINEs) cannot be recovered from
Nematostella. Drosophila, Helobdella and Octopus show the highest synteny those insertions.
Extended Data Figure 10 | Cephalopod phylogeny and novelties. a, Whole- octopus genome. c, d, Novel gene expression across multiple tissues. Bars depict
genome-derived phylogeny of molluscs and select other phyla showing the all cephalopod novelties; dark grey indicates sequences with no similarity to
relative position of octopus at the base of the coleoid cephalopods. For methods non-cephalopod genes using HMM searches (see Source Data for this figure).
see Supplementary Note 7.1. Members of the cephalopod class are indicated in c, Counts of tissue-specific novelties in a given tissue. d, Proportion of
blue, scale indicates number of substitutions per site. b, Phylogenetic tree of expression of novel genes versus total expression in individual tissues. CNS
reflectin genes. Reflectins are cephalopod-specific genes that allow for rapid (central nervous system) combines Supra, Sub, OL and ANC expression data.
and reversible changes in iridescence. Six reflectin genes were identified in the
Hsa_dachs
*
ous_2_CR
_U
C80
IX
oRR738-738_hiCDH
709_
oRR
Hsa_P_PCDH-15
A_a_EAX04 H-8
VIII
oRR023_C14_Z
.2
1 54
9_oT
AmphiCdhr10
AmphiCdhr11
Hsa_CDCDH-11
4
Hsa_P_ CDH-22
Hsa_P_C
AmphiCdhr9
AmphiDCHS
3
Hs a_fla H_E o_1_ AG_ CDH 88528
Cte_181060
Hsa_CD
I
Hsa_ CDH-18_21
Lgi_174335
270
-16
AmphiCddhr1
NvHedgling
54.1
AmphiCd hr2
Hsa_CD -8
Hs _CD ing F_L _P 22 75 4
NvCdhr3 hr3
Amp
Hsa_C _CDH-1
Hsa_ H-19_1
a_fl min G AA 7p -8
Hsa_P
Hs _Flam _EG Hs C _21 7835 21
Cte_ iCdhr5
Hs a_C D
N vP CD H
Am _2207 9-938-
NvDCHS
Amph64541
NvCdhr5
Lgi_2192718
NvCdhr4
Hs
5_o
NvCdhr2
NvCdhr1
Hs CD hr23 AAG
Lgi_1634028
Lgi_ 883-4_ p
Lgi_ 4593
a
228
NvFAT
a
.1
AmphiC
am go F_L G0 GR _1
Hsa_C
Lgi_ 164592
CDH-
Dm Cte_ 2319 KX
oRR 164590
Hsa _CDH -9
H-24_1
CDH-11
6
C28X
T00 oRR phiCE914.1
942.1
Hsa _CDH 3
oRR 229554
458
ing _2_ AG 00 _2
DH
Hs a_CDDH-4_ _AAB
Hsa phiCd H_23_
Hsa 16195 R
Cte 696-92
Am phiCd 53
e_s
Lgi_1
Am a_CD hr17
Hs phiCd hr4
Lgi_vCELS ht
_CD
4 _ C 2 2 6 LS
19_2
A8
N y_nig 4
DH-1 0
Hs a_C H-4 2
Lgi_5L3D10X
A a_ D m le -4 D 2
Hsaa_N-C H-2
Hs a_E DH-1 _1
Hs _ C DH
H sa_ C D H H_CA
Am W64 930.11
c
_E F6 pgr_ .1
8E _C R
VII
_a
-6
_
Hs sa_C DH-1-15
H sa_C DH -3
26
_
H _ DH 3
a
in_
9
Hs
C te_ _s 6_ r1 _1 rote
Cte _1 257 AT
C me 58 dh -16 _p -2
a_
te
C Amp V0 _fat
Fa
oR
g
t4_ D 22 63
R5
8
o R _C1
a
36
Q6 m e 0 0 8
Ctegi_1 C32 T_Wr6
A te 21 ho CX 3
R2 3E Nv CD H3 1
L 8_ L2 dh 2
C m _2 74 tgu 3
_2 051 _W
Am te_ phiC240 92 n
2 6 C H
1
7.
L _ p ph 8 8 h r 4
27
L gi_ p_ iC 36 19
iC 41 0 27 e
1 d 4
k
37
N v v CD D H - N
oR
nin
N C H 4
R
te
Nv CD 665 L2
1 4 L g Nv F _ 2 2
yn
e _ 2 4 3
0_ i_ AT 27
_2 5E 05 e als
VI _c
dh 7
oR
5
C2 II
r1
R1 L
C
H 18 2 5_
te
g
H sa Am 03 i_ 6 0 9
s
8 2 0 1
Hs sa_ _PC ph _C 104 23 0 T -1
_
a_ PC D iFA 33 48 i_ 22 8_ 21 in 3
PC D H_ T- LE 7 Lg te_ 05 166 561 nten nin-
Am D H_ Fa lik 4 C RR 1 81 y te -1
p H_ Fa t_ e o gi_ _1 als yn N nin
L h 2 L te _c als ST nte
Am gi_ iCd Fat t_3 C sa _c L lsy
o C p 15 h _1 H sa h a iC
Dm RR te_ hiC 420r12 H mp _C TN 51
e_ 899 228 dh 8 e
A m LS 04 0 0
C C D _ C 4 4 r8 D vC 21 00 3
N te_ 21 665 2
oR Lg te_2 H_8 12X 3 C te_ 22 000 8
R i_ 1 7 C te_ 21 24
Am 681 229 555 A
3 C te_ 29 914
Am phiC-2_C 249 C gi_2 59 916 4
Am ph dh 1 3 L gi_1 59 hr1
Cte phiCiCdh r16 L i_1 iCd 36 1.1
Lg _19 dh r7 Lg mph 302 72 3140
Hs o L i_
RR gi_ 229 171 9 r6 A i_2 349 027
a_d
ach 8 8 15 24 Lg i_2 P0 28
C 8_ 45 7 Lg h_X 718 1
V so u
s_1 C te_1 C12D44
_E te_ 395 3
s_ i_1 787 0
g 3
L i_2 78 9 7
Hs AW6 1396 22 Lg i_23 786
s_p
a 8
Am _PCD 683.19
0 Lg i_23 1826
cdh Dm phiC H-1 Lg i_17 7868
11_ e_d dh 6 Lg i_23 6784
Xlin Lg ach r18 Lg i_23 623
ked
_XP Cte i_143sous Lg i_166 827
_00 _13 094
2 5 Lg i_238 473
7
Am 413 38 6 Lg i_171 680
Am phiPC 90.1 Lg i_237 88
H phiP DH Lg 1678 76
Hsa sa_PC CDH 2 Lgi_ 2376 89p
1
Hsa _P_PC DH-12 Lgi_ 1643
Hsa _PCDHDH-17 Lgi_ 175623
_P -1 Lgi_ 173039
Hsa_P Hsa_P DH-1 9_aC
Hsa_P CDH_a CDH-10_ 9_c Lgi_ 173823 p
lp Lgi_ 237867
Hsa_PCDH_alp ha-C2_11 Lgi_ 165875
Hsa_ CDH_alpha-C1_1 Lgi_ 171822
PCDH ha-5 Lgi_ 171821
Hsa_
Hsa_PC PCDH _alpha-6 _1 Lgi_ 32792
_alph _1
Hsa_PC DH_alpha-a-8_1 Lgi_2 iCdhr20
Hsa_PC DH_alpha-111_1 Amph 1617
DH 0_ Cte_20 HR_3
Hsa_PC _alpha-12_ 1 Hsa_CD564
DH lpha-1_11
Hsa_PCD _a Lgi_156 560
Hsa_PC H_alpha-2_1 Lgi_156 C3
DH oRR180__C2DF
Hsa_PCDH_alpha-13_1
Hsa_PCDH__alpha-3_1 oRR277 T060_C2DX
Hsa_PCDH_ alpha-4_1 opRR526-8_K
alph oRR836_C5
Hsa_PCDH_al a-7_1 Lgi_232259
pha-9
Hsa_PCDH_alpha _2 Lgi_160343
-9_1
Hsa_PCDH_beta-13 Cte_199156
Hsa_PCDH_beta-8 oRR625_C15
Hsa_PCDH_beta-15 Lgi_161952
Hsa_PCDH_beta-6 Hsa_Ret_a
Hsa_PCDH_beta-5 Dme_CDH_96Ca
oE098_C4MR_IZV
IV Hsa_PCDH_beta-4
Hsa_PCDH_beta-3
oRR092_C2_IZ
oRR444_C2Y_IZ
Hsa_PCDH_beta-2 opRR671_C3A_IZ
Hsa_PCDH_beta-7 oC831_C2X_IZ
Hsa_PCDH_beta-10 oC826_C2Y_IZ
Hsa_PCDH_beta-9 oRR894_C2DY_
Hsa_PCDH_be -11
ta-14 oT039_C3X2_I IZ
Hsa_PCDH_ beta-12
beta oC832_C3_ Z
oE097_oT8 IZ
Hsa_PCDH_ a-C5_1 oE093_ 98_C3_IZ
_gamm
Hsa_PCDH _gamma-C4_1 opE100 C6TDS_IZUOJV
DH
Hsa_PC H_gamma-C3 _1
_1 opC829 _C3DR_UOWV
Hsa_PCDDH_gamma-B21_1 opC825 _C3DR_ZUOW
-B opRR57_C3DR_UO V
Hsa_PC DH_gamma -B7_1 opC8 8_C3DR WV
Hsa_PC DH_gamma -B6_1 oE09 28_C3DRS _UOQV
Hsa_PC DH_gammaa-B5_1 oRR56_C6A_ZU _UOWV
m
Hsa_PC DH_gam ma-B4_1 oT09 76_C2_Z OWV
6_
Hsa_
PC _gam a-A11_11 oRR C3 _Z
PCDH m oG80026_C3A
Hsa_ CDH_gam ma-A12__1
am 10 oG81 8_C3S _Z
Hsa_P CDH_g amma-A A9_1 oT18 0-813_ X_UOW
Hsa_P CDH_g amma- A8_2 oJ05 6_C3L C6TDN JQ
_g a-
Hsa_P_PCDH _gamm a-A8_1 oG 4_C6A _Z SF_
ZUO
H sa D H m 5_1 oJ0 812_C6A TNF_Z WJQ
_PC _gam a-A 1 oG 52_C TNS UOW V
Hsa CDH _gamm a-A2_ 1
_P
Hsa _PCDH _gamm a-A1 _1
_ oJ0 809_C 6TDNF F_ZU JQV
m oG 53_C 6TDN _ZUO OWJQ
Hsa _PCDH _gam ma-A76_1 opR815_C 3_Z F_ZU WJQV V
sa D H a m -A O
H _PC H_g ma 4 _ 1 oB R54 TD 6 WJQ
Hsa _PCD H_gammma-A5723 oJ0333_C4_C5 NF_Z V
Hsa _PCD H_ga te_22224179 oF 51_ 6TN TDN UOW
a
H s _PCD C _2 25 oB 973_ C3D F_IZ FX_U JV
Cte _1827926 oB 323 C3H A_Z UOW OWJQ
Hsa Cte _22 2588 oB 324 _C4S TSX JQV V
Cte _22 60757 oB 326 _C6 _ZV 2_UO
Cte _22 878 0 oB 322__C5_D2X_ WV
3
oT 27 C4 IZ IZQV
Cte _21 239 6 oB 840 _C4 SX_
Cte Cte87605 2 oR 31 _C S_Z ZV
7
_ 1 21 0 oB R077_C64X_IZ V
Cte Cte7188063 oB 31 5_ DS
_2 33 3 6 9 o 31 8_C C6 X_
Cte Cte 289 37 o B31 6_C 6D _IZU IZU
_2 196 34 o B30 0_C 6S 2_Z OW WJ
Ctete_2 560 665 o B31 9_ 4_ D_ UW J QV
C te_1 92 133 5 o B31 1_ C4_ Z IZUO JQV
C Cte 195 68 1 oRL48 2_CC3_ Z JV
_ 32 73 0 o R 2_ 4 Z
Cte Cte 62 86 7 o B3 8 8 C 4 _ Z
o K7 32 3_ _Z
Ctete96 454 08 o B3 49 _C C3
C 14 17 03 oK K75 29_ _C6 6TD _Z
_ i7 7 a o 7 3 C M S
Cte Lg gi71 ed_ _a o K7 47 _C 6S S_ F_
L nk e d 2 p o B3 46 _C 6T N ZW ZU
-li nk 09 65 o RR 28 _C 6 S _IZ J O
_X -li 3 19 27 o F9 8 _ 6 SF NF U V WJQ
11 _Y 12 i5 0 51 oR RR 85 26_ C6D TPS _ZU _ZUOW V
H -11 gi_ Lg gi82 94
- R 71 _C C6 TS FD OW O JQV
R 4 90 8 1 3 2 4 0 3 4
oF T6 82 _C C6T DS_ UO F_ OW
12 2_ 6T S F _ W
CD H L L 20
L 0_ C 7 1 5 7
6_ C M P_ _U IZ JQ QV
o F9 75 _ 6 IZ SN ZU
5 _ 90 2
97 1_ C S SE IZ J U JQ
oR RR gi_ Lgi6 39
_P PCD i_
o F9 19 _C 6_ 6M F_
Lg C 5_ F_ ZU O WJ V
7_ C 6T BN _Z UW Q OW V
1
Lg gi52C7X _U
a
7
o 8 6 C
i_
H s sa_ 6S U IZ O W Q
Lg Lg i71 06 2
C 6M SX X U JQ
Lg
F_ OW UO W JQ V
L i_ i5 69 2
6M S _ _I O
o T6 95 _C _IZ IZ
H
_
Lg gi_1 104 168 2
IZ V W JV V
o RR 17 C2 X_ X
SF FX ZU ZUOWJ
i_ 0 63 3
o L
W
o C8 6_ C2 C2 _IZ
Q
L 1 0 48 2
X2 _Z OW W QV
6
o I31 5_ 86_ 3X
JQ V
oR 74 Lgi_ gi51 511 90
_U UO JQ JQ
0
V
o R 1 5D T
oR oRR24 _C7 051 526
O WJ V V
op RR3 8_C C4D IZ ZUO UOW WJQ QV
oA R85 oL4 R82 R85 _C7_ZU 12
W Q
o I31 7_ 3_ F_ _IZ UO WJ V
V
JQ V
oA 30_ _C6 7_C _C6 _C6_ZV
V
o 81 4_C C6 HTS E_Z IZU
C6 BF IZU UW W
6 1
oC 85 63_ C6 TS X_ Q
S
TF X_ O V
W
oH R7 6_ 6H SF UW
4_ TP _ IZ U
o ZU OW Q
oR R2777_CC6HTB_IZ ZUOW
oA 6TSGSFX A85 OW JQ
W
oR 2 0_
1_ FX_ ZUO _ZU V
2
5 8 7 3
V
Z
oR R76 _C6 6KT UOW Q
C4 ZU W O
Q
R
843 84 U Q V
R
o A 8 4 ZU U
_C 4_C OW
oR 852 7_C6 U
JQ
o A84 8_ OW
1
V
oL 683 C6T F_Z JQV V
O
IZU
_
OW
oR R519_ _IZU OWJQ
H
_C5 V
oR 3_C6T6T_ZU JQV
W QV V
_ZU
OW
T
oRR 9_C4T_ZQV
JV
oRR0 7_C6TSF837_C4_ Z
JQ
oRR 117_C6T _IZOW
oA82 27_C6S _ZOWQ Z
oRR1749_C5TS_ ZUOWJQ
III
OWV
oD43 _ZUOW
oRR5 69_C6TS_UOWJQV
oA82 C6S_ZU WJQV
G
2_ _ C 6
V
oR
oM05 1_C6TKDF
8
oM06 C6HTS_IZU
oN659_ C6T_IZUOW JQV
oN658_ C5_ZU
JQV
_
oT078_61_C5_IZUOWJ
oA832_C OWJQV
opRR1 C6T_ZUJQV
6_ZUOW
oRR783_ X_ZOWJQV
C
QV
oA836_C6T TS_ZOWV
opA824_C5761_C3_OWJ
pA
84
opB320_C6 _IZOWV
oA820_C6TS UOWJQV
oA838_C6TS ZUOWV
X_
oA819_C6SX_I OWJQV
oA837_C6TSX_ OWJQV
oA811_C6TSF_ZUWWJQ
opT761_C2DTS_
JQV
oRR730_C6TX_ZUO WJQ
Q
oA 443
oM065_C6TFX_IZUO
JQV
C
oRR261_C6TX_IZUOWJQ
opT235-12_C5_U0JQ
oA835_C6_ZUO
oA821_C6TD_ZUOWJQV
oA822_C6T_ZUOWJQV oRR804_C6_IZUOWV
F_ZU
OW
opT235-14_C5_U0J
o
F_ZUOWJ
oA827_C6AS_ZUO
V
oD
SFX
85
oA816_C6HTBFDX_ZUOW
ZUO
C
C6T_IZU
4_IZ WJQ
o
_ZU
TF_
V
40_
WQ V
3_C
oA810_C6T
o
opT
C6
oD43
8_
V
oA85
WJQV
JQV
V
WJQV
Extended Data Figure 11 | Phylogenetic tree of cadherin genes. This is a protocadherin expansion (168 genes); IV, human protocadherin expansion (58
larger image of Fig. 2a. Phylogenetic tree of cadherin genes in Hsa (red), Dme genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical
(orange), Nematostella vectensis (mustard yellow), Amphimedon queenslandica cadherins. Asterisk denotes a novel cadherin with over 80 extracellular
(yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus kowalevskii cadherin domains found in Obi and Cte.
(purple). I, Type I classical cadherins; II, calsyntenins; III, octopus