Professional Documents
Culture Documents
1. Abstract ............................................................................................................................... 1
2. Introduction ........................................................................................................................ 1
2.1 Wnt family as cell surface ligands ................................................................................ 1
2.2 WNT3A protein ............................................................................................................ 3
3. Materials and methods ........................................................................................................ 5
3.1 Sequence selection: creating the database. Massive alignment .................................... 5
3.2 Multiple alignment: CLUSTAL-W .............................................................................. 5
3.3 Phylogenetic tree confection. WNT3A ortholog characterization ................................ 6
3.4. HMM profiles and Logo confection ............................................................................ 7
3.5. Paralog analysis ........................................................................................................... 8
3.6 Protein modelling. ......................................................................................................... 8
3.7 WNT3A characterization. 3D HMM printing. ............................................................. 9
3.8 WNT3A antigenicity................................................................................................... 10
3.9 Protein docking ........................................................................................................... 10
3.10 SNPs, directed mutations and copy number variations ............................................ 10
4. Results and Discussion ..................................................................................................... 11
4.1 Mass alignment ........................................................................................................... 11
4.2 Multiple alignment ...................................................................................................... 11
4.3 Phylogenetic analysis .................................................................................................. 12
4.4 HMM and logo profile ................................................................................................ 13
4.5 Paralog analysis .......................................................................................................... 16
4.6 Protein modelling ........................................................................................................ 17
4.7 WNT3A characterization results. ............................................................................... 19
4.8 Antigenicity analysis. .................................................................................................. 23
4.9 Protein docking analysis ............................................................................................. 24
4.10 Mutations and pathological implications. ................................................................. 25
5.Conclusions ....................................................................................................................... 26
6. References ........................................................................................................................ 28
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
1. Abstract
In this report we intended to perform an exhaustive analysis of the human WNT3A protein based
in the coherence between the predicted functional aspects and the data gathered from the current research
frame. For this aim we initially created a database containing 150 homolog sequences including a wide
range of different clades, and we used this information to build a phylogenetic tree showing the putative
evolution of the already mentioned protein. From these results we were able to define a group of ortholog
proteins, which was used to build a profile HMM, informing about the conserved positions in functional
terms. Additionally, similar approaches were used to search and analyze paralog sequences. Being the
WNT3A involved in a signal transducing pathway, we considered interesting to evaluate its 3D structure
and compare it with its potential interactions with its binding receptors. For that, we both dock the
modelled protein and analyzed some important features such as the hydrophobicity, electric potential, etc.
2. Introduction
2.1 Wnt family as cell surface ligands
The Wnt1 (wingless-type MMTV integration site
family) gene family consists of structurally related genes
which encode secreted signaling proteins. These proteins
have been implicated in oncogenesis and in several
developmental processes, including regulation of cell fate
and patterning during embryogenesis (NCBI, 2017).
Wnt proteins play an important role as Figure 1. Wnt-induced FzLRP5/6 complex formation promotes
the recruitment of the Axin-GSK3 phosphotilating complex via
activating ligands in the so called Wnt signaling Dvl. (Zeng, et al.., 2008)
pathways. These transduction pathways are made of proteins passing the signals into a cell through cell
surface receptor binding. Three Wnt signaling pathways have been characterized: the canonical Wnt
pathway, the noncanonical planar cell polarity pathway, and the noncanonical Wnt/calcium pathway
(Angers & Moon, 2009). They all are activated by binding a Wnt-protein ligand to
a Frizzled family receptor which passes the biological signal to a series of protein complexes leading
mainly to the -catenin-mediated new gene transcription or to structural cytoskeleton rearrangements.
These conserved also called Wnt/-catenin pathway regulates stem cell pluripotency and cell fate
1
All capital letters designation (WNT) reserved for the human isoforms. Protein name stands for their first time identification
in a wingless Drosophila mutant line.
1|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
decisions during development. The Wnt ligands are secreted glycoproteins that bind to Frizzled receptors,
leading to the formation of a larger cell surface complex with LRP5/6. Activation of the Wnt receptor
complex triggers displacement of the multifunctional kinase GSK-3 from a regulatory APC/Axin/GSK-
3-complex. In the absence of Wnt-signal (Off-state), -catenin, an integral E-cadherin cell-cell adhesion
adaptor protein and a transcriptional co-regulator, is targeted by coordinated phosphorylation by CK1 and
the APC/Axin/GSK-3-complex leading to its ubiquitination and proteasomal degradation. In the
presence of Wnt ligand (On-state), the co-receptor LRP5/6 is brought in complex with Wnt-bound
Frizzled. This leads to activation of Dishevelled (Dvl) by sequential phosphorylation, poly-ubiquitination,
and polymerization, which displaces GSK-3 from APC/Axin through an unclear mechanism that may
involve substrate trapping and/or endosome sequestration. Stablized -catenin is translocated to the
nucleus via Rac1 and other factors, where it binds to LEF/TCF transcription factors, displacing co-
repressors and recruiting additional co-activators to Wnt target genes (McDonald, Tamai, & He, 2009).
Additionally, -catenin cooperates with several other transcription factors to regulate specific targets.
Importantly, researchers have found -catenin point mutations in human tumors that prevent GSK-3
phosphorylation and thus lead to its aberrant accumulation. Other proteins involved in this pathway have
also been reported to present certain mutations in different tumor samples. Wnt signaling has also been
shown to promote nuclear accumulation of other transcriptional regulator implicated in cancer.
Furthermore, GSK-3 is involved in glycogen metabolism and other signaling pathways, which has made
its inhibition relevant to diabetes and neurodegenerative disorders (Clevers & Nusse, 2012).
Wnt signaling pathways are present either in nearby cell to cell paracrine communication or in
self-cell autocrine communication. They are highly evolutionarily conserved in animals (Nusse &
Varmus, 1992). Wnt signaling was first identified for its role in carcinogenesis, then for its function in
embryonic development. The embryonic processes it controls include body axis patterning, cell
proliferation and cell migration. These processes are necessary for proper formation of important tissues
including bone, heart and muscle. Wnt signaling also controls tissue regeneration in adult bone marrow,
skin and intestine. Later research found that the genes responsible for these abnormalities also influenced
breast cancer development in mice. This pathways clinical importance was demonstrated by mutations
that led to various diseases, including breast and prostate cancer, glioblastoma, type II diabetes and others
(Logan & Nusse, 2004; Komiya & Habas, 2008). The role of this proteins have been largely related to
differentiation. For instance, hematopoietic stem cells (HSCs), which have the ability to renew themselves
and to give rise to all lineages of the blood, showed that the WNT signaling pathway has an important role
2|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
in this process. The ectopic expression of axin or a Frizzled ligand-binding domains, inhibitors of the
WNT signaling pathway, led to inhibition of HSC growth in vitro and reduced reconstitution in
vivo. Although Wnt proteins are secreted from cells, secretion is usually inefficient and attempts to
characterize Wnt proteins had been hampered by their high degree of insolubility. A genomewide RNA
interference screening have been performed in Drosophila cells to find regulators of the Wnt pathway
(DasGupta, Kaykas, Moon, & Perrimon, 2005). 238 Potential regulators including known pathway
components with functions not previously linked to this pathway, and genes with no previously assigned
functions have been identified. Reciprocal Best-BLAST analyses revealed that 50% of the genes identified
in the screen had human orthologs, of which approximately 18% were associated with human disease.
We will focus in the particular member 3A of the Wnt family for its importance both in embryonic
development and cell proliferation. The human WNT3A gene encodes a protein identically named which
shows 96% amino acid identity to mouse Wnt3A protein, and 84% to human Wnt3 protein, another Wnt
gene product. This gene is clustered with Wnt14 gene, another family member, in chromosome 1q42
region with an interval of about 58 kb (NCBI, 2017). Using the ortholog browser of NCBI the gen was
found to have a number of 178 ortholog sequences from several species ranging from drosophila to the
chimpanzee, all of them with considerable homology, showing
how conserved this signaling protein is. Northern blot analysis
revealed WNT3A transcripts in different tissues, being placenta
and lung tissue the ones with higher expression levels (Figure 2).
It has been found that the Wnt3a, as many other proteins of the
WNT family is palmitoylated on a particularly conserved cysteine
(Cys77) and on Ser209, being the latter of the utmost importance
for its function, since enzymatic removal of this site resulted in a
complete loss of activity. However, the recent X-ray crystal
structure of Xenopus Wnt8 in complex with the extracellular
domain of Frizzled-8 revealed that Cys55 on XWnt8 (equivalent
Figure 2. Wnt3a transcription levels show its
to Cys77 on Wnt3a) is involved in a disulfide bond, raising the importance in development as well as in cell
proliferation. Available in the Human Portein
question whether Cys77 is really palmitoylated in vivo. Wnt3a Atlas WNT3A.
3|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
protein is post-translationally glycosylated at two sites, Asn87 and Asn298, and site-directed mutagenesis
experiments indicated that glycosylation seems to be important for Wnt3a secretion (Gao & Hannoush,
2013). The purified Wnt3a protein induced self-renewal of HSCs, suggesting its potential use in tissue
engineering. In 2008, Yamamoto et al. found that after stimulation by WNT3A, the protein receptor
complex formed by Frizzled-5 (Fz5) and the phosphorylated LRP6 are internalized, leading to the
stabilization of the transcription factor -catenine, that otherwise would be perpetually binding a
destruction complex, and therefore activating the expression of new genes (Reactome Database, 2014). In
this report, we will analyze the phylogeny of WNT3A as well as the functional relevance of some of its
residues that have shown importance for the signaling. For this objective, we will harness different
bioinformatic tools such as the modelling of the LRP6-WNT3A-FZD5 complex docking. We will
eventually evaluate the consistence of the obtained results with the reported mutational alterations and
their role in pathology.
Figure 3. Overview of Wnt3a function and role as well as its positive expression findings in mouse models. Data obtained from the Mouse Genome
Informatics, Wnt3a. Available in: http://www.informatics.jax.org/marker/MGI:98956
4|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
After that, a massive alignment against the online database was performed. To do that, we used
two different tools: Protein BLAST from NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins)
and pHMMER from EBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins ). Although both of the
algorithms compare the input sequence with their databases, there are some differences between them. On
one hand, BLAST is a heuristic tool, so there is no guarantee of finding the optimal alignment.
Additionally, the only criteria used by this tool are based on the similarity of the sequence and not in its
function. More recent developments in homology inference involve profile-based tools for detecting
remote homologies, using, for instance, Hidden Markov Models (HMMs). The basis of the pHMMER is
the construction of a HMM of the query sequence using a multiple sequence alignment, and the
comparison of the obtained profile with a database. The advantage of this tool is that the HMM profile
determines the residues that are more likely responsible for the function of the protein, and the homology
in these positions becomes more important than the obtained on the other ones.
Although both analysis were performed, just the results of the pHMMER were selected in order to
create a database of 150 homolog proteins being this algorithm more appropriate for our intended goals.
Additionally, EBI-pHMMER allows the user to browse through the taxonomy of the organisms
corresponding to the found homologs. In order to create a representative database the relation between the
total number of homologs found and the number of organisms belonging to each taxon was maintained,
extracting a proportional number of sequences from each of the taxons, hence all the phylogenetic groups
being in the 150 homolog database balanced and proportionally weighed.
5|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
you to realign a set of aligned sequences to refine the alignment (Baxevanis & Ouellette, 2001). The
aligned database was exported in FASTA format for further analysis.
In a phylogenetic tree, members of a group share a common evolutionary history and are more
related one to each other than to members of any other outgroup. According to this definition, those
proteins which are closely related to human WNT3A in a phylogenetic tree are likely to be orthologs and,
therefore, to have the same function.
Three different types of phylogenetic trees were built in order to find the ortholog sequences with
more accuracy. The three of them were obtained using UGENE tools.
In first place, we used a distance-based method, which builds a tree based in the amount of
dissimilarity between two aligned sequences. It needs to be said that a distance method would only
reconstruct the true tree if all the divergence events were accurately recorded in the sequence. This
condition hardly-ever fits the reality (Baxevanis & Ouellette, 2001), meaning that the results should not
be considered a rule of thumb, at least not before having compared it with other kind of more accurate
trees. In this case, the distance-based algorithm that we used was the so called Neighbor Joining.
Last, we used a Bayesian method to build a third tree. Bayesian inference seeks the probability of
a tree conditional on the data. Bayesian estimation of phylogeny is focused on a quantity called the
posterior probability distribution of trees. For a given tree, the posterior probability is the probability that
6|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
the tree is correct, and the goal is to identify the tree with the highest probability of describing reality. We
applied a Bayesian inference through the MrBayes algorithm (Pevsner, 2009).
In all the cases, the substitution model which describes the process (log of probability based) from
which a sequence of symbols changes into another set of trait, was the Jones-Taylor-Thornton matrix set,
which is based in large protein databases.
As it is expected, the probability of finding a substitution in the different positions of the sequences
is not the same. To estimate the substitution rate heterogenecity, we used gamma distribution models,
which assign a substitution probability to sites by assuming that, for a given sequence, the probabilities
vary according to a gamma distribution (Baxevanis & Ouellette, 2001). The provided shape parameter
was set to a value of 1.
To ensure the accuracy of the phylogenetic tree, the algorithm should include an iterative bootstrap
analysis, which describes the robustness of the tree topology. Once the tree is built, the program make an
artificial data set of the same size as the original data set by randomly picking columns from the multiple
sequence alignment. This is usually performed with replacement, meaning that any individual column
may appear multiple times. A tree is generated from the randomized data set. A large number of bootstrap
replicates are generated, and they are compared to the original inferred tree. The information got from the
bootstrapping is the frequency with which each group in the original tree is observed (Pevsner, 2009). All
the trees were built using 1000 bootstrap replicates.
Comparing the different results found with all the approaches, we selected a group of closely
related proteins that seemed to be orthologs of the human WNT3A. In order to verify the results obtained,
we consulted the Ensambl genome browser, were we could consult a more robust Gene tree for WNT3A
and the diverse organisms where it is present.
7|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
The visual results of the profile were obtained using the online tool Skylign, which creates a graphic
logo providing a compact representation of the conservation pattern of the set of sequences. This logo
renders the information contained in probability HMM profile by drawing a stack of letter-represented
aminoacid residues for each position, where the height of the stack corresponds to the conservation at that
position, and the height of each letter within a stack depends on the frequency of that particular aminoacid
at that position (Wheeler & Clements, 2014). Due to this fact, the logo gives information about which of
the residues are truly important for the function of WNT3A proteins, as they will be well conserved in all
the proteins. Since WNT3A is a signalling protein that binds to two receptors, the conserved aminoacids
should allegedly correspond to the union sites.
A similar HMM-profile was also built with the 150-sequence database in order to look for the
differences between the whole set of homolog sequences (which may contain other proteins from WNT
family) and the true paralogs of WNT3A.
8|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
we used the I-TASSER prediction online software that builds up a 3D model for any input protein FASTA
sequence and offers an evaluation of the reliability of the results among other parameters.
A protein can be described in terms of many different parameters, all of them important for an
overall comprehension of its functionality and structure. Once the WNT3A model was successfully built,
we set out to analyze the biophysical properties that emerge from the structure. We used the Consurf
Online Server (http://consurf.tau.ac.il) that informs about many interesting aspects and allows the user to
visually analyze the results using specialized software like Chimera 1.11.2. For any input PDB, the server
uses a homolog search algorithm and picks similar proteins with already known properties to come up
with a proper description of our favorite protein domains. We search for homologs using the jackHMMER
algorithm throughout the UniProt reference database with a number of 5 iterations and an E-value cutoff
of 0.00001. The maximal identity was set to 95 % in order to avoid redundant sequences, while the
minimal homology was set to 35 %. The used calculation method was based on Bayesian analysis and the
evolutionary substitution model was left in the default mode (best model), turning out to be the WAG
substitution model after the server criteria. This tool starts extracting the sequence from the PDB file,
finds the homologs, aligns the sequences, selects the best evolutionary model based in the feasibility of
the results, calculates the conservation scores based in the HMMER algorithm and eventually projects this
conservation scores onto the molecule, so they could be graphically observed on the 3D with no need of
creating a Skylign logo. Once the program finished, a PDB file and a colouring script were obtained. These
files could be opened in Chimera for further analysis.
9|Page
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
As we already mentioned, the WNT3A has been reported to act in a wide variety of cancers. This
is compatible with its high expression during the embryonic development, period in which active
proliferation and cell fate are processes of the utmost importance that need to be correctly regulated.
Because of that, it might be a potential pharmacological target in future approaches. The actual tendency
in the field, particularly concerning the protein-protein malignant interactions, is to block the signaling
pathway through the generation of antibodies addressed to the union site. In these terms, we considered
interesting to analyze putative B epitopes on the protein model surface. For this, the WNT3A antigenicity
was checked using the IEDB Analysis Resource (http://tools.iedb.org/ellipro) that searches for both lineal
and discontinuous epitopes. In contrast to other epitope predictors, this one does not use the FASTA
sequence of the protein but its PDB file, which gathers more information concerning the protein structure.
Once we satisfactory got the docking complex, we intended to go a step further and analyze the
consistence of our model with both the HMM WNT3A logo and 3D analysis, and the data obtained from
different references concerning the apparent relevance of the SNPs variations and mouse intentionally
perfumed mutational assays within the former protein. We checked if the residues that have been reported
to be modified and that led to different anomalies are present in the interaction region and play an
important role for the protein function, thus explaining the biological effects of their mutation. For this we
gathered information from various papers (already mentioned in the Introduction) and databases such as
10 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
the NCBI, Uniprot, or OMIM. We also gathered information concerning the described copy number
variations implied in pathology.
Because of what we intended to do, we built our particular database with the results given the
pHMMR. Using the option taxonomy, we were offered a diagram with all the taxonomic classifications in
which we could pick homologous sequences to human WNT3A throughout phylogeny. All the matching
sequences belonged to Metazoan organisms, being Chordata the most representative group (nearly 2000
sequences out of the 3399 results). Using this information, the 150-sequence database was built
maintaining the taxonomy proportions and trying to avoid redundant sequences, since they might produce
fails when performing the HMMER algorithms.
Figure 5. Screenshot of the UGENE interface showing part of the results of the multiple alignment. The figure below shows a profile of the
conserved positions amongst the different species. .
aminoacids than the human WNT3A. Be that as it may, these insertions might not interfere in the function
of the proteins, since they would have not been evolutionary selected. In other words, this modifications
might have emerged in portions of the sequence that are not important for the function, as all the resulting
proteins are still functional and in fact, have a common functional background with our WNT3A, since
the database that contained them was built after a pHMMER search.
12 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
instance, MrBayes tree was composed by one big node which barely contained a list of all the organisms
studied; just some small secondary branches turned up. However, the ML tree showed a more complex
pattern of branches, in which we could differentiate the different groups of organisms. This later tree did
in fact resemble the Ensambl reference model, supporting our results.
Although the human WNT3A was set in a group containing other WNT3 proteins, these were also
found in a closely related group, and the common root for both of them nearly contained all the Chordata
organisms used for the analysis. Although this group was the one primarily selected as the one containing
the ortholog sequences, we found, by consulting the Ensembl database, that the previous two ramifications
also contained ortholog sequences to human WNT3A. In fact, the first organism showing a WNT3A
protein belongs to the genus Ciona, which is found in the ML tree close to the aforementioned group.
Once the new branches were considered to have WNT3A proteins, we concluded that this molecule was
present in all the Chordata sample, and the sequences from the original database belonging to this group
were the ones used to perform the HMM profile (View Figures 11 & 12, pages 14 & 15, respectively).
A new subdatabase including only the Chordata organisms was created, aligned and used to build
the HMM profile. The output .hmm file obtained was uploaded to Skyling, showing the following logo.
Figure 7. Profile HMM logo from the ortholog group (Chordata) and a zoomed in picture of it. We can see several conserved amino acids
amongst the sequence, especially cysteine residues, for instance, described as important for the palmitoylation in several studies.
13 | P a g e
Figure 8. Phylogenetic tree for the WNT3A gene obtained from Ensembl database. The nodes in red represent a gene duplication, so the first individual showing WNT3A
protein would be Ciona sea squirts (highlighted in green)
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
14 | P a g e
Figure 9. Extract from maximum likelihood phylogenetic tree. Labeled with a red spot we can see the query protein, human WNT3A. In yellow, other homolog proteins which are
certainly WNT3A. In green, two species of Ciona, which were shown to be the first organisms presenting an ortholog for the human WNT3A.
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
15 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
As represented in the Figure 13 of page 13, there are some positions that are clearly well conserved
amongst all the Chordata species. Most of this positions are occupied by a cysteine residue, and these
positions are likely to have an importance in the function of the protein. This fact will try to be clarified
in posterior analysis such as the protein docking, thanks to which we will try to find out which residues
most likely act in the binding with the receptors.
The logo built with all the homolog proteins (the 150-sequence database) shows a greater number
of conserved sites. However, the numeric description appearing below the letters indicates that this
construction has an important amount of random noise, as none of the positions is completely blank. This
happens due to the fact that we are comparing many different sequences, and some of them may not have
the same function than human WNT3A and might be in fact quite distant to it.
There are important differences in the logos obtained with all the sequences and just with the
orthologs. Whereas in the ortholog logo we could see some positions with conserved residues of cysteine,
these residues are not found any more in the noisy figure. Thus, the information given by the latter is not
relevant and we cannot infer any interesting information from it.
Figure 10. Logo confectioned with the profile HMM from all the homolog sequences tested. There is an important amount of background
noise and the conserved amino acids differ significantly from the ones obtained with the ortholog analysis.
16 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
Figure 11. Whole paralog alignment and its homology profile. As we can see in the picture, there are less conserved positions than in the
previous analysis.
The conserved positions were analyzed using the profile HMM logo. Although we can see again
an important quantity of background noise in the results, we can infer that the positions which are
conserved in all the human WNT family are the ones that characterize this kind of proteins, and the ones
that are different in them determine the concrete function of each molecule within the whole WNT family.
Figure 12. General logo from the profile HMM of the paralog sequences. The non-conserved positions are likely to be important for the
specific function of each protein from the family.
The I-TASSER bioinformatic tool is claimed to be one of the best in the protein structure prediction
field and provides a complete model using the whole sequence based in many different feasibility scales.
In the WNT3A prediction, the algorithms used as a template the Wnt8x-Fizzled-8 crystal complex
17 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
(Accession 4f0a), showing 81 % of homology with the predicted protein and in fact offering a quite
acceptable result. In fact, being the Frizzle-5 and WNT3A docking one of our objectives, this template
seems to be a perfect choice. Analogous procedures were carried out in the case of the Fizzled-5
modelling. For this protein, the most likely model was the one created after the human smoothened 7TM
receptor in complex with an antitumor agent (Accession: 4jkv), a kind of protein closely related with the
Frizzled receptors according to Ensambl phylogeny. The LP6 structure did not need to be modelled since
the PDB of the crystal was already characterized in the protein RCSB database. UCSF Chimera was used
to visualize the results shown in the following pictures.
Figure 17. Top: General overview of the SwissProt Modelling Server results. The non-coloured sequence regions are the ones that were not
modelled in the biased structured. The pictures below show the feasibility of the predicted structure obtained using the Wnt8x structure
(left) and the goodness of the modelling throughout the sequence.
18 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
Figure 18. I-TASSER WNT3A model. Figure 19. Frizzle-5 receptor after I-TASSER modelling.
Figure 20. LP6 crystal structure. The model includes post-translational modifications.
The Consurf Online Server offered a 3D HMM profile representation that could be opened using
Chimera. We strictly followed the instructions given by the website to produce a high resolution chimera
simulation in which the residues of the WNT3A functionally conserved were depicted in purple tones,
while the more variable regions were presented in green.
19 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
After a first view evaluation of the HMM ortholog dataset logo, it was clear that most of the most
functionally important residues were cysteine residues. The problem emerging from the cysteine analysis
in this predictions is the lack of information concerning the possibility of their taking part in disulfide
bonds. But distance analysis within the 3D structure can inform about how likely a pair of cysteine residues
can be forming a bridge. As we mentioned in the Introduction, Cys77 (equivalent to Cys55 in Wnt8x)
was described to be palmitoylated, but the recent crystallization of the Wnt8x showed that in fact that
particular residue was involved in a disulfide bond, rising questions about what really happens in WNT3A.
In our model, the conserved Cys77 is found in the hydrophobic inner part of the protein, being even
inaccessible to a visual recognition when the surface properties are shown in the Chimera software. In
addition to that, and because the also well-conserved Cys88 is present in its very close proximities, we
suggest that Cys77 might be forming the disulfide bond with a structural aim, instead of an interacting
one, and meaning that it might not be palmitoylated as it was described some years ago.
Cys77 Cys88
Figure 21. Top left: WNT3A logo showing the conserved Cys77 and Cys88. Top right: Potential microenvironment for the disulfide
bonding Bottom left: surface coloured model; the cysteines are red coloured, but even so, neither Cys77 nor Cys88 can be observed
because of their inaccessibility. Bottom right: Cys77 can be located within the ribbon structure.
20 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
It is true, however, that we in fact used Wnt8x as a template to create the WNT3A model, so maybe
these residues are not in such a conformation in the real protein. Contrary to this, the cysteines are in fact
quite conserved functionally throughout the family, which supports the idea of this residues being involved
in the disulfide bond.
Another aspects that catches the eye after observing the HMM logo is that some of these
conserved cysteines appear in a recurrent pattern consisting in a couple of them separated by 8
to 12 aminoacids. What is more, the order in which the frequency of aminoacids that follows the
big C is constantly the same CAVSL. This finding made us believe that there might be a possible
background inner pattern appearing time and again along the sequence. In order to check that,
we built a dot plot facing the WNT3A sequence against itself with a threshold of 10 conserved
aminoacids, finding that there are quite a few tandemly repeated sequences. This might be
representing evolution duplications of just a common micro domain motif.
The graphic representation that the Consurf Server offered (Figure 23) shows the presence of a
conserved pocket surrounded by to also conserved arms (A & B) that might be a perfect place for protein-
protein interaction. The great majority of remaining cysteines did appear in the surface, and in fact, Ser209,
which have been reported to be palmitoyled is located in the tip of ArmB, a perfect place for protein
recognition. The inner pocket contains a functionally conserved Phe224, probably involved in the
interaction. ArmA is quite probably involved in the protein function, since most of its sequence is purple
printed and contains up to the 60 % of the most important residues such as tryptophans or cysteines.
Arm A
Inner
pocket
Arm B
Figure 22. Dotplot of WNT3A against itself. Figure 23. 3D printed HMM. Purple residues represent functionally
conserved atoms while green residues imply non conserved ones.
21 | P a g e
22 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
Figure 24. Below, a close look at the inner pocket is shown. On the left, A arm was red coloured showing the Trp333, Trp336, CCys334 and Cys335, all conserved after
both the WNT3A logo and the 3D HMM printing (purple atoms). On the right, B arm, crowned by the relevantly palmitoylated Ser209. In the middle the Phe224, located
in the interior of the described pocket. The top left photo combines both HMM 3D printing and electrical potential model obtained from PDB2PQR Server, confirming a
very positive potential electrostatic area for the interaction pocket. This idea is supported by the top right photo, showing there a hydrophobic patch.
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
In addition to all that, residues Arg85, Trp86, Asn87, and Cys88 were shown in both the logo and
in the 3D conservational model as potential sites for protein-protein interaction. Supporting this idea it has
to be said that an important negatively charged area was found in the proximities of the residues just
behind the arm A. The back of Arm B also shows similar properties. This findings suggest a second
potential place for interaction.
Arm B Arm A
Arm A
Figure 26. Left: Side view of the docking point 2. RWNC sequence labelled in orange. Right: Protein view from below. Red and blue
electrostatic pockets indicate electrically charged areas.
The WNT3A was predicted to have a discontinuous B epitope with a predictability of the 93.5 %.
It turned out to correspond to the area of the A arm. The sequence included residues R322, R324, C327,
23 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
R328, C329, V330, F331, H332, W333, C334, C335, Y336, V337, S338, C339, Q340, E341, C342, and
T343. Again, B arm was found to have important properties as a lineal B epitope with a 90 % of
confidence, including the already mentioned sequence Cys201-Cys211.
Figure 27. Left: Discontinuous B epitope in A arm. Right: Lineal B epitope in B arm.
24 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
have been described in some papers. For all this, we believe that this model is likely to be similar to what
happens during real interaction.
Figure 28. Frizzled-5-WNT3A-LP6 complex obtained by ClusPro docking and visualized with Chimera. The Frizzled-5 is shown in blue-
like colours; the WNT3A, in the middle, shown in orange (B arm) and red (A arm); the LP6 transmembrane protein appears in green
colour on the right.
Different experiments have been performed in mice, showing how the Wnt3a is related to neuronal
progenitor differentiation or how they are involved in some cancers like synovial sarcoma. In addition to
those experimental clues, there has been reported up to 21 copy number variations, 19 of them likely
pathogenic, that include WNT3A within the repeated fragment, showing a potential malignant behavior.
The tool BioMuta (https://hive.biochemistry.gwu.edu/biomuta) informs about the described human SNPs
25 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
and their role as possible risk factor. For instance, 29 lung cancer patients showed SNPs in the WNT3A
gene, which might indicate that this mutation predetermines the individual to constrain this kind of cancer
being this compatible with the usually high expression levels of this protein in the pulmonary tract and
hence showing its relevance in this tissue. Of the collected 170 SNPs, around the 30% were classified as
potentially malignant. But when we intended to compare their position with the residues that we have
checked to be more conserved and thus important in functional terms, we did not find any. In other words,
the important residues does not allow the presence of SNPs, perhaps because given their relevance, any
mutation leads to lethal results. However, it its quite clear that the protein arms and some other mentioned
regions gathered most of the SNPs, but not in the important residues. For instance, potentially malignant
residues were found all around the Cys77 and Cys88, but not in those positions. A mild interaction leading
to a slight environmental modification may be the reasons of the pathological causes of these reported
cancers.
Figure 30. Count of cancers with reported WNT3A SNPs. Frequency of SNP per aminoacid.
5.Conclusions
WNT3A and its ortholog proteins were found to be widely distributed along the Chordata clade.
Both the HMM and the protein 3D model were compatible and showed coherent results concerning the
molecule function. We analyzed the possibility of the Cys77 being palmitoylated, concluding the
hypothesis which is more likely to occur is its participating in a disulfide bond with Cys88. We satisfactory
evaluate the interaction of the secreted glycoprotein with its receptors, concluding that both their arms are
26 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
responsible for the interaction with LP6, as well as certain residues of an electrically charged pocket. The
so called arm B and part of the back of the WNT3A is responsible for the Frizzled-5 interaction. In
addition, PL6 and Frizzled-5 may share an interaction space which might be important for the signal
transducing pathway.
27 | P a g e
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
6. References
Angers, S., & Moon, R. (2009). Proximal events in Wnt signal transduction. Nat Rev Mol Cell Biol, 10(7), 468-
77.
Baxevanis, A. D., & Ouellette, B. F. (2001). Bioinformatics: A practical Guide to the Analysis of Genes and
Proteins. New York: Wiley.
Clevers, H., & Nusse, R. (2012). Wnt/-catenin signaling and disease. Cell, 149(6), 1192-205.
DasGupta, R., Kaykas, A., Moon, R. T., & Perrimon, N. (2005). Functional genomic analysis of the Wnt-
Wingless signaling pathway. Science, 308, 826-33.
Gao, X., & Hannoush, R. N. (2013). Single-cell imaging of Wnt palmitoylation by the acyltransferase porcupine.
Nature Chemical Biology, 10.
Komiya, Y., & Habas, R. (2008). Wnt Signl transduction pathways. Organogenesis, 4(2), 68-75.
Lewis, C. (2017). Definition of Homolog, Ortholog and Paralog. Retrieved from
http://homepage.usask.ca/~ctl271/857/def_homolog.shtml
Logan, C. Y., & Nusse, R. (2004). The Wnt signaling pathway in development and disease. Annual Review of
Cell and Developmentalm Biology, 20, 781-810.
Maleki, M., Vasudev, G., & Rueda, L. (2013). The role of electrostatic energy in prediction of obligate protein-
protein interactions. Proteome Science, 11(Suppl 1).
McDonald, B., Tamai, K., & He, X. (2009). Wnt/-catenin signaling: components, mechanisms, and disease. Dev
Cell, 17(1), 9-26.
NCBI. (2017, May 25). WNT3A Wnt family member 3A [ Homo sapiens (human) ]. Retrieved from
https://www.ncbi.nlm.nih.gov/gene/89780
Nusse, R., & Varmus, H. E. (1992). Wnt genes. Cell, 69(7), 1073-87.
Pevsner, J. (2009). Bioinformatics and functional genomics. Baltimore: Wiley.
Reactome Database. (2014). WNT3A stimulates the caveolin-dependent internalization of FZD5:p-LRP6.
Retrieved from http://reactome.org/content/detail/R-HSA-5368596
Search, N. O. (2017, May 22). WNT3A Orthologs . Retrieved from
https://www.ncbi.nlm.nih.gov/gene/?Term=ortholog_gene_89780[group]
Wheeler, T. J., & Clements, J. F. (2014). Skylign: a tool for creating informative, interactive logos representing
sequence alignments and profile hidden Markov models. BMC Bioinformatics, 15(7).
Zeng, X., Huang, H., Tamai, K., Zhang, X., Harada, Y., & Yokota, C. (2008). Initiation of Wnt signaling: control
of Wnt coreceptor Lrp6 phosphorylation/activation via frizzled, dishevelled and axin functions.
Development, 367-75.
28 | P a g e