Bioinformatics Report

EBT-IV
UNIVERSITY OF OVIEDO 2016/2017
HUMAN PROTEIN WNT3A

PHYLOGENETIC AND FUNCTIONAL
BIOINFORMATIC ANALYSIS
Alfonso Pearroya Rodrguez & Mara Pedrosa Laza

0. INDEX
1. Abstract ............................................................................................................................... 1
2. Introduction ........................................................................................................................ 1
2.1 Wnt family as cell surface ligands ................................................................................ 1
2.2 WNT3A protein ............................................................................................................ 3
3. Materials and methods ........................................................................................................ 5
3.1 Sequence selection: creating the database. Massive alignment .................................... 5
3.2 Multiple alignment: CLUSTAL-W .............................................................................. 5
3.3 Phylogenetic tree confection. WNT3A ortholog characterization ................................ 6
3.4. HMM profiles and Logo confection ............................................................................ 7
3.5. Paralog analysis ........................................................................................................... 8
3.6 Protein modelling. ......................................................................................................... 8
3.7 WNT3A characterization. 3D HMM printing. ............................................................. 9
3.8 WNT3A antigenicity................................................................................................... 10
3.9 Protein docking ........................................................................................................... 10
3.10 SNPs, directed mutations and copy number variations ............................................ 10
4. Results and Discussion ..................................................................................................... 11
4.1 Mass alignment ........................................................................................................... 11
4.2 Multiple alignment ...................................................................................................... 11
4.3 Phylogenetic analysis .................................................................................................. 12
4.4 HMM and logo profile ................................................................................................ 13
4.5 Paralog analysis .......................................................................................................... 16
4.6 Protein modelling ........................................................................................................ 17
4.7 WNT3A characterization results. ............................................................................... 19
4.8 Antigenicity analysis. .................................................................................................. 23
4.9 Protein docking analysis ............................................................................................. 24
4.10 Mutations and pathological implications. ................................................................. 25
5.Conclusions ....................................................................................................................... 26
6. References ........................................................................................................................ 28
HUMAN PROTEIN WNT3A PHYLOGENETIC AND FUNCTIONAL BIOINFORMATIC ANALYSIS
1. Abstract
In this report we intended to perform an exhaustive analysis of the human WNT3A protein based
in the coherence between the predicted functional aspects and the data gathered from the current research
frame. For this aim we initially created a database containing 150 homolog sequences including a wide
range of different clades, and we used this information to build a phylogenetic tree showing the putative
evolution of the already mentioned protein. From these results we were able to define a group of ortholog
proteins, which was used to build a profile HMM, informing about the conserved positions in functional
terms. Additionally, similar approaches were used to search and analyze paralog sequences. Being the
WNT3A involved in a signal transducing pathway, we considered interesting to evaluate its 3D structure
and compare it with its potential interactions with its binding receptors. For that, we both dock the
modelled protein and analyzed some important features such as the hydrophobicity, electric potential, etc.
2. Introduction
2.1 Wnt family as cell surface ligands
The Wnt1 (wingless-type MMTV integration site
family) gene family consists of structurally related genes
which encode secreted signaling proteins. These proteins
have been implicated in oncogenesis and in several
developmental processes, including regulation of cell fate
and patterning during embryogenesis (NCBI, 2017).
Wnt proteins play an important role as Figure 1. Wnt-induced FzLRP5/6 complex formation promotes
the recruitment of the Axin-GSK3 phosphotilating complex via
activating ligands in the so called Wnt signaling Dvl. (Zeng, et al.., 2008)
pathways. These transduction pathways are made of proteins passing the signals into a cell through cell
surface receptor binding. Three Wnt signaling pathways have been characterized: the canonical Wnt
pathway, the noncanonical planar cell polarity pathway, and the noncanonical Wnt/calcium pathway
(Angers & Moon, 2009). They all are activated by binding a Wnt-protein ligand to
a Frizzled family receptor which passes the biological signal to a series of protein complexes leading
mainly to the -catenin-mediated new gene transcription or to structural cytoskeleton rearrangements.
These conserved also called Wnt/-catenin pathway regulates stem cell pluripotency and cell fate
1
All capital letters designation (WNT) reserved for the human isoforms. Protein name stands for their first time identification
in a wingless Drosophila mutant line.
1|Page
decisions during development. The Wnt ligands are secreted glycoproteins that bind to Frizzled receptors,
leading to the formation of a larger cell surface complex with LRP5/6. Activation of the Wnt receptor
complex triggers displacement of the multifunctional kinase GSK-3 from a regulatory APC/Axin/GSK-
3-complex. In the absence of Wnt-signal (Off-state), -catenin, an integral E-cadherin cell-cell adhesion
adaptor protein and a transcriptional co-regulator, is targeted by coordinated phosphorylation by CK1 and
the APC/Axin/GSK-3-complex leading to its ubiquitination and proteasomal degradation. In the
presence of Wnt ligand (On-state), the co-receptor LRP5/6 is brought in complex with Wnt-bound
Frizzled. This leads to activation of Dishevelled (Dvl) by sequential phosphorylation, poly-ubiquitination,
and polymerization, which displaces GSK-3 from APC/Axin through an unclear mechanism that may
involve substrate trapping and/or endosome sequestration. Stablized -catenin is translocated to the
nucleus via Rac1 and other factors, where it binds to LEF/TCF transcription factors, displacing co-
repressors and recruiting additional co-activators to Wnt target genes (McDonald, Tamai, & He, 2009).
Additionally, -catenin cooperates with several other transcription factors to regulate specific targets.
Importantly, researchers have found -catenin point mutations in human tumors that prevent GSK-3
phosphorylation and thus lead to its aberrant accumulation. Other proteins involved in this pathway have
also been reported to present certain mutations in different tumor samples. Wnt signaling has also been
shown to promote nuclear accumulation of other transcriptional regulator implicated in cancer.
Furthermore, GSK-3 is involved in glycogen metabolism and other signaling pathways, which has made
its inhibition relevant to diabetes and neurodegenerative disorders (Clevers & Nusse, 2012).
Wnt signaling pathways are present either in nearby cell to cell paracrine communication or in
self-cell autocrine communication. They are highly evolutionarily conserved in animals (Nusse &
Varmus, 1992). Wnt signaling was first identified for its role in carcinogenesis, then for its function in
embryonic development. The embryonic processes it controls include body axis patterning, cell
proliferation and cell migration. These processes are necessary for proper formation of important tissues
including bone, heart and muscle. Wnt signaling also controls tissue regeneration in adult bone marrow,
skin and intestine. Later research found that the genes responsible for these abnormalities also influenced
breast cancer development in mice. This pathways clinical importance was demonstrated by mutations
that led to various diseases, including breast and prostate cancer, glioblastoma, type II diabetes and others
(Logan & Nusse, 2004; Komiya & Habas, 2008). The role of this proteins have been largely related to
differentiation. For instance, hematopoietic stem cells (HSCs), which have the ability to renew themselves
and to give rise to all lineages of the blood, showed that the WNT signaling pathway has an important role
2|Page
in this process. The ectopic expression of axin or a Frizzled ligand-binding domains, inhibitors of the
WNT signaling pathway, led to inhibition of HSC growth in vitro and reduced reconstitution in
vivo. Although Wnt proteins are secreted from cells, secretion is usually inefficient and attempts to
characterize Wnt proteins had been hampered by their high degree of insolubility. A genomewide RNA
interference screening have been performed in Drosophila cells to find regulators of the Wnt pathway
(DasGupta, Kaykas, Moon, & Perrimon, 2005). 238 Potential regulators including known pathway
components with functions not previously linked to this pathway, and genes with no previously assigned
functions have been identified. Reciprocal Best-BLAST analyses revealed that 50% of the genes identified
in the screen had human orthologs, of which approximately 18% were associated with human disease.
2.2 WNT3A protein
We will focus in the particular member 3A of the Wnt family for its importance both in embryonic
development and cell proliferation. The human WNT3A gene encodes a protein identically named which
shows 96% amino acid identity to mouse Wnt3A protein, and 84% to human Wnt3 protein, another Wnt
gene product. This gene is clustered with Wnt14 gene, another family member, in chromosome 1q42
region with an interval of about 58 kb (NCBI, 2017). Using the ortholog browser of NCBI the gen was
found to have a number of 178 ortholog sequences from several species ranging from drosophila to the
chimpanzee, all of them with considerable homology, showing
how conserved this signaling protein is. Northern blot analysis
revealed WNT3A transcripts in different tissues, being placenta
and lung tissue the ones with higher expression levels (Figure 2).
It has been found that the Wnt3a, as many other proteins of the
WNT family is palmitoylated on a particularly conserved cysteine
(Cys77) and on Ser209, being the latter of the utmost importance
for its function, since enzymatic removal of this site resulted in a
complete loss of activity. However, the recent X-ray crystal
structure of Xenopus Wnt8 in complex with the extracellular
domain of Frizzled-8 revealed that Cys55 on XWnt8 (equivalent
Figure 2. Wnt3a transcription levels show its
to Cys77 on Wnt3a) is involved in a disulfide bond, raising the importance in development as well as in cell
proliferation. Available in the Human Portein
question whether Cys77 is really palmitoylated in vivo. Wnt3a Atlas WNT3A.
3|Page
protein is post-translationally glycosylated at two sites, Asn87 and Asn298, and site-directed mutagenesis
experiments indicated that glycosylation seems to be important for Wnt3a secretion (Gao & Hannoush,
2013). The purified Wnt3a protein induced self-renewal of HSCs, suggesting its potential use in tissue
engineering. In 2008, Yamamoto et al. found that after stimulation by WNT3A, the protein receptor
complex formed by Frizzled-5 (Fz5) and the phosphorylated LRP6 are internalized, leading to the
stabilization of the transcription factor -catenine, that otherwise would be perpetually binding a
destruction complex, and therefore activating the expression of new genes (Reactome Database, 2014). In
this report, we will analyze the phylogeny of WNT3A as well as the functional relevance of some of its
residues that have shown importance for the signaling. For this objective, we will harness different
bioinformatic tools such as the modelling of the LRP6-WNT3A-FZD5 complex docking. We will
eventually evaluate the consistence of the obtained results with the reported mutational alterations and
their role in pathology.
Figure 3. Overview of Wnt3a function and role as well as its positive expression findings in mouse models. Data obtained from the Mouse Genome
Informatics, Wnt3a. Available in: http://www.informatics.jax.org/marker/MGI:98956
4|Page
3. Materials and methods

3.1 Sequence selection: creating the database. Massive alignment
In first place, we obtained the amino acid sequence of the WNT3A protein, base of all the analysis.
This was obtained from the NCBI protein database (https://www.ncbi.nlm.nih.gov/protein ).
After that, a massive alignment against the online database was performed. To do that, we used
two different tools: Protein BLAST from NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins)
and pHMMER from EBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins ). Although both of the
algorithms compare the input sequence with their databases, there are some differences between them. On
one hand, BLAST is a heuristic tool, so there is no guarantee of finding the optimal alignment.
Additionally, the only criteria used by this tool are based on the similarity of the sequence and not in its
function. More recent developments in homology inference involve profile-based tools for detecting
remote homologies, using, for instance, Hidden Markov Models (HMMs). The basis of the pHMMER is
the construction of a HMM of the query sequence using a multiple sequence alignment, and the
comparison of the obtained profile with a database. The advantage of this tool is that the HMM profile
determines the residues that are more likely responsible for the function of the protein, and the homology
in these positions becomes more important than the obtained on the other ones.
Although both analysis were performed, just the results of the pHMMER were selected in order to
create a database of 150 homolog proteins being this algorithm more appropriate for our intended goals.
Additionally, EBI-pHMMER allows the user to browse through the taxonomy of the organisms
corresponding to the found homologs. In order to create a representative database the relation between the
total number of homologs found and the number of organisms belonging to each taxon was maintained,
extracting a proportional number of sequences from each of the taxons, hence all the phylogenetic groups
being in the 150 homolog database balanced and proportionally weighed.
3.2 Multiple alignment: CLUSTAL-W.

The 150-sequence database was opened in UGENE, a free and open-source software used for the
analysis of biological genetic data, such as raw sequences, multiple alignments or phylogenetic trees.
UGENE integrates dozens of well-known biological subsoftwares and algorithms, including CLUSTAL-
W, which was in this case used to perform a multiple alignment of all the sequence in the gathered in the
database. This tool uses a series of different pair-score matrices, biases the location of gaps and allows
5|Page
you to realign a set of aligned sequences to refine the alignment (Baxevanis & Ouellette, 2001). The
aligned database was exported in FASTA format for further analysis.
3.3 Phylogenetic tree confection. WNT3A ortholog characterization

With the aligned sequences, we were then able to build a phylogenetic tree in order to determine
the sequences that were actual human WNT3A orthologs. Orthologs are defined as genes in different
species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same
function in the course of evolution (Lewis, 2017). Other type of homolog sequences are paralogs, which
are genes related by duplication within a genome. Paralogs evolve into new functions, even if these tend
to be related to the original one.
In a phylogenetic tree, members of a group share a common evolutionary history and are more
related one to each other than to members of any other outgroup. According to this definition, those
proteins which are closely related to human WNT3A in a phylogenetic tree are likely to be orthologs and,
therefore, to have the same function.
Three different types of phylogenetic trees were built in order to find the ortholog sequences with
more accuracy. The three of them were obtained using UGENE tools.
In first place, we used a distance-based method, which builds a tree based in the amount of
dissimilarity between two aligned sequences. It needs to be said that a distance method would only
reconstruct the true tree if all the divergence events were accurately recorded in the sequence. This
condition hardly-ever fits the reality (Baxevanis & Ouellette, 2001), meaning that the results should not
be considered a rule of thumb, at least not before having compared it with other kind of more accurate
trees. In this case, the distance-based algorithm that we used was the so called Neighbor Joining.
Secondly, we applied a character-based method to build a new tree. In particular, we built a

Maximum Likelihood (ML) tree, which uses the evolutionary model that has the highest likelihood of
producing the observed data. This algorithm is, compared to the former one, quite accurate but implies a
bigger computational workflow.
Last, we used a Bayesian method to build a third tree. Bayesian inference seeks the probability of
a tree conditional on the data. Bayesian estimation of phylogeny is focused on a quantity called the
posterior probability distribution of trees. For a given tree, the posterior probability is the probability that
6|Page
the tree is correct, and the goal is to identify the tree with the highest probability of describing reality. We
applied a Bayesian inference through the MrBayes algorithm (Pevsner, 2009).
In all the cases, the substitution model which describes the process (log of probability based) from
which a sequence of symbols changes into another set of trait, was the Jones-Taylor-Thornton matrix set,
which is based in large protein databases.
As it is expected, the probability of finding a substitution in the different positions of the sequences
is not the same. To estimate the substitution rate heterogenecity, we used gamma distribution models,
which assign a substitution probability to sites by assuming that, for a given sequence, the probabilities
vary according to a gamma distribution (Baxevanis & Ouellette, 2001). The provided shape parameter
was set to a value of 1.
To ensure the accuracy of the phylogenetic tree, the algorithm should include an iterative bootstrap
analysis, which describes the robustness of the tree topology. Once the tree is built, the program make an
artificial data set of the same size as the original data set by randomly picking columns from the multiple
sequence alignment. This is usually performed with replacement, meaning that any individual column
may appear multiple times. A tree is generated from the randomized data set. A large number of bootstrap
replicates are generated, and they are compared to the original inferred tree. The information got from the
bootstrapping is the frequency with which each group in the original tree is observed (Pevsner, 2009). All
the trees were built using 1000 bootstrap replicates.
Comparing the different results found with all the approaches, we selected a group of closely
related proteins that seemed to be orthologs of the human WNT3A. In order to verify the results obtained,
we consulted the Ensambl genome browser, were we could consult a more robust Gene tree for WNT3A
and the diverse organisms where it is present.
3.4 HMM profiles and Logo confection

Once the phylogenetic analysis gave the relationship between the sequences of WNT3A and its
homologs, we isolated the potential orthologs and, from them, a new set was built. This new set, which
was previously aligned using CLUSTALW, was used to build a HMM profile, which identifies features
in sequences such as conserved residues that define a particular protein family (Pevsner, 2009). The HMM
profile was built using the tool HMMER3 included in UGENE, with the ortholog sequences previously
aligned.
7|Page
The visual results of the profile were obtained using the online tool Skylign, which creates a graphic
logo providing a compact representation of the conservation pattern of the set of sequences. This logo
renders the information contained in probability HMM profile by drawing a stack of letter-represented
aminoacid residues for each position, where the height of the stack corresponds to the conservation at that
position, and the height of each letter within a stack depends on the frequency of that particular aminoacid
at that position (Wheeler & Clements, 2014). Due to this fact, the logo gives information about which of
the residues are truly important for the function of WNT3A proteins, as they will be well conserved in all
the proteins. Since WNT3A is a signalling protein that binds to two receptors, the conserved aminoacids
should allegedly correspond to the union sites.
A similar HMM-profile was also built with the 150-sequence database in order to look for the
differences between the whole set of homolog sequences (which may contain other proteins from WNT
family) and the true paralogs of WNT3A.
3.5 Paralog analysis

The same tools previously explained were used to search for WNT3A paralog sequences in
humans. Using the pHMMER tool with WNT3A as query and restricting the results for Homo sapiens
proteins, a 50-sequence database of paralogs was built. This set of sequences was aligned using
CLUSTAL-W in UGENE and an HMM-profile was obtained with HMM3, following the already
described procedures. A new logo was built with Skylign, which was compared with the one obtained
from the orthologs analysis.
3.6 Protein modelling

As afore mentioned, in the current research frame it has been clearly demonstrated that WNT3A
plays an important role in a wide variety of diseases and morphological malformations. Being well known
the protein interactions within the pathway in which it is involved, and given the difficulty to isolate the
secreted protein and to characterize it through X-ray diffraction, we proposed modelling the 3 dimensional
structure of the protein based in already known fragments of proteins of the Wnt family. We particularly
focused on the interaction shown in Figure 1, in which it can be observed how after the current theories,
the Wnt3a ligand triggers a response right after binding to both LP5/6 already known its crystal structure-
and Frizzled-5, whose structure has not been yet described. In order to analyze how this interaction takes
place, we imported the LP6 crystal structure from the PDB RCSB database, and we also modelled the
Frizzle-5 protein using Frizzle and phylogenetically related fragments as reference templates. For this aim
8|Page
we used the I-TASSER prediction online software that builds up a 3D model for any input protein FASTA
sequence and offers an evaluation of the reliability of the results among other parameters.
3.7 WNT3A characterization. 3D HMM printing
A protein can be described in terms of many different parameters, all of them important for an
overall comprehension of its functionality and structure. Once the WNT3A model was successfully built,
we set out to analyze the biophysical properties that emerge from the structure. We used the Consurf
Online Server (http://consurf.tau.ac.il) that informs about many interesting aspects and allows the user to
visually analyze the results using specialized software like Chimera 1.11.2. For any input PDB, the server
uses a homolog search algorithm and picks similar proteins with already known properties to come up
with a proper description of our favorite protein domains. We search for homologs using the jackHMMER
algorithm throughout the UniProt reference database with a number of 5 iterations and an E-value cutoff
of 0.00001. The maximal identity was set to 95 % in order to avoid redundant sequences, while the
minimal homology was set to 35 %. The used calculation method was based on Bayesian analysis and the
evolutionary substitution model was left in the default mode (best model), turning out to be the WAG
substitution model after the server criteria. This tool starts extracting the sequence from the PDB file,
finds the homologs, aligns the sequences, selects the best evolutionary model based in the feasibility of
the results, calculates the conservation scores based in the HMMER algorithm and eventually projects this
conservation scores onto the molecule, so they could be graphically observed on the 3D with no need of
creating a Skylign logo. Once the program finished, a PDB file and a colouring script were obtained. These
files could be opened in Chimera for further analysis.
Electrostatic interactions are also of great importance in understanding inter-molecular

interactions, since they are long-range, and because of their influence in charged molecules (Maleki,
Vasudev, & Rueda, 2013). In these terms, we considered interesting to evaluate the electrostatic energy
profiles for predicting the protein-protein interactions for the case of the WNT3A and its receptors. For
this, we used the PDB2PQR Server 2.0.0 (http://nbcr-222.ucsd.edu/pdb2pqr_2.0.0). PQR files are PDB
files where the occupancy and B-factor columns have been replaced by per-atom charge and radius. These
files can also be visualized in Chimera. We used the Amber forcefield and set the prediction pH to 7.
Finally, a hydrophobic profile was also created with one of the Chimera extensions.
9|Page
3.8 WNT3A antigenicity
As we already mentioned, the WNT3A has been reported to act in a wide variety of cancers. This
is compatible with its high expression during the embryonic development, period in which active
proliferation and cell fate are processes of the utmost importance that need to be correctly regulated.
Because of that, it might be a potential pharmacological target in future approaches. The actual tendency
in the field, particularly concerning the protein-protein malignant interactions, is to block the signaling
pathway through the generation of antibodies addressed to the union site. In these terms, we considered
interesting to analyze putative B epitopes on the protein model surface. For this, the WNT3A antigenicity
was checked using the IEDB Analysis Resource (http://tools.iedb.org/ellipro) that searches for both lineal
and discontinuous epitopes. In contrast to other epitope predictors, this one does not use the FASTA
sequence of the protein but its PDB file, which gathers more information concerning the protein structure.
3.9 Protein docking

Once having strong protein models, we decided to evaluate the way they might be interacting
within the tridimensional space. This process is called protein docking. For this goal we used the ClusPro
online server, which offers up to 10 possible models ordered by their reliability. In addition to that, the
software provides different models that mainly focus in one or another parameter (electrostatic
interactions, hydrophobic profile), or that otherwise creates a balanced model taking all of them into
account in an equal proportion (balanced model). Since the complex we intend to elucidate was formed
by 3 different proteins (WNT3A, Frizzled-5, and LP6), we needed to firstly dock 2 of them, and later dock
the former complex with the third protein. The most-likelihood result was evaluated in terms of its surface
potential and conservation as it has been described in the WNT3A characterization section.
3.10 SNPs, directed mutations and copy number variations
Once we satisfactory got the docking complex, we intended to go a step further and analyze the
consistence of our model with both the HMM WNT3A logo and 3D analysis, and the data obtained from
different references concerning the apparent relevance of the SNPs variations and mouse intentionally
perfumed mutational assays within the former protein. We checked if the residues that have been reported
to be modified and that led to different anomalies are present in the interaction region and play an
important role for the protein function, thus explaining the biological effects of their mutation. For this we
gathered information from various papers (already mentioned in the Introduction) and databases such as
10 | P a g e
the NCBI, Uniprot, or OMIM. We also gathered information concerning the described copy number
variations implied in pathology.
4. Results and Discussion

4.1 Massive alignment
The sequence of the WNT3A protein was
found in the corresponding database and it was used
as input for pHMMER and BLASTp analysis. The
results obtained in both approaches were significantly
different: although in both of them the first positions
are held by primates, which are likely to have a
similar protein sharing both sequence homology and
function, the first non-primate result appear sooner in
the pHMMER results than in the ones form BLASTp;
to be more precise, in the tenth position for
pHMMER versus the seventeenth position for
BLASTp. This fact could be explained after the
Figure 4. Extract of the taxonomic distribution of the results for the
importance of the conserved aminoacids given for pHMMER mass alignment. The arrows contain the number of
sequences found within the clade.
pHMMER: although a sequence might not be very
similar in its residues, if the differences appear in parts of the sequence that are considered not to be
relevant for the protein function, the algorithm would punish the present substitutions in a milder way.
Because of what we intended to do, we built our particular database with the results given the
pHMMR. Using the option taxonomy, we were offered a diagram with all the taxonomic classifications in
which we could pick homologous sequences to human WNT3A throughout phylogeny. All the matching
sequences belonged to Metazoan organisms, being Chordata the most representative group (nearly 2000
sequences out of the 3399 results). Using this information, the 150-sequence database was built
maintaining the taxonomy proportions and trying to avoid redundant sequences, since they might produce
fails when performing the HMMER algorithms.
4.2 Multiple alignment

The results for the alignment showed several conserved regions within the compared sequences.
However, a considerably number of insertions emerged from the multiple alignment.
11 | P a g e
Figure 5. Screenshot of the UGENE interface showing part of the results of the multiple alignment. The figure below shows a profile of the
conserved positions amongst the different species. .
The consensus sequence obtained from

this analysis was compared to our query using a
simple dot-plot. This tool was set to consider
homology only after a minimum length
coincidence of 5 residues. The result shows that
there are important insertions and deletions which
may have been a result from the evolution of the
studied sequence. In fact, the consensus that
Figure 6. Dot plot comparing the human WNT3A sequence (vertical)
gathers all these insertions contains 3 times more with the consensus sequence extracted from the alignment (horizontal)
aminoacids than the human WNT3A. Be that as it may, these insertions might not interfere in the function
of the proteins, since they would have not been evolutionary selected. In other words, this modifications
might have emerged in portions of the sequence that are not important for the function, as all the resulting
proteins are still functional and in fact, have a common functional background with our WNT3A, since
the database that contained them was built after a pHMMER search.
4.3 Phylogenetic analysis

For the tree analysis, the homolog sequence found in Porifera was set as an outgroup given its low
homology and its phylogenetic distance. The best result was the one obtained from the ML approach. For
12 | P a g e
instance, MrBayes tree was composed by one big node which barely contained a list of all the organisms
studied; just some small secondary branches turned up. However, the ML tree showed a more complex
pattern of branches, in which we could differentiate the different groups of organisms. This later tree did
in fact resemble the Ensambl reference model, supporting our results.
Although the human WNT3A was set in a group containing other WNT3 proteins, these were also
found in a closely related group, and the common root for both of them nearly contained all the Chordata
organisms used for the analysis. Although this group was the one primarily selected as the one containing
the ortholog sequences, we found, by consulting the Ensembl database, that the previous two ramifications
also contained ortholog sequences to human WNT3A. In fact, the first organism showing a WNT3A
protein belongs to the genus Ciona, which is found in the ML tree close to the aforementioned group.
Once the new branches were considered to have WNT3A proteins, we concluded that this molecule was
present in all the Chordata sample, and the sequences from the original database belonging to this group
were the ones used to perform the HMM profile (View Figures 11 & 12, pages 14 & 15, respectively).
4.4 HMM and logo profile
A new subdatabase including only the Chordata organisms was created, aligned and used to build
the HMM profile. The output .hmm file obtained was uploaded to Skyling, showing the following logo.
Figure 7. Profile HMM logo from the ortholog group (Chordata) and a zoomed in picture of it. We can see several conserved amino acids
amongst the sequence, especially cysteine residues, for instance, described as important for the palmitoylation in several studies.
13 | P a g e
Figure 8. Phylogenetic tree for the WNT3A gene obtained from Ensembl database. The nodes in red represent a gene duplication, so the first individual showing WNT3A
protein would be Ciona sea squirts (highlighted in green)
14 | P a g e
Figure 9. Extract from maximum likelihood phylogenetic tree. Labeled with a red spot we can see the query protein, human WNT3A. In yellow, other homolog proteins which are
certainly WNT3A. In green, two species of Ciona, which were shown to be the first organisms presenting an ortholog for the human WNT3A.
15 | P a g e
As represented in the Figure 13 of page 13, there are some positions that are clearly well conserved
amongst all the Chordata species. Most of this positions are occupied by a cysteine residue, and these
positions are likely to have an importance in the function of the protein. This fact will try to be clarified
in posterior analysis such as the protein docking, thanks to which we will try to find out which residues
most likely act in the binding with the receptors.
The logo built with all the homolog proteins (the 150-sequence database) shows a greater number
of conserved sites. However, the numeric description appearing below the letters indicates that this
construction has an important amount of random noise, as none of the positions is completely blank. This
happens due to the fact that we are comparing many different sequences, and some of them may not have
the same function than human WNT3A and might be in fact quite distant to it.
There are important differences in the logos obtained with all the sequences and just with the
orthologs. Whereas in the ortholog logo we could see some positions with conserved residues of cysteine,
these residues are not found any more in the noisy figure. Thus, the information given by the latter is not
relevant and we cannot infer any interesting information from it.
Figure 10. Logo confectioned with the profile HMM from all the homolog sequences tested. There is an important amount of background
noise and the conserved amino acids differ significantly from the ones obtained with the ortholog analysis.
4.5 Paralog analysis

Fifty paralog sequences were found in the pHMMER mass alignment with the human protein
database. These sequences were aligned finding the following results: although no so many big gaps were
found, the residues were far more different and the regions where we can find high homology were few.
16 | P a g e
Figure 11. Whole paralog alignment and its homology profile. As we can see in the picture, there are less conserved positions than in the
previous analysis.
The conserved positions were analyzed using the profile HMM logo. Although we can see again
an important quantity of background noise in the results, we can infer that the positions which are
conserved in all the human WNT family are the ones that characterize this kind of proteins, and the ones
that are different in them determine the concrete function of each molecule within the whole WNT family.
Figure 12. General logo from the profile HMM of the paralog sequences. The non-conserved positions are likely to be important for the
specific function of each protein from the family.
4.6 Protein modelling

Having the protein FASTA sequences that we previously obtained from the NCBI, we modelled
the molecule 3D structure using the Expasy SwissModel online software. This tool provides a potential
structure based on the lowest complex energy state, feasibility of the domains when compared with the
already known protein databank, and other parameters informing about the reliability of the prediction. It
also shows the variability that raises depending on the use of one protein or another as a template, allowing
the user to pick the one that makes more sense in each particular case. However, in order to be accurate
with the results, this software cuts off the unclear domains, leaving blank gaps in the sections that could
not be predicted with a high reliability. That said, it is understandable that for the docking assay that we
intended to perform, we must have the whole protein structure and no gap should be allowed. In these
terms, we resorted to use the predictions obtained using the I-TASSER online modelling software, provided
by the University of Michigan. Be that as it may, the SwissProt modelling gives some interesting results
that could be considered. For instance, it shows the regional quality of the estimation using the 4f01
template (Wnt8x). It also informs about the feasibility of the model giving a Z-score that is in fact depicted
quite close to the already known cluster of protein crystals from SwissProt database.
The I-TASSER bioinformatic tool is claimed to be one of the best in the protein structure prediction
field and provides a complete model using the whole sequence based in many different feasibility scales.
In the WNT3A prediction, the algorithms used as a template the Wnt8x-Fizzled-8 crystal complex
17 | P a g e
(Accession 4f0a), showing 81 % of homology with the predicted protein and in fact offering a quite
acceptable result. In fact, being the Frizzle-5 and WNT3A docking one of our objectives, this template
seems to be a perfect choice. Analogous procedures were carried out in the case of the Fizzled-5
modelling. For this protein, the most likely model was the one created after the human smoothened 7TM
receptor in complex with an antitumor agent (Accession: 4jkv), a kind of protein closely related with the
Frizzled receptors according to Ensambl phylogeny. The LP6 structure did not need to be modelled since
the PDB of the crystal was already characterized in the protein RCSB database. UCSF Chimera was used
to visualize the results shown in the following pictures.
Figure 17. Top: General overview of the SwissProt Modelling Server results. The non-coloured sequence regions are the ones that were not
modelled in the biased structured. The pictures below show the feasibility of the predicted structure obtained using the Wnt8x structure
(left) and the goodness of the modelling throughout the sequence.
18 | P a g e
Figure 18. I-TASSER WNT3A model. Figure 19. Frizzle-5 receptor after I-TASSER modelling.
Figure 20. LP6 crystal structure. The model includes post-translational modifications.
4.7 WNT3A characterization results
The Consurf Online Server offered a 3D HMM profile representation that could be opened using
Chimera. We strictly followed the instructions given by the website to produce a high resolution chimera
simulation in which the residues of the WNT3A functionally conserved were depicted in purple tones,
while the more variable regions were presented in green.
19 | P a g e
After a first view evaluation of the HMM ortholog dataset logo, it was clear that most of the most
functionally important residues were cysteine residues. The problem emerging from the cysteine analysis
in this predictions is the lack of information concerning the possibility of their taking part in disulfide
bonds. But distance analysis within the 3D structure can inform about how likely a pair of cysteine residues
can be forming a bridge. As we mentioned in the Introduction, Cys77 (equivalent to Cys55 in Wnt8x)
was described to be palmitoylated, but the recent crystallization of the Wnt8x showed that in fact that
particular residue was involved in a disulfide bond, rising questions about what really happens in WNT3A.
In our model, the conserved Cys77 is found in the hydrophobic inner part of the protein, being even
inaccessible to a visual recognition when the surface properties are shown in the Chimera software. In
addition to that, and because the also well-conserved Cys88 is present in its very close proximities, we
suggest that Cys77 might be forming the disulfide bond with a structural aim, instead of an interacting
one, and meaning that it might not be palmitoylated as it was described some years ago.
Cys77 Cys88
Figure 21. Top left: WNT3A logo showing the conserved Cys77 and Cys88. Top right: Potential microenvironment for the disulfide
bonding Bottom left: surface coloured model; the cysteines are red coloured, but even so, neither Cys77 nor Cys88 can be observed
because of their inaccessibility. Bottom right: Cys77 can be located within the ribbon structure.
20 | P a g e
It is true, however, that we in fact used Wnt8x as a template to create the WNT3A model, so maybe
these residues are not in such a conformation in the real protein. Contrary to this, the cysteines are in fact
quite conserved functionally throughout the family, which supports the idea of this residues being involved
in the disulfide bond.
Another aspects that catches the eye after observing the HMM logo is that some of these
conserved cysteines appear in a recurrent pattern consisting in a couple of them separated by 8
to 12 aminoacids. What is more, the order in which the frequency of aminoacids that follows the
big C is constantly the same CAVSL. This finding made us believe that there might be a possible
background inner pattern appearing time and again along the sequence. In order to check that,
we built a dot plot facing the WNT3A sequence against itself with a threshold of 10 conserved
aminoacids, finding that there are quite a few tandemly repeated sequences. This might be
representing evolution duplications of just a common micro domain motif.
The graphic representation that the Consurf Server offered (Figure 23) shows the presence of a
conserved pocket surrounded by to also conserved arms (A & B) that might be a perfect place for protein-
protein interaction. The great majority of remaining cysteines did appear in the surface, and in fact, Ser209,
which have been reported to be palmitoyled is located in the tip of ArmB, a perfect place for protein
recognition. The inner pocket contains a functionally conserved Phe224, probably involved in the
interaction. ArmA is quite probably involved in the protein function, since most of its sequence is purple
printed and contains up to the 60 % of the most important residues such as tryptophans or cysteines.
Arm A
Inner
pocket
Arm B
Figure 22. Dotplot of WNT3A against itself. Figure 23. 3D printed HMM. Purple residues represent functionally
conserved atoms while green residues imply non conserved ones.
21 | P a g e
22 | P a g e
Figure 24. Below, a close look at the inner pocket is shown. On the left, A arm was red coloured showing the Trp333, Trp336, CCys334 and Cys335, all conserved after
both the WNT3A logo and the 3D HMM printing (purple atoms). On the right, B arm, crowned by the relevantly palmitoylated Ser209. In the middle the Phe224, located
in the interior of the described pocket. The top left photo combines both HMM 3D printing and electrical potential model obtained from PDB2PQR Server, confirming a
very positive potential electrostatic area for the interaction pocket. This idea is supported by the top right photo, showing there a hydrophobic patch.
Arm B contains the Ser209, identified as a

palmitoylated residue that turned out to be a
quite important one for the protein function
given its conserved status. It also shows a
considerable number of purple residues
contains an also conserved sequence (Cys201
to Cys211) that might also be involved in
recognition. We expect to find one of the
proteins interacting with both arms and
probably with the inner pocket containing the
Figure 25. Exposed cysteine distribution in WNT3A (red labelled). Phe224.
In addition to all that, residues Arg85, Trp86, Asn87, and Cys88 were shown in both the logo and
in the 3D conservational model as potential sites for protein-protein interaction. Supporting this idea it has
to be said that an important negatively charged area was found in the proximities of the residues just
behind the arm A. The back of Arm B also shows similar properties. This findings suggest a second
potential place for interaction.
Arm B Arm A
Arm A
Figure 26. Left: Side view of the docking point 2. RWNC sequence labelled in orange. Right: Protein view from below. Red and blue
electrostatic pockets indicate electrically charged areas.
4.8 Antigenicity analysis
The WNT3A was predicted to have a discontinuous B epitope with a predictability of the 93.5 %.
It turned out to correspond to the area of the A arm. The sequence included residues R322, R324, C327,
23 | P a g e
R328, C329, V330, F331, H332, W333, C334, C335, Y336, V337, S338, C339, Q340, E341, C342, and
T343. Again, B arm was found to have important properties as a lineal B epitope with a 90 % of
confidence, including the already mentioned sequence Cys201-Cys211.
Figure 27. Left: Discontinuous B epitope in A arm. Right: Lineal B epitope in B arm.
4.9 Protein docking analysis

After the ClusPro docking, we resorted to pick the model that was consistently obtained using
both possible paths (WNT3A-Frizzled-5 docked with LP6, and WNT3A-LP6 docked with Frizzled-5) and
that showed the best score based in the hydrophobic interactions. We chose this approach for two main
reasons: the first was that, as afore mentioned in the introduction, the WNT3A forms the activating
complex over the cell membrane domain, with LP6 and Frizzled-5 clustered within the membrane bilayer;
and second, because this model provided a trimeric structure that is coherent with the fact that the 3
proteins allegedly end up interacting for the phosphorylation of LP6 as recent studies show, being the 3
proteins accessible for the other 2 at some point of their structure. What we found it partially what we
expected: both WNT3A arms surround LP6 protein, which is displayed in the proximities of the
electrostatic pocket. The back of the protein WNT3A did show purple balls in the 3D HMM model we
were indeed able to find some secondary docking place by analyzing certain electrostatic patches. This all
fits with the docking model, suggesting that Frizzle-5 interacts with this particular area of the protein.
What is more, we found that there is some interaction area between LP6 and Frizzled-5, which in fact
24 | P a g e
have been described in some papers. For all this, we believe that this model is likely to be similar to what
happens during real interaction.
Figure 28. Frizzled-5-WNT3A-LP6 complex obtained by ClusPro docking and visualized with Chimera. The Frizzled-5 is shown in blue-
like colours; the WNT3A, in the middle, shown in orange (B arm) and red (A arm); the LP6 transmembrane protein appears in green
colour on the right.
4.10 Mutations and pathological implications
Different experiments have been performed in mice, showing how the Wnt3a is related to neuronal
progenitor differentiation or how they are involved in some cancers like synovial sarcoma. In addition to
those experimental clues, there has been reported up to 21 copy number variations, 19 of them likely
pathogenic, that include WNT3A within the repeated fragment, showing a potential malignant behavior.
The tool BioMuta (https://hive.biochemistry.gwu.edu/biomuta) informs about the described human SNPs
Figure 29. Available in: https://www.ebi.ac.uk/gxa/genes/
25 | P a g e
and their role as possible risk factor. For instance, 29 lung cancer patients showed SNPs in the WNT3A
gene, which might indicate that this mutation predetermines the individual to constrain this kind of cancer
being this compatible with the usually high expression levels of this protein in the pulmonary tract and
hence showing its relevance in this tissue. Of the collected 170 SNPs, around the 30% were classified as
potentially malignant. But when we intended to compare their position with the residues that we have
checked to be more conserved and thus important in functional terms, we did not find any. In other words,
the important residues does not allow the presence of SNPs, perhaps because given their relevance, any
mutation leads to lethal results. However, it its quite clear that the protein arms and some other mentioned
regions gathered most of the SNPs, but not in the important residues. For instance, potentially malignant
residues were found all around the Cys77 and Cys88, but not in those positions. A mild interaction leading
to a slight environmental modification may be the reasons of the pathological causes of these reported
cancers.
Figure 30. Count of cancers with reported WNT3A SNPs. Frequency of SNP per aminoacid.
5.Conclusions
WNT3A and its ortholog proteins were found to be widely distributed along the Chordata clade.
Both the HMM and the protein 3D model were compatible and showed coherent results concerning the
molecule function. We analyzed the possibility of the Cys77 being palmitoylated, concluding the
hypothesis which is more likely to occur is its participating in a disulfide bond with Cys88. We satisfactory
evaluate the interaction of the secreted glycoprotein with its receptors, concluding that both their arms are
26 | P a g e
responsible for the interaction with LP6, as well as certain residues of an electrically charged pocket. The
so called arm B and part of the back of the WNT3A is responsible for the Frizzled-5 interaction. In
addition, PL6 and Frizzled-5 may share an interaction space which might be important for the signal
transducing pathway.
27 | P a g e
6. References
Angers, S., & Moon, R. (2009). Proximal events in Wnt signal transduction. Nat Rev Mol Cell Biol, 10(7), 468-
77.
Baxevanis, A. D., & Ouellette, B. F. (2001). Bioinformatics: A practical Guide to the Analysis of Genes and
Proteins. New York: Wiley.
Clevers, H., & Nusse, R. (2012). Wnt/-catenin signaling and disease. Cell, 149(6), 1192-205.
DasGupta, R., Kaykas, A., Moon, R. T., & Perrimon, N. (2005). Functional genomic analysis of the Wnt-
Wingless signaling pathway. Science, 308, 826-33.
Gao, X., & Hannoush, R. N. (2013). Single-cell imaging of Wnt palmitoylation by the acyltransferase porcupine.
Nature Chemical Biology, 10.
Komiya, Y., & Habas, R. (2008). Wnt Signl transduction pathways. Organogenesis, 4(2), 68-75.
Lewis, C. (2017). Definition of Homolog, Ortholog and Paralog. Retrieved from
http://homepage.usask.ca/~ctl271/857/def_homolog.shtml
Logan, C. Y., & Nusse, R. (2004). The Wnt signaling pathway in development and disease. Annual Review of
Cell and Developmentalm Biology, 20, 781-810.
Maleki, M., Vasudev, G., & Rueda, L. (2013). The role of electrostatic energy in prediction of obligate protein-
protein interactions. Proteome Science, 11(Suppl 1).
McDonald, B., Tamai, K., & He, X. (2009). Wnt/-catenin signaling: components, mechanisms, and disease. Dev
Cell, 17(1), 9-26.
NCBI. (2017, May 25). WNT3A Wnt family member 3A [ Homo sapiens (human) ]. Retrieved from
https://www.ncbi.nlm.nih.gov/gene/89780
Nusse, R., & Varmus, H. E. (1992). Wnt genes. Cell, 69(7), 1073-87.
Pevsner, J. (2009). Bioinformatics and functional genomics. Baltimore: Wiley.
Reactome Database. (2014). WNT3A stimulates the caveolin-dependent internalization of FZD5:p-LRP6.
Retrieved from http://reactome.org/content/detail/R-HSA-5368596
Search, N. O. (2017, May 22). WNT3A Orthologs . Retrieved from
https://www.ncbi.nlm.nih.gov/gene/?Term=ortholog_gene_89780[group]
Wheeler, T. J., & Clements, J. F. (2014). Skylign: a tool for creating informative, interactive logos representing
sequence alignments and profile hidden Markov models. BMC Bioinformatics, 15(7).
Zeng, X., Huang, H., Tamai, K., Zhang, X., Harada, Y., & Yokota, C. (2008). Initiation of Wnt signaling: control
of Wnt coreceptor Lrp6 phosphorylation/activation via frizzled, dishevelled and axin functions.
Development, 367-75.
28 | P a g e

Bioinformatics Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bioinformatics Report

Uploaded by

Copyright:

Available Formats

EBT-IV

UNIVERSITY OF OVIEDO 2016/2017

HUMAN PROTEIN WNT3A

Alfonso Pearroya Rodrguez & Mara Pedrosa Laza

2.2 WNT3A protein

3. Materials and methods

3.2 Multiple alignment: CLUSTAL-W.

3.3 Phylogenetic tree confection. WNT3A ortholog characterization

Secondly, we applied a character-based method to build a new tree. In particular, we built a

3.4 HMM profiles and Logo confection

3.5 Paralog analysis

3.6 Protein modelling

3.7 WNT3A characterization. 3D HMM printing

Electrostatic interactions are also of great importance in understanding inter-molecular

3.8 WNT3A antigenicity

3.9 Protein docking

3.10 SNPs, directed mutations and copy number variations

4. Results and Discussion

4.2 Multiple alignment

The consensus sequence obtained from

4.3 Phylogenetic analysis

4.4 HMM and logo profile

4.5 Paralog analysis

4.6 Protein modelling

4.7 WNT3A characterization results

Arm B contains the Ser209, identified as a

4.8 Antigenicity analysis

4.9 Protein docking analysis

4.10 Mutations and pathological implications

Figure 29. Available in: https://www.ebi.ac.uk/gxa/genes/

You might also like