You are on page 1of 4

UCL-CS Bioinformatics: PSIPRED Help http://bioinf.cs.ucl.ac.uk/index.php?

id=5399

UCL Department Of Computer Science

Site Navigation
PSIPRED HELP & TUTORIALS
Introduction
The PSIPRED Protein Structure Analysis Workbench aggregates several UCL structure prediction methods into one location, allowing users to run a number
People of analyses simultaneously. The following document gives a brief description of the services and how to use additionally summarising the results the each
Projects analysis produces.
Publications This guide is divided into three main sections. The first two sections explain the Input Form and the Results pages; the last section redirects to our
Web Servers Tutorials page, where a few cases are examined in more detail. You can view the input form at the main web page for the PSIPRED Server. You can also
Software & Downloads click here to view a fully interactive mock version of a typical results page.
Vacancies
CONTENTS
Contact
Group Intranet INPUT
Input Form
Choose Method
Sequence Input
Email Address
Password
Identifier
Filtering Options
DOMPRED Options
DISOPRED Options
BioSerf Options
RESULTS
Summary Page
Sequence Map
Sequence Resubmission
GenTHREADER Summary
BioSerf Output
DISOPRED Output
DOMPRED Output
FFPred Output
GenTHREADER Outputs
MEMSATSVM Output
MEMPACK Output
PSIPRED Output
Downloads
TUTORIALS

PSIPRED INPUT
The input form allows users to select the analyses they wish to perform and input their
query sequence. There are a number of mandatory fields.

Choose Method
You must choose at least 1 method to run. If no method is chosen PSIPRED secondary
structure prediction will run by default.

Input Sequence
Type your AMINO ACID sequence here. Please do not try to enter a nucleic acid
sequence. We recommend that you enter your sequence as a plain single-letter string
like this:

ALGSNLNTPVEQLHAALKAISQLSNTHLVTTSSFYKSKPLGPQDQPDYVNAVAKIETEL

Alternatively, you can enter your sequence in FASTA format, but the description text will
be ignored by the server.

Note that there is an upper limit to the length of sequences which can be submitted. For
mGenTHREADER that limit is 1000 residues. For the other methods, the limit is 1500
residues. If your sequence is longer than this, try breaking it into likely domains before submitting it. Our DomPred server can help you in doing this.

You can also input a Multiple Sequence Alignment (MSA) in FASTA format, please be aware that not every method will run with MSA input.

Submission Details
Email Address

Enter your e-mail address here. Results will be returned as soon as they are available - usually within 40 minutes, though sometimes longer depending on
the server load. Bear in mind that if you enter an incorrect e-mail address or do not provide and e-mail address. , there is no way the server can contact
you! Also watch out that your anti-spam software isn't rejecting the messages from our server. You are not required to enter your email address but we
recommend that users provide one.

Password

This field should be ignored if you are accessing the server from an academic site (i.e. a University). If you are a commercial user who has a current
license to use the PSIPRED server then you should enter your password here. Please contact us if your password does not work for some reason. Note
that if your e-mail address is commercial - e.g. ends .com or .co.uk - then you must enter your PSIPRED password in order to use the server. This applies
even if you are an academic user who is using a private e-mail account. PSIPRED passwords are only granted to licensed users or commercial
collaborators.

Short Identifier

Use this field to assign a short memorable name to your prediction job. This is useful so that you can identify particular jobs in your mailbox. This is
particularly important because PSIPRED will not necessarily return your results in the order you submitted them! Generally speaking, shorter jobs will be
returned first. The name you specify will be included in the subject line of the e-mail messages sent to you from the server. For example, here is a
possible message header for a job called "MySeq":

From: psipred@cs.ucl.ac.uk

Date: Fri, 14 Jan 2002 14:55:39 GM

To: Some.User@somesite.somewhere.edu

Subject: PSIPRED Sequence analysis results for job ID:dfec480c-01fc-11e4-883f-00163e110593/MySeq

Filtering Options

1 of 4 20150709 14:49
UCL-CS Bioinformatics: PSIPRED Help http://bioinf.cs.ucl.ac.uk/index.php?id=5399

Once you have filled in the main form you can switch tabs to select any filtering options.
To reduce the false positive rate of fold recognition methods, particularly when applied
to long sequences, it is important that biased regions of the target sequence are filtered
out before the prediction is carried out. The PSIPRED server uses the PFILT program to
perform the masking and has 3 filtering options, which will filter out low complexity
regions, likely transmembrane segments and coiled-coil regions. The default setting is
for just low-complexity regions of the sequence to be masked out. Regions which are
masked out will be replaced with 'X' (unknown) residues.

Obviously, if you filter out transmembrane helices and then try to use MEMSAT3 to predict the transmembrane topology, you will not get sensible results.
For GenTHREADER and mGenTHREADER we recommend turning on all filtering if you are expecting matches to globular proteins.

DOMPRED Options
If you have selected a DOMPRED job then the DOMPRED tab will appear in the input form.
DOMPRED runs 2 independent protein structural domain prediction algorithms, DOMPRED
and DomSSEA. This tab allows you to control options for both methods

PSI-BLAST sequence alignment domain prediction


The PSI-BLAST sequence alignment domain prediction searches the query sequence
against a large database of sequences (nrdb90), including sequences from Pfam-A.

Pfam-A search
Domain sequences from Pfam-A are searched against the query sequence, and if
significant sequence matches are found (as defined by the chosen E-value cut-off), this is
indicated on the DomPred results page. A separate table displaying such hits accessible from the results page.

Query vs sequence database


In cases where no clear homology exists to known domain sequences, such as Pfam-A domain sequences, a different strategy is required. Here, the query
sequence is searched against a non-redundant sequence database (nrdb90), utilising the given parameters specified in the input form to identify
significantly matching sequence homologues. These matching sequences are then used to identify possible domain boundaries within the query sequence
(and therefore predict if single or multi-domain).
The domain boundary prediction procedure utilises an algorithm to identify residue positions to which the N and C termini of matching database hits are
aligned to the query sequence. The positions of the N and C termini from all the PSI-BLAST database matches are simply summed along the query
sequence. Cases where both N and C termini hits are found in similar regions along the query sequence are given a higher weighting.
The summed profile is then smoothed using a window of 15 residues, and Z-score's calculated over this profile. Significant peaks (Zscore>1.5) over the
mean termini value of the query are assigned as putative domain boundaries. Termini hits to the first and last 50 residues of the alignment profile are not
considered as these regions often contain a large multiple of alignment termini that correspond to the true termini ends of the query sequence.
The alignment profile generated by the PSI-BLAST alignments (and drawn by gnuplot) is shown at the top of the results page. Putative domain boundaries
are indicated by peaks in the plot. Peaks considered to be significant by the algorithm are indicated.
In cases where significant peaks are found, and the query sequence is predicted to be multi-domain, multi-domain predictions given by DomSSEA are
given higher significance.

Input E-value cut-off (default 0.01)


Optimisation of the PSI-BLAST sequence alignment domain prediction showed an E-value cut-off of 0.01 to give the best trade-off between the sensitivity
and selectivity (define?) of domain boundary prediction. Decreasing the E-value (ie reducing the number of 'significant' aligned sequences) was found to
reduce sensitivity however increase the selectivity of domain boundary prediction.

Input number of PSI-BLAST iterations (default 5)


The default number of PSI-BLAST iterations used is 5. Decreasing the iteration number may increase the speed of the PSI-BLAST search, but my also
result in the failure to identify more distant homologues. The user should be aware that the higher the iteration value the higher the risk of introducing
profile wander into the PSI-BLAST sequence search.

DomSSEA Prediction
This is constitutively turned on

DOMPRED PSIPRED options


You can also select whether the DOMPRED analysis also performs a PSIPRED secondary structure prediction and displays those results.

DISOPRED Options
The DISOPRED options allow the user to control the underlying sensitivity by controlling the False Positive Rate and also whether a PSIPRED secondary
structure prediction should be included.

Additionally users can control if the analysis allows them to download the underlying
PSI-BLAST output.

BioSerf Options
BioSerf is a fully automated homology modelling pipeline which uses MODELLER to
construct a final homology model. Because of the licence terms if you select a BioSerf
job you are required to provide the MODELLER Key available from the Sali Lab.

RESULTS
The PSIPRED server produces a large number of differing results pages. Here we briefly describe these outputs. At any point you can follow this link try
the static example results to explore the functionality of the results pages.

Sequence Summary Page


The results summary page is the main output page for PSIPRED server sequence
results. This gives a brief summary of the results returned as annotated on the
sequence you have submitted to the server. At the top of the page the Job ID details are
listed including the short identifier you provide for the job and the unique private ID
assigned by our server. Below this the series of tabs allow you to view the specific
outputs for each analysis that was run. The Summary page is then divided in to 3
sections

Secondary Structure Map/TM Helix Map


The first region lays out the query sequence and annotates the residues as per the key.
If you have run a PSIPRED job residues will be annotated as per the predicted
secondary structure. If you have run a MEMSAT, MESATSVM or MEMPACK job residues
will be annotated as per the location of predicted TM Helices. If you have run both types
of analysis you can toggle between these annotations with the appropriate buttons. Also
note that if a DISOPRED or DOMPRED job has been run then predicted disordered
residues and any putative domain boundaries will be marked. Please note that all
domain boundaries will be annotated, this is not to imply they are all always
simultaneously applicable.

Sequence Resubmission

2 of 4 20150709 14:49
UCL-CS Bioinformatics: PSIPRED Help http://bioinf.cs.ucl.ac.uk/index.php?id=5399

This sequence of the summary page allows you to resubmit your sequence or a
subsequence of it for further analysis. First use the slider to select the sequence
region you wish to resubmit (or input the linear coordinates in the Start and Stop
boxes). Next Click the 'Select Methods' button. This will bring up a panel that
allows you to select new analysis methods for you sequence or sub-sequence. Finally click the new "Resubmit" button to submit a new job to the server.
One obvious use would be to resubmit domain subsequences after running a DOMPRED job.

GenTHREADER, pDomTHREADER or pGenTHREADER Summary


The final, lower section of the Summary Page presents a simple alignment cartoon of any GenTHREADER hits you have found if you also ran a
GenTHREADER, pDOMTHREADER or pGenTHREADER analysis. Each hit is laid out as per the region on your query sequence that it hit. With the left hand
side of the cartoon being the 1st residue and the right hand border being the final residue. Each row represents each structural hit calculated by one of
the GenTHREADER methods. The PDB chain ID or CATH domain ID appears at the left. Each bar is coloured as per the GenTHREADER confidence regions.
If you mouse over any of the hits a further summary of the alignment is given. On the right hand side of each row you can select to have a simple
homology model build for that structural alignment with your query sequence. This only work if you provide a valid MODELLER key.

BioSerf Output
If you provide a valid MODELLER key you will have been able to run a BioSerf job. BioSerf is a fully automated homology
modelling service which integrates PSI-BLAST, HHBlits, PSIPRED, GenTHREADER and MODELLER. The final output is a
PDB file which can be viewed by clicking the BioSerf tab on the results page. The file is viewed using the Jmol plugin and
requires that your web browser has java enabled and installed. All standard Jmol commands can be used to explore the
structure.

DISOPRED Output
If you asked for disordered region predictions, the DISOPRED tab will be available with the disorder profile plot. The graph
shows the DISOPRED3 disorder confidence levels against the sequence positions as a solid blue line. The grey dashed
horizontal line marks the threshold above which amino acids are regarded as disordered. For disordered residues, the
orange line shows the confidence of disordered residues being involved in protein-protein interactions. The Summary Tab
annotates this information on the query sequence.

DOMPRED Output
Clicking the DOMPRED tab brings up the DOMPRED output. This output is divided in to 2
sections. The DOMPRED output and the DOMSSEA output. The DOMPRED output shows
the graph output by the PSI-BLAST aligned termini algorithm. The graph annotates
secondary structure regions, peaks in the aligned termini profile indicate regions that
may form a Structural domain boundary. The putative domain boundaries are listed in
the summary statistics immediately below the graph.

Below the PSI-BLAST summary is the DomSSEA table. In this method SCOP structural
domains are matched to the query sequence. Where more than one domain matches
sequentially on the query sequence it can be possible to predict a possible domain
boundary.

All the possible domain boundaries are annotated on the query sequence available via the Summary Tab.

FFPred Output
The FFPred tab gives a summary of the FFPred output. FFPred attempts to predict GO terms for eukaryotic proteins using
a series of Support Vector Machines (SVMs). The top of the page gives three tables which summarise these predictions,
one table for each Gene Ontology domain (Biological Process, Molecular Function, Cellular Component). The tables
provide the scoring for each GO term, equal to the posterior probability for the query protein to be annotated with that
GO term. Also, note that predictions obtained using less reliable SVMs are shown at the bottom of each table over a red
background. SVMs are regarded as reliable when their MCC, sensitivity, specificity and precision are jointly above a given
threshold.

Below the tables are summaries of the features that were calculated for the incoming query sequence, and were used by
the SVMs to obtain the predictions.

GenTHREADER Outputs
The GenTHREADER, DomTHREADER and pGenTHREADER tabs all link to tables of the output statistics for each
GenTHREADER job. Each table show the number of structural hits for the query sequence. These are full PDB chains for
GenTHREADER and pGenTHREADER and CATH domains for pDomTHREADER. For each structure the first portion of the
table gives summary statistics

Conf. : The hit confidence category based on p-value; GUESS (<1), LOW (<=0.1), MEDIUM (<=0.01), HIGH
(<=0.001), CERT (<=0.0001)
Net Score: The GenTHREADER raw score
P-Value : The p-value
Pair E: The Pairwise Energy
Solv E: The solvation Energy
Aln Score: The Pairwise alignment score
Aln Len: The length of the alignment
Str Len: The length of the structural hit
Seq Len: The length of the query sequence
Domain Start: The start of the domain (pDomTHERADER only)
Domain End: The end of the domain (pDomTHERADER only)
Domain Code: The CATH code for the domain hit (pDomTHREADER only)
The latter portion of the table links out to other resources and has the following columns

View Alignment: A button that opens JalView to view an annotated alignment. Known ligand binding residues are annotated on the hit
SCOP Codes: A link that searches SCOP for the PDB chain (genTHREADER and pGenTHREADER only)
CATH Codes: A link that searches CATH for the PDB chain (genTHREADER and pGenTHREADER only)
Structure: A thumbnail image of the hit, clicking the link will take you to PDBSum
CATH Entry: A link that searches CATH web services to summarise the hit.

MEMSAT-SVM Output
In the MEMSATSVM tab there are several diagrams and reports which summarise the
MEMSAT-SVM output. Importantly MEMSAT-SVM jobs also run MEMSAT3 which allows
you to compare the prediction with both methods. The first diagram shows a cartoon of
the MEMSATSVM and MEMSAT3 TM helix predictions. MEMSATSVM predictions now
include a prediction of pore-lining helices. The key for the schematic can be found at
the bottom of the diagram. Below the schematic are the traces for the assorted SVM
outputs that the MEMSATSVM prediction was based on. Further down the page are a
series of cartoon diagrams of the membrane topology annotated with the predicted

3 of 4 20150709 14:49
UCL-CS Bioinformatics: PSIPRED Help http://bioinf.cs.ucl.ac.uk/index.php?id=5399

helix coordinates. Finally at the bottom of the page are the output reports from both the MEMSAT3 and MEMSATSVM
methods.

MEMPACK Output
If you select a MEMPACK job the MEMPACK tab will take you to the diagram of transmembrane helix packing which
mempack outputs. Running a MEMPACK job will also run a MEMSATSVM job. The MEMPACK output shows a top down
diagram of the possible packing of the predicted transmembrane helices. Possible residues contacts are predicted
between each helix then the helices are arranged and oriented to maximise the number of helix contacts that face one
another.

PSIPRED Output
The last analysis page gives the PSIPRED diagrammatic output. These diagrams annotate the query sequence with
secondary structure cartoons and confidence value at each position in the alignment. The confidence is given as a series
of blue bar graphs.

Downloads
The final tab offers any plain text and ancillary downloads for each of the methods you have chosen. These are broken
up in sections as per each analysis method.

TUTORIALS
Finally, you can find examples of use of the PSIPRED server at our Tutorials page.

4 of 4 20150709 14:49

You might also like