PENCE PENCE Proteome Analyst

Proteome Analyst Specialized Subcellular Localization Server
Output Description

Understanding Analysis Results

When your analysis is finished you will see a page like this:

Follow the link provided. It will lead you to the Card Set Index of your results.

When Proteome Analyst performs an analysis it creates a PA Card Set. Each of the cards within the set is a lot like a baseball card. It contains the "stats" on each protein you submitted, including BLAST results and a Subcellular Prediction.

This is an example Card Set Index:

Each of the headings of the table corresponds to a field by which the Card Set can be sorted. By default the Card Set is sorted by card number, which is the order of the proteins as they appeared in the FastA file you submitted.

You can click any of the headings of the table to sort your Card Set by that field.

For example, let's sort the sample Card Set by Subcellular Prediction class:

Now the Card Set is sorted alphabetically by the class name assigned by the Subcellular Predictor.

Let's look at Card #2 which is now the first card in the set:

Displayed are:
  • The Definition line (i.e. the line beginning with a ">") of the sequence as it appeared in the FastA file.
  • The first few letters of the protein sequence and a link to the actual FastA entry for this protein
  • The BLAST results for this protein, including a brief summary of the top 3 BLAST hits
  • The Prediction that the Subcellular Predictor made for this protein including
    • The name of the class that the Subcellular Predictor predicted for this protein.
    • The probability of this prediction.
    • A link to the Explain page for the prediction (This will be discussed later)

The arrows on the cards help you navigate through your Card Set. As you navigate through the set the cards will be displayed in the order of the last sorting you performed while on the index page. If we were to press the next button on this example card we would be shown the Card for protein #6 in our Set since it is the card after card #2 when we sorted the card set by class name.

Here is a brief summary of the arrows on this page:
  • The "First" button takes you to the first Card in the set (by the current sorting).
  • The "Last" button takes you to the last Card in the set
  • The "Next" button takes you to the next Card in the set
  • The "Previous" button (not shown) takes you to the last Card in the set.
If you are currently looking at the first card in the set (as above) no "Previous" arrow is available. The reverse is true for the last card in the set.

The "Index" link will bring you back to the Card Set index where you may re-sort the Card Set or view a different Card.

Clicking the class name in the Prediction row of the card ("cytoplasm" for this card) brings you to the probability distribution page with a graph like this on it:

This graph summarizes the probability the predictor assigned to each of the classes. In this example the probability is divided between "cytoplasm" and "nucleus".

If we go back to the Card page and click the Explain link we are sent to a page with a graph similar to this:
(the page may take a moment to load)

The names of the possible Subcellular Localization sites are along the y-axis and the probability assigned to that site is the total length of the corresponding bar. It is important to note that this graph is on a logarithmic scale so small differences between the bars are actually large differences in probability.

The legend of the graph assigns a color to words. These words were extracted from the BLAST output for the corresponding protein. In the graph, bar lengths of the smaller colored bars represent the contribution to the probability by the word with the corresponding color.

For example, in the graph above we can see the word "cytoplasm" is assigned a red color and the red bar contributes the most length to the class "cytoplasm" (as is expected). The word also contributes to the other classes because sometimes the word cytoplasm is found in the BLAST results of proteins which are active in other areas of the cell.

There are many details of the Explain graph which are not discussed here, for further information please see:
D. Szafron, P. Lu, R. Greiner, D. Wishart, Z. Lu, B. Poulin, R. Eisner, J. Anvik and C. Macdonell, Proteome Analyst - Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors, International Conference on Machine Learning Workshop on Machine Learning in Bioinformatics (ICML Workshop - Bioinformatics), August 2003, Washington, U.S.A., pp. 2-10, abstract | pdf

Please contact us if you have any further questions.