A Guide to Structure Prediction (version 2.1)

Rob Russell
EMBL
Meyerhofstrasse, 1
D-69117 Heidelberg
Germany

These pages were created to accompany a joint CCP11/British Biophysical Society Meeting: Getting the Most from your Protein Sequence which was held at the Wellcome Trust, London on the 11th of March 1996. You can see most of the slides from the talk (in order) by clicking here

Let me know if you would like these pages maintained (russell@embl.de).

Preface to version 2.1 (September 2002)

This is just a move of the site to my new group. I hope to update the content more substantially... some day.

Preface to version 2 (September 1999)

Thanks to the hundreds of people who e-mailed me saying that they wanted this site maintained. I only had one day to update the server, so I expect that there are going to be several ommissions and problems. Please let me know if you find any, or if you have any suggestions.

The world has moved on considerably since I did the original pages. So many of the details have been removed or modified substantially to reflect the changing times. Please let me know if, in the process of doing this, I have removed anything that you may have found useful.

Introduction

This is by no means intended to be a comprehensive guide to predicting protein 3D structure. Rather, I have tried as best as possible to summarise my general approach to the problem in a manner that I hope is useful and not too difficult to follow. I apologise in advance for failing to include various references, WWW sites, etc. I would strongly recommend exploring the WWW pages given here, and looking for "related sites", etc. In this way you should get a more comprehensive picture of what is available.

The assumption is that you have a sequence of a protein that you want to know more about. Before you start, remember that this approach will not always provide satisfying or complete answers. However, it is increasingly rare that the techniques described here fail to shed any light on a protein sequence. Just a little time to analyse a sequence can possibly save time and money by aiding experimental design.

I should emphasise that the title of talk for the above meeting was Secondary structure prediction and fold recognition. The contents of these pages are thus heavily biased towards these two subjects (e.g. there are no figures for most of the other sections). Mostly, however, there are links within the other sections that can give you more information about them.

A Flowchart for Structure Prediction

See the clickable flowchart to see what I think is a generalised approach to predicting protein structure. Most regions of the flowchart are described in separate sections below. Ideally, a protein sequence goes in one end and a protein structure comes out of another. Be warned that this is not always possible. Nevertheless, the other sections of the flowchart can provide useful insights into protein structure and function, and provide information that can aid experimental design.

The contents of this guide (all reachable via the flowchart) have been divided into several sections:

Relevant experimental data

Sequence data/preliminary analysis

Sequence Database searching

Domain assignment

Multiple sequence alignment

Comparative or homology modelling

Secondary structure prediction

Fold Recognition

Analysis of folds and alignment of secondary structures

Sequence to structure alignment

About the figures

Quite a few people have asked me about the figures in the pages above. Pictures of protein three-dimensional structures were drawn using Suhail Islam's program PREPI (available from this server).

The pretty alignment shown in the secondary structure prediction section was drawn using Geoff Barton's ALSCRIPT program.

I recommend them both.

More information

There are hundreds or thousands of WWW sites to get more information about tools for analysis in molecular biology. Some good starting points are:

Protein Structure Prediction Centre Site dedicated to the Critical Assessment of Structure Prediction (CASP) experiments run every two years
ExPASy Molecular Biology Server (Switzerland)
Tools for analysis of primary sequence data (at ExPASy)
European Bioinformatics Institute (Cambridge,UK)
National Center for Biotechnology Information (NIH, USA)
The Protein Databank
Principles of Protein Structure Birkbeck College, London

I hope this guide has helped, and I wish you the best success in your predictions. Please send me comments if you have them, particularly if you feel of other things that should be included in this document.