The relationship between sequence and interaction divergence
in proteins


These pages relate to Aloy et al, J. Mol. Biol., 332, 989-998, 2003 PubMed 14499603.
Go straight to the interactive browsing tool



In this paper we tried to do something similar to what Chothia & Lesk did for protein structures in the 1980s (Chothia & Lesk, EMBO J., 5, 823-6, 1986) Their aim was to correlate the degree to which protein structures changed as a function of sequence divergence. The classic plot is one of root mean square deviation (RMSD) of C-alpha atoms versus percentage sequence identity, which showed a gradual increase in RMSD as identity dropped.

The difference is that we looked not at how individual proteins diverged, but how their interactions with other proteins changed as a function of sequence divergence. All of this is discussed in more detail in the paper, but the main result is a similar plot: RMSD versus sequence identity. We had to devise a slightly modified interaction-RMSD (iRMSD) that measured the difference between two interactions in different structures (PDB files).

We also classified the similarities between the proteins. So when comparing one interaction, A-B, with another one of the same type, A'-B', the As are similar in structure to each other, as are the Bs. We classified these similarities according to their degree of similarity in the SCOP database:

  1. fol are proteins or domains from the same fold, but in different superfamilies
  2. sup are from the same superfamily, but in different families
  3. fam are from the same family

This provides a good guide to how similar one can expect the proteins to be in terms of function and evolution. For example fol similarities usually have different functions, sup similarities often have similar functions despite low or undecetable sequence similarity, and so forth.

We also classified interacting pairs of domains (i.e. A-B) as inter- or intra- molecular, the latter meaning they lie in separate polypeptide chains. We defined fusions as those pairs of interactions (A-B compared to A'-B') where one was inter- and one was intra- molecular.

The main plots are below. We have also created a simple, interactive tool for inspecting the data. For more information, please consult the paper.


Following on from the "Note added in proof". We have created improved data sets, which are also available on the web tool. For more information on how these were constructed, and to download some original data, click here.


To get the data from this study, see here.

Please cite Aloy et al, J. Mol. Biol., 332, 989-998, 2003 PubMed 14499603.