Rob Russell looked through each target where results were sorted by
raw GDT_TS score, judging whether or not groups had a correct prediction of overall fold.
2 points were awarded for "excellent" predictions (those with a correct or nearly correct
structure) 1 point was given for "good" predictions (those where features of the fold
were correct, but with distortions or differences), and 0 points otherwise. He went down
the ranked list until it became clear that no more points would be awarded; the number
of models considered varied from 20 to 124, and roughly 1000 models were inspected
manually.
Summary for NF/FR
Initial inspection of the
predictions revealed a number where we thought that
overlapping coordinates were giving artificially high GDT_TS
scores. We thus repeated all calculations after first invoking a filter to try to remove
this effect. We removed any prediction that had 10 or more clashes (defined as
Calpha to Calpha distances <= 3.0 angstroms,ignoring adjacent residues).
More discussion of problems with automated GDT_TS based evalautions can be
found here and on the FORCASP
website.
See here
for some of our findings on how overlaping coordinates can increase GDT_TS.
Inspection of known 3D structures shows that they never have more than 1
contact between C-alphas < 3.7 angstroms. We also took precedent
from the Critical Assessment of PRedictions of Interactions meeting (assessment of docking;
CAPRI), where similar filter is applied for
similar reasons.
A total of 146 (out of 4916) models were removed by this filter. Results
below are either presented "raw" or "filtered".
We calculated the mean, standard-deviation and Z-score for all predictions per target. For each group, we then calculated the sum of the best (positive) Z-score per target over all targets and the average Z-score (performance, sum divided by number of predictions made). Z-scores acknowledge outstanding performance on difficult targets.
Summaries for: NF, NF (filtered) , NF+NF/FR, NF+NF/FR (filtered)We calculated the percentile (0-100) for each GDT_TS score per target. For the best prediction per group, we then gave points depending on the percentile of the GDT_TS score (>=90 : 6; >=80 : 4; >=70 : 3; >= 60 : 2; >= 50 : 1; < 50 : 0) and normalized them in the range 0-100 (see Lesk et al., Proteins Suppl 5:98-118 (2001)). Percentiles acknowledge good sustained performance but remove differences within the different ranges. All targets have equal weights.
Summaries for: NF, NF (filtered), NF/FR, NF/FR (filtered)We sorted the groups according to the GDT_TS score and normalized the ranking for each target by removing the lower half of the predictions and giving points (0: worst to 100: best) for the remaining ones. We then added up the points for the best prediction per group for all targets. This acknowledges good performance and takes small differences into account. All targets have equal weights.
Summaries for: NF, NF (filtered), NF/FR, NF/FR (filtered)These tables report the ranks for each group and target based on GDT_TS score. Sum and group average are the same as for GDT_TS Z-score.
Summaries for: NF, NF (filtered), NF/FR, NF/FR (filtered)
|
|