The HiQ54+ dataset (HiQ for High Quality) is designed for critical methods-testing applications, especially computational methods, without any significant level of evident errors in the reference crystal structure and without the complications of tight non-protein ligands or of multimer interfaces. For large-scale statistical use in structural bioinformatics, we recommend our Top8000 dataset.

It consists of all of the 55 non-redundant, single-chain, monomeric proteins from the PDB through 2010 that have 60-200 residues (avg=134) and no tightly-bound or large ligands. All were required to have both resolution and MolProbity score (Chen et al., 2010) at or below (better than) 1.4, very few bond-length or angle outliers, and deposited structure-factor data. The structures as available here have hydrogens added and optimized by Reduce, including the correction of Asn/Gln/His flips where needed. However, we have not yet gone through each structure to either validate or correct the very few remaining outliers.

We then selected the best chain (in terms of average of resolution and MolProbity score) per PDB homology cluster. There was a small number of ties within clusters (for < 1% of the final chain tallies); these were resolved, arbitrarily but reproducibly, by alphabetical order of PDB ID + single-character chain ID.

The reference for publication of the HiQ54 is:

Leaver-Fay A, O'Meara MJ, Tyka M, Jacak R, Song Y, Kellogg EH, Thompson J, Davis IW, Pache RA, Lyskov S, Gray JJ, Kortemme T, Richardson JS, Havranek JJ, Snoeyink J, Baker D, Kuhlman B (2013) "Scientific benchmarks for updating the Rosetta energy function", Meth Enzymol 523:109-143

The dataset was used there for testing some of the proposed changes to the Rosetta energy function. The HiQ54 performed better for that purpose than an older Rosetta test library of 50 small, high-resolution structures which included much higher levels of validation outliers, redundancy, ligands, and multimers. One of the 55 PDB files was inadvertently dropped from those tests, so this table shows the 54 used in that paper in the main section, with the extra file (1fj8) below, and thus is called the HiQ54+.

Below is a table of the HiQ54+ files and their characteristics, plus a downloadable pdf of that table. MolProbity has just been revised to incorporate updated H atom parameters, and the HiQ54 files are being recalculated for the small changes in H positions and in scores. They will be available for download here soon.

(page revision February 28, 2013)

download arrow HiQ54+_library_table.pdf