Structural Model Analysis
All-atoms Contact Analysis
Once all H atoms have been added and optimized (with Reduce), analysis of atomic overlaps (both clashes and H-bonds) makes a very sensitive and reliable way to find all the problem areas in a model, and often indicates how to fix them. In assessing a deposited crystal structure, the local presence or absence of all-atom clashes is an excellent way to guage the local accuracy of that part of the structure. The method is sensitive because H atom clashes are almost never minimized in refinement. That means, however, that you should not be surprised to find such problems in respectable, well-done structures.
The figure at top left shows the all-atom contacts for a Thr fit backwards in a 1.7 Å structure (not uncommon, since electon density for branched sidechains is sometimes straight and bar-shaped rather than showing the tetrahedral carbon position). Such misfittings result in poor rotamers and characteristic bond-angle distortions that aid in diagnosis. Below are contacts for the same Thr, idealized and refit.
To get a graphics display of the serious clashes (non-Hbond overlaps >= 0.4 Å) in the 3D structure for any PDB file (fetch from PDB or upload your own) use our MolProbity web server at http://molprobity.biochem.duke.edu/ You can add hydrogens, run various model quality analyses (Rama, rotamers, clashscores, Cbeta variations) and view results and models or download the results and kinemages.
If you turn off sidechains, het groups, distant and close contacts, small overlaps, and Hbonds (leaving only backbone and serious overlaps >= 0.4 Å), you can easily see the red spikes that indicate problems (as in the figure below).
Pickcenter on each such area in turn, zoom in, and turn on sidechains again to see what is clashing. If there are too many clashes, try running Probe again with the "all" selection replaced by "blt40", to concentrate on the more important, and more fixable, clashes at low B.
If you prefer to work with a list of problem areas (clusters of clashes), MolProbity can provide tables too. Also check out the multi-crit kinemage for a graphical summary view.
Asn / Gln / His Flips
Addition of all-atom clash information to H-bond analysis allows clear resolution of the 180° orientation ambiguities of sidechain amides or imidazoles in almost all cases, as described in Word et al. (1999) J. Mol. Biol. 285: 1735. (For more info about this and other Richardson Lab citations, see our publications page.) The figure shows the correct (green) and incorrect (pink) flip orientations for a doubly H-bonded Gln-Asn pair, which have equivalent H-bonds in the two states but are easily assigned because of the physically impossible overlap of an epsilon-NH and the C-alpha H in the lower panel.
MolProbity recommends flips for Asn, Gln, and His sidechains as part of its default Reduce run of adding hydrogens. It generates views of both flipped and non-flipped orientations so that the user can (and SHOULD!) judge the best orientation.
"Structure Validation by Cα Geometry: φ, ψ and Cβ Deviation", PROTEINS: Structure, Function, and Genetics, 50: 437-450 (2003) is our lab's paper about model quality assessments. Again, see our publications page for more info about this paper.
Improved definitions are given for the "Preferred" versus "Allowed but Disfavored" versus "Forbidden" regions of Phi-Psi space obtained from a large, accurate, B-factor filtered database (500 selected proteins at 1.8 Å resolution or better, residues with backbone B values less than 30 Å2: about 100,000 data points)
This figure shows the phi, psi distribution of data points for the non-Gly, non-Pro general case, with the preferred (core) region outlined in orange and the disfavored but allowed region outlined in gold. This new data shows a core region (98% of the data) that almost exactly matches the "strict" single region defined by Kleywegt & Jones (1996, Structure, 4, 1395), but it has in common with ProCheck (Laskowski et al., 1993, J. Appl. Crystallogr. 26, 283) the definition of an allowed outer region. However, it is now clear that the early ProCheck regions were too permissive in many places and missed the now-quite-distinct gamma-turn region near +70°, -60°.
A kinemage file Rama500noGPc.kin(2.6MB) of the above distribution and outlines is available. When viewed in either Mage or King, any data point can be clicked on to find its residue number and PDB code, and, if the "X Y Z point" tool is active, its phi-psi values as well. The kinemage also includes the high-B data, to show its greater scatter.
Although peak heights and shapes differ among the 18 non-Gly, non-Pro amino acids, the outline of the preferred and allowed regions are remarkably consistent. However, Gly and Pro are each very different, and using their individual distributions for structure validation is very desirable.
Gly, in spite of its greater flexibility, still occupies only about half the area of the Ramachandran plot and is the residue which phi,psi values are most prone to error (since it has fewer observable atoms.) A twofold-symmetrized version of the Gly distribution is shown at left.
A kinemage file ( Rama500Glyc.kin(336KB)) can be downloaded to view in Mage or King. It shows the non-symmetrized Gly data points, which fit the symmetrized region boundaries well but have more points with positive phi because they are more uniquely useful to a protein.
The closed ring of Pro makes it the most constrained residue type, with Phi values between about -50° and -90°. In addition to the expected peaks in the helical region and the poly-Pro region of beta, the Pro distribution at left shows a minor peak in between them, which corresponds to the inverse gamma-turn region.
This distribution includes both trans and cis Pro although the differences are substantial, with cis Pro confined to the lower phi values. A kinemage file ( Rama500Proc.kin(208KB)) may be downloaded; it allows animating between the cis and trans data.