Part 3: MolProbity-style Validation in PHENIX and Coot

PHENIX has many built-in MolProbity tools, and can call and interact with the crystallographic rebuilding program Coot, which also has the capability of using Probe dots as an aid in model changes.

  1. Launch the Phenix GUI, and select "Set up tutorial data...". From the “Tutorial data” drop-down menu, select “Protein Kinase A (structure comparison),” and select an appropriate destination for the data files. Once setup, launch this new project.
  2. From the main Phenix GUI window, click on “Validation” in the menu on the right, and then select “Comprehensive Validation”.
  3. In the “Comprehensive Validation” window, browse to the tutorial directory that you specified above, and select 3dnd.pdb as your PDB file. Also specify the matching Reflections file, 3dnd.mtz. With both files specified, select “Run” from the top menu bar.
  4. When your “Comprehensive Validation” run has completed, you will be greeted with a “Model and data” tab, which summarizes the overall statistics for this model (R values, RMS bonds and angles, etc.). Below these statistics, you will see a button to “Open in Coot”. Click on this now, as we’ll need Coot later.
  5. Next, click on the “Compare statistics” button. This will launch a tool called “Polygon”, which is used to simultaneously compare R-work, R-free, RMSD-bonds, RMSD-angles, Clashscore, and Average B factor. Well-built models will usually have a small, fairly equilateral polygon, whereas larger or significantly asymmetric deviations are indicative of model problems. As you can see for 3dnd, both RMSD-bonds and RMSD-angles are a bit large, which could indicate some misfit areas of the structure that are causing geometric strain.
  6. Return to the main validation window, and select the “MolProbity” tab. Under this tab, you will see sub-tabs for “Summary”, “Basic geometry”, “Protein”, and “Clashes”. If your model had RNA, there would also be an “RNA” tab. Under the “Summary” tab are most of the same overall statistics that you would find when running the MolProbity webserver. Notice the high percentage of rotamer outliers (7.7%), and large number of C-beta deviations (those combine geometry problems around the Calpha into a single measure, as explained in Lovell 2003). Click on the “Show validation in KiNG” button, which will launch KiNG with a multi-criterion kinemage, similar to what you have seen before from the MolProbity webserver.

  7. As you’ve seen in previous exercises, the multi-criterion kinemage displays a variety of validation criteria in a visual 3D context. Have a look at the molecule, noting the rotamer outliers, serious steric clashes, and handful of bond length, bond angle, and C-beta deviations. Near bottom center there's a cluster of 5 gold sidechain outliers - right-click on the middle one (Leu 268), or on the clash between it and another close Leu outlier (272). Turn mainchain and sidechain on and Calphas and dots (near the botom) off, clip tighter. Choose Tools/Structural biology/sidechain rotator, and middle-click on an atom in Leu 268. [Notice that the new Probe dots do not show a clash. That's because the 3dnd.pdb file does not have explicit hydrogens added - they are the crucial element that gives the all-atom contact method its diagnostic power.] Turn on "Probedots" in the Model Manager window. In the Leu 268 rotator window, click through the rotamer choices to find a high-scoring one that fits the density as well as the original outlier did. Exit without saving.
  8. Back in the "Comprehensive validation" window, click on the “Basic geometry” tab. Here you find a summary of all bond, angle, dihedral, chirality, and planarity outliers (from Phenix, in this case, rather than MolProbity - this includes sidechain and het group geometry). Outliers will be listed in the associated lists, and each item is clickable, which will center in Coot. Note the bond outlier for Ile A 163 – we’ll be seeing this residue again later.
  9. Next, go to the “Protein” tab. This section contains validation information for Ramachandran and rotamer outliers, C-beta deviations, and recommended Asn/Gln/His sidechain flips (these have NOT already been done for you, as you can tell if you click on His 39 in the flip list to see it in Coot.

    Find the list of rotamer outliers, and click on Leu 27 from the A chain, which will center on this residue in the Coot window. As you can see in Coot, this orientation is not a terrible fit to the density, but it is a rotamer outlier and energetically unfavorable due to an eclipsed Chi angle, and it has a suggestive difference peak. We’ll use the tools in Coot to fix this sidechain. In Coot, click Calculate-> Model/Fit/Refine to bring up the window of modeling tools. First, we need to select a map, which you can do by pressing the “Select Map” button. Choose the 2FOFCWT map. Next, select “Auto Fit Rotamer”, and then click on an atom in the Leu A 27 sidechain. Did you see it rotate ~180°? These kinds of misfit Leu residues are common in crystal structures, but are easy to identify and fix. Change your point of view (maybe center on Cgamma) and Click “Undo” and “Redo” a few times until you are comfortable with how this change is carried out. Similarly, you can correct many of the other sidechain outliers in the GUI list.

  10. Return to the Validation GUI window, and navigate to the list of C-beta position outliers. Select “Ile 163” from the A chain. Do you recall this sidechain from the bond-length outlier list? Misfit sidechains will often have multiple diagnostic indicators of a problem, which is useful in easily identifying the worst offenders. Notice that the CD1 atom is so far away from the CG1 atom that Coot does not draw the bond vector. Also notice the blob of positive density near to the sidechain, indicating that it may not be in an optimal position. Correcting misfit sidechains such as these can be tricky, as many rounds of refinement have caused distortions in the model to accommodate the misfit. One approach that works well to fix this type of problem is to first mutate the offending residue to an alanine, and then run real-space refinement. These steps allow the C-beta position to be properly refined, without being trapped by the other misfit sidechain atoms (similar to the backrub in KiNG). To do this, select “Simple Mutate”, click on an atom in Ile A 163, and then select Ala from the pop-up window. Next, select “Real Space Refine Zone”, and click on 2 atoms to specify a range that runs at least 2 residues on either side of the Ile A 163 (or pick Ile 163 and hit the "a" key for autozone). Notice the subtle, but distinct movement of the C-beta position. Accept the change. Next, we’ll mutate the residue back to Ile. To do this, select “Mutate and Auto Fit”, click on an atom in Ile A 163, and select Ile from the pop-up menu. Notice that the newly fit sidechain is now rotated ~180°, has a much better density fit for the CD1 atom, and has now placed the CG2 atom in the positive density peak. Upon further refinement, the position of this Ile will further improve, as the neighboring atoms are able to recover from the strain caused by the initial outlier

The kind of sidechain fixups you've done in KiNG or Coot can mostly be accomplished using the -autofit_rotamer flag in phenix.refine, up to somewhere between 2 and 2.5σ (and of course NQH flips are done automatically by Reduce in either MolProbity or Phenix). But since the autofit-rotamer only uses real-space refinement and rotamer scores, it can't reliably get the right answer at poorer resolutions. That's a hard job for people as well, but can often be done if you have the interactive information on clashes and H-bonds from the non-pairwise, H-aware all-atom-contact dots.

http://kinemage.biochem.duke.edu/teaching/workshop/Erice2012
Jane & Dave Richardson