William Sheffler
Joined Program: 2004
Previous Degrees: M.S. Computer Science, Brown University; B.S. Mathematics & Engineering, Brown University
Baker Lab
wsheffle (at) u.washington.edu
Research:
The first goal of my research with Dr. Baker is to determine when
rosetta has generated a “correct” candidate structure for a protein or
protein interaction. In a recent benchmark of de novo protein structure
prediction, rosetta was used to predict the strcuture of 16 small
proteins. For 5 of the 16 proteins, rosetta was able to generate correct
structures which matched the x-ray crystal structure down to the
positioning of individual amino acid side chains. This shows that
rosetta can generate correct protein structure predictions in some
cases; unfortunately, there is currently no reliable way to decide
whether structure prediction in a given case was successful. We will be
collaborating with noble lab to develop a machine learning approach to
determine the success or failure of a protein structure prediction or
protein-protein docking attempt.
A second and related goal is to create local measures of protein
structure quality. Many structures generated by rosetta are correct
within a region of the protein, but not correct over the entire
structure. If correct pieces of a structure could be detected, this
information could be used to guide future sampling and help to generate
globally correct structures. Detection of correct pieces could used in
de novo protein structure prediction as the basis of a divide and
conqueror approach, where whole structures are built up out of smaller
components previously determined to have likely correct structure. A
local quality measure would also be useful in homology modeling and
protein design, where it is important to know which regions of a
structure are correct and which require further refinement.
Both of my research goals involve analysis of the quality of protein
structures. Many measures of structure quality are already available in rosetta in the score functions used in various optimization procedures.
These scores used in optimization must be fast to compute and pairwise
additive, and previous work has focused on quality measure which satisfy
these constraints. I will be developing metrics of protein quality to be
used after optimizations are complete, which can thus afford to be more
sophisticated and expensive to compute. For example, I have considered patterns of unsatisfied hydrogen bond donor and acceptor groups buried
within the core of a protein and inaccessible to solvent, as well as the
presence of voids or cavities in a protein core. Such measures
complement preexisting scores used in rosetta and should help in the
identification of both globally and locally correct protein strucures. |