How to optimally combine experimental data with molecular modeling to describe the structure of complex protein mixtures in solution
Ingemar Andre
Biochemistry & Structural Biology, Lund University, Sweden
Many experimental techniques enable structural characterization of complex mixtures of proteins, but do not provide a resolution that allows for construction of detailed molecular models. However, by combining experimental data with structural modeling and simulation it is possible to construct molecular models of protein mixtures that would not be possible by data or simulation alone. Optimally combining data with simulation can be challenging and many different approaches have been used to address this problem. In this talk I will describe how statistical modeling methods can serve as a basis to marry data with simulation, with an emphasis on Bayesian statistical approaches. Our group uses small angle scattering (SAS) from X-rays and neutrons to study proteins that from structural ensembles in solution or self-associate to form higher-order complexes. SAS is powerful method to study complex mixtures, but the information content in the data is not sufficient to define three-dimensional structures. However, by using statistical modeling methods to combine SAS data with models generated through protein structure prediction, atomistic protein models can be generated of protein mixtures. We have used this approach to study the mechanism by which virus capsids self-assembles from capsid proteins by combining time-resolved SAS measurements, modeling of capsid oligomers and statistical modeling. The result is kinetic model for capsid assembly that explains how protein capsids are built up sequentially from simple protein building blocks to fully formed capsids with hundreds of subunits, which would be difficult to achieve with experiments or simulation alone. The methodology presented in this talk provides a general framework for analysis of many types of experimental data guided by molecular modeling and can easily be used to combine many different experimental data sources.