Skip to content

Latest commit

 

History

History
695 lines (513 loc) · 24.8 KB

scattering.rst

File metadata and controls

695 lines (513 loc) · 24.8 KB

Scattering

On this page:

X-ray form factors

The X-ray scattering contribution from an isolated atom is described by the atomic form factor. Nowadays, two form factor parametrizations are commonly used: one from the International Tables of Crystallography Vol. C (4 Gaussians + constant factor), and the other from Waasmaier & Kirfel (1995), Acta Cryst. A51, 416 (5 Gaussians). Other parametrizations exist (for example, with only two Gaussians), but are not widely used.

Currently, Gemmi includes only the ITC parametrization.

In C++, the form factor coefficients are listed in the :file:`it92.hpp` header. In Python, these coefficients can be accessed as a property of an element (only coefficients for neutral atoms):

>>> gemmi.Element('Fe').it92  #doctest: +ELLIPSIS
<gemmi.IT92Coef object at 0x...>
>>> gemmi.Element('Es').it92  # -> None (no parametrization for Einsteinium)

or by using the function IT92_get_exact() that takes an element and a charge as arguments and returns None if this exact atom or ion is absent in the table.

The charges listed in a coordinates file might not be reliable for determining the scattering of an atom, so by default, we use coefficients for neutral atoms regardless of the formal charge. To use coefficients for ions, you first need to change IT92::ignore_charge:

>>> gemmi.IT92_set_ignore_charge(False)
>>> gemmi.IT92_get_exact(gemmi.Element('Mg'), +2)  # for Mg2+ #doctest: +ELLIPSIS
<gemmi.IT92Coef object at 0x...>

You can get the coefficients as numbers:

>>> fe_coef = gemmi.Element('Fe').it92
>>> fe_coef.a
[11.7695, 7.3573, 3.5222, 2.3045]
>>> fe_coef.b
[4.7611, 0.3072, 15.3535, 76.8805]
>>> fe_coef.c
1.0369

The a's and c should sum up to the number of electrons (Z - charge), but the numbers from the Tables may differ slightly:

>>> gemmi.Element('Fe').atomic_number
26
>>> sum(fe_coef.a) + fe_coef.c  # doctest: +ELLIPSIS
25.9904...

Some programs (Refmac) normalize the coefficients, which means multiplying them by a factor between 0.99995 and 1.00113.

>>> # let's store the original values first
>>> orig_coefs = {i : gemmi.Element(i).it92.get_coefs() for i in range(1, 99)}
>>> # now we normalize the values
>>> gemmi.IT92_normalize()
>>> # and we can see that the values have changed
>>> fe_coef.a  # doctest: +ELLIPSIS
[11.7738..., 7.3600..., 3.5235..., 2.30535...]

Now let's use the function set_coefs() to change the coefficients back to the original values (for neutral atoms):

>>> for i in range(1, 99):
...     gemmi.Element(i).it92.set_coefs(orig_coefs[i])

Macromolecular models may have unknown atoms with the element specified as X. By default, we use oxygen's coefficients for X, but you may change it (as well as the coefficients of any other atom):

>>> c_coefs = gemmi.Element('C').it92.get_coefs()
>>> gemmi.Element('X').it92.set_coefs(c_coefs)

The coefficients can be used to directly calculate the sum of Gaussians -- the structure factor contribution:

>>> fe_coef.calculate_sf(stol2=0.4)  # argument: (sin(theta)/lambda)^2
9.303603172302246

The large number of reflections in macromolecular crystallography makes direct calculation of structure factors inefficient. Instead, we can calculate electron density on a grid and use FFT to obtain structure factors. In the simplest case, the atom's contribution to the electron density at a grid point can be calculated as:

>>> # arguments are distance^2 and isotropic ADP
>>> fe_coef.calculate_density_iso(r2=2.3, B=50)
0.5279340744018555

The C++ interface provides more functions to calculate the electron density. We have separate functions to work with isotropic and anisotropic ADPs. For efficiency, the computations are split into two stages: the first one is performed once per atom, and the second one for each nearby grid point.

In the usual scenario, we add f' (the real component of anomalous scattering -- see the next section) to the constant coefficient c.

Anomalous scattering

The anomalous dispersion is wavelength-dependent. Gemmi provides the function cromer_liberman, which calculates the real and imaginary components f' and f" for isolated atoms from Z=3 to Z=92.

As the name suggests, this function uses the Cromer-Liberman algorithm. This algorithm, as noted on the pyFprime website, "fails in computing f' for wavelengths < 0.16 Å (> 77.48 keV) for the heaviest elements (Au-Cf) and fails to correctly compute f', f" and μ for wavelengths > 2.67 Å (< 4.64 keV) for very heavy elements (Am-Cf)."

The implementation is contained in a single C++ header file :file:`fprime.hpp` with no dependencies. All the data is embedded in the code. The binary size after compilation is about 100kB.

Reportedly, the data tables synthesized by C.T. Chantler are more accurate. Consider using them instead. They are available from the NIST website and the XrayDB project.

The code in :file:`fprime.hpp` is based on the X-ray spectroscopy project Larch and should give the same results as the f1f2_cl function there. The Fortran code in Larch is, in turn, based on the Brennan and Cowan routines. Which, in turn, were based on the original program from Don Cromer. Along the way, the code was extensively modified. Importantly, the Jensen correction has been removed (as is recommended in the chapter 4.2.6 of ITvC) and the Kissel and Pratt (1990) correction has been added. Therefore, it gives different results than the crossec program, which was contributed to CCP4 directly by Don Cromer in the 1990's.

The cromer_liberman function is available in both C++ and Python:

>>> gemmi.Element('Se').atomic_number
34
>>> gemmi.cromer_liberman(z=_, energy=10332.0) # energy in eV  #doctest: +ELLIPSIS
(-1.41862..., 0.72389...)
>>> # use gemmi.hc to convert wavelength [A] to energy [eV]
>>> gemmi.cromer_liberman(z=34, energy=gemmi.hc/0.71073)  #doctest: +ELLIPSIS
(-0.09201..., 2.23336...)

The same values can be printed from the command line program :ref:`gemmi-fprime <fprime>`.

Electron form factors

The electron form factors are parametrized as 5 Gaussians, using coefficients from International Tables for Crystallography Volume C, edition 2011, table 4.3.2.2 (pp. 282-283), "Elastic atomic scattering factors of electrons for neutral atoms and s up to 2.0 A--1". The same parametrization is used in cctbx and CCP4.

In gemmi, we refer to these coefficients as C4322 (after the volume and table number in the International Tables). For C++, the form factor coefficients are tabulated in the :file:`c4322.hpp` header. In Python, they can be accessed as a property of an element:

>>> fe_coef = gemmi.Element('Fe').c4322
>>> fe_coef.a
[0.3946, 1.2725, 1.7031, 2.314, 1.4795]
>>> fe_coef.b
[0.2717, 2.0443, 7.6007, 29.9714, 86.2265]

Similarly to the X-ray coefficients, they can be used to calculate structure factors and density:

>>> fe_coef.calculate_sf(stol2=0.4)  # argument: (sin(theta)/lambda)^2
0.9971507787704468
>>> fe_coef.calculate_density_iso(r2=2.3, B=50)
0.13794779777526855

Unlike for X-ray form factors, we do not add anomalous scattering here.

Neutron scattering

For neutrons, we use bound coherent scattering lengths from NCNR, which are based on Neutron News, Vol. 3, No. 3, 1992.

In C++, this data is in the :file:`neutron92.hpp` header. In Python, it can be accessed as a property of an element:

>>> h_neut = gemmi.Element('H').neutron92
>>> h_neut.get_coefs()  # bound coherent scat. length in fm (10^-15 m)
[-3.739]
>>> h_neut.calculate_sf(0)  # the argument is ignored
-3.739
>>> h_neut.calculate_density_iso(r2=1.3, B=25)
-0.17104358254308388

We don't store data for individual isotopes, except for deuterium.

>>> gemmi.Element('D').neutron92.get_coefs()
[6.671]

Direct summation

Gemmi has functions to calculate structure factor by direct summation of structure factors from individual atoms. This route is not commonly used in macromolecular crystallography and it was implemented primarily to check the accuracy of FFT-based computations. Therefore, the efficiency was not a priority here. In particular, space group specific optimizations, described by Bourhis et al (2014), are not included.

In Python classes StructureFactorCalculatorX and StructureFactorCalculatorE perform direct summation using X-ray and electron form factors, respectively. Class StructureFactorCalculatorN does calculations for neutrons. The C++ interface is similar, although it uses a single templated class StructureFactorCalculator. These classes are constructed with UnitCell as a parameter (we don't pass SpaceGroup because UnitCell already contains a list of symmetry operations).

>>> st = gemmi.read_structure('../tests/4oz7.pdb')
>>> calc_e = gemmi.StructureFactorCalculatorE(st.cell)

Now we can compute structure factors from Model for any (hkl):

>>> calc_e.calculate_sf_from_model(st[0], (3,4,5))  #doctest: +ELLIPSIS
(54.50873...+53.39498...j)

Similarly, for small molecules:

>>> small = gemmi.read_small_structure('../tests/1011031.cif')
>>> small.change_occupancies_to_crystallographic()
>>> calc_x = gemmi.StructureFactorCalculatorX(small.cell)
>>> calc_x.calculate_sf_from_small_structure(small, (0,2,4))
(17.849694728851315-6.557871454633539e-15j)

For each atom, the Debye-Waller factor (used in the structure factor calculation) is obtained using either isotropic or anisotropic ADPs (B-factors). If anisotropic ADPs are non-zero, isotropic ADP is ignored.

Addends

StructureFactorCalculator also contains addends -- per-element values that are to be added to the form factor coefficient c (which is the same as adding it to the total atomic contribution to the structure factor):

>>> calc_x = gemmi.StructureFactorCalculatorX(st.cell)
>>> calc_x.addends  #doctest: +ELLIPSIS
<gemmi.Addends object at 0x...>

When calculating X-ray structure factors, one may want to include f' (real part of the :ref:`anomalous scattering <anomalous>`). To do this, f' needs to be set for each element present in the system. This is done with the function set_addend(), which sets an angle-independent value that will be added to the value calculated from the form factors.

>>> energy = gemmi.hc / 0.8  # for wavelength 0.8A
>>> for symbol in ['C', 'N', 'O', 'S', 'Cu']:
...     el = gemmi.Element(symbol)
...     fp, _ = gemmi.cromer_liberman(z=el.atomic_number, energy=energy)
...     calc_x.addends.set(el, fp)
...
>>> calc_x.addends.get(gemmi.Element('Cu'))
0.265503853559494

Alternatively, you could call a single function (here we first need to remove the previous values, so it makes two functions):

>>> calc_x.addends.clear()
>>> calc_x.addends.add_cl_fprime(energy)

which calculates f' for all elements handled by the Cromer-Liberman algorithm (Z from 3 to 92). Although it seems wasteful, it takes well below 1ms.

Structure factors calculated at this point incorporate the addends:

>>> calc_x.calculate_sf_from_model(st[0], (3,4,5))  #doctest: +ELLIPSIS
(182.3655...+269.0002...j)

Addends can also be used to calculate electron scattering from X-ray form factors according to the Mott–Bethe formula, as shown in a :ref:`separate section <mott_bethe>`.

Density for FFT

To use FFT to calculate structure factors, we first need to calculate the density of the scatterer (e.g. electron density for X-rays) on a grid. For this we use:

  • in C++ -- class DensityCalculator templated with a form factor table,
  • in Python -- classes DensityCalculatorX (corresponding to X-ray form factors), DensityCalculatorE (electron form factors) and DensityCalculatorN (neutrons).

DensityCalculator contains a grid. The size of the grid is determined from two parameters that we need to set: d_min which corresponds to our resolution limit, and rate -- the oversampling rate (1.5 by default).

>>> dencalc = gemmi.DensityCalculatorE()
>>> dencalc.d_min = 2.5 # 2.5A
>>> dencalc.rate = 1.5  # we could skip it, this is the default value

These two parameters are only used to determine the spacing of the grid. In this case, about 2.5Å / (2 * 1.5) = 0.83Å.

As with StructureFactorCalculator, here we also have :ref:`addends <addends>`:

>>> dencalc.addends  #doctest: +ELLIPSIS
<gemmi.Addends object at 0x...>

To create a grid and calculate the density we use two function calls:

>>> dencalc.grid.setup_from(st)
>>> dencalc.put_model_density_on_grid(st[0])

Calling put_model_density_on_grid is equivalent to these three functions:

>>> dencalc.initialize_grid()
>>> dencalc.add_model_density_to_grid(st[0])
>>> dencalc.grid.symmetrize_sum()

initialize_grid(), in this case, uses d_min and rate to determine the required grid spacing and uses this spacing to setup the grid.

If d_min is not set but the grid size is already set, initialize_grid() zeros the grid values without changing its size.

At this point, we have a grid with density:

>>> dencalc.grid
<gemmi.FloatGrid(48, 48, 50)>

which can be transformed (as described :ref:`above <fft>`) into a structure factor grid:

>>> sf_grid = gemmi.transform_map_to_f_phi(dencalc.grid)
>>> sf_grid
<gemmi.ReciprocalComplexGrid(48, 48, 50)>
>>> sf_grid.get_value(3, 4, 5)  #doctest: +ELLIPSIS
(54.5276...+53.4189...j)

In addition to d_min and rate, which govern the grid density, DensityCalculator has two more parameters that affect the accuracy of the calculated structure factors:

  • cutoff (default: 1e-5) -- the density cut-off in the same unit as the map. It determines the atomic radius r within which the density is calculated (density at distance r will be approximately equal to cutoff). A smaller cutoff gives more accurate but slower calculations.
  • blur (default: 0) -- Gaussian dampening (blurring) factor, an artificial temperature factor Bextra added to all atomic B-factors (the structure factors must be corrected later to cancel it out).

Choosing these parameters is a trade-off between efficiency and accuracy. Bextra is the most interesting one. It is discussed in the ITfC vol B, section 1.3.4.4.5 by G. Bricogne, and further in papers by J. Navaza (2002) and P. Afonine and A. Urzhumtsev (2003), but no formula for the optimal value exists. The value of Bextra that gives the most accurate results depends on resolution, oversampling, atomic radius cut-off, and the distribution of B-factors (particularly the minimal B-factor in the model). Additionally, increasing the dampening slows down computations because it increases the atomic radius.

Bextra can be set explicitly (it can even be negative):

>>> dencalc.blur = 10

or using the formula from Refmac (a function of the grid spacing and Bmin). If the formula results in a negative number, Refmac sets Bextrato 0, while Servalcat uses the negative value.

>>> dencalc.set_refmac_compatible_blur(st[0], allow_negative=False)
>>> dencalc.blur
31.01648695048263

The :ref:`sfcalc <sfcalc>` program can be used to test different choices of Bextra.

If the density was blurred, it needs to accounted for by either adding the unblur option to prepare_asu_data():

>>> asu_data = sf_grid.prepare_asu_data(dmin=dencalc.d_min, unblur=dencalc.blur)

or by multiplying individual structure factors by dencalc.reciprocal_space_multiplier(inv_d2).

Mott-Bethe formula

(This is a niche topic that most readers should skip.)

Electron scattering can be computed using X-ray form factors. To do this, we use fx--Z as form factors and multiply the result by a factor derived from the Mott-Bethe formula. This factor includes d 2 (the squared length of the scattering vector) and is negated because we use fx--Z instead of Z--fx.

>>> calc_x = gemmi.StructureFactorCalculatorX(st.cell)
>>> calc_x.addends.subtract_z()
>>> # this also stores d^2 in calc_x, which is then used in mott_bethe_factor()
>>> sf = calc_x.calculate_sf_from_model(st[0], (3,4,5))
>>> calc_x.mott_bethe_factor()  #doctest: +ELLIPSIS
-2.95382566...
>>> calc_x.mott_bethe_factor() * sf  #doctest: +ELLIPSIS
(54.065...+52.968...j)

We can use the same trick when calculating structure factors through FFT:

>>> dc = gemmi.DensityCalculatorX()
>>> dc.d_min = 2.5
>>> dc.addends.subtract_z()
>>> dc.grid.setup_from(st)
>>> dc.set_refmac_compatible_blur(st[0])
>>> dc.put_model_density_on_grid(st[0])
>>> grid = gemmi.transform_map_to_f_phi(dc.grid)

Then we multiply it by –1/(2π2a0d 2) to get fe.

We either multiply individual values by mott_bethe_factor() (which includes reciprocal_space_multiplier()):

>>> dc.mott_bethe_factor([3,4,5]) * grid.get_value(3,4,5)  #doctest: +ELLIPSIS
(54.063...+52.970...j)

or we call prepare_asu_data() with mott_bethe=True:

>>> asu_data = grid.prepare_asu_data(dmin=2.5, mott_bethe=True, unblur=dencalc.blur)
>>> asu_data.value_array[numpy.all(asu_data.miller_array == [3,4,5], axis=1)]  #doctest: +ELLIPSIS
array([54.063...+52.970...j], dtype=complex64)

That is all. If you want to separate the positions of hydrogen nuclei and electron clouds, first use a model in which hydrogen coordinates point to electron clouds. Call subtract_z() as:

>>> dc.addends.subtract_z(except_hydrogen=True)

Then adjust the hydrogen coordinates to proton positions and subtract Z=1 by adding c=-1:

>>> for cra in st[0].all():
...     if cra.atom.is_hydrogen():
...         dc.add_c_contribution_to_grid(cra.atom, -1)

Scaling and bulk solvent correction

Anisotropic scaling of calculated structure factors F and bulk solvent correction are usually handled together because the bulk solvent parameters are optimized along with the scaling parameters. In Gemmi, both are implemented in one class: Scaling.

The bulk solvent correction is optional and used only with crystallographic data. The solvent (not modeled as individual atoms) occupies a significant volume of a macromolecular crystal, so accounting for it makes a difference. If we have a model with atomic coordinates, we can create a mask, as described in the section about :ref:`solvent masking <solventmask>`, to model bulk solvent as a flat scatterer. Then, we Fourier-transform the mask and obtain Fmask.

The solvent correction is parametrized as:

Fsol = ksol exp(–Bsol s2/4) Fmask

ksol and Bsol are refined together with the overall anisotropic scaling parameters, kov and Bov:

Ftot = kov exp(–sT Bov s / 4) (Fcryst + Fsol)

Bov is a crystallographic anisotropic tensor that obeys the symmetry; effectively, from a single fittable parameter for a cubic crystal to 6 parameters for a primitive crystal. All the parameters (kov, Bov, ksol and Bsol), or only selected ones, are optimized to make the total calculated amplitude Ftot = |Ftot| match the diffraction data Fobs. But according to what criterion?

First, there is least-squares scaling, which minimizes ∑(FtotFobs)2. While this target function has fallen out of favor in macromolecular crystallography (because the errors are not really Gaussian-distributed), it is still routinely used for the calculation of statistics such as R-factors.

The workhorse algorithm for non-linear least squares optimization is the Levenberg-Marquardt method. Unlike other popular algorithms, such as BFGS and conjugate gradient methods, L-M works only for least squares. This is because while the other methods estimate the Hessian from the result of the previous optimization step (BFGS) or ignore the second derivatives (CG), L-M approximates the Hessian as a squared Jacobian (JT J), which holds only for quadratic forms. However, for these forms L-M tends to converge faster than the other methods.

Here is a full example. We read Fobs from an mtz file, then we read a corresponding model and use it to calculate Fcryst and Fmask. These functions were described in the previous sections. Then we instantiate the Scaling class, pass the data points to it, and finally we perform the least squares scaling.

Note: functions in class Scaling are likely to change soon.

>>> # read Fobs
>>> mtz = gemmi.read_mtz_file('../tests/5e5z.mtz')
>>> mtz.update_reso()
>>> f_obs = mtz.get_value_sigma('FP', 'SIGFP')

>>> # calculate Fcryst
>>> st = gemmi.read_structure('../tests/5e5z.pdb')
>>> d_min = mtz.resolution_high() - 1e-9
>>> dc = gemmi.DensityCalculatorX()
>>> dc.d_min = d_min
>>> dc.set_refmac_compatible_blur(st[0])
>>> dc.grid.setup_from(st)
>>> dc.put_model_density_on_grid(st[0])
>>> grid = gemmi.transform_map_to_f_phi(dc.grid, half_l=True)
>>> f_cryst = grid.prepare_asu_data(dmin=d_min, unblur=dc.blur)

>>> # calculate Fmask
>>> mask_grid = gemmi.FloatGrid()
>>> mask_grid.setup_from(st, spacing=min(0.6, d_min / 2 - 1e-9))
>>> masker = gemmi.SolventMasker(gemmi.AtomicRadiiSet.Refmac)
>>> masker.put_mask_on_float_grid(mask_grid, st[0])
>>> fmask_gr = gemmi.transform_map_to_f_phi(mask_grid, half_l=True)
>>> f_mask = fmask_gr.prepare_asu_data(dmin=d_min)

>>> # now we can start with Scaling
>>> scaling = gemmi.Scaling(st.cell, st.find_spacegroup())
>>> scaling.use_solvent = True
>>> scaling.prepare_points(f_cryst, f_obs, f_mask)
>>> scaling.fit_isotropic_b_approximately()
>>> wssr = scaling.fit_parameters()
>>> print(f'RMSE: {(wssr / len(f_obs)) ** 0.5:.4f}')
RMSE: 6.4176
>>> print(f'R-factor: {scaling.calculate_r_factor():.2%}')
R-factor: 17.73%

>>> # get scaled Ftotal
>>> f_tot = f_cryst.copy()
>>> scaling.scale_data(f_tot, f_mask)

Least squares are sensitive to outliers. To make the scaling less sensitive, we must change the target function. The absolute differences in R-factor are less affected by outliers than the squared differences. If we choose a target that's more similar to the R-factor, we'll get more robust scaling, and also a lower R-factor.

This requires a different optimization algorithm. Actually, it's also possible to robustify least squares and keep using (modified) Levenberg-Marquardt, as was reviewed for the bundle adjustment procedure, but I haven't found any research indicating these methods are preferable to simply using another algorithm.

TBC