Skip to content

Commit

Permalink
Stacking (GrossfieldLab#57)
Browse files Browse the repository at this point in the history
* preliminary commit of a simple stacking store -- totally untested

* it does help to do principle axes of both groups... idiot

* switch to logistic function

* program to test the stacking (also does packing score as a control)

* ugh, I had the logistic implemented backward.

* make switch distance of logistic user-settable

* finish making distance user-settable, and update comment

* optional options

* removed commented-out code

* preliminary version -- adapted to work on trajectories, not tested

* added program to do stacking analysis of a trajectory

* interim: norm by a-form, something's wrong

* reset default box per traj

* merge from main

* messed up merge

* add description of --normfile option and install the example norm file
  • Loading branch information
agrossfield authored Jun 15, 2021
1 parent b3297d7 commit 11c54b1
Show file tree
Hide file tree
Showing 8 changed files with 229 additions and 5 deletions.
3 changes: 0 additions & 3 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@
2021-05-05 Alan Grossfield <alan>
* fixed bug in reading TinkerArc files with periodic boxes

2021-04-26 Alan Grossfield <alan>
* add new tool contact_distance (like rmsds, but in residue contact space)

2021-04-08 Alan Grossfield <alan>
* add frameBoundaries to VirtualTrajectory

Expand Down
2 changes: 1 addition & 1 deletion Packages/PyLOOS/SConscript
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ apps += 'simple_model_transform all_contacts hierarchical-cluster '
apps += 'simple_traj_calc axis_with_membrane inside_helices '
apps += 'simple_traj_transform center_molecule native-hbs total_charge '
apps += 'cluster-structures packing_score_per_res cylindrical-density '
apps += 'protein_tilt scattering contact_distance '
apps += 'protein_tilt scattering all_stacking contact_distance'

# Moved libraries out of Packages/PyLOOS to loos/pyloos
#files = 'ConvexHull NAMDBin'
Expand Down
167 changes: 167 additions & 0 deletions Packages/PyLOOS/all_stacking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
#!/usr/bin/env python3

"""
Track a base stacking through a trajectory. Intended for use with
nucleic acids, to identify base stacking.
"""

"""
This file is part of LOOS.
LOOS (Lightweight Object-Oriented Structure library)
Copyright (c) 2021 Alan Grossfield, Grossfield Lab
Department of Biochemistry and Biophysics
School of Medicine & Dentistry, University of Rochester
This package (LOOS) is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation under version 3 of the License.
This package is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
"""
import sys
import loos
import loos.pyloos
import numpy as np
import argparse
import json


def fullhelp():
print("""
all_stacking.py tracks the stacking of a set of cylindrical objects. In
principle, these could be anything, but the program is designed to work
with nucleic acid bases, where it will naturally track the average
stacking score for each base in the selection with all others.
Usage: all_stacking.py [OPTIONS] system_file selection_string outfile traj1 [traj2 ...]
-- system_file is a PDB, PSF, prmtop, etc
-- selection_string identifies the atoms to be analyzed. They will be
split by residue number to identify the individual sets to be
evaluated.
-- outfile will be a numpy-style text-file containing the results. It
will be a square matrix (num_res x num_res, where num_res is the
number of residues in the selection)
-- traj1 is a trajectory file, eg DCD, xtc. Its contents must precisely
match the system file. Optionally, you can include more than 1
trajectory file
Options
--skip: # of residues to exclude from the front of each trajectory
--stride: how to step through the trajectory (eg --stride 10 will read
every 10th frame)
--fullhelp: print this message
--base_carbons: apply an additional selection string to the one given
on the command line, which will exclude the backbone and only
include carbons atoms. This is for convenience only, since you
could do this yourself with the command line selection. We exclude
hydrogens because for RNA the base hydrogens are generally coplanar
with the carbons, and we exclude the nitrogens because in some
bases they're out of the plane.
--normfile: attempt to normalize the stacking relative to that present
for A-form RNA helices.
An example input for the --normfile option is stack_vals.json, found
in the share/ directory of the LOOS distribution, and installed in
$CONDA_PREFIX/share/loos/stack_vals.json if you installed LOOS into
a conda environment. These values were determined from a structure built as a perfect A-form RNA helix using NAB from AmberTools. The file
is basically a dictionary with 2 dash-separated residue names as the key and the stacking value as the value.
""")


class FullHelp(argparse.Action):
def __init__(self, option_strings, dest, nargs=None, **kwargs):
kwargs['nargs'] = 0
super(FullHelp, self).__init__(option_strings, dest, **kwargs)

def __call__(self, parser, namespace, values, option_string=None):
fullhelp()
parser.print_help()
setattr(namespace, self.dest, True)
parser.exit()


parser = argparse.ArgumentParser(description="Track stacking")
parser.add_argument('system_file', help="File describing the system")
parser.add_argument('selection_string',
help="Selection string describing which residues to use")
parser.add_argument('outfile_name',
help="File with the average contact occupancies")
parser.add_argument('traj_files', nargs='+')
parser.add_argument('--skip', type=int, default=0,
help="# of frames to skip")
parser.add_argument('--stride', type=int, default=1,
help="Ready every nth frame")
parser.add_argument('--base_carbons', action='store_true',
help="Just select the carbons in the base")
parser.add_argument('--fullhelp',
help="Print detailed description of all options",
action=FullHelp)
parser.add_argument('--normfile',
help="JSON file containing base normalizers",
default=None)

args = parser.parse_args()

header = " ".join([f"'{i}'" for i in sys.argv])

system = loos.createSystem(args.system_file)
all_trajs = []
for t in args.traj_files:
traj = loos.pyloos.Trajectory(t, system,
skip=args.skip,
stride=args.stride)
all_trajs.append(traj)

selection = loos.selectAtoms(system, args.selection_string)
if args.base_carbons:
selection = loos.selectAtoms(selection, '!backbone && name =~"^C"')

residues = selection.splitByResidue()

scores = np.zeros((len(residues), len(residues)))

# default box in case the system isn't actually periodic

total_frames = 0
for traj in all_trajs:
# Reset box in case some trajectories are periodic and others aren't.
# That would be very strange, but why not be safe.
box = loos.GCoord(1000., 1000., 1000.)
for frame in traj:
if frame.isPeriodic():
box = frame.periodicBox()
for i in range(len(residues)-1):
for j in range(i+1, len(residues)):
p = residues[i].stacking(residues[j], box, 5.0)
scores[i, j] += p
scores[j, i] += p
total_frames += 1
scores /= total_frames

# Normalize, if they asked for it
if args.normfile:
with open(args.normfile) as jsfile:
stack_values = json.load(jsfile)

resnames = []
for r in residues:
resnames.append(r[0].resname())

for i in range(len(scores)-1):
for j in range(i+1, len(scores)):
key = resnames[i] + "-" + resnames[j]
newval = scores[i, j] / stack_values[key]
scores[i, j] = newval
scores[j, i] = newval

np.savetxt(args.outfile_name, scores, header=header)
19 changes: 19 additions & 0 deletions Packages/PyLOOS/try_stacking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/usr/bin/env python3

import loos
import loos.pyloos
import sys

print("#", " ".join(sys.argv))

system = loos.createSystem(sys.argv[1])
traj = loos.pyloos.Trajectory(sys.argv[2], system)

res1 = loos.selectAtoms(system, sys.argv[3])
res2 = loos.selectAtoms(system, sys.argv[4])

for frame in traj:
box = system.periodicBox()
stacking = res1.stacking(res2, box, 5)
packing = res1.packingScore(res2, box, norm=True)
print(traj.index(), stacking, packing)
2 changes: 1 addition & 1 deletion share/SConscript
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Import('loos')
clone = env.Clone()
clone.Prepend(LIBS = [loos])

files = 'suitename_definitions.dat'
files = 'suitename_definitions.dat stack_vals.json'
PREFIX = env['PREFIX']

if env.USING_CONDA:
Expand Down
1 change: 1 addition & 0 deletions share/stack_vals.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"A-A": 0.5470554662549612, "A-U": 0.7722368874390322, "U-A": 0.27274844191431374, "A-C": 0.7657718319442695, "C-A": 0.2777381562201163, "A-G": 0.5422063399652962, "G-U": 0.7762320695815688, "U-U": 0.5164280783727266, "U-C": 0.5082728495378227, "C-U": 0.5229066443373507, "U-G": 0.2686105784227347, "G-C": 0.7698004475247754, "C-G": 0.27364915269837153, "C-C": 0.5147964714538514, "G-A": 0.5523359629418022, "G-G": 0.5477505960608738}
34 changes: 34 additions & 0 deletions src/AG_numerical.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -459,6 +459,39 @@ namespace loos {
}
}

greal AtomicGroup::stacking(const AtomicGroup& other,
const GCoord& box,
const double threshold=5.0) const {
GCoord c1 = centroid();
GCoord c2 = other.centroid();

GCoord dx = c2 - c1;
dx.reimage(box);
greal dx2 = dx.length2();
dx /= dx.length();

std::vector<GCoord> axes1 = principalAxes();
GCoord n1 = axes1[2];
std::vector<GCoord> axes2 = other.principalAxes();
GCoord n2 = axes2[2];

greal dot = n1*n2;
GCoord ave = 0.0;
if (dot < 0) {
ave = 0.5*(n2 - n1);
}
else {
ave = 0.5*(n2 + n1);
}

greal threshold2 = threshold * threshold;
greal mult = dx2/threshold2;
greal denom = 1 + mult*mult*mult;

greal val = dot * dot * (ave*dx) / denom;
return fabs(val);
}

std::vector<double> AtomicGroup::scattering(const double qmin, const double qmax,
const uint numValues,
loos::FormFactorSet &formFactors) {
Expand Down Expand Up @@ -513,4 +546,5 @@ namespace loos {
return values;
}


}
6 changes: 6 additions & 0 deletions src/AtomicGroup.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,12 @@ namespace loos {
greal sphericalVariance(const pAtom) const;
greal sphericalVariance(const GCoord) const;

//! Estimate stacking, as between two nucleobases
/** Algorithm: n1=normal to self; n2=normal to other, dx = difference between
centroids
* stacking = (n1*n2)^2 *[(n1 + n2)/2 * dx]/|dx| * 1/1 + (dx/threshold)^6
*/
greal stacking(const AtomicGroup&, const GCoord& box, const double threshold) const;

//! Compute the RMSD between two groups
/** Assumes a 1:1 correspondence between ith atoms.
Expand Down

0 comments on commit 11c54b1

Please sign in to comment.