-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme efficiency enhancements #151
Draft
rwsmith7531
wants to merge
97
commits into
MaginnGroup:master
Choose a base branch
from
rwsmith7531:extreme_efficiency_enhancements
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Extreme efficiency enhancements #151
rwsmith7531
wants to merge
97
commits into
MaginnGroup:master
from
rwsmith7531:extreme_efficiency_enhancements
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Previously, the CBMC Fragment_Placement subroutine would sometimes choose a dihedral trial with trial overlap despite its weight being zero. This was fixed by changing the "<=" operator to "<" and flagging cbmc_overlap if none of the trials are picked.
…enerated with the bug.
This results in creation of rminsq file for each tolerance. Also output information regarding maximum individual widom_var for insertions that would have been excluded if the rminsq table had been used.
Combine xtc reading capability with atompair rminsq table feature.
…he stack or the heap.
…rgy_table usage in documentation.
This updates the Makefiles and improves the linking of the xtc reader libraries. It also adds more tests.
…or list. Members of gathered overlap cells and cell neighbor lists are now filtered by proximity. CBMC cell list option would now be more appropriately called a cell neighbor list method, since the possible neighbors for a cell are now gathered and filtered by proximity. CBMC cells are now the same size as overlap cells; the gathering algorithm just searches more cells to capture all possible neighbors. Trial insertion of first fragment in CBMC are now greatly vectorized. CBMC dihedral trials are not yet, but applying vectorization and bitcell overlap detection to dihedral trials should be fairly straightforward. Dimension padding currently assumes vector size no greater than 256 bits (the size of AVX2 vector registers), and if we want Cassandra to support AVX-512, changes need to be made to accommodate that since it would violate the alignment assumptions made in some ifort compiler directives. While intermolecular CBMC energy estimation is vectorized when used with CBMC cell neighbor lists, it can apparently sometimes still be slightly slower than directly computing the energy, most likely due to slower memory access for the very large, precomputed energy table. I still left it as an option though because for more expensive force fields, it may be faster. Some cheap WRITE statements used for debugging are still present in the code and should probably be removed to avoid excessive verbosity, especially to STDOUT. Repeating an old simulation (from before this commit) using the same seeds and simulation options will not give identical results even with a single thread due to the way CBMC insertion trial positions are calculated from the random numbers differing from how it used to be done; for example, using rranf() - 0.5 instead of 0.5 - rranf() as fractional COM coordinate. Restricted insertion trial coordinates are now generated within the inner volume the first time, rather than being generated anywhere in the box and re-generating them within the inner volume them if they're outside the inner volume, as was done previously, and this process is now vectorized. Widom insertions will no longer be restricted ever, even if the inserted species is designated with restricted GCMC insertions. It's likely this was never a problem for anyone, but this fix should make sure it won't be a problem in the future. If restricted Widom insertions are ever allowed in the future, additional changes will need to be made for it to be done properly.
… to RB form where possible. All OPLS dihedrals are internally converted to RB torsions now because RB torsions are much faster to compute. CHARMM style dihedrals are converted to RB torsions when it is possible to do so (I don't think I've seen one that isn't possible to convert to RB but they might exist), and they are left as CHARMM style if it isn't possible to convert to RB format. All dihedrals formatted as RB torsions (whether explicitly input or internally converted) that are stacked on the same or reverse 4-atom sequence as each other are collapsed into a single RB torsion by adding together the coefficients of the stacked RB torsions. RB torsions are implemented in the protein convention (based on phi) in Cassandra, like the other dihedral types are. This differs from how they are implemented in GROMACS, which uses the polymer convention (based on psi, which is phi - pi). To convert from one convention to the other (either direction), simply flip the sign of the coefficients of the even-powered terms of the series. I also commented out the code that reads parameters for AMBER-style dihedrals because Cassandra has no code to compute the energies of AMBER-style dihedrals and they aren't converted to another style either. Dihedral styles are now allowed to be specified in all-caps or all lowercase in the mcf files, to make things more user-friendly. I also renamed get_internal_coords.f90 to internal_coordinate_routines.f90 and made Internal_Coordinate_Routines a module, since the file previously just contained a collection of subroutines, one of which is named Get_Internal_Coords, not encompassed by a module.
…ore vectorization. Also designate several procedures as ELEMENTAL for ease of use and optimization. Add optional argument l_skip_dihed_vec to Compute_Molecule_Dihedral_Energy that specifies which dihedrals to skip computing energy for. Allow different species to use different CBMC kappa values. Add way to specify minimum ideal_bitcell_length, which overrides the ideal_bitcell_length computed by the default method if it is greater than the computed ideal_bitcell_length, but not if it is smaller, since the computed value is the minimum value required for the algorithm to work properly. The user-defined minimum is an option because it can be beneficial to lower the resolution of the bitcell grid so it occupies less memory and allows faster memory access and probably has a lower cache miss rate. This will result in the bitcell overlap method catching fewer overlaps (which will be caught by the cell list overlap detection instead if they are overlaps), but allowing the bitcells to be checked faster can be worth it (tested with min_ideal_bitcell_length = 0.2). New function Excess_Molecule_Intrafragment_Energy was added, and optional excess_flag_o and/or minimg_flag_o arguments were added to a few subroutines to cause them to instead compute the "excess" energy (energy minus what it would be if computed with the minimum image sum style as during fragment library generation) and minimum image energy (essentially forces the subroutine to act as if the sum style is minimum image, even if it isn't) so you only get the intramolecular parts you need for Widom insertions. Interfragment intramolecular energy is now optionally output by Build_Molecule as E_interfrag, though the logic in Fragment_Placement causes it to only do so during Widom insertions. That should probably be changed if/when the new intramolecular energy accounting done in Widom insertions is applied to other CBMC moves. Widom insertions now include no intramolecular energy except for what is computed by Excess_Molecule_Intrafragment_Energy and the interfragment intramolecular energy, since any remaining parts would have been used to generate the fragment libraries and would have to be subtracted back out if included, which is inefficient. This method should be more robust than what was previously done, and should probably be applied to other CBMC moves as well. Use of undamped shifted force method for coulombic interactions in CBMC trial energies is only partially implemented.
…ments. Improve vectorization of reciprocal ewald energy calculation for Widom insertions. Add vectorized random number generation subroutines, which are used in Build_Molecule. Fix bug causing problems when writing fragment mcf file with RB torsions. Change file unit numbers to not be problematic for Widom insertion simulations with more than 10 species. Stop wasting time setting and applying bitcell overlap mask where mask bits are known to be permanently zero. Allow user to specify the use of shifted force electrostatics for cbmc trial energy calculation.
… and optimize overlap voxel grid setup. Add a minor optimimization to vectorized random number generation. Correct stack memory inflation due to certain array bounds increasing by 8 every Widom insertion frame. Add some code to help visualize bitcell overlap detection masks and grids outside Cassandra; this will need to be removed eventually.
Cavity biasing is implemented. This is the version of Cassandra used for the simulations in Ryan Smith's dissertation chapter 4 and the test particle insertion enhancement paper unless they are later rerun with a faster version. BOVINE overlap checking code is made more concise with forced inlining. Atom ID pair overlap radius optimization histogram creation portion of widom_insert is made robust to some atomic overlap not being detected due to floating point rounding, which hasn't been a problem but possibly could have been if the algorithm were not made robust. It is also now parallelized with OMP WORKSHARE without leaving and re-entering the parallel region.
…matrix basis. Previously, Cassandra's trajectory reader would not work when the trajectory coordinates were PBC-wrapped by atoms rather than by molecule center of mass or not at all (unwrapped) if the trajectory molecules are polyatomic since Cassandra wraps molecules by center of mass, which requires molecules to be intact. This commit allows the trajectroy reader to repair partially-wrapped molecules. It also optimizes parts of the trajectory reader, including more vectorization. The LAMMPS trajectory conversion script now accepts wrapped coordinates, not just unwrapped coordinates. The subroutine Load_Next_Frame and other (non-XTC) trajectory reader procedures are now included in a module, Trajectory_Reader_Routines, rather than having Load_Next_Frame be a non-module subroutine containing the other (non-XTC) trajectory reader procedures. Box cell matrices are now automatically converted to the upper triangular form used by LAMMPS, since it allows better optimization. Coordinates loaded from a trajectory file, checkpoint file, or configuration file are automatically converted to the new basis if the basis is changed.
…ength do detect box size/shape change.
Improve vectorization and multithreading and improve mathematical formulation of Ewald summation. Also overhaul data structures for Ewald data and molecule pair energy and replace large array copying with memory allocation transfers or remove them entirely (if they're unnecessary). The arbitrary limit on the number of kspace vectors was removed and replaced with a much larger limit based on implementation limitations that are unlikely to be met for sane systems. If the new limits ever become too low for a sane application, the limits may be increased by updating the integer components of a kspace vector to be encoded in a 64-bit integer instead of a 32-bit integer. For triclinic and non-cubic, orthogonal boxes, the range within which to check kspace vectors is automatically computed based on the face distances of a box in reciprocal space for which the cell matrix is the transpose of the inverse of the cell matrix for the real box. Previously, the range to check in reciprocal space was hardcoded for triclinic and non-cubic, orthogonal boxes.
Also improve cavity biasing random position generation and use 32-bit integers to encode cavity voxel coordinates when voxel grid is small enough. This commit also adds a "compatibility mode" that enabled by default in this commit. When compatibility_mode is true, several changes are made to the CBMC routines to try to emulate their old implementation. Although the new implementation is correct, it generates and uses random numbers differently, causing many tests to fail at the moment. Also improve trajectory reader parallelization and efficiency. Add special system Ewald reciprocal energy routine for simulations using the trajectory reader. Trajectory reader simulations (sim type pregen) don't need sin_mol and cos_mol, so they are not allocated.
…t padding and vectorization accordingly. This commit also adds lossless compression for cavity_locs and cavity_locs_int32, which store cavity voxel locations. Target architecture optimization flags were added to gfortran Makefiles. This commit also reduces stack usage when creating atompair_nrg_table_reduced, which would previously sometimes cause Cassandra to run out of stack space unless the stack size limit is increased from the default, depending on the default limit and memory requirements. For the Intel compiler, Cassandra derives memory padding parameters from the -align arraynbyte compiler option. For the gfortran compiler, Cassandra derives this from the -m option, such as -mavx2 or -msse4.2. With gfortran, the -m option should always be included even if it is redundant with -march since Cassandra uses it to determine memory padding and in rare cases vector size.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Pairwise intermolecular energy computation vectorization (except for compute system total energy)
Cell list overhaul and switch to cell neighbor lists
Add option to use cell neighbor lists for CBMC neighbor-finding
Refactored and vectorized CBMC routines (mainly just those used for Widom insertions)
Vectorized RNG for CBMC insertion trial positions
BOVINE
BOVINE-Cavs
Refactored and vectorized Ewald summation reciprocal part.
Widom insertion energy computation redundancy elimination
Double-precision fast inverse square root function in energy_routines.f90
Describe your changes in detail
ppvdw_table2_sp
. The intermolecular portion is computed with double precision. This could be done in single precision and could be done in the future. This change affects any CBMC move for a multifragment molecule (i.e. anything that usesfragment_placement
, like GEMC, NVT, NPT).Related Issue
This project only accepts pull requests related to open issues
If suggesting a new feature or change, please discuss it in an issue first
If fixing a bug, there should be an issue describing it with steps to reproduce
Please include a reference to the issue.
How Has This Been Tested?
Please describe in detail how you tested your changes.
Include details, and the tests you ran to.
see how your change affects other areas of the code, etc.
Backward Compatibility
Please state whether any changes in the pull request break backward compatibility for inputs, and - if yes - explain what has been
changed and why.
Post Submission Checklist
Please check the fields below as they are completed
/Documentation/source/reference/acknowledgements.rst
Further Information, Files, and Links
Any additional information here, attach relevant text or image files and URLs to external sites, publications , etc.