-
Notifications
You must be signed in to change notification settings - Fork 116
Matrix solvers: PETSc
PETSc is a library, used in fluidity, for solving linear systems. It contains optimised implementations of many algorithms, such as conjugate gradient, GMRES, etc. PETSc also links against and provides a consistent interface to lots and lots of other libraries implementing algorithms for solving linear systems.
Until recently fluidity used its own solver routines, SOLCG and GMRES, for solving linear systems. The reasons for switching to PETSc were:
- PETSc is very well optimised.
- PETSc is designed from the ground up for scalability. PETSc has been scaled to over 4000 processors in tests.
- PETSc is maintained by a dedicated team of programmers.
- PETSc has become a means for authors of new algorithms to publicise their algorithms to the wider scientific community. Several of the algorithms contained in PETSc were contributed by their authors, such as Prometheus and LGMRES. When new algorithms are added we get the extra benefit for no extra work.
Previously sections of fluidity called either the SOLCG or GMRES subroutine to solve a linear system, depending on whether the linear system is symmetric or not. In the new code, both of these are dummy wrappers around petsc_solve. petsc_solve takes in the CSR matrix, constructs a PETSc matrix datatype, sets the solver and preconditioner options, and then passes the system to PETSc to solve.
A full list of solvers is available at the PETSc solver list (http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/KSP/KSPType.html). The most common options are 'cg' for symmetric matrices and 'gmres' for asymmetric matrices (those solved with GMRES). However, several variants are available, including: biconjugate gradient ('bicg'), conjugate gradient squared ('cgs'), conjugate gradient on the normal equations ('cgne'), flexible GMRES ('fgmres'), LGMRES ('lgmres').
Nearly all solvers have options giving finer control over the solving process. See the PETSc documentation at http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/ for details.
If the ksp_type is set to 'legacy', the old solver code is used.
##Note:## if you use rotated boundary conditions you must use legacy as your solver.
A full list of preconditioners is available at the PETSc preconditioner list (http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/PC/PCType.html). Choosing a preconditioner is a little more complicated. Some only work in serial; trying to use one of these in parallel will result in an error.
Rules of thumb:
- in serial, use incomplete Cholesky factorisation ('icc') for symmetric systems, and incomplete LU factorisation ('ilu') or Eisenstat's version of SSOR ('eisenstat') for asymmetric systems.
- In parallel, use SSOR ('sor') for small to medium systems.
- Use the algebraic multigrid methods for large or ill-conditioned systems.
Algebraic multigrid is a solver technique that can be applied as a preconditioner. Two options available in PETSc are:
- [http://www.columbia.edu/~ma2325/ Prometheus], written by Mark Adams (-''matrixtype''_pc_type prometheus)
- BoomerAMG, part of the [http://www.llnl.gov/CASC/linear_solvers/ HYPRE] package (-''matrixtype''_pc_type hypre, -pc_hypre_type boomeramg) Algebraic multigrid methods are the best choice for large systems (because of their better scaling properties), and ill-conditioned systems, such as those in large aspect ratio problems, or more generally problems in which there is a large variety in length scales. An algebraic multigrid method targeted specifically at large-scale, large aspect ratio ocean problems is developed for the ICOM model (ask Stephan Kramer). It can be switched on with -''matrixtype''_pc_type mg (until I find a more original name for it). Note that it is best to use BoomerAMG with gmres because the AMG preconditioner by default is not symmetric.
Whenever in fluidity a linear solve hasn't completed succesfully it dumps out the equation in a file called ''matrixdump''. This file can be used to analyse the behaviour of this solve without having to rerun the model. For this purpose one can use petsc_readnsolve, one of the programs that are made by ''make fltools''. It reads in the ''matrixdump'' and tries to solve it with the solver options specified in the .flml file.
It is always a good idea to first reproduce the failing solve with the same options as it happened in fluidity. After that one can try to change the solver options in the .flml to see if that fixes the problem. The options under .../solver/diagnostics are particularly useful to diagnose the problem.
<fluidity_source_path>/bin/petsc_readnsolve FILENAME FIELDNAME [OPTIONS]
where FILENAME is the .flml filename, FIELDNAME is the name of the field for which the solve was failing (this will be used to look up its solver options in the .flml). If you are using code is based on one of the other available schemas, you can still use petsc_readnsolve by specifying -prns_flml FILENAME and -prns_field FIELDNAME where FILENAME is the name of your options file (e.g. file.swml) and FIELDNAME is as described above.
Further OPTIONS that may be provided:
Option | Description |
---|---|
-l | Write the information that is normally printed in the terminal to a log file called petsc_readnsolve.log-0, similar to the -l option of fluidity. This is particularly useful for parallel runs as the output gets split per process. |
-prns_verbosity N | Determines the amount of information that is print to the terminal. By default petsc_readnsolve uses the maximum verbosity (equivalent to -v3 in fluidity), this can be lowered with this option. |
-prns_filename file | reads from the specified ''file'' instead of ''matrixdump''. |
-prns_zero_init_guess | No initial guess is read from ''matrixdump'' instead the initial guess is zeroed. |
-prns_write_solution file | Writes solution vector to specified file so that it can be used for comparison in next runs of petsc_readnsolve (provided we are sufficiently confident in the accuracy of the obtained solution). |
-prns_read_solution file | Reads solution vector from the specified file, so that exact errors can be calculated(*). |
-prns_scipy | Writes out several files that can be read in scipy. This is a simple way of inspecting the actual values of the matrix, as these files are all in ascii format. |
-prns_random_rhs | Instead of the rhs in the ''matrixdump'', use a random rhs vector. |
Additionally all options that are available from the PETSc library may be added to the command line. Options specified in the .flml always take precedence however. Some PETSc options that may be useful:
Option | Description |
---|---|
-ksp_view | Information on all the solver settings. |
-mat_view_info | Information on matrix size |
-mat_view_draw | Draw the matrix nonzero structure |
-help | Gives an overview of all PETSc options that can be given for the selected solver/preconditioner combination. |
(*) For small matrices a good approximation of the exact solution can be found using a direct method: select iterative_method "preonly" and preconditioner "lu" in the .flml. Note however that for ill-conditioned matrices direct methods are very sensitive to round off errors.
When a solve fails in a parallel run, a single ''matrixdump'' file is written that represent the entire, global matrix equation that failed. Therefore one can run petsc_readnsolve ''in serial'' on this ''matrixdump''. Note however that typical parallel runs are very big and can easily be too big to run in serial. Moreover the behaviour of a solver in parallel is often different than in serial, so to reproduce the problem it's better to run petsc_readnsolve in parallel as well. This is done as usual by prepending "mpirun -np N" on the command line (where N is the number of processes).
petsc_readnsolve in parallel always needs to have the mesh files of the '''same mesh''' as the one used by fluidity during the failing solve. That means for adaptive runs, one needs to have a checkpoint at the point of the failing solve (for instance by selecting /io/checkpointing/checkpoint_at_end), and use the checkpoint flml for petsc_readnsolve. This is because petsc_readnsolve needs to use the same domain decomposition as used by fluidity. Running petsc_readnsolve on the same global mesh, but (re)decomposed differently than in the fluidity run will generally lead to a very bad solver performance.
In most cases the mesh files are not needed for serial runs of petsc_readnsolve, even if the fluidity run was parallel. With some solver settings however, at the moment precondtioner "mg" with vertical_lumping, the solver itself uses the geometric mesh information. In this case petsc_readnsolve needs the mesh files in serial as well and one cannot run petsc_readnsolve in serial after a parallel fluidity run (not even with the original serial mesh before fldecomp).
By setting the Local:Diagnostic_Output to at least 2, each solve call prints out the performed number of iterations and the reason why it returned from the solver, i.e. whether it converged or produced an error. So by running fluidity
fluidity -v2 ${PROJECT} > out
you can see the subsequent number of iterations in the different solves with:
grep iterations out
and the 'reason of convergence' with:
grep reason out
A negative number means the solve has not converged. See [http://www-unix.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-current/docs/manualpages/KSP/KSPConvergedReason.html KSPConvergedReason] for its specific meaning. Negative numbers nearly always mean there is something wrong with your model. After all the basic flow equations of the system are no longer accurately solved! There is one exception:
- During spin up time of the model, the mismatch between initial and boundary conditions may result in a very hard to solve system of equations. In this case you may get away with not accurately solving the equations while the system is adjusting. Bear in mind however that the results in this period should still be considered wrong and your run is only to produce more 'physical' initial conditions for the rest of your run (maybe after a restart), in which no further non-convergences should occur.
If a solve has not converged, the matrix equation will be dumped in a file called 'matrixdump'. This file can be further analysed with petsc_readnsolve, so that you can see whether there is something wrong with the matrix, or that other solver/preconditioner options work better. If the solver has not converged within the maximum number of iterations (reason -3), only the first matrix will be dumped and the model will continue running. All other reasons of non-converging/diverging (e.g. NaNs or wrong preconditioner) will halt fluidity at the end of the time step.
PETSc reason of convergence: 0 This usually means that PETSc has given some other error. Check the log output for error messages given by PETSc - these should be clearly visible, see the example below. Unfortunately they might not always exactly line up with the rest of the log output in terms of when it occurred.
An example of a failed PETSc solve returning with 'reason of convergence: 0' (in this case because the non-existent preconditioner 'foo' has been selected):
Going into petsc_(block_)solve_csr.
Solving matrix type: solmom_
Number of rows == 18
Matrix assembly completed.
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Unknown type. Check for miss-spelling or missing external package needed for type!
[0]PETSC ERROR: Unable to find requested PC type foo!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 2.3.3, Patch 0, Wed May 23 23:31:05 CDT 2007 HG revision: a170d2d2f21bc221150abdc41467e476f37518fd
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: ../../bin/dfluidity-debug on a amcg-gcc4 named ese-luigi by skramer Tue May 20 15:28:06 2008
[0]PETSC ERROR: Libraries linked from /data/stephan/debian/petsc/petsc-amcg-2.3.3/lib/amcg-gcc4-opt
[0]PETSC ERROR: Configure run at Mon Oct 8 19:38:43 2007
[0]PETSC ERROR: Configure options --with-debugging=0 PETSC_ARCH=amcg-gcc4-opt --with-parmetis-dir=/usr/ --with-hypre=yes --download-hypre=yes --with-prometheus=yes --download-prometheus=yes --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpiCC --with-mpi-dir=/usr/lib/mpich-amcg-gcc4 --with-mpi-compilers=1 --with-blas-lib=-lblas-3 --with-lapack-lib=-llapack-3 --with-mpi-shared=1 --download-ml=yes --with-ml=yes --with-shared=0
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: PCSetType() line 68 in src/ksp/pc/interface/pcset.c
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Unknown type. Check for miss-spelling or missing external package needed for type!
[0]PETSC ERROR: Unable to find requested PC type foo!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 2.3.3, Patch 0, Wed May 23 23:31:05 CDT 2007 HG revision: a170d2d2f21bc221150abdc41467e476f37518fd
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: ../../bin/dfluidity-debug on a amcg-gcc4 named ese-luigi by skramer Tue May 20 15:28:06 2008
[0]PETSC ERROR: Libraries linked from /data/stephan/debian/petsc/petsc-amcg-2.3.3/lib/amcg-gcc4-opt
[0]PETSC ERROR: Configure run at Mon Oct 8 19:38:43 2007
[0]PETSC ERROR: Configure options --with-debugging=0 PETSC_ARCH=amcg-gcc4-opt --with-parmetis-dir=/usr/ --with-hypre=yes --download-hypre=yes --with-prometheus=yes --download-prometheus=yes --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpiCC --with-mpi-dir=/usr/lib/mpich-amcg-gcc4 --with-mpi-compilers=1 --with-blas-lib=-lblas-3 --with-lapack-lib=-llapack-3 --with-mpi-shared=1 --download-ml=yes --with-ml=yes --with-shared=0
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: PCSetType() line 68 in src/ksp/pc/interface/pcset.c
[0]PETSC ERROR: PCSetFromOptions() line 180 in src/ksp/pc/interface/pcset.c
[0]PETSC ERROR: KSPSetFromOptions() line 214 in src/ksp/ksp/interface/itcl.c
ksp_type:gmres
pc_type: foo
ksp_max_it, ksp_atol, ksp_rtol, ksp_dtol: 1000 9.999999999999999E-012 1.000000000000000E-010 10000.0000000000
startfromzero: F
Assembling RHS.
RHS assembly completed.
Assembling initial guess.
Initial guess assembly completed.
Entering solver.
Out of solver.
solmom_ PETSc reason of convergence: 3
solmom_ PETSc n/o iterations: 0
Out of petsc_(block_)solve_csr.