This brief guide contains an introduction to the OPM Flow --linear-solver
command line option. We show basic and advanced usage of this feature, and how it can be used to run the linear solver on GPUs with OPMs gpu-ISTL framework.
Basic Usage
Choosing the right linear solver can have a large impact on simulation runtime. OPM Flow provides the --linear-solver
option which lets the user choose what linear solver to use. The default linear solver is a BiCGSTAB solver with the CPRW preconditioner, which is equivalent to specifying --linear-solver=cprw
. The option can be set to other values such as dilu
, ilu0
or cpr_trueimpes
, which will also choose BiCGSTAB as the linear solver, with the preconditioner of the provided name. All possible values of this option are described in the OPM Flow reference manual. To adjust the residual reduction target of the linear solver and the maximum number of iterations use --linear-solver-reduction
and --linear-solver-max-iter
.
Advanced Usage
You can inspect the generated .DBG
file of a completed run and find a JSON object containing the property tree of the linear solver. You can find this by searching for a left brace character “{” in the file. For example, if you set the linear solver to be ilu0
, it looks like this:
{
"tol": "0.01",
"maxiter": "200",
"verbosity": "0",
"solver": "bicgstab",
"preconditioner": {
"type": "ParOverILU0",
"relaxation": "0.90000000000000002",
"ilulevel": "0"
}
}
To tinker with these low-level properties of the linear system you can instead set the --linear-solver
option to be the path to a .json
file in your file system. The easiest way to do this is to copy the one from the output and start adjusting the values. This lets you adjust tolerances, maximum iterations, choose experimental preconditioners, choose other linear solvers and much more. Different preconditioners will also have different sets of parameters that can be adjusted.
Note: If you provide an argument both on the commandline and in the JSON file, the JSON file will be the one actually used.
Linear solver parameters:
tol
– The residual reduction target of the linear solver. If the linear solver does not manage to reduce the residual by this factor withinmaxiter
linear iterations, the linear solver is considered to have failed, and the length of the current timestep will be cut. There are exceptions to this that can be controlled when using other Flow arguments, such as--relaxed-linear-solver-reduction
, but they are not set within the linear solver JSON.maxiter
– The maximum number of linear iterations to complete per linear solve.verbosity
– The verbosity of the linear solver dictates how much extra information will be printed out during the simulation. Hack warning: if set to 11, or higher, this will print out each linear system to solve, quickly filling up the entire disc if one is not careful!solver
– Which linear solver to use, the possible values arebicgstab
,loopsolver
,gmres
,flexgmres
,umfpack
. If you have compiled with GPU supportgpubicgstab
is also available.
CPRW JSON Example
The CPRW preconditioner has many more adjustable parameters than the ILU-type preconditioners we saw in the previous JSON example. It is a two-stage preconditioner, meaning that it consists of two different preconditioners. The extracted pressure system is referred to as the coarse system, and the full linear system is referred to as the fine system. Usually we precondition the extracted scalar pressure systems with an AMG preconditioner, and the full linear system with an ILU-type preconditioner. Here are some fields that can be beneficial to tinker with:
finesmoother
– Thepreconditioner.finesmoother.type
field can take on the value of any preconditioner in OPM Flow, and is used to preconditioner the full linear system.type
(withinpreconditioner.coarsesolver.preconditioner
) – The coarse solver (i.e. pressure system within CPRW) preconditioner type. While typically an AMG variant, it can be any OPM preconditioner.smoother
(withinpreconditioner.coarsesolver.preconditioner
) – The coarse smoother is used on the hierarchical scalar linear systems of the AMG preconditoner. This smoother can also take on the value of any preconditioner in OPM Flow.pre_smooth
&post_smooth
(withinpreconditioner.coarsesolver.preconditioner
) – The number of times to apply the smoother before and after each AMG level.weight_type
– What weight types are used to extract the pressure system. Possible values includetrueimpes
,quasiimpes
,trueimpesanalytic
Here is what the default-generated JSON currently looks like:
{
"maxiter": "20",
"tol": "0.0050000000000000001",
"verbosity": "0",
"solver": "bicgstab",
"preconditioner": {
"type": "cprw",
"use_well_weights": "false",
"add_wells": "true",
"weight_type": "trueimpes",
"finesmoother": {
"type": "ParOverILU0",
"relaxation": "1"
},
"verbosity": "0",
"coarsesolver": {
"maxiter": "1",
"tol": "0.10000000000000001",
"solver": "loopsolver",
"verbosity": "0",
"preconditioner": {
"type": "amg",
"alpha": "0.33333333333300003",
"relaxation": "1",
"iterations": "1",
"coarsenTarget": "1200",
"pre_smooth": "1",
"post_smooth": "1",
"beta": "0",
"smoother": "ILU0",
"verbosity": "0",
"maxlevel": "15",
"skip_isolated": "0",
"accumulate": "1",
"prolongationdamping": "1",
"maxdistance": "2",
"maxconnectivity": "15",
"maxaggsize": "6",
"minaggsize": "4"
}
}
}
}
Tuning Example
We can run flow on the Norne case from the opm-data repository without specifying the linear solver to get a run with the CPRW preconditioner. To improve the runtime we run it in parallel using MPI. We run the following command on a system with an AMD Ryzen 7950x processor (16 cores):
mpirun -np 16 flow /path/to/NORNE_ATW2013.DATA --threads-per-process=1
We complete another run where we append --linear-solver=ilu0
to the command to change to the simpler preconditioner.
On this system the ILU0 run uses 30105 linear iterations, whereas the CPRW run produces only 2508 linear iterations. The runtime of the simulation on the tested system is 74 seconds with the ILU0 preconditioner and 68 seconds with CPRW preconditioned BiCGSTAB. By tuning the JSON in the .DBG
file for the CPRW run we can improve upon this. By changing the preconditioner’s finesmoother.type
and the smoother
to DILU
we reduce the runtime to 59 seconds. This is mainly because the construction and update of DILU is cheaper than that of ILU0. Further tweaks of this form can improve the runtime of the linear solver further. This sort of low-level tuning, especially of tolerances, will often be very case dependent.
gpu-ISTL Solvers
OPM Flow has a linear solver working on AMD and Nvidia GPUs called gpu-ISTL
, which is based on the Dune-ISTL
library. Use it by setting the solver
field in the linear solver JSON to gpubicgstab
. The option --matrix-add-well-contributions
must be set to true when running the gpubicgstab
linear solver, because we currently do not support the special well operators for linear solves on the GPU. Note that you can solve cases with wells on the CPU! This linear solver is currently supporting four GPU preconditioners:
GPUJAC
– a Jacobi preconditioner which takes in a relaxation factor in the fieldrelaxation
GPUILU0
– an ILU(0) implementation based on the vendor libraries hipSPARSE and cuSPARSE, takes in a relaxation factor in the fieldrelaxation
.OPMGPUILU0
– an ILU(0) implementation created by OPM developers, typically faster thanGPUILU
. Takes in the boolsplit_matrix
adjusting an underlying datastructure, and the booltune_gpu_kernels
, which adds a calibration phase optimizing parameters determining the runtime of the preconditioner.GPUDILU
– a DILU implementation which also takes insplit_matrix
andtune_gpu_kernels
.
Here is an example JSON using the gpu-ISTL
solver:
{
"tol": "0.01",
"maxiter": "200",
"verbosity": "0",
"solver": "gpubicgstab",
"preconditioner": {
"type": "OPMGPUILU0",
"split_matrix": "true",
"tune_gpu_kernels": "true",
}
}
Running gpubicgstab
with both DILU and ILU0 as preconditioners on Norne with an Nvidia RTX 4070Ti. we see that the DILU run uses 224 seconds on the linear solver, whereas ILU0 run uses 251 seconds on the linear solver. Since only the linear solver currently is supported on GPU there is significant memory traffic between the CPU and the GPU during the simulation. Because of this the cases usually should contain at least a few hundred thousand cells before it will be worth it to use the GPU backend. On SPE11C for instance, cases containing at least 270k cells are typically faster on a GPU with ILU0 than a single multicore CPU with CPR. Norne only has 44 thousand cells, and is thus not expected to be any faster on a GPU. Since CPR-type preconditioners are currently not supported on the GPU there are also some cases with very difficult linear systems that will be solved slowly on the GPU.
The gpu-ISTL solvers have mainly been tested using one MPI process, although it does support using multiple GPUs with one GPU per MPI process. Performance will be very poor if there is more than one MPI process per GPU available.
The GPU preconditioners are still in an experimental state and may change in the future. This is the interface as of OPM Flow version 24.10.
Tobias Meyer Andersen, SINTEF