[Opm] ParMETIS error on HPC

Markus Blatt markus.blatt at opm-op.com
Wed Oct 21 20:28:15 UTC 2020


Hi,

On Mon, Oct 19, 2020 at 12:22:13PM +0000, Antoine B Jacquey wrote:
> 
> But during the first time step calculation, I get the following errors:
> 
> Time step 0, stepsize 1 days, at day 0/7, date = 01-Jan-2015
>     Switching control mode for well INJ from RATE to BHP on rank 20
>     Switching control mode for well INJ from BHP to RATE on rank 20
> PARMETIS ERROR: Poor initial vertex distribution. Processor 2 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 4 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 6 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 8 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 12 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 14 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 16 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 18 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 20 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 0 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 10 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 22 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 26 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 24 has no vertices assigned to it!
> 
> Does anyone know what this error means? Is it coming because of a bad mesh partitioning or is it due to something else?
>

in the AMG of dune-istl we try to agglomerate the linear system to successuvely fewer processors ( N -> n, with n<N if the
number of unknowns drops below a certain threshold. This uses Parmetis or the Parmetis bindings of PTScotch. This code should
be heavily tested on HPC computers with 300000 and more processes. Not sure whether there were recent changes to this code.

This error would mean that every process with an even rank, ends up with a linear system that has zero rows (owned by this
process). That seems a bit strange/unliekly, but you never know. There is an option in the AMG to skip isolated rows (not 100%
sure whether we use this in OPM). This would explain the situation if all rows on a process were isolated.

There has been quite some refactoring of the AMG bindings and the CPR in OPM during the last year. I assume the people
who did this refactoring have tested these changes in parallel but I personally have not. As AMG/Cpr are not the default
solver they might be tested less than the rest.

Hence there are multiple sites where these errors might come from. To replicate please provide DUNE and OPM version that
you are using together with CMakeCached.txt and config.h from the build-tree of opm-simulators. We can only replicate if
this happens for our test cases. If this is not the case, we can only help if you provide a test case.

In addition please perform a run with the parameter "--flow-linear-solver-verbosity=2" and provide the standard output. With this
can check whether my assumption above actually holds (or I have guessed wrong).

Cheers,

Markus

-- 
Markus Blatt
CTO @ OPM-OP AS, Heyerdahlsvei 12b, 0777 Oslo, Norway
https://opm-op.com | +4916097590858


More information about the Opm mailing list