[Opm] ParMETIS error on HPC

Atgeirr Rasmussen Atgeirr.Rasmussen at sintef.no
Tue Oct 20 09:43:14 UTC 2020


Hi Antoine!

Our partitioning scheme starts with the whole graph on a single process, so indeed this would be a "bad" starting partition. The partitioning we end up with does not seem any worse though, although for very large process counts, this could become a bottleneck.

I am a little confused though, as OPM Flow uses Zoltan for partitioning, not ParMETIS. This is because ParMETIS is not open source. However, if you do have access to ParMETIS I believe you can configure the dune-istl parallel linear solvers (that in turn are used by OPM Flow) to use ParMETIS (or the PTScotch workalike library) for redistribution of coarse systems within the algebraic multigrid (AMG) solver. However, that is not the default linear solver for OPM Flow. So I am a bit at a loss, as to where those ParMETIS messages come from! Did you run with the default linear solver or not? I assume that the simulation actually runs?

Atgeirr
________________________________
From: Opm <opm-bounces at opm-project.org> on behalf of Antoine B Jacquey <ajacquey at mit.edu>
Sent: 19 October 2020 14:22
To: opm at opm-project.org <opm at opm-project.org>
Subject: [Opm] ParMETIS error on HPC

Hi OPM community,

I recently compiled OPM Flow on a local cluster in my institute. I linked to the PartMETIS library during configuration to make use of mesh partitioning when using large number of MPI processes.
When I run a flow simulation, it seems that the mesh is partitioned automatically. Here is part of the output I get for a simulation with 8 MPI processes:

Load balancing distributes 300000 active cells on 8 processes as follows:
  rank   owned cells   overlap cells   total cells
--------------------------------------------------
     0         36960            2760         39720
     1         40110            3720         43830
     2         38100            4110         42210
     3         38940            2250         41190
     4         36600            2280         38880
     5         33660            3690         37350
     6         37800            3690         41490
     7         37830            2730         40560
--------------------------------------------------
   sum        300000           25230        325230

The problem occurs when I use a larger number of MPI processes (here for 27 MPI processes). The mesh is also partitioned:

Load balancing distributes 1012500 active cells on 27 processes as follows:
  rank   owned cells   overlap cells   total cells
--------------------------------------------------
     0         40230            6390         46620
     1         40185            5175         45360
     2         40635            4050         44685
     3         40230            6255         46485
     4         40905            5850         46755
     5         39825            6030         45855
     6         37035            2610         39645
     7         36945            5625         42570
     8         40680            4185         44865
     9         35835            5460         41295
    10         41250            6765         48015
    11         39825            5310         45135
    12         36855            2655         39510
    13         32850            3690         36540
    14         38790            5400         44190
    15         36540            5625         42165
    16         30105            3105         33210
    17         40320            5400         45720
    18         35685            4185         39870
    19         39465            5490         44955
    20         20160            1800         21960
    21         39915            4860         44775
    22         40050            6165         46215
    23         34020            2475         36495
    24         39645            6345         45990
    25         36990            6570         43560
    26         37530            4005         41535
--------------------------------------------------
   sum       1012500          131475       1143975

But during the first time step calculation, I get the following errors:

Time step 0, stepsize 1 days, at day 0/7, date = 01-Jan-2015
    Switching control mode for well INJ from RATE to BHP on rank 20
    Switching control mode for well INJ from BHP to RATE on rank 20
PARMETIS ERROR: Poor initial vertex distribution. Processor 2 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 4 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 6 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 8 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 12 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 14 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 16 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 18 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 20 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 0 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 10 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 22 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 26 has no vertices assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 24 has no vertices assigned to it!

Does anyone know what this error means? Is it coming because of a bad mesh partitioning or is it due to something else?

I would appreciate any advice or tip to solve this issue.
Thank you in advance.

Best,

Antoine


More information about the Opm mailing list