[Opm] ParMETIS error on HPC

Antoine B Jacquey ajacquey at mit.edu
Tue Oct 20 13:05:36 UTC 2020


Hi Atgeirr,

I use AMG as preconditioner  (—use-amg=true). When the ParMETIS errors occur, the simulation crashes.
I indeed linked to the ParMETIS library when configuring DUNE.

Do you actually advise to use PTScotch instead of ParMETIS? I could try to recompile Dune + OPM with PTScotch to see if the simulation runs with this configuration.

Thanks for your answer.

Antoine

> On Oct 20, 2020, at 05:43, Atgeirr Rasmussen <Atgeirr.Rasmussen at sintef.no> wrote:
> 
> Hi Antoine!
> 
> Our partitioning scheme starts with the whole graph on a single process, so indeed this would be a "bad" starting partition. The partitioning we end up with does not seem any worse though, although for very large process counts, this could become a bottleneck.
> 
> I am a little confused though, as OPM Flow uses Zoltan for partitioning, not ParMETIS. This is because ParMETIS is not open source. However, if you do have access to ParMETIS I believe you can configure the dune-istl parallel linear solvers (that in turn are used by OPM Flow) to use ParMETIS (or the PTScotch workalike library) for redistribution of coarse systems within the algebraic multigrid (AMG) solver. However, that is not the default linear solver for OPM Flow. So I am a bit at a loss, as to where those ParMETIS messages come from! Did you run with the default linear solver or not? I assume that the simulation actually runs?
> 
> Atgeirr
> ________________________________
> From: Opm <opm-bounces at opm-project.org> on behalf of Antoine B Jacquey <ajacquey at mit.edu>
> Sent: 19 October 2020 14:22
> To: opm at opm-project.org <opm at opm-project.org>
> Subject: [Opm] ParMETIS error on HPC
> 
> Hi OPM community,
> 
> I recently compiled OPM Flow on a local cluster in my institute. I linked to the PartMETIS library during configuration to make use of mesh partitioning when using large number of MPI processes.
> When I run a flow simulation, it seems that the mesh is partitioned automatically. Here is part of the output I get for a simulation with 8 MPI processes:
> 
> Load balancing distributes 300000 active cells on 8 processes as follows:
>  rank   owned cells   overlap cells   total cells
> --------------------------------------------------
>     0         36960            2760         39720
>     1         40110            3720         43830
>     2         38100            4110         42210
>     3         38940            2250         41190
>     4         36600            2280         38880
>     5         33660            3690         37350
>     6         37800            3690         41490
>     7         37830            2730         40560
> --------------------------------------------------
>   sum        300000           25230        325230
> 
> The problem occurs when I use a larger number of MPI processes (here for 27 MPI processes). The mesh is also partitioned:
> 
> Load balancing distributes 1012500 active cells on 27 processes as follows:
>  rank   owned cells   overlap cells   total cells
> --------------------------------------------------
>     0         40230            6390         46620
>     1         40185            5175         45360
>     2         40635            4050         44685
>     3         40230            6255         46485
>     4         40905            5850         46755
>     5         39825            6030         45855
>     6         37035            2610         39645
>     7         36945            5625         42570
>     8         40680            4185         44865
>     9         35835            5460         41295
>    10         41250            6765         48015
>    11         39825            5310         45135
>    12         36855            2655         39510
>    13         32850            3690         36540
>    14         38790            5400         44190
>    15         36540            5625         42165
>    16         30105            3105         33210
>    17         40320            5400         45720
>    18         35685            4185         39870
>    19         39465            5490         44955
>    20         20160            1800         21960
>    21         39915            4860         44775
>    22         40050            6165         46215
>    23         34020            2475         36495
>    24         39645            6345         45990
>    25         36990            6570         43560
>    26         37530            4005         41535
> --------------------------------------------------
>   sum       1012500          131475       1143975
> 
> But during the first time step calculation, I get the following errors:
> 
> Time step 0, stepsize 1 days, at day 0/7, date = 01-Jan-2015
>    Switching control mode for well INJ from RATE to BHP on rank 20
>    Switching control mode for well INJ from BHP to RATE on rank 20
> PARMETIS ERROR: Poor initial vertex distribution. Processor 2 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 4 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 6 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 8 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 12 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 14 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 16 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 18 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 20 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 0 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 10 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 22 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 26 has no vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 24 has no vertices assigned to it!
> 
> Does anyone know what this error means? Is it coming because of a bad mesh partitioning or is it due to something else?
> 
> I would appreciate any advice or tip to solve this issue.
> Thank you in advance.
> 
> Best,
> 
> Antoine
> _______________________________________________
> Opm mailing list
> Opm at opm-project.org
> https://opm-project.org/cgi-bin/mailman/listinfo/opm

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1866 bytes
Desc: not available
URL: <//opm-project.org/pipermail/opm/attachments/20201020/caf9d756/attachment.bin>


More information about the Opm mailing list