[Opm] OPM Flow multi-node simulations stuck at domain decomposition step

Kai Bao Kai.Bao at sintef.no
Wed Mar 11 07:53:40 UTC 2020


Hi, Yogi,

I see you are trying to run Norne with 144 processes.  Did you see the problem with much less processes, for example 4 or 8 processes?

In my opinion, with the current approach for domain decomposition, it can be challenging to run Norne with so many processes, considering the relatively small size of the Norne model and the long wells existing in this model.  I am not totally sure though.

Best Regards,
Kai Bao
________________________________
From: Opm <opm-bounces at opm-project.org> on behalf of Yogi Pandey <yogi.pandey at oracle.com>
Sent: Tuesday, March 10, 2020 10:15 PM
To: opm at opm-project.org <opm at opm-project.org>
Subject: [Opm] OPM Flow multi-node simulations stuck at domain decomposition step

All,



I am trying to run OPM Flow simulations on multiple nodes. I have built OPM Flow from source on Oracle Linux 7 OS (binary compatible with RHEL) with:

.        GCC-8.3.1

.        openmpi-4.0.2 (built from source)

.        boost-1.72.0 (built from source)

.        cmake-3.16.4 (built from source)

.        parmetis-4.0.3 (built from source)

.        dune-2.6.0: dune-common, dune-geometry, dune-grid, dune-istl (built from source)

.        Zoltan-3.83 (built from source)

.        OPM Flow modules are built using following commads:

o   cmake -DCMAKE_BUILD_TYPE=Release -DUSE_MPI=ON -DUSE_OPENMP=ON -DBLAS_LIBRARIES=/usr/lib64 -DCMAKE_INSTALL_PREFIX=/usr/local ..

o   sudo make



For Norne data set, following is the input file (params) content:

ecl-deck-file-name=NORNE_ATW2013.DATA

output-dir=out_parallel

output-mode=none

output-interval=1000000

enable-opm-rst-file=false

threads-per-process=1



Simulation is being run on 4 nodes with 32 processors each using following command:

mpirun --display-map -mca btl self -x UCX_TLS=rc,self,sm -x HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 -x UCX_IB_GID_INDEX=3 --cpu-set 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35 -np 144 --hostfile /etc/opt/rdma/hostfile /mnt/nfs-share/etc/opm-flow/opm-simulators/build/bin/flow --parameter-file=/mnt/nfs-share/data/norne/params



The simulation get stuck indefinitely at the domain decomposition step. I am able to finish a parallel run up to 3 nodes, but always getting stuck at 4 nodes.



I have also created some customized simulation decks with about 11 million cells to rule-out that fewer number of cells in the Norne model may be a reason, but the simulation gets stuck as soon as I scale from 1 node to 2 nodes. Can someone please help me understand, what might be causing it?



Thank you,

Yogi


_______________________________________________
Opm mailing list
Opm at opm-project.org
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopm-project.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopm&data=02%7C01%7Ckai.bao%40sintef.no%7C1ac6994dae0842f9a4a008d7c5387102%7Ce1f00f39604145b0b309e0210d8b32af%7C1%7C1%7C637194718508875070&sdata=dk%2BNtwSBcDA9%2F2ubjoNpmQcLmp1tt63ftJkHkBcUsq0%3D&reserved=0


More information about the Opm mailing list