[Opm] OPM Flow multi-node simulations stuck at domain decomposition step
Yogi Pandey
yogi.pandey at oracle.com
Tue Mar 10 21:15:19 UTC 2020
All,
I am trying to run OPM Flow simulations on multiple nodes. I have built OPM Flow from source on Oracle Linux 7 OS (binary compatible with RHEL) with:
. GCC-8.3.1
. openmpi-4.0.2 (built from source)
. boost-1.72.0 (built from source)
. cmake-3.16.4 (built from source)
. parmetis-4.0.3 (built from source)
. dune-2.6.0: dune-common, dune-geometry, dune-grid, dune-istl (built from source)
. Zoltan-3.83 (built from source)
. OPM Flow modules are built using following commads:
o cmake -DCMAKE_BUILD_TYPE=Release -DUSE_MPI=ON -DUSE_OPENMP=ON -DBLAS_LIBRARIES=/usr/lib64 -DCMAKE_INSTALL_PREFIX=/usr/local ..
o sudo make
For Norne data set, following is the input file (params) content:
ecl-deck-file-name=NORNE_ATW2013.DATA
output-dir=out_parallel
output-mode=none
output-interval=1000000
enable-opm-rst-file=false
threads-per-process=1
Simulation is being run on 4 nodes with 32 processors each using following command:
mpirun --display-map -mca btl self -x UCX_TLS=rc,self,sm -x HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 -x UCX_IB_GID_INDEX=3 --cpu-set 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35 -np 144 --hostfile /etc/opt/rdma/hostfile /mnt/nfs-share/etc/opm-flow/opm-simulators/build/bin/flow --parameter-file=/mnt/nfs-share/data/norne/params
The simulation get stuck indefinitely at the domain decomposition step. I am able to finish a parallel run up to 3 nodes, but always getting stuck at 4 nodes.
I have also created some customized simulation decks with about 11 million cells to rule-out that fewer number of cells in the Norne model may be a reason, but the simulation gets stuck as soon as I scale from 1 node to 2 nodes. Can someone please help me understand, what might be causing it?
Thank you,
Yogi
More information about the Opm
mailing list