[Opm] OPM Flow multi-node simulations stuck at domain decomposition step
Yogi Pandey
yogi.pandey at oracle.com
Wed Mar 11 14:56:50 UTC 2020
Hi Dr. Blatt,
I am using OPM Flow version 2020.04-pre. Each node has 384 GB memory, and I don't see any memory problem indicated by kswapd appearing.
With some hit and trial I arrived at a model with just 1 producer and 1 injector, with 11 million cells, which scales on 4 nodes. My impression is, it's well location and groups, which may be causing a problem during domain decomposition.
Thank you,
Yogi
-----Original Message-----
From: Markus Blatt [mailto:markus at dr-blatt.de]
Sent: Wednesday, March 11, 2020 5:04 AM
To: Yogi Pandey <yogi.pandey at oracle.com>
Cc: opm at opm-project.org
Subject: Re: [Opm] OPM Flow multi-node simulations stuck at domain decomposition step
On Tue, Mar 10, 2020 at 02:15:19PM -0700, Yogi Pandey wrote:
> All,
> I am trying to run OPM Flow simulations on multiple nodes. I have built OPM Flow from source on Oracle Linux 7 OS (binary compatible with RHEL) with:
>
> [...]
>
> . OPM Flow modules are built using following commads:
>
> o cmake -DCMAKE_BUILD_TYPE=Release -DUSE_MPI=ON -DUSE_OPENMP=ON -DBLAS_LIBRARIES=/usr/lib64 -DCMAKE_INSTALL_PREFIX=/usr/local ..
>
> o sudo make
>
>
>
> For Norne data set, following is the input file (params) content:
>
> ecl-deck-file-name=NORNE_ATW2013.DATA
> [...]
>
> Simulation is being run on 4 nodes with 32 processors each using following command:
>
> mpirun --display-map -mca btl self -x UCX_TLS=rc,self,sm -x HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 -x UCX_IB_GID_INDEX=3 --cpu-set 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35 -np 144 --hostfile /etc/opt/rdma/hostfile /mnt/nfs-share/etc/opm-flow/opm-simulators/build/bin/flow --parameter-file=/mnt/nfs-share/data/norne/params
>
>
>
> The simulation get stuck indefinitely at the domain decomposition step. I am able to finish a parallel run up to 3 nodes, but always getting stuck at 4 nodes.
>
>
>
> I have also created some customized simulation decks with about 11 million cells to rule-out that fewer number of cells in the Norne model may be a reason, but the simulation gets stuck as soon as I scale from 1 node to 2 nodes. Can someone please help me understand, what might be causing it?
>
>
WHich version of OPM are using? If you are using the release, then chances are that you might simply run out of available memory. You could check that with top or htop on one of the machines and look for the kswapd process popping up.
>
--
Dr. Markus Blatt
OPM-OP AS
More information about the Opm
mailing list