[Opm] File formats

Tue May 19 08:33:20 UTC 2020

I've used HDF5 for 15+ years and it has never failed me. Parallel output support, compression support, slicing support, good tooling for all relevant languages (C/C++/Fortran), support in matlab, octave, python, command line.

It being old is an advantage, not a disadvantage as I see it.
________________________________
Fra: Opm <opm-bounces at opm-project.org> på vegne av Alf Birger Rustad <abir at equinor.com>
Sendt: tirsdag 19. mai 2020 09:49
Til: Joakim Hove <joakim.hove at opm-op.com>; opm at opm-project.org <opm at opm-project.org>
Emne: Re: [Opm] File formats

> The feasability of implementing/using said format in post processing
    tools should therefore be an important criteria.

I would even say a prerequisite. We already have it in opm-common in a shape that can be used without post processing tools, but if we are to support it within Flow, I believe we must have support in at least Resinsight.

> . I *think* Petrel / eclrun / eclipse has some functionality in this
    regard - if this is a file we can be compatible with that would make
    very much sense.

Thanks for pointing it out. Yes, there is such a format. There are a number of unknowns related to that format yet. What I believe already is clear is that it is not supported by Eclipse directly, so it is also of the type the is created after simulation is done. If anybody knows more about this format, please share.

> In addition to HDF5 I would consider looking into Parquet which at
    least is a much newer format than HDF5

Thanks for the suggestion! Yes, we should read up on alternatives before deciding. If anybody has any experience or knowledge on any of the containers, please share. I am in deep water here 😊

-----Original Message-----
From: Opm <opm-bounces at opm-project.org> On Behalf Of Joakim Hove
Sent: tirsdag 19. mai 2020 07:23
To: opm at opm-project.org
Subject: Re: [Opm] File formats

My take on this is:

 1. Yes I see the value of a transposed file format - however the value
    is quite limited before it is implemented in post processing tools.
    The feasability of implementing/using said format in post processing
    tools should therefor be an important criteria.
 2. I *think* Petrel / eclrun / eclipse has some functionality in this
    regard - if this is a file we can be compatible with that would make
    very much sense.
 3. In addition to HDF5 I would consider looking into Parquet which at
    least is a much newer format than HDF5

Here is an extensive file-format comparison:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Findico.cern.ch%2Fevent%2F613842%2Fcontributions%2F2585787%2Fattachments%2F1463230%2F2260889%2Fpivarski-data-formats.pdf&data=02%7C01%7Carne.morten.kvarving%40sintef.no%7Ca059780f6b12451f7b2308d7fbc95278%7Ce1f00f39604145b0b309e0210d8b32af%7C1%7C1%7C637254714402213950&sdata=5PCG3EcfcOKEvt3hJg%2BgjOerEhDzPuJFvesPGadS55c%3D&reserved=0

On 5/18/20 5:51 PM, Alf Birger Rustad wrote:
> Dear community,
>
> We are at a cross roads with respect to file formats, and I hope you are motivated to help us arrive at the best solution. We need better load-on-demand performance for summary files than what is currently possible with the default Eclipse format for summary files. Currently you will find an implementation in opm-common that simply transposes the summary vectors, while still using the same Fortran77 binary format. That approach has mainly three drawbacks. One is that it is not supported by any post-processing application (yet).
> The second is that it can only be created from a finished simulation, so you need to wait for simulations to finish before you get the performant result file.

For a traditional column oriented file format in any sense I think you will need to write out the file in full, i.e. I think this will apply anyways. Use of a database format might resolve this, or at least handle the appending transparently, but that is maybe a bit overkill?

> The third being that it is not suited for parallel processing, so forget about each process writing out it's part.

For the summary files that is not so relevant, because the final calculation of summary properties like WWCT = WWPR / (WWPR + WOPR) is only done on the IO rank anyway.

Joakim

_______________________________________________
Opm mailing list
Opm at opm-project.org
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopm-project.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopm&data=02%7C01%7Carne.morten.kvarving%40sintef.no%7Ca059780f6b12451f7b2308d7fbc95278%7Ce1f00f39604145b0b309e0210d8b32af%7C1%7C1%7C637254714402223906&sdata=Lpu7ijJS39XyPQLoWfhlCSPVxZ0%2B7FgUumMipjlTw0M%3D&reserved=0

-------------------------------------------------------------------
The information contained in this message may be CONFIDENTIAL and is
intended for the addressee only. Any unauthorized use, dissemination of the
information or copying of this message is prohibited. If you are not the
addressee, please notify the sender immediately by return e-mail and delete
this message.
Thank you
_______________________________________________
Opm mailing list
Opm at opm-project.org
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopm-project.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopm&data=02%7C01%7Carne.morten.kvarving%40sintef.no%7Ca059780f6b12451f7b2308d7fbc95278%7Ce1f00f39604145b0b309e0210d8b32af%7C1%7C1%7C637254714402223906&sdata=Lpu7ijJS39XyPQLoWfhlCSPVxZ0%2B7FgUumMipjlTw0M%3D&reserved=0