CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Programming & Development

HDF5 IO library for OpenFOAM

Register Blogs Community New Posts Updated Threads Search

Like Tree31Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   August 22, 2013, 11:32
Default HDF5 IO library for OpenFOAM
  #1
Senior Member
 
Join Date: Dec 2011
Posts: 111
Rep Power: 20
haakon will become famous soon enough
I have for some time developed an HDF5 IO library for OpenFOAM. This library can write the results from a simulation into a HDF5 archive instead of the usual (horrible) file-based structure OpenFOAM use as default. The major benefits does show up when you are increasing the number of processes (say in the range 1000-10000) and want to write more than a few timesteps to the disk. It is also highly useful if you are using the Lagrangian particle functionality of OpenFOAM, as this produces ~50 files per timestep per process. A nice addition is that the savings in terms of disk space is significant, however this depend on what IO format you compare against (ASCII, binary, with or without compression).

When the simulation is finished, the HDF5 archive can be parsed, and an XDMF metadata file written. This XDMF file can be opened in for example ParaView, VisIT or Ensight and the visualization is performed as for any other OpenFOAM case.

Another benefit is the ability to easily load the data into a tool like Matlab or Python to perform calculations or processing of the results. Personally I have used this to process data from fluid-particle simulations.

The code is found in a Github repository: https://github.com/hakostra/IOH5Write together with some installation instructions and hints. I hope that this code can be useful for the OpenFOAM community, and in special those of you that have access to a HPC system. In case any of you have any suggestions for improvements, please feel free to use this thread to discuss it.
egp, su_junwei, ngj and 13 others like this.
haakon is offline   Reply With Quote

Old   August 25, 2013, 07:29
Default
  #2
ngj
Senior Member
 
Niels Gjoel Jacobsen
Join Date: Mar 2009
Location: Copenhagen, Denmark
Posts: 1,903
Rep Power: 37
ngj will become famous soon enoughngj will become famous soon enough
Hi Håkan,

That is interesting, and I have myself been thinking of how to change the write format in OF. My motivation was long simulations, where I needed to output 2400 time folders for a lot of post-processing. The simulation was decomposed on 6 processors and each time folder contained 35-45 individual files, thus 0.5M-0.65M files.

Essentially, this should not have been a problem, because most of the files are based on the faMesh, so they are pretty small (say 1KB), but if you are downloading these from a cluster with a overloaded/slow infrastructure, it will take ages with a lot of small files.

I see that you are doing it through the functionObjects, so essentially OF keeps on doing its own outputting, so my question is, whether you have considered to make your code into an additional option in the controlDict at the same level as ascii, binary, compressed, uncompressed? It will be somewhat more interfering in the core of OF, but on the other hand the outputting would not be a dual process.

Kind regards

Niels
__________________
Please note that I do not use the Friend-feature, so do not be offended, if I do not accept a request.
ngj is offline   Reply With Quote

Old   August 26, 2013, 04:19
Default
  #3
Senior Member
 
Join Date: Dec 2011
Posts: 111
Rep Power: 20
haakon will become famous soon enough
Thank you for your interest in my work. If you ever try to compile it and try it out, I would appreciate your feedback and suggestions for improvement.

Your case is an example on why I developed this code. Even tough I never have been in the situation where I need 2400 timesteps written to disk, I often want to decompose the case massively, running it on several hundreds or thousands of processes. As for your case, my simulations produce approx. 40-50 files per process per timestep, hence the total number of files would become 45*1000=45 000 per timestep for 1000 processors. This is very problematic, especially on a parallel file system designed to handle a few, very large files. As far as I know, there are no HPC file systems on the market that are designed to cope with this amount of files of that size in an efficient manner.

Regarding the implementation, I think the current way is a good way, as it allows for (relatively) easy transitions between OpenFOAM versions.I do not need a single modification to the OpenFOAM core, and hopefully the amount of work needed when a new OF version is released is limited. For example, when going from 2.1.1 to 2.2.0 (or from 2.1.x to 2.2.x if you prefer that), I only needed to change one single line of code (if my memory is not playing with me).

Another factor for doing it the current way is that there is no restart functionality in the HDF5-plugin, i.e. currently you cannot take a field from the HDF5 file and use it as an initial condition for the restart of simulations. Therefore, I always specify a few writes in the "native" way (perhaps once every approx. 6-24 hours of walltime), in this way I can always restart a simulation in case of a crash.
haakon is offline   Reply With Quote

Old   August 26, 2013, 04:40
Default
  #4
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 30
akidess will become famous soon enough
Quote:
Originally Posted by haakon View Post
Another factor for doing it the current way is that there is no restart functionality in the HDF5-plugin, i.e. currently you cannot take a field from the HDF5 file and use it as an initial condition for the restart of simulations. Therefore, I always specify a few writes in the "native" way (perhaps once every approx. 6-24 hours of walltime), in this way I can always restart a simulation in case of a crash.
I think this was pretty much Niels' point - having full HDF5 capabilities (not just for postprocessing) would be great! I'd even try submitting it to the OpenFOAM foundation: http://www.openfoam.org/contrib/unsupported.php

In any case, thanks for sharing your code! Even "just" for postprocessing, it's quite a nice thing to have
ngj likes this.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
akidess is offline   Reply With Quote

Old   August 26, 2013, 08:22
Default
  #5
Senior Member
 
Join Date: Dec 2011
Posts: 111
Rep Power: 20
haakon will become famous soon enough
Quote:
Originally Posted by akidess View Post
I think this was pretty much Niels' point - having full HDF5 capabilities (not just for postprocessing) would be great! I'd even try submitting it to the OpenFOAM foundation: http://www.openfoam.org/contrib/unsupported.php
I think implementing full HDF5 capabilities in OF is a great task, since, as far as I can see, there is no central IO class or library. As long as the various parts of the code just dump data into streams ending up in files, it is a great job to change this behaviour. since every single part of the code that does IO needs to be modified.

Quote:
Originally Posted by akidess View Post
In any case, thanks for sharing your code! Even "just" for postprocessing, it's quite a nice thing to have
I would dare to say that "just" postprocessing is the main task and purpose of writing data. I really cannot find very many good reasons for writing gigabytes (or terabytes) of data to the disk without the need of any postprocessing or visualization of the data.

And in case there is a need for f.ex. restarting of simulations, I think it would be fairly easy to create a "HDF5ToFoam" converter, based on many of the "xxxxToFoam" converters already available.
ngj likes this.
haakon is offline   Reply With Quote

Old   November 18, 2013, 05:31
Default
  #6
Senior Member
 
Join Date: Dec 2011
Posts: 111
Rep Power: 20
haakon will become famous soon enough
A short update: I have now made a simple Pythin-progran, that uses h5py to read the metadata from the HDF5-files and write the corresponding XDMF-files. Both field data and Lagrangian clouds are supported, and all attributes present will be included in the file. One XDMF-file is generated for the field (mesh) data, and one for each cloud.

The profram is called 'writeXDMF.py', and a help message is displayed if you run it with '--help' argument. It require Python 3. The program/script is installed to $FOAM_USER_APPBIN when you run the ./Allwmake script.

'writeXDMF.py' makes the attached Matlab-files obsolete, however I have not yet removed them from the repository in case someone will use them as a basis for further work in Matlab.

My next area of focus will be to clear up some really, really, bad code in the writer module itself...
wyldckat and ganeshv like this.
haakon is offline   Reply With Quote

Old   November 23, 2013, 09:04
Default
  #7
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings to all!

@Håkon: I picked up on this thread when you made the recent post above. This is a very nice function object and I've taken the liberty of adding a quick reference page for it at openfoamwiki.net: http://openfoamwiki.net/index.php/Contrib/IOH5Write - Feel free to update the wiki page!
It's accessible from here: http://openfoamwiki.net/index.php/Ma...nction_objects


And I have also been wondering on how to add an optional input/output file format for field files in OpenFOAM, but I was thinking more along the lines of having an in-place replacement for OpenFOAM's IOstream related classes. In fact, by using SQLite.
But HDF5 makes a lot more sense! Although using HDF5 would require considerably more hacking than a mere replacement for IOstream... mmm... then again, maybe it wouldn't be all that hard.

Best regards,
Bruno
__________________
wyldckat is offline   Reply With Quote

Old   December 1, 2013, 16:41
Default
  #8
Member
 
Ganesh Vijayakumar
Join Date: Jan 2010
Posts: 44
Rep Power: 16
ganeshv is on a distinguished road
Dude,

This is super awesome! I will try this out. I have a couple of questions though.. feel free to ignore them. What you've done is more than enough!!

1. Have you benchmarked/recorded the speed up in write time? Esp. for large parallel cases? I'm running a case with 1760 and 4600 procs now... will be super happy if this will speed things up!
2. Does it make loading of large data sets any faster in Paraview? I have a case running with 60 million cells and another with 150 million? I'd be blessed if this works out to be faster!

Thanks again. Big fan!
ganeshv is offline   Reply With Quote

Old   December 2, 2013, 03:56
Default
  #9
Senior Member
 
Join Date: Dec 2011
Posts: 111
Rep Power: 20
haakon will become famous soon enough
1: I have done some benchmarking, yes. My conclusion is:
  • With few processes (relative to the case size), the difference in performance is none.
  • With large amount of processes, my HDF5 writer is faster. But it is difficult to estimate how much faster, since that of course will depend on how many timesteps you want to write, number of variables, disk system, numerical precision etc.
  • I have done all my benchmarks on a parallel file system capable of doing MASSIVE parallel I/O so my conclusions might not be valid on a small cluster with serial IO (on such a platform I suspect that HDF5 IO might be faster for smaller number of processes due to the savings in space = savings in amount to be written).
  • The number of variables to write is also significant. "My method" gives the user an opportunity to not write variables that are of no interest. If you only are interested in say, velocity and pressure, and not omega/k/epsilon etc, the gain might be larger. The comparisons and benchmarks are however based on a case where all variables present are written.
  • The main advantage is really the ability to store more data in a more optimal way. Compared to uncompressed OpenFOAM ASCII format you can store ~5 times as much data in the same space on the disk!
2: I think ParaView is dead slow anyways, I think that is more a ParaView memory handling issue than anything else. To be honest, I haven't benchmarked this, but for everyday purposes, I don't think there is that big difference.

If you end up testing it, I would really appreciate some feedback! But please remember that there are some limitations... I mainly developed this as a way of storing large amounts of particle data (order of magnitude 200 GB) and have not cared too much about flow fields.
haakon is offline   Reply With Quote

Old   December 2, 2013, 12:13
Default
  #10
Member
 
Ganesh Vijayakumar
Join Date: Jan 2010
Posts: 44
Rep Power: 16
ganeshv is on a distinguished road
Thanks. I think this is awesome and the way to go for future large parallel datasets.

As far as your comparison to uncompressed ASCII goes, I think it would be better to compare against the binary output in OpenFOAM. I think switching from uncompressed ascii to compressed ascii to binary itself results in the savings like you mention. However, I think the IO would be greatly improved simply because of writing to one file using optimized HDF5 rather than multiple thousand files... not to mention the ease of handling the files if you're transferring them to a different visualization cluster.

You mention in your README file that you haven't implemented writing out the boundary mesh and data simply because you are lazy. Could you tell me how to do that? I wouldn't mind implementing it.

btw.. I got your code to run on OpenFOAM-2.1.x and python 2.6. Required some changes. I think I'll fork your repo and upload it there.
ganeshv is offline   Reply With Quote

Old   December 2, 2013, 13:54
Default
  #11
Member
 
Ganesh Vijayakumar
Join Date: Jan 2010
Posts: 44
Rep Power: 16
ganeshv is on a distinguished road
Never mind explaining the boundary data part. I just realized that as far as XDMF is concerned, there's no difference between a volume element and a face element.... it treats both as cells. You've already written the point data out. So I just need to add the boundary topology at the end with quad/tri elements, corresponding data and almost no modifications to the XDMF file. I think that will work.

However I have a mesh that's rotating with no topology change but the geometry points are changing. So this currently requires a lot more work. I'll get to it some day!
ganeshv is offline   Reply With Quote

Old   December 2, 2013, 16:51
Default
  #12
Senior Member
 
Join Date: Dec 2011
Posts: 111
Rep Power: 20
haakon will become famous soon enough
Quote:
Originally Posted by ganeshv View Post
Never mind explaining the boundary data part. I just realized that as far as XDMF is concerned, there's no difference between a volume element and a face element.... it treats both as cells. You've already written the point data out. So I just need to add the boundary topology at the end with quad/tri elements, corresponding data and almost no modifications to the XDMF file. I think that will work.
Yes, that is more or less correct. It should not be too difficult, I have just not found time to do it myself yet. Programming CFD codes is not among my core tasks, and since I have not yet needed the boundary fields, I have not written that code.

Quote:
Originally Posted by ganeshv View Post
However I have a mesh that's rotating with no topology change but the geometry points are changing. So this currently requires a lot more work. I'll get to it some day!
I certainly don't think that is too much work either. I have implemented some moving mesh functionality in the code, but it writes both the points and cells if it detect a transient simulation. And the python-script to create the XDMF files will need some refreshments too for this to work.

Anyways, see https://github.com/hakostra/IOH5Writ...h5Write.C#L174
BTW: I have actually NEVER tried to use this on a dynamic mesh...

And as you correctly states in your previous post, the point of this code is not to save space on the disk, it is to make ones life easier when working on large clusters and postprocessing these large datasets. As an example, I work on a simulation with 50 million Lagrangian particles at the moment, and opening the HDF5 dataset in Python, calculating statistics, making plots and distributions based on these particle data is EASY. Parsing the OpenFOAM file format to do the same would have required a lot of coding just to read in particle locations and velocities.
haakon is offline   Reply With Quote

Old   January 23, 2014, 07:50
Default
  #13
Member
 
Join Date: Aug 2012
Posts: 33
Rep Power: 14
gigilentini8 is on a distinguished road
congrats!! very interesting project!! I hope it could improve the parallel visualization of big simulations.
However I am not able even to run the tutorial, please find attached the log file of compilation
I am using Ubuntu 12.04 + OF 2.2.2 + system HDF5 and system OMPI
That's the error I am getting during running:
Code:
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) MPI-process 3:
  #000: ../../../src/H5D.c line 141 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../../../src/H5Gloc.c line 241 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
Attached Files
File Type: txt compile-log.txt (21.3 KB, 5 views)
gigilentini8 is offline   Reply With Quote

Old   January 23, 2014, 07:55
Default
  #14
Senior Member
 
Join Date: Dec 2011
Posts: 111
Rep Power: 20
haakon will become famous soon enough
I most certainly think that your HDF5 version is too old. I know that I am using some features of the HDF5 library that is introduces recently, but I do not know exactly where the version cut-off is wrt. compatibility. Perhaps you can try version 1.8.9 or newer?
haakon is offline   Reply With Quote

Old   January 23, 2014, 11:47
Default
  #15
Member
 
Join Date: Aug 2012
Posts: 33
Rep Power: 14
gigilentini8 is on a distinguished road
I tried to install the new version from source but it still gives te same error.
did you check the warning that I get during the compilation? it could be related to that

Quote:
Originally Posted by haakon View Post
I most certainly think that your HDF5 version is too old. I know that I am using some features of the HDF5 library that is introduces recently, but I do not know exactly where the version cut-off is wrt. compatibility. Perhaps you can try version 1.8.9 or newer?
gigilentini8 is offline   Reply With Quote

Old   January 24, 2014, 09:55
Default
  #16
Member
 
Join Date: Aug 2012
Posts: 33
Rep Power: 14
gigilentini8 is on a distinguished road
Thanks Haakon
I solved by switching to OF22x and using Gcc instead of Intel
looking forward to testing it in big test cases

Quote:
Originally Posted by gigilentini8 View Post
I tried to install the new version from source but it still gives te same error.
did you check the warning that I get during the compilation? it could be related to that
gigilentini8 is offline   Reply With Quote

Old   January 27, 2014, 04:19
Default
  #17
Member
 
Join Date: Aug 2012
Posts: 33
Rep Power: 14
gigilentini8 is on a distinguished road
changing computer, same error:
Code:
HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 1:
  #000: hdf5-1.8.12/src/H5D.c line 141 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: hdf5-1.8.12/src/H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
tried both with Icc and Gcc, OF222 and OF22x
It only worked on my laptop with gcc4.6, icc and gcc4.7 gave me this error

any hint?
gigilentini8 is offline   Reply With Quote

Old   January 28, 2014, 18:00
Default
  #18
New Member
 
David Huckaby
Join Date: Jul 2009
Posts: 21
Rep Power: 17
dhuckaby is on a distinguished road
I think you can fix this error by commenting out line 222 in h5WriteCloud.C which current reads
" H5Sclose(fileSpace); "

Thanks Haakon for developing and releasing this tool.
dhuckaby is offline   Reply With Quote

Old   February 3, 2014, 07:44
Default
  #19
Senior Member
 
Join Date: Dec 2011
Posts: 111
Rep Power: 20
haakon will become famous soon enough
I am sorry for mt late reply in this issue that have come up here. I want to comment on a few things:

1: Line 222 of h5WriteCloud.C is now removed. Thanks for the bug report!

2: I am doing all development on Gcc, so If anyone have any problems with Intel Compilers, please let me know, and I will check it out. I have access to Icc as well, but does not use it on a daily basis.

3: I think you will need HDF5 version equal to or above 1.8.9 independent on this error/bug, but do not take that version for granted.

It now works for me, with Gcc 4.8, Linux Mint 16 and OpenFOAM 2.2.x, please let me know if anyone else encounter any issues.
haakon is offline   Reply With Quote

Old   February 3, 2014, 10:10
Default
  #20
Member
 
Join Date: Aug 2012
Posts: 33
Rep Power: 14
gigilentini8 is on a distinguished road
Thanks Haakon but it is still not working with OF 2.2.2 and Icc
Code:
h5Write::fileCreate:
HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 0:
  #000: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5F.c line 1503 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5F.c line 1274 in H5F_open(): unable to open file: time = Mon Feb  3 17:05:34 2014
, name = 'h5Data/h5Data0.h5', tent_flags = 13
    major: File accessibilty
    minor: Unable to open file
  #002: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FD.c line 987 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FDmpio.c line 1057 in H5FD_mpio_open(): MPI_File_open failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #004: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FDmpio.c line 1057 in H5FD_mpio_open(): MPI_ERR_OTHER: known error not in list
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
Quote:
Originally Posted by haakon View Post
I am sorry for mt late reply in this issue that have come up here. I want to comment on a few things:

1: Line 222 of h5WriteCloud.C is now removed. Thanks for the bug report!

2: I am doing all development on Gcc, so If anyone have any problems with Intel Compilers, please let me know, and I will check it out. I have access to Icc as well, but does not use it on a daily basis.

3: I think you will need HDF5 version equal to or above 1.8.9 independent on this error/bug, but do not take that version for granted.

It now works for me, with Gcc 4.8, Linux Mint 16 and OpenFOAM 2.2.x, please let me know if anyone else encounter any issues.
gigilentini8 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Other] Multi species mass transport library [update] novyno OpenFOAM Community Contributions 111 November 10, 2021 00:37
problem loading UDF library in parallel cluster Veera Gutti FLUENT 8 July 26, 2016 08:24
Compiled library vs. inInclude Files, DSMC solver crashes after run GPesch OpenFOAM Programming & Development 8 April 18, 2013 08:17
[swak4Foam] Installing swak4Foam to OpenFOAM in mac Kaquesang OpenFOAM Community Contributions 22 January 21, 2013 12:51
OpenFOAM141dev linking error on IBM AIX 52 matthias OpenFOAM Installation 24 April 28, 2008 16:49


All times are GMT -4. The time now is 13:21.