|
[Sponsors] |
August 22, 2013, 11:32 |
HDF5 IO library for OpenFOAM
|
#1 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
I have for some time developed an HDF5 IO library for OpenFOAM. This library can write the results from a simulation into a HDF5 archive instead of the usual (horrible) file-based structure OpenFOAM use as default. The major benefits does show up when you are increasing the number of processes (say in the range 1000-10000) and want to write more than a few timesteps to the disk. It is also highly useful if you are using the Lagrangian particle functionality of OpenFOAM, as this produces ~50 files per timestep per process. A nice addition is that the savings in terms of disk space is significant, however this depend on what IO format you compare against (ASCII, binary, with or without compression).
When the simulation is finished, the HDF5 archive can be parsed, and an XDMF metadata file written. This XDMF file can be opened in for example ParaView, VisIT or Ensight and the visualization is performed as for any other OpenFOAM case. Another benefit is the ability to easily load the data into a tool like Matlab or Python to perform calculations or processing of the results. Personally I have used this to process data from fluid-particle simulations. The code is found in a Github repository: https://github.com/hakostra/IOH5Write together with some installation instructions and hints. I hope that this code can be useful for the OpenFOAM community, and in special those of you that have access to a HPC system. In case any of you have any suggestions for improvements, please feel free to use this thread to discuss it. |
|
August 25, 2013, 07:29 |
|
#2 |
Senior Member
Niels Gjoel Jacobsen
Join Date: Mar 2009
Location: Copenhagen, Denmark
Posts: 1,903
Rep Power: 37 |
Hi Håkan,
That is interesting, and I have myself been thinking of how to change the write format in OF. My motivation was long simulations, where I needed to output 2400 time folders for a lot of post-processing. The simulation was decomposed on 6 processors and each time folder contained 35-45 individual files, thus 0.5M-0.65M files. Essentially, this should not have been a problem, because most of the files are based on the faMesh, so they are pretty small (say 1KB), but if you are downloading these from a cluster with a overloaded/slow infrastructure, it will take ages with a lot of small files. I see that you are doing it through the functionObjects, so essentially OF keeps on doing its own outputting, so my question is, whether you have considered to make your code into an additional option in the controlDict at the same level as ascii, binary, compressed, uncompressed? It will be somewhat more interfering in the core of OF, but on the other hand the outputting would not be a dual process. Kind regards Niels
__________________
Please note that I do not use the Friend-feature, so do not be offended, if I do not accept a request. |
|
August 26, 2013, 04:19 |
|
#3 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
Thank you for your interest in my work. If you ever try to compile it and try it out, I would appreciate your feedback and suggestions for improvement.
Your case is an example on why I developed this code. Even tough I never have been in the situation where I need 2400 timesteps written to disk, I often want to decompose the case massively, running it on several hundreds or thousands of processes. As for your case, my simulations produce approx. 40-50 files per process per timestep, hence the total number of files would become 45*1000=45 000 per timestep for 1000 processors. This is very problematic, especially on a parallel file system designed to handle a few, very large files. As far as I know, there are no HPC file systems on the market that are designed to cope with this amount of files of that size in an efficient manner. Regarding the implementation, I think the current way is a good way, as it allows for (relatively) easy transitions between OpenFOAM versions.I do not need a single modification to the OpenFOAM core, and hopefully the amount of work needed when a new OF version is released is limited. For example, when going from 2.1.1 to 2.2.0 (or from 2.1.x to 2.2.x if you prefer that), I only needed to change one single line of code (if my memory is not playing with me). Another factor for doing it the current way is that there is no restart functionality in the HDF5-plugin, i.e. currently you cannot take a field from the HDF5 file and use it as an initial condition for the restart of simulations. Therefore, I always specify a few writes in the "native" way (perhaps once every approx. 6-24 hours of walltime), in this way I can always restart a simulation in case of a crash. |
|
August 26, 2013, 04:40 |
|
#4 | |
Senior Member
Anton Kidess
Join Date: May 2009
Location: Germany
Posts: 1,377
Rep Power: 30 |
Quote:
In any case, thanks for sharing your code! Even "just" for postprocessing, it's quite a nice thing to have
__________________
*On twitter @akidTwit *Spend as much time formulating your questions as you expect people to spend on their answer. |
||
August 26, 2013, 08:22 |
|
#5 | ||
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
Quote:
Quote:
And in case there is a need for f.ex. restarting of simulations, I think it would be fairly easy to create a "HDF5ToFoam" converter, based on many of the "xxxxToFoam" converters already available. |
|||
November 18, 2013, 05:31 |
|
#6 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
A short update: I have now made a simple Pythin-progran, that uses h5py to read the metadata from the HDF5-files and write the corresponding XDMF-files. Both field data and Lagrangian clouds are supported, and all attributes present will be included in the file. One XDMF-file is generated for the field (mesh) data, and one for each cloud.
The profram is called 'writeXDMF.py', and a help message is displayed if you run it with '--help' argument. It require Python 3. The program/script is installed to $FOAM_USER_APPBIN when you run the ./Allwmake script. 'writeXDMF.py' makes the attached Matlab-files obsolete, however I have not yet removed them from the repository in case someone will use them as a basis for further work in Matlab. My next area of focus will be to clear up some really, really, bad code in the writer module itself... |
|
November 23, 2013, 09:04 |
|
#7 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,982
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
@Håkon: I picked up on this thread when you made the recent post above. This is a very nice function object and I've taken the liberty of adding a quick reference page for it at openfoamwiki.net: http://openfoamwiki.net/index.php/Contrib/IOH5Write - Feel free to update the wiki page! It's accessible from here: http://openfoamwiki.net/index.php/Ma...nction_objects And I have also been wondering on how to add an optional input/output file format for field files in OpenFOAM, but I was thinking more along the lines of having an in-place replacement for OpenFOAM's IOstream related classes. In fact, by using SQLite. But HDF5 makes a lot more sense! Although using HDF5 would require considerably more hacking than a mere replacement for IOstream... mmm... then again, maybe it wouldn't be all that hard. Best regards, Bruno
__________________
|
|
December 1, 2013, 16:41 |
|
#8 |
Member
Ganesh Vijayakumar
Join Date: Jan 2010
Posts: 44
Rep Power: 16 |
Dude,
This is super awesome! I will try this out. I have a couple of questions though.. feel free to ignore them. What you've done is more than enough!! 1. Have you benchmarked/recorded the speed up in write time? Esp. for large parallel cases? I'm running a case with 1760 and 4600 procs now... will be super happy if this will speed things up! 2. Does it make loading of large data sets any faster in Paraview? I have a case running with 60 million cells and another with 150 million? I'd be blessed if this works out to be faster! Thanks again. Big fan! |
|
December 2, 2013, 03:56 |
|
#9 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
1: I have done some benchmarking, yes. My conclusion is:
If you end up testing it, I would really appreciate some feedback! But please remember that there are some limitations... I mainly developed this as a way of storing large amounts of particle data (order of magnitude 200 GB) and have not cared too much about flow fields. |
|
December 2, 2013, 12:13 |
|
#10 |
Member
Ganesh Vijayakumar
Join Date: Jan 2010
Posts: 44
Rep Power: 16 |
Thanks. I think this is awesome and the way to go for future large parallel datasets.
As far as your comparison to uncompressed ASCII goes, I think it would be better to compare against the binary output in OpenFOAM. I think switching from uncompressed ascii to compressed ascii to binary itself results in the savings like you mention. However, I think the IO would be greatly improved simply because of writing to one file using optimized HDF5 rather than multiple thousand files... not to mention the ease of handling the files if you're transferring them to a different visualization cluster. You mention in your README file that you haven't implemented writing out the boundary mesh and data simply because you are lazy. Could you tell me how to do that? I wouldn't mind implementing it. btw.. I got your code to run on OpenFOAM-2.1.x and python 2.6. Required some changes. I think I'll fork your repo and upload it there. |
|
December 2, 2013, 13:54 |
|
#11 |
Member
Ganesh Vijayakumar
Join Date: Jan 2010
Posts: 44
Rep Power: 16 |
Never mind explaining the boundary data part. I just realized that as far as XDMF is concerned, there's no difference between a volume element and a face element.... it treats both as cells. You've already written the point data out. So I just need to add the boundary topology at the end with quad/tri elements, corresponding data and almost no modifications to the XDMF file. I think that will work.
However I have a mesh that's rotating with no topology change but the geometry points are changing. So this currently requires a lot more work. I'll get to it some day! |
|
December 2, 2013, 16:51 |
|
#12 | ||
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
Quote:
Quote:
Anyways, see https://github.com/hakostra/IOH5Writ...h5Write.C#L174 BTW: I have actually NEVER tried to use this on a dynamic mesh... And as you correctly states in your previous post, the point of this code is not to save space on the disk, it is to make ones life easier when working on large clusters and postprocessing these large datasets. As an example, I work on a simulation with 50 million Lagrangian particles at the moment, and opening the HDF5 dataset in Python, calculating statistics, making plots and distributions based on these particle data is EASY. Parsing the OpenFOAM file format to do the same would have required a lot of coding just to read in particle locations and velocities. |
|||
January 23, 2014, 07:50 |
|
#13 |
Member
Join Date: Aug 2012
Posts: 33
Rep Power: 14 |
congrats!! very interesting project!! I hope it could improve the parallel visualization of big simulations.
However I am not able even to run the tutorial, please find attached the log file of compilation I am using Ubuntu 12.04 + OF 2.2.2 + system HDF5 and system OMPI That's the error I am getting during running: Code:
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) MPI-process 3: #000: ../../../src/H5D.c line 141 in H5Dcreate2(): not a location ID major: Invalid arguments to routine minor: Inappropriate type #001: ../../../src/H5Gloc.c line 241 in H5G_loc(): invalid object ID major: Invalid arguments to routine minor: Bad value |
|
January 23, 2014, 07:55 |
|
#14 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
I most certainly think that your HDF5 version is too old. I know that I am using some features of the HDF5 library that is introduces recently, but I do not know exactly where the version cut-off is wrt. compatibility. Perhaps you can try version 1.8.9 or newer?
|
|
January 23, 2014, 11:47 |
|
#15 |
Member
Join Date: Aug 2012
Posts: 33
Rep Power: 14 |
I tried to install the new version from source but it still gives te same error.
did you check the warning that I get during the compilation? it could be related to that |
|
January 24, 2014, 09:55 |
|
#16 |
Member
Join Date: Aug 2012
Posts: 33
Rep Power: 14 |
Thanks Haakon
I solved by switching to OF22x and using Gcc instead of Intel looking forward to testing it in big test cases |
|
January 27, 2014, 04:19 |
|
#17 |
Member
Join Date: Aug 2012
Posts: 33
Rep Power: 14 |
changing computer, same error:
Code:
HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 1: #000: hdf5-1.8.12/src/H5D.c line 141 in H5Dcreate2(): not a location ID major: Invalid arguments to routine minor: Inappropriate type #001: hdf5-1.8.12/src/H5Gloc.c line 253 in H5G_loc(): invalid object ID major: Invalid arguments to routine minor: Bad value It only worked on my laptop with gcc4.6, icc and gcc4.7 gave me this error any hint? |
|
January 28, 2014, 18:00 |
|
#18 |
New Member
David Huckaby
Join Date: Jul 2009
Posts: 21
Rep Power: 17 |
I think you can fix this error by commenting out line 222 in h5WriteCloud.C which current reads
" H5Sclose(fileSpace); " Thanks Haakon for developing and releasing this tool. |
|
February 3, 2014, 07:44 |
|
#19 |
Senior Member
Join Date: Dec 2011
Posts: 111
Rep Power: 19 |
I am sorry for mt late reply in this issue that have come up here. I want to comment on a few things:
1: Line 222 of h5WriteCloud.C is now removed. Thanks for the bug report! 2: I am doing all development on Gcc, so If anyone have any problems with Intel Compilers, please let me know, and I will check it out. I have access to Icc as well, but does not use it on a daily basis. 3: I think you will need HDF5 version equal to or above 1.8.9 independent on this error/bug, but do not take that version for granted. It now works for me, with Gcc 4.8, Linux Mint 16 and OpenFOAM 2.2.x, please let me know if anyone else encounter any issues. |
|
February 3, 2014, 10:10 |
|
#20 | |
Member
Join Date: Aug 2012
Posts: 33
Rep Power: 14 |
Thanks Haakon but it is still not working with OF 2.2.2 and Icc
Code:
h5Write::fileCreate: HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 0: #000: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5F.c line 1503 in H5Fcreate(): unable to create file major: File accessibilty minor: Unable to open file #001: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5F.c line 1274 in H5F_open(): unable to open file: time = Mon Feb 3 17:05:34 2014 , name = 'h5Data/h5Data0.h5', tent_flags = 13 major: File accessibilty minor: Unable to open file #002: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FD.c line 987 in H5FD_open(): open failed major: Virtual File Layer minor: Unable to initialize object #003: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FDmpio.c line 1057 in H5FD_mpio_open(): MPI_File_open failed major: Internal error (too specific to document in detail) minor: Some MPI function failed #004: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FDmpio.c line 1057 in H5FD_mpio_open(): MPI_ERR_OTHER: known error not in list major: Internal error (too specific to document in detail) minor: MPI Error String Quote:
|
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Other] Multi species mass transport library [update] | novyno | OpenFOAM Community Contributions | 111 | November 10, 2021 00:37 |
problem loading UDF library in parallel cluster | Veera Gutti | FLUENT | 8 | July 26, 2016 08:24 |
Compiled library vs. inInclude Files, DSMC solver crashes after run | GPesch | OpenFOAM Programming & Development | 8 | April 18, 2013 08:17 |
[swak4Foam] Installing swak4Foam to OpenFOAM in mac | Kaquesang | OpenFOAM Community Contributions | 22 | January 21, 2013 12:51 |
OpenFOAM141dev linking error on IBM AIX 52 | matthias | OpenFOAM Installation | 24 | April 28, 2008 16:49 |