Parallel Computing: HDF5 and MPIO for Fedora Core 6

As a professor once told me, “70% of a typical modeling problem is about pre-processing of data”. One of the biggest problems of data pre-processing is simple storing, representation and loading of data. Fortunately, the folks at the NCSA took the time to create stuff we could actually use to solve these problems.

What is HDF5?

  • HDF5 is a general purpose library and file format for storing scientific data. It comes with a lot of commonly used functions and utilities in library form for easy integration into existing programs.
  • Efficient storage and I/O. Definitely, this is much better than plain text file and easier than building your own binaries.
  • Software. You can build, compile and use it in a number of ways.
  • Emphasis on standards. Uses a standards-compliant storage format to allow ease of use amongst different development teams and organizations.
  • Large and varied user community. If you have a problem then there are lots of people to ask help from.

The problem is that the OpenMPI and HDF5 packages in FC6 aren’t quite what I need. The OpenMPI packages don’t have the necessary development libraries to work with Eclipse and PTP. I blogged about this previously. Here are some of the steps to get HD5F working in Fedora Core 6 and some examples to work with.

  1. Install OpenMPI as instructed in the previous blog. Or download my pre-packaged RPM for FC6 here.
  2. Install HDF5 for use in a parallel environment. It seems that the HDF5 binary RPM that comes with Fedora Extras is not compiled with parallel MPIO support. So we have to build one ourselves. I made a few modifications to the SPEC file to remove Fortran and C++ support (which I don’t need) and enabled parallel MPIO support with the “–enable-parallel” flag. You can download a pre-built RPM for FC6 here and here.
  3. Then you can try out the following test application that creates a HDF5 file using MPIO:
    #include <mpi.h>
    #include <hdf5.h>
    #include <stdio.h>
    #include <stdlib.h>
    
    int main (int argc, char *argv[])
    {
            hid_t fd, plist_id, dspace, dset;
            int i, rank, size;
    
            hsize_t dims[2] = {8, 5};
            int *data;
    
            /* create sample data */
            data = (int *) malloc(sizeof(int)*dims[0]*dims[1]);
            for (i=0; i < dims[0]*dims[1]; i++)
                    data[i] = i;
    
            /* initialize mpi */
            MPI_Init (&argc, &argv);
            MPI_Comm_size (MPI_COMM_WORLD, &size);
            MPI_Comm_rank (MPI_COMM_WORLD, &rank);
    
            /* create properties list to define parallel i/o */
            plist_id = H5Pcreate (H5P_FILE_ACCESS);
            H5Pset_fapl_mpio (plist_id, MPI_COMM_WORLD, MPI_INFO_NULL);
    
            /* create file */
            fd = H5Fcreate ("pdata.h5", H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
    
            /* close properties list */
            H5Pclose (plist_id);
    
            /* create a 2 dimensional data space */
            dspace = H5Screate_simple (2, dims, NULL);
    
            /* create data set */
            dset = H5Dcreate (fd, "Parallel", H5T_NATIVE_INT, dspace, H5P_DEFAULT);
    
            /* set property list to write data collectively */
            /* change COLLECTIVE to INDEPENDENT for independent write */
            plist_id = H5Pcreate (H5P_DATASET_XFER);
            H5Pset_dxpl_mpio (plist_id, H5FD_MPIO_COLLECTIVE);
    
            /* write data to all */
            if (H5Dwrite(dset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, plist_id, data) < 0)
                    perror ("parallel");
    
            /* free data */
            free (data);
    
            /* close properties list */
            H5Pclose (plist_id);
    
            /* close data set */
            H5Dclose (dset);
    
            /* close data space */
            H5Sclose (dspace);
    
            /* close file */
            H5Fclose (fd);
    
            /* close mpi */
            MPI_Finalize ();
    
            return 0;
    } /* main */
    
  4. You can also use the following Makefile.
    CFLAGS=-Wall -Werror
    RM=rm -rf
    MCC=mpicc
    MLIBS=-lhdf5 -lmpi
    
    parallel:       parallel.o
            $(MCC) $(CFLAGS) -o parallel parallel.o $(MLIBS)
    
    clean:
            $(RM) parallel *.o *.h5
    
  5. Put both files in the same directory and type ‘make’. Then run the program using ‘mpirun -np 2 ./parallel’. You can replace 2 with any power of 2.

The example code above does not really do much. It demonstrated the creation of a file in parallel and allowing the different processes to write into the file. Notice that all processes have access to the same file. This is done in a safe manner to avoid all sorts of parallel I/O messiness. My next posting will be a bit more useful and will use the concept of a hyperslab to allow really Parallel I/O on a single HDF5 dataset.

One Response to “Parallel Computing: HDF5 and MPIO for Fedora Core 6”

  1. It’s hip2b2 (Mobile, Security, Web 2.0, Pipe Dreams and More) » Blog Archive » Parallel Computing: Hyperslabs in HDF5 Says:

    […] black berry unlocked « Parallel Computing: HDF5 and MPIO for Fedora Core 6 Verizon Users Send 10 Billion SMSs in 1 Month … So? » […]

Leave a Reply