Total Pageviews

Thursday, 14 July 2016


DataMPI is an efficient, flexible, and productive communication library, which provides a set of key-value pair based communication interfaces that extends MPI for Big Data. Through utilizing the efficient communication technologies in the High-Performance Computing area, DataMPI can speed up the emerging data intensive computing applications. DataMPI takes a step in bridging the two fields of HPC and Big Data.
We are working on a draft version of our proposed specification to extend MPI for Big Data. We label the specification as MPI-D, which is still under revision and will be published later. DataMPI is a Java binding implementation for MPI-D.
DataMPI can support multiple modes for various Big Data Computing applications, including Common, MapReduce, Streaming, and Iteration. The current version implements the functionalities and features of the Common mode, which aims to support the single program, multiple data (SPMD) applications. The remaining modes will be released in the future.
The current implementation of DataMPI is extending mpiJava. We also integrate some features from Hadoop under Apache License 2.0. The current evaluations of DataMPI use MVAPICH2 as the backend. DataMPI also supports other MPI implementations, such as MPICH2.
Please read the Quick Start to get started with DataMPI. For detailed instructions of usage, please refer to the User Guide.

If you would like to get the source code, please contact us by email ( at

DataMPI 0.6.0

  • DataMPI Package [Tarball (x86_64)Changelog].
  • DataMPI Quick Start [PDFHTML]: A short document with the necessary information for users to build, configure DataMPI, and run example programs.
  • DataMPI User Guide [PDFHTML]: A detailed user guide with instructions to build, configure DataMPI, and execute data computing programs.