Section: New Software and Platforms
Hadoop_g5k
Participants : Reza Akbarinia, Miguel Liroz-Gistau, Patrick Valduriez.
URL: https://www.grid5000.fr/mediawiki/index.php/Hadoop_On_Execo
Apache Hadoop provides an open-source framework for reliable, scalable, parallel computing. It can be deployed and used in large-scale platforms such as Grid 5000. However, its configuration and management is very difficult, specially under the dynamic nature of clusters. Therefore, we built Hadoop_g5k (Hadoop easy deployment in clusters), a tool that makes it easier to manage Hadoop clusters and prepare reproducible experiments. Hadoop_g5k offers a set of scripts to be used in command-line interfaces and a Python interface. It is actually used by Grid5000 users, and helps them saving much time when doing their experiments with MapReduce.