Sunday, July 10, 2011

Getting Started With Mahout

This is a quick tutorial on how to get started with Mahout from the svn repositories.  The first step is installing Hadoop.  There is a great tutorial at the following link.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

I am currently using Hadoop 0.20.2 because Mahout does not support 0.21.0 yet. To get started, download maven2 and subversion form the ubuntu repository using the following:

sudo aptitude install maven2 
sudo aptitude install subversion

Now you can go ahead and download the latests mahout from subversion. At the time of writing, I am using Mahout 0.5-SNAPTSHOT.

svn co http://svn.apache.org/repos/asf/mahout/trunk

Next, cd into the directory and install it using maven. This will download all the necessary dependencies and build the latest jar.

mvn install

Once you have the core files installed, make sure your environment variables are set correctly. For me, this is:


export JAVA_HOME=/usr/lib/jvm/java-6-sun/
export HADOOP_CONF_DIR=/home/hadoop/hadoop-0.20.2/conf
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2
export MAHOUT_CONF_DIR=/home/hadoop/mahout/src/conf

Now my current home directory looks like this:



/home/hadoop/mahout
/home/hadoop/hadoop-0.20.2

At this point Mahout should be ready to run. Switch into the directory:
/home/hadoop/mahout/bin/


and execute the mahout shell script. If all goes well, you should see all of the available functions.


2 comments:

  1. Hai Vincent Thanx 4 the tutorial.
    I am getting the following error on running mahout. My .bashrc file is provided

    hadoop@master:~/trunk/bin$ ./mahout
    MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
    hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
    Error occurred during initialization of VM
    Could not reserve enough space for object heap
    Could not create the Java virtual machine.

    export HADOOP_HOME=/home/hadoop/hadoop/bin/
    export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/
    export HADOOP_CONF_DIR=/home/hadoop/hadoop/conf/
    export MAHOUT_CONF_DIR=/home/hadoop/trunk/src/conf/
    export MAHOUT_LOCAL=/home/hadoop/trunk/bin/

    ReplyDelete
  2. I have same problem, did you solve it?

    ReplyDelete