Wednesday, July 27, 2011

Removing SVN files

Removing .svn files

find . -name ".svn" -type d -exec rm -rf {} \;


.... Git is so much easier ^.^

Sunday, July 10, 2011

Getting Started With Mahout

This is a quick tutorial on how to get started with Mahout from the svn repositories.  The first step is installing Hadoop.  There is a great tutorial at the following link.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

I am currently using Hadoop 0.20.2 because Mahout does not support 0.21.0 yet. To get started, download maven2 and subversion form the ubuntu repository using the following:

sudo aptitude install maven2 
sudo aptitude install subversion

Now you can go ahead and download the latests mahout from subversion. At the time of writing, I am using Mahout 0.5-SNAPTSHOT.

svn co http://svn.apache.org/repos/asf/mahout/trunk

Next, cd into the directory and install it using maven. This will download all the necessary dependencies and build the latest jar.

mvn install

Once you have the core files installed, make sure your environment variables are set correctly. For me, this is:


export JAVA_HOME=/usr/lib/jvm/java-6-sun/
export HADOOP_CONF_DIR=/home/hadoop/hadoop-0.20.2/conf
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2
export MAHOUT_CONF_DIR=/home/hadoop/mahout/src/conf

Now my current home directory looks like this:



/home/hadoop/mahout
/home/hadoop/hadoop-0.20.2

At this point Mahout should be ready to run. Switch into the directory:
/home/hadoop/mahout/bin/


and execute the mahout shell script. If all goes well, you should see all of the available functions.