CS Research at Haverford: 2010

Tuesday, July 27, 2010

Printer issues in the Lab, how to do proper setup

With updating the lab to the new machines, the lab has been experiencing problems printing to the network-enabled printer. Originally, we had set up the computers to print using the HP JetDirect protocol, but that had proven to be flaky. Instead, we used the Line Printer Daemon Protocol (LPD) to print to this printer.

In order to set LPR (the "client" for LPD) up properly in Ubuntu, we first needed the IP address of the printer, which could be, for example, 192.168.1.127.

Next, go the System->Administration->Printing, and press the Add button.

Then, click "Network Printer" in the Devices list on the left, and choose "LPD/LPR Host or Printer".

Type in the printer's IP address into the "Host:" text box, and leave the Queue blank.

(The reason why there is no queue is because there is only one printer hosted on this LPD server (the printer itself). If we had a print server with two or more printers connected to it, we would have to specify the specific queue that is tied to the specific printer that we want to add. For instance, if the print server had an HP LaserJet and an HP DeskJet connected to it, with queue names HPLaserJet and HPDeskJet, respectively, then in order to add the HP LaserJet as a printer, we would put the IP address of the print server in the "Host:" field, and HPLaserJet in the "Queue:" field).

Getting back on track, press the forward button, and Ubuntu will find the proper driver (hopefully), offer you some print options (which for our purposes we did not need to change, press "Forward" to continue) and automatically make a name for the printer, which you can change if you want.

Finish up by pressing Apply, and it will ask you if you would like to print a test page.

Et Viola! Your printer should now be up and running!

Tuesday, July 20, 2010

Setting up Hadoop cluster

In order to make ssh keys work and to not give superuser permissions to students who want to run a map/reduce program, we first create a user called hadoop and a home folder for this user in /ahome (because this is safe from NFS. Then we extract the hadoop program and change its owner to hadoop. In order to not share this user across NFS, we choose a user ID and group that is lower than (to be safe) 700 (in this case, it was 666):

sudo mkdir hadoop (in /ahome)
sudo groupadd -g 666 hadoop
sudo useradd -g hadoop -u 666 -d /ahome/hadoop hadoop
sudo passwd hadoop
sudo tar xvf /ahome/sadmin/Desktop/hadoop-0.20.2.tar.gz (in /usr/local)
sudo chown -R hadoop:hadoop hadoop-0.20.2/
cd /ahome
sudo chown -R hadoop:hadoop hadoop/

Next, we change user to hadoop and then generate a new password-less (the -P "" parameter) RSA key, and then copy it to the ssh authorized_keys file so that hadoop can ssh to a computer without a password:

su - hadoop
ssh-keygen -t rsa -P ""
cat /ahome/hadoop/.ssh/id_rsa.pub >> /ahome/hadoop/.ssh/authorized_keys

Next, we have to change hadoop's hadoop-env.sh configuration file to compensate for problems with IPv6 (change the preferred Java IP stack to IPv4Stack) and to specify the location of the Java virtual machine:

cd /usr/local/hadoop-0.20.2/conf
pico hadoop-env.sh
Change:
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export JAVA_HOME=/usr/lib/jvm/java-6-sun

Next, we need to state where the temporary directory for hadoop is and change the fs.default.name and the jobtracker name to the name of the master in the cluster, and the dfs replication number to the number of slaves in the cluster:

pico core-site.xml
Change:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-0.20.2/hadoop-temp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<configuration>

pico mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>

pico hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>10</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>

We make the temp directory (that was specified in the core-site.xml):

cd ..
mkdir hadoop-temp (in /usr/local/hadoop-0.20.2)

Next, we add the IP address and name combinations into the /etc/hosts file:

In /etc/hosts:
192.168.0.1 master
192.168.0.2 slave1
192.168.0.3 slave2

etc...

To complete the ssh portion of the setup, we ssh as the user hadoop from every computer in the cluster as hadoop to every computer in the cluster:

su - hadoop
ssh master
ssh slave1
ssh slave2

etc...

Then we go back to the conf/ directory, and change the masters file to the name of the master (THIS IS ONLY FOR THE MASTER):

pico masters

(then add the name of the master)

And then we change the slaves file in the conf/ directory to include all of the slaves in the cluster (DO THIS ON EVERY MACHINE IN THE CLUSTER):

pico slaves

(then add the names of the slaves)

And (THIS IS ONLY FOR THE MASTER) finally, we format the namenode:

bin/hadoop namenode -format

Thursday, June 10, 2010

Using Hadoop on a single node

After you download the hadoop software from Apache's website (http://hadoop.apache.org/common/releases.html), extract it to /usr/local.

Since this folder already contains the bin folder with the binary for Hadoop, there is little configuration to be done. The one thing that needs to be changed, however, is hadoop-env.sh.

Open up conf/hadoop-env.sh in Hadoop's directory (/usr/local/hadoop-*) in an editor (make sure to sudo it), uncomment the line with export JAVA_HOME=, and append to the end of that line the directory of java (in our case, it was /usr/lib/jvm/java-6-sun).

Finally, make an input folder in the Hadoop directory, put the input files in that folder, and run

sudo bin/hadoop jar hadoop-*-examples.jar grep input output 'WORDTOBEFOUND'

This command is specific to the word search example that comes with Hadoop.

Tuesday, June 8, 2010

Google Chrome Profile Problems

If you are having the error in Chrome about not being able to load the profile, enter this command in the terminal:

rm -r ~/.config/google-chrome

Thursday, May 20, 2010

GNOME/Firefox errors

I recently encountered two problems, one of which is a GNOME problem and the other a Firefox problem.

The Gnome Problem occurs when the user logs on and an error pops up stating a problem with GNOME TrashApplet. To fix this problem, start Synaptic from the Terminal

synaptic

and then lookup trash. Look for the gnome package from the search result, mark it, and then generate a script for the download you marked. Run the script, which will wget the gnome package, and then open the package to reinstall. This, I believe, fixed the problem.

The Firefox problem was that Firefox would simply not load. It stated that there was already a Firefox session in progress. This could be due to a corrupted upgrade in Firefox (which could be caused by an abrupt stop in the Firefox upgrade process). To fix this problem, type into the terminal

mv ~/.mozilla/firefox ~/.mozilla/firefox.old

This will move the broken firefox to ~/.mozilla/firefox.old. At this point all you need to do is start Firefox and it should work like brand-spankin' new.

Wednesday, May 19, 2010

Installation of Eclipse

Some information about Eclipse:

Dave wants to use Eclipse Galileo, which can be downloaded from here.
Once you download the file, extract it with the Unix command

tar xzf [DIRECTORY]/eclipse-cpp-galileo-SR2-linux-gtk-x86_64.tar.gz

where [DIRECTORY] should be replaced with the directory that the download was saved in. At this point, I moved the extracted directory (should be, by default, named "eclipse") to /usr/local by using the command

mv eclipse /usr/local/eclipse3.5

in the directory where the extracted directory is stored. Notice that this command also renames the moved folder to "eclipse3.5".

Finally, to allow eclipse to be run at the command line, I went to the /usr/local/bin directory, and performed

ln -s /usr/local/eclipse3.5/eclipse eclipse

which creates an alias in the bin directory.

If you run into permission problems, put a sudo in front of the command you are typing.

In order to install Java and Pydev, I just added them under Help->Install New Software and choosing "All Available Sites" under "Work with:". They can be found by scrolling down the list generated after changing "Work with:".

To distribute the eclipse configurations across all of the machines, I used rsync to copy the eclipse folder to the same place on each machine, and then I created an alias to the program in /usr/local/bin so that eclipse may be run from the command line.

CS Research at Haverford