Tuesday, July 27, 2010

Printer issues in the Lab, how to do proper setup

With updating the lab to the new machines, the lab has been experiencing problems printing to the network-enabled printer. Originally, we had set up the computers to print using the HP JetDirect protocol, but that had proven to be flaky. Instead, we used the Line Printer Daemon Protocol (LPD) to print to this printer.


In order to set LPR (the "client" for LPD) up properly in Ubuntu, we first needed the IP address of the printer, which could be, for example, 192.168.1.127.

Next, go the System->Administration->Printing, and press the Add button.

Then, click "Network Printer" in the Devices list on the left, and choose "LPD/LPR Host or Printer".

Type in the printer's IP address into the "Host:" text box, and leave the Queue blank.
(The reason why there is no queue is because there is only one printer hosted on this LPD server (the printer itself). If we had a print server with two or more printers connected to it, we would have to specify the specific queue that is tied to the specific printer that we want to add. For instance, if the print server had an HP LaserJet and an HP DeskJet connected to it, with queue names HPLaserJet and HPDeskJet, respectively, then in order to add the HP LaserJet as a printer, we would put the IP address of the print server in the "Host:" field, and HPLaserJet in the "Queue:" field).


Getting back on track, press the forward button, and Ubuntu will find the proper driver (hopefully), offer you some print options (which for our purposes we did not need to change, press "Forward" to continue) and automatically make a name for the printer, which you can change if you want.

Finish up by pressing Apply, and it will ask you if you would like to print a test page.



Et Viola! Your printer should now be up and running!

Tuesday, July 20, 2010

Setting up Hadoop cluster

In order to make ssh keys work and to not give superuser permissions to students who want to run a map/reduce program, we first create a user called hadoop and a home folder for this user in /ahome (because this is safe from NFS. Then we extract the hadoop program and change its owner to hadoop. In order to not share this user across NFS, we choose a user ID and group that is lower than (to be safe) 700 (in this case, it was 666):

sudo mkdir hadoop (in /ahome)
sudo groupadd -g 666 hadoop
sudo useradd -g hadoop -u 666 -d /ahome/hadoop hadoop
sudo passwd hadoop
sudo tar xvf /ahome/sadmin/Desktop/hadoop-0.20.2.tar.gz (in /usr/local)
sudo chown -R hadoop:hadoop hadoop-0.20.2/
cd /ahome
sudo chown -R hadoop:hadoop hadoop/



Next, we change user to hadoop and then generate a new password-less (the -P "" parameter) RSA key, and then copy it to the ssh authorized_keys file so that hadoop can ssh to a computer without a password:

su - hadoop
ssh-keygen -t rsa -P ""
cat /ahome/hadoop/.ssh/id_rsa.pub >> /ahome/hadoop/.ssh/authorized_keys



Next, we have to change hadoop's hadoop-env.sh configuration file to compensate for problems with IPv6 (change the preferred Java IP stack to IPv4Stack) and to specify the location of the Java virtual machine:

cd /usr/local/hadoop-0.20.2/conf
pico hadoop-env.sh
Change:
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export JAVA_HOME=/usr/lib/jvm/java-6-sun



Next, we need to state where the temporary directory for hadoop is and change the fs.default.name and the jobtracker name to the name of the master in the cluster, and the dfs replication number to the number of slaves in the cluster:

pico core-site.xml
Change:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-0.20.2/hadoop-temp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<configuration>



pico mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>


pico hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>10</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>



We make the temp directory (that was specified in the core-site.xml):

cd ..
mkdir hadoop-temp (in /usr/local/hadoop-0.20.2)



Next, we add the IP address and name combinations into the /etc/hosts file:

In /etc/hosts:
192.168.0.1 master
192.168.0.2 slave1
192.168.0.3 slave2


etc...


To complete the ssh portion of the setup, we ssh as the user hadoop from every computer in the cluster as hadoop to every computer in the cluster:

su - hadoop
ssh master
ssh slave1
ssh slave2

etc...


Then we go back to the conf/ directory, and change the masters file to the name of the master (THIS IS ONLY FOR THE MASTER):

pico masters
(then add the name of the master)


And then we change the slaves file in the conf/ directory to include all of the slaves in the cluster (DO THIS ON EVERY MACHINE IN THE CLUSTER):

pico slaves
(then add the names of the slaves)


And (THIS IS ONLY FOR THE MASTER) finally, we format the namenode:

bin/hadoop namenode -format