Hadoop 在 Ubuntu 上安裝
OS:
Ubuntu 12.4 LTS (x86)
jdk 1.7.0_67
Hadoop 2.2.0
安裝jdk
先 安裝jdk
安裝SSH
$ cd ~
$ sudo apt-get install openssh-server
啟動
$ sudo /etc/init.d/ssh start
查看服務
$ ps -e | grep ssh
設定免密碼登入
$ cd .ssh/
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost
安裝Hadoop
$ cd ~
$ wget http://www.trieuvan.com/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
$ sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.2.0/ hadoop/
$ cd ~
$ sudo gedit .bashrc
複製下列內容到檔案最後面
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
$ cd /usr/local/hadoop/etc/hadoop
$ sudo gedit hadoop-env.sh
複製下列內容到檔案最後面
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
確定Hadoop安裝
$ /usr/local/hadoop/bin/hadoop version
會看到類似下面內容 ٩(。・ω・。)و
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar
Single Node
$ cd /usr/local/hadoop/etc/hadoop
$ gedit core-site.xml
修改以下內容
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
<description>The name of the defaultfile system. Either the literal string "local" or a host:port forNDFS.
</description>
<final>true</final>
</property>
</configuration>
$ gedit hdfs-site.xml
修改以下內容
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
</property>
</configuration>
$ gedit yarn-site.xml
修改以下內容
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
格式化文件系統
$ cd /usr/local/hadoop
$ export HADOOP_ROOT_LOGGER=INFO,console
$ mkdir dfs/
$ mkdir dfs/name
$ mkdir dfs/data
$ ./bin/hadoop namenode -format
啟動
$ sbin/hadoop-daemon.sh start namenode
$ sbin/hadoop-daemon.sh start datanode
$ sbin/hadoop-daemon.sh start dfs
$ sbin/start-yarn.sh
查看狀態
$ ./bin/hadoop dfsadmin -report
看到 Datanodes available: 1 (1 total, 0 dead) 表示節點成功
在瀏覽器輸入 http://127.0.0.1:50070/ 可以看到hadoop啟動 ✧*。٩(ˊᗜˋ*)و✧*。
測試
$ cd /usr/local/hadoop
$ mkdir input/
$ cd input
$ gedit file1.txt
hello world
hello ray
hello Hadoop
$ gedit file2.txt
Hadoop ok
Hadoop fail
Hadoop 2.3
$ cd ..
加入hadoop系統
$ ./bin/hadoop fs -mkdir /data
$ ./bin/hadoop fs -put -f input/file1.txt input/file2.txt /data
(出現 name node is in safe mode
$ cd /usr/local/hadoop/etc/hadoop
$ ./bin/hadoop dfsadmin -safemode leave
把 safemode解除)
$ ./bin/hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.2.0-sources.jar org.apache.hadoop.examples.WordCount /data /output
查看結果
$ ./bin/hadoop fs -cat /output/part-r-00000
會出現以下結果表示成功 (っ●ω●)っ
2.3 1
fail 1
Hadoop 4
hello 3
ray 1
ok 1
world 1