最近學習了Hadoop集群的搭建,在這裡記錄一下,幫助自己鞏固知識。
環境配置
安裝步驟
具體內容
1.安裝VMware Workstation 14 Pro
這個安裝沒有什麼特別的地方,只要在網上百度這個軟體,下載後直接根據提示安裝即可。
具體可以參考:https://jingyan.baidu.com/article/7f41ecec2df104593c095c6f.html
2.在VMware Workstation 14 Pro中安裝linux系統centOS 7
安裝centOS我參照了下面這個博客,具體內容也就不詳述了。就把安裝中的幾個注意點寫一下吧。
具體可以參考:使用VMware安裝CentOS7詳請 - CSDN博客
(1)一般選擇稍後安裝操作系統
之後在虛擬機設置中選擇下載好的鏡像
(2)網路適配器選擇
網路連接有橋接模式、nat模式和主機模式三種,當初在選擇的時候很茫然,所以參考了下面這個博客,終於有了點了解。我這裡選擇了主機模式,主機模式是虛擬機和主機,虛擬機和虛擬機直接是可以直接通訊的,但虛擬機無法連接外網。對於我搭建的這個hadoop系統,並不需要直接連接外網的功能,所以選擇了這個模式。
參考博客:https://jingyan.baidu.com/article/546ae1852778811149f28c8c.html
(3)在linux啟動正式安裝後,很多選擇默認安裝屬性即可。軟體選擇我剛開始的時候是選擇了最小安裝,後來發現只能進入終端模式進不了圖形界面,所以需要有圖形界面的在安裝的時候建議重新在軟體選擇里重新選擇對應的安裝軟體,我這裡安裝了GNOME桌面。
(4)更改主機名
在安裝設置時就可以更改主機名,需要的可以直接改掉。我直接把主機名改成了hadoop01.
3.網路連接
進入Linux圖形界面 -> 右鍵點擊右上方的兩個小電腦 -> 點擊Edit connections -> 選中當前網路System eth0 -> 點擊edit按鈕 -> 選擇IPv4 -> method選擇為manual -> 點擊add按鈕 -> 添加IP:192.168.1.101 子網掩碼:255.255.255.0 網關:192.168.1.1 -> apply
4.更改linux系統的一些設置(一般在終端操作)
vim /etc/hosts 192.168.1.101 hadoop01(定義的主機名稱)
#查看防火牆狀態 service iptables status #關閉防火牆 service iptables stop #查看防火牆開機啟動狀態 chkconfig iptables --list #關閉防火牆開機啟動 chkconfig iptables off
reboot
5.創建用戶組和用戶
創建用戶組和用戶主要是為了配置許可權以及使用中的安全問題,這一塊我還不是特別明白,所以先跟著手順做,晚一點再做進一步的研究。
sudo addgroup hadoop
sudo adduser -ingroup hadoop hadoop
sudo gedit /etc/sudoers #按回車鍵後就會打開/etc/sudoers文件了,給hadoop用戶賦予root用戶同樣的許可權 #在root ALL=(ALL:ALL) ALL下添加hadoop ALL=(ALL:ALL) ALL(裡面的空格最好用tab鍵)
6.安裝jdk(jdk我還是用root用戶創建的)
為了便於管理,我把要安裝的文件都裝在/home/hadoop/app下面。
jdk去官網下就可以了,我這次用的是jdk-8u171-linux-x64.tar.gz。
對於上傳這個文件,開始我也遇到了一點問題。因為安裝了主機模式,虛擬機無法自己連接網路,開始時又只裝了終端模式,找不到方法把文件上傳。折騰了好久,用了winSCP的sftp模式才成功上傳。現在裝了圖形界面,發現文件可以直接從windows系統拷貝到linux系統,很方便。
#進入對於目錄 cd /home/hadoop/app tar -zxvf jdk-8u171-linux-x64.tar.gz #改名(為了好記憶,我把目錄名改了) mv jdk1.8.0_73 jdk #刪除安裝文件 rm -f jdk-8u171-linux-x64.tar.gz
vim /etc/profile #在文件最後加上這些參數 #java export JAVA_HOME=/home/hadoop/app/jdk export CLASSPATH=$JAVA_HOME/lib export PATH=:$PATH:$JAVA_HOME/bin
source /etc/profile
java -version
7.安裝hadoop(以後步驟都用hadoop用戶創建)
cd /home/hadoop/app tar -zxvf hadoop-2.7.3.tar.gz
vim /etc/proflie #添加以下內容 export HADOOP_HOME=/home/hadoop/app/hadoop-2.7.3 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
8.更改hadoop的配置文件
#java的路徑指定 vim hadoop-env.sh #第27行 export JAVA_HOME=/home/hadoop/app/jdk
<!-- 指定HADOOP所使用的文件系統schema(URI),HDFS的老大(NameNode)的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:9000</value> </property> <!-- 指定hadoop運行時產生文件的存儲目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/app/hadoop-2.7.3/tmp</value> </property>
<!-- 指定HDFS副本的數量 --> <property> <name>dfs.replication</name> <value>1</value> </property>
<!-- 指定mr運行在yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
<!-- 指定YARN的老大(ResourceManager)的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop01</value> </property> <!-- reducer獲取數據的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
[root@hadoop01 hadoop]# cat slaves localhost [root@hadoop01 hadoop]# vim slaves [root@hadoop01 hadoop]# cat slaves hadoop01
最簡單的情況就是改上面幾個配置文件即可,如果實際運用中還有別的要求可以參考官方手冊進行更改。
hadoop namenode -format
參照博客:Hadoop namenode重新格式化需注意問題
#先啟動HDFS,步驟中提示部分輸入yes或登陸密碼 sbin/start-dfs.sh
#再啟動YARN sbin/start-yarn.sh
#驗證是否啟動成功 #使用jps命令驗證,正常情況下下面會顯示以下進程 jps 27408 NameNode 28218 Jps 27643 SecondaryNameNode 28066 NodeManager 27803 ResourceManager 27512 DataNode
下面兩個界面也能正常登陸。
#HDFS管理界面 http://192.168.1.101:50070 #MR管理界面 http://192.168.1.101:8088
再來查看一下埠設置
#查看埠 netstat -nltp
9.設置ssh無密碼登陸
下面就來設置一下
#生成密鑰對,過程中都採取默認值,直接回車即可 [root@hadoop01 hadoop]# ssh-keygen -t rsa Generating public/private rsa key pair. #下面這個就是生成的密鑰對的路徑 Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: ca:a5:00:29:94:84:9e:86:b3:0e:38:96:2b:b2:c6:12 root@hadoop01 The keys randomart image is: +--[ RSA 2048]----+ |oo. | |o. . | |+ + | |o= . | |oo. . S | |Eo o + | |=o. + | |+= | |*. | +-----------------+ [root@hadoop01 hadoop]# cd /root/.ssh [root@hadoop01 .ssh]# ll 總用量 12 #私鑰 -rw-------. 1 root root 1679 4月 19 19:32 id_rsa #公鑰 -rw-r--r--. 1 root root 395 4月 19 19:32 id_rsa.pub -rw-r--r--. 1 root root 524 4月 16 15:10 known_hosts
[root@hadoop01 .ssh]# touch authorized_keys [root@hadoop01 .ssh]# cat id_rsa.pub >> authorized_keys [root@hadoop01 .ssh]# cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDro4OxWo5z3jA57yxLv2zgU9f05/rS3bk/S2LkaShP2O5g/hp4geC3kf6IBOi1JDTkbJc3HE/MYW7+XYeR3MUh0XqmqAiAKkn1Fbs+TFJjwU6JQfSe3bbnnujlJYs4BfLyTOzgPd2Zm2hfLF3odAQJI9NDP+I1E6BqXaPDNA4sf2pMPprKgDJnHAXfCGORuWmgRXV69uhqdzrUWx7fAW9N8NSePfJBr1OoLotph3e2i8sEqRyKF/1jeKTSoK2H1imldDckdUh5b8yFkhDGDrypE260FJFMS2lNQn4dQBJAv1E97TV3Twm2mshZWZN7cLXuj3AIaNRWD6s86mnj+T13 root@hadoop01
[root@hadoop01 .ssh]# chmod 600 ./authorized_keys
[root@hadoop01 .ssh]# ssh hadoop01 Last login: Thu Apr 19 19:41:16 2018 from localhost
10.啟動hadoop,測試系統
[root@hadoop01 sbin]# start-dfs.sh Starting namenodes on [hadoop01] hadoop01: starting namenode, logging to /home/hadoop/app/hadoop-2.7.3/logs/hadoop-root-namenode-hadoop01.out localhost: starting datanode, logging to /home/hadoop/app/hadoop-2.7.3/logs/hadoop-root-datanode-hadoop01.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-hadoop01.out
[root@hadoop01 sbin]# start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/app/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-hadoop01.out localhost: starting nodemanager, logging to /home/hadoop/app/hadoop-2.7.3/logs/yarn-root-nodemanager-hadoop01.out
[root@hadoop01 sbin]# jps 8194 SecondaryNameNode 8468 NodeManager 7989 DataNode 8358 ResourceManager 7863 NameNode 8815 Jps
[root@hadoop01 sbin]# netstat -nltp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 8194/java tcp 0 0 127.0.0.1:46734 0.0.0.0:* LISTEN 7989/java tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/systemd tcp 0 0 192.168.122.1:53 0.0.0.0:* LISTEN 1586/dnsmasq tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 7863/java tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1139/sshd tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 1114/cupsd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1600/master tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 7989/java tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 7989/java tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 7989/java tcp 0 0 192.168.1.101:9000 0.0.0.0:* LISTEN 7863/java tcp6 0 0 :::8042 :::* LISTEN 8468/java tcp6 0 0 :::111 :::* LISTEN 1/systemd tcp6 0 0 :::22 :::* LISTEN 1139/sshd tcp6 0 0 ::1:631 :::* LISTEN 1114/cupsd tcp6 0 0 192.168.1.101:8088 :::* LISTEN 8358/java tcp6 0 0 ::1:25 :::* LISTEN 1600/master tcp6 0 0 :::13562 :::* LISTEN 8468/java tcp6 0 0 192.168.1.101:8030 :::* LISTEN 8358/java tcp6 0 0 192.168.1.101:8031 :::* LISTEN 8358/java tcp6 0 0 192.168.1.101:8032 :::* LISTEN 8358/java tcp6 0 0 192.168.1.101:8033 :::* LISTEN 8358/java tcp6 0 0 :::42946 :::* LISTEN 8468/java tcp6 0 0 :::8040 :::* LISTEN 8468/java [root@hadoop01 sbin]#
[root@hadoop01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 20 20 Number of Maps = 20 Samples per Map = 20 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Wrote input for Map #16 Wrote input for Map #17 Wrote input for Map #18 Wrote input for Map #19 Starting Job 18/04/19 20:00:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.1.101:8032 18/04/19 20:00:45 INFO input.FileInputFormat: Total input paths to process : 20 18/04/19 20:00:45 INFO mapreduce.JobSubmitter: number of splits:20 18/04/19 20:00:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524138918742_0001 18/04/19 20:00:46 INFO impl.YarnClientImpl: Submitted application application_1524138918742_0001 18/04/19 20:00:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1524138918742_0001/ 18/04/19 20:00:46 INFO mapreduce.Job: Running job: job_1524138918742_0001 18/04/19 20:01:07 INFO mapreduce.Job: Job job_1524138918742_0001 running in uber mode : false 18/04/19 20:01:07 INFO mapreduce.Job: map 0% reduce 0% 18/04/19 20:01:59 INFO mapreduce.Job: map 30% reduce 0% 18/04/19 20:02:49 INFO mapreduce.Job: map 35% reduce 0% 18/04/19 20:02:50 INFO mapreduce.Job: map 55% reduce 0% 18/04/19 20:02:53 INFO mapreduce.Job: map 60% reduce 0% 18/04/19 20:03:38 INFO mapreduce.Job: map 70% reduce 0% 18/04/19 20:03:39 INFO mapreduce.Job: map 85% reduce 0% 18/04/19 20:03:41 INFO mapreduce.Job: map 85% reduce 28% 18/04/19 20:04:01 INFO mapreduce.Job: map 95% reduce 28% 18/04/19 20:04:02 INFO mapreduce.Job: map 100% reduce 28% 18/04/19 20:04:06 INFO mapreduce.Job: map 100% reduce 100% 18/04/19 20:04:08 INFO mapreduce.Job: Job job_1524138918742_0001 completed successfully 18/04/19 20:04:09 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=446 FILE: Number of bytes written=2499688 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=5270 HDFS: Number of bytes written=215 HDFS: Number of read operations=83 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=20 Launched reduce tasks=1 Data-local map tasks=20 Total time spent by all maps in occupied slots (ms)=901301 Total time spent by all reduces in occupied slots (ms)=74288 Total time spent by all map tasks (ms)=901301 Total time spent by all reduce tasks (ms)=74288 Total vcore-milliseconds taken by all map tasks=901301 Total vcore-milliseconds taken by all reduce tasks=74288 Total megabyte-milliseconds taken by all map tasks=922932224 Total megabyte-milliseconds taken by all reduce tasks=76070912 Map-Reduce Framework Map input records=20 Map output records=40 Map output bytes=360 Map output materialized bytes=560 Input split bytes=2910 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=560 Reduce input records=40 Reduce output records=0 Spilled Records=80 Shuffled Maps =20 Failed Shuffles=0 Merged Map outputs=20 GC time elapsed (ms)=54427 CPU time spent (ms)=27910 Physical memory (bytes) snapshot=1635778560 Virtual memory (bytes) snapshot=43597676544 Total committed heap usage (bytes)=2485833728 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=2360 File Output Format Counters Bytes Written=97 Job Finished in 206.209 seconds Estimated value of Pi is 3.17000000000000000000
在構建過程中,遇到的問題羅列一下:
1.vi與vim
以前在linux中常用vi,現在看到參考博客中用的都是vim,所以去查了一下區別,其實vim就是升級版的vi。
參考博客:vi 和vim 的區別 - KiraEXA - 博客園
2.網路適配器選擇
推薦閱讀: