Medicare Fraud Detection Using Open Source Data
By: xinyulrsm • March 17, 2019 • Essay • 517 Words (3 Pages) • 807 Views
hadoop install
steps and nececry files for installing hadoop + yarn 2.6 on ubuntu 14.10 (from http://releases.ubuntu.com/14.10/ubuntu-14.10-desktop-amd64.iso)
I collected many instructions as I could (see the refs below) but select the steps I like and put them here (It is kind of like cherry pick). Those steps are tested on my hadoop cluster. It works perfect. Three big steps: install packages and config them and hadoop xml files. I used tmux with the function of synchronize-panes for setting all the machines.
##machines
- pocoyo-1 192.168.1.72 (master)
- pocoyo-2 192.168.1.52 (data node)
- pocoyo-3 192.168.1.44 (data node)
edit host
- vi /etc/hostname
- check machine name, for each machine, for example, you can modify them if you want
- pocoyo-1
- sudo vi /etc/hosts
- add folowing lines, for each machine or use scp to others
- 127.0.0.1 localhost
- 192.168.1.72 pocoyo-1 # nameNode
- 192.168.1.52 pocoyo-2 # secondary namdNode
192.168.1.44 pocoyo-3 # data node
*sudo scp 192.168.1.72:/etc/hosts /etc/hosts (run this on slaves)
##creat hadoop user and user group for each machine
- sudo addgroup hadoop
- sudo adduser --ingroup hadoop hduser
- sudo adduser hduser sudo
- sudo chown -R hduser:hadoop /usr/local/
##install ssh for each machine (the following is not a secure way but it faster for test purpose)
- su - hduser
- sudo apt-get intall openssh-server
- ssh localhost
on master
- ssh-keygen -t rsa -P ""
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
on slaves (pocoyo-2 and pocoyo-3)
- mkdir .ssh
on master
- ssh-copy-id hduser@pocoyo-2 (do the same for pocoyo-3)
- ssh hduser@pocoyo-2
- ssh hduser@pocoyo-3
##disable ipv6 for each machine (:setw synchronize-panes in tmux worked for me)
- sudo vi /etc/sysctl.conf
- add following lines
- net.ipv6.conf.all.disable_ipv6 = 1
- net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
##### run
* sudo service networking restart
##download hadoop for each machine
(once one dowloaded you can use scp to copy to others)
* su - hduser
* cd /usr/local
* wget http://mirror.reverse.net/pub/apache/hadoop/common/stable2/hadoop-2.6.0.tar.gz
* tar -xzf hadoop-2.6.0.tar.gz
* ln -s /usr/local/hadoop-2.6.0 /usr/local/hadoop
##install java 1.7 for all machines.
(once one dowloaded you can use scp to copy to others)
we select 1.7 because it is reported on http://wiki.apache.org/hadoop/HadoopJavaVersions
* su - hduser
* cd cd /usr/local
* wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u75-b13/jdk-7u75-linux-x64.tar.gz"
* tar -xzf jdk-7u75-linux-x64.tar.gz
* ln -s /usr/local/jdk-7u75-linux-x64 /usr/local/jdk
## edit /etc/profile for master
(:setw synchronize-panes in tmux worked for me)
* sudo vi /etc/profile
* add following lines
```sh
export HADOOP_HOME=/usr/local/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export JAVA_HOME=/usr/local/jdk
export CLASSPATH=$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
- source /etc/profile
- java -version ( to test)
on slaves
- sudo scp hduser@pocoyo-1:/etc/profile /etc/profile
- source /etc/profile
##config hadoop xml files.
modify $HADOOP_HOME/etc/hadoop/hadoop-env.sh for all machines add
- export JAVA_HOME=/usr/local/jdk
modify $HADOOP_HOME/etc/hadoop/slaves for all machines
- add
pocoyo-1
pocoyo-2
pocoyo-3
copy xml files first
- cd $HADOOP_HOME
- cp ./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml ./etc/hadoop/core-site.xml
- cp ./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml ./etc/hadoop/hdfs-site.xml
- cp ./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml ./etc/hadoop/yarn-site.xml
- cp ./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml ./etc/hadoop/mapred-site.xml
modify core-site.xml
property | value | machines |
fs.defaultFS | hdfs://pocoyo-1:9000 | all |
hadoop.tmp.dir | /usr/local/hadoop/tmp | all |
io.file.buffer.size | 131072 | all |
modify hdfs-site.xml
property | value | machines |
dfs.namenode.rpc-address | pocoyo-1:9001 | all |
dfs.namenode.secondary.http-address | pocoyo-2:50090 | namenode and seconday nameNode |
dfs.namenode.name.dir | /usr/local/hadoop/dfs/name | namenode and seconday nameNode |
dfs.datanode.data.dir | /usr/local/hadoop/data | datanodes |
modify mapred-site.xml
...