Sunday, August 27, 2017
Setup CentOS VM in VirtualBox for Software Development and Distributed Computation on Spark and HDFS
Setup CentOS VM in VirtualBox for Software Development and Distributed Computation on Spark and HDFS
This post summarize my experience on setting up a simple CentOS VM for development environment using VirtualBox that i used to test-run distributed jobs in spark and HDFS cluster.
1. Create vbox centos VM
Launch virtualbox, and create a VM using the centos iso image downloaded. Configure the VM settings to have the network adapter in the "Network" tab:Host-only adapter
If you need to share a folder in the host computer with the centos VM, add the shared folder in the "Shared Folders" tab of the VM settings and configure it to have:
1. Full Access
2. Auto Mount
In this example, it is assumed the shared folder is named "git" and it is available on "C:Usersxschengit" on the host computer.
2. Install the development tools in the centos VM
Launch the centos VM and follows the standard installation steps. Make sure that the three network adapters are enabled when installing the centos.After the installation is completed, run the following commands to install the necessary tools:
```bash
`yum update
`yum install -y java-1.8.0-openjdk-devel
`yum install -y maven
`yum install -y kernel-devel
`yum install -y gcc
`yum install -y bzip2
The java-1.8.0-openjdk-devel and the maven are used for java development, and the kernel-devel, gcc, and bzip2 can be used for compiling C or C++ based source codes (which will be needed later to install the vboxsf)
3. Access shared folder on the host computer
In order to access the shared folder "git" on the host computer, we must first install the VirtualBoxLinuxAdditions so that vboxsf is available in centos VM.3.1 Mount and install VBoxGuestAddition
Click the "Device-->Insert Guest Addition CD Image" in the menu of the VM user display, and the VBoxGuestAdditions.iso will be mounted on the VM cdrom. To see the device that mounts the VBoxGuestAdditions.iso, run the following command in the centos VM:```bash
`ls /dev -l | grep cd
You should see something like /dev/sr0 which indicates mounted iso there. Run the following commands to access the mounted iso:
```bash
`mkdir /mnt/dvd
`mount -r -t iso9660 /dev/sr0 /mnt/dvd
Now run the following command to install the vboxsf:
```bash
`cd /mnt/dvd
`sh VBoxLinuxAdditions.run
if encounter any error, run the following command:
```bash
`yum groupinstall "Development Tools"
After the above commands are successfully executed, you should be able to see vboxsf is available in the centos VM by running the following command:
```bash
`lsmod | grep vbox
3.2. Mount and access the shared folder
Run the following commands to mount and access the shared folder:```bash
`mkdir /mnt/git
`mount -t vboxsf git /mnt/git
Now to access the shared folder, just enter the following commands:
```bash
`cd /mnt/git
`ls
4. Assign static ip address to a network adapter
In this example, we want to assign a static ipaddress to the network adapter. It is assumed that:1. the host computer is in the subnet "192.168.56.*"
2. the static ip address to the host-only network adapter is "192.168.56.101" (which will be in the same subnet as the host computer)
In the centos VM, install and run the ifconfig tool:
```bash
`yum -y install net-tools
`ifconfig
In my computer, i have the enp0s3 as the host-only network adapter. their configuration files ifcfg-enp0s3 (if not exist, simply create the text file of the same name) can be found in the folder "/etc/sysconfig/network-scripts".
4.1 Configure static IP address in the network adapter
Run the following command to open and edit the ifcfg-enp0s3:```bash
`cd /etc/sysconfig/network-scripts
`vi ifcfg-enp0s3
Add or modify the following settings in the ifcfg-enp0s3:
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.56.101
A sample of the ifcfg-enp0s3 is as shown below:
TYPE=Ethernet
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=enp0s3
DEVICE=enp0s3
ONBOOT=yes
IPADDR=192.168.56.101
PREFIX=24
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_PRIVACY=no
4.2. Restart the network service
Run the following command to restart the network service and re-check the network configuration:```bash
`service network restart
`ifconfig
You should see that the static ip address
5. Security: Disable firewall and selinux
For the project I am working, I need to run distributed computation jobs using apache spark cluster and HDFS. In order for spark cluster to work, the firewall and selinux must be properly configured. As a quick and dirty way, the firewall and selinux (optional) can be disabled on all VMs running spark cluster (including master and slave nodes in spark and namenodes and datanodes in HDFS).5.1. Disable firewalld
To disable firewall, run the command:
```bash
`systemctl stop firewalld.service
`systemctl disable firewalld.service
To restart the the firewall, run the following command:
```bash
`systemctl start firewalld.service
`systemctl enable firewall.service
5.2. Disable selinux (Optional)
Run the command to edit /etc/selinux/config:```bash
`vi /etc/selinux/config
In the /etc/selinux/config, change the following line:
SELINUX=enforcing
SELINUXTYPE=targeted
to:
SELINUX=disabled
#SELINUXTYPE=targeted
Next to turn off the selinux immediately, run the following command:
```bash
`setenforce 0
download file now
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.