big-data Archives - Oakdale Software Ltd

Tag Archives: big-data

Accumulo on Hortonworks Sandbox

by Alex on May 26, 2015 in Big Data

Accumulo is not included in the Ambari installation so has to be manually installed. If you want to do some development with it the best starting place to get an instance up and running quickly is the Hortonworks Sandbox, however due to differences in installation procedures getting this working isn’t quite as straightforward as it could be.

Here are some notes on the procedure to help you on your way.

Prerequisites:

Download the Hortonworks Sandbox and start it in your virtual machine manager, I’m using VirtualBox here. Networking settings are quite important too, I set this to NAT so that the VM runs on a 10.0.2.0 network and management web pages are accessed on your host on the http://127.0.0.1/ address. This keeps everything simple and external repos can be accessed through the host internet connection.

Sandbox address: http://127.0.0.1:8000/about/
Ambari address: http://127.0.0.1:8080/

Go to the Ambari management page, login is admin/admin and verify that the processes that we need are up and running. That will be HDFS, MapReduce2, YARN and Zookeeper; I also like to start the Ambari Metrics and collector so that I can see the activity but its not required.

Procedure:

Log in via ssh to the sandbox, login root/hadoop.
yum install accumulo

1

yum install accumulo
Accumulo is installed under (version numbers may differ), /usr/hdp/2.2.4.2-2/accumulo/
Copy a configuration example set to the root config directory, select a configuration according to your memory constraints but they should always be a standalone set. e.g.

cp /usr/hdp/2.2.4.2-2/accumulo/conf/examples/2GB/standalone/* /usr/hdp/2.2.4.2-2/accumulo/conf

1

cp /usr/hdp/2.2.4.2-2/accumulo/conf/examples/2GB/standalone/* /usr/hdp/2.2.4.2-2/accumulo/conf

Edit the file accumulo-env.sh and set the following variables accordingly.

JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64
HADOOP_PREFIX=/usr/hdp/2.2.4.2-2/hadoop
ZOOKEEPER_HOME=/usr/hdp/2.2.4.2-2/zookeeper
test -z "$ACCUMULO_HOME"        &amp;&amp; export ACCUMULO_HOME=/usr/hdp/2.2.4.2-2/accumulo

JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64

HADOOP_PREFIX=/usr/hdp/2.2.4.2-2/hadoop

ZOOKEEPER_HOME=/usr/hdp/2.2.4.2-2/zookeeper

test -z "$ACCUMULO_HOME" && export ACCUMULO_HOME=/usr/hdp/2.2.4.2-2/accumulo

uncomment the line which reads:

ACCUMULO_MONITOR_BIND_ALL="true"

1	ACCUMULO_MONITOR_BIND_ALL="true"

Edit the file accumulo-site.xml and modify the value tags as below to hadoop, this is very important so that accumulo can interact with zookeeper.

<property>
    <name>instance.secret</name>
    <value>hadoop</value>
    <description>A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd], and then update this file.</description>    
</property>

<name>instance.secret</name>

<value>hadoop</value>

<description>A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd], and then update this file.</description>

</property>

<property>
    <name>trace.token.property.password</name>
    <value>hadoop</value>
    <!-- change this to the root user's password, and/or change the user below -->
<property>

<name>trace.token.property.password</name>

<value>hadoop</value>

Now we have to change the accumulo user properties, edit /etc/password and change:

accumulo:x:495:486:accumulo:/var/lib/accumulo:/bin/bash

1

accumulo:x:495:486:accumulo:/var/lib/accumulo:/bin/bash

to

accumulo:x:495:501:accumulo:/home/accumulo:/bin/bash

1

accumulo:x:495:501:accumulo:/home/accumulo:/bin/bash

Note that group 501 in this case is the hadoop group.
Create the home directory (need to su – hdfs to run the hadoop commands)

mkdir /home/accumulo/data hadoop fs -mkdir -p /home/accumulo/data

1
2

mkdir /home/accumulo/data
hadoop fs -mkdir -p /home/accumulo/data

Change permissions and ownership

chown accumulo:hadoop accumulo
chown accumulo:hadoop data
hadoop fs -chown -R accumulo:hadoop /home/accumulo/data
chmod 777 /home/accumulo/data
hadoop fs -chmod -R 777 /home/accumulo/data

chown accumulo:hadoop accumulo

chown accumulo:hadoop data

hadoop fs -chown -R accumulo:hadoop /home/accumulo/data

chmod 777 /home/accumulo/data

hadoop fs -chmod -R 777 /home/accumulo/data

Now you are ready to initialize accumulo, this step writes the configuration information into zookeeper.
su - accumulo cd /usr/hdp/2.2.4.2-2/accumulo/conf . ./accumulo-env.sh accumulo init

1
2
3
4

su - accumulo
cd /usr/hdp/2.2.4.2-2/accumulo/conf
. ./accumulo-env.sh
accumulo init

You should enter that instance name, which can be anything you like and the secret which must be hadoop

2015-05-21 16:09:27,651 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
2015-05-21 16:09:27,653 [init.Initialize] INFO : Hadoop Filesystem is dfs://sandbox.hortonworks.com:8020
2015-05-21 16:09:27,654 [init.Initialize] INFO : Accumulo data dirs are [hdfs://sandbox.hortonworks.com:8020/accumulo]
2015-05-21 16:09:27,654 [init.Initialize] INFO : Zookeeper server is localhost:2181
2015-05-21 16:09:27,654 [init.Initialize] INFO : Checking if Zookeeper is available. If this hangs, then you need to make sure zookeeper is running
Instance name : horton
Enter initial password for root (this may not be applicable for your security setup): ******
Confirm initial password for root: ******
2015-05-21 16:11:46,827 [Configuration.deprecation] INFO : dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
2015-05-21 16:11:47,156 [Configuration.deprecation] INFO : dfs.block.size is deprecated. Instead, use dfs.blocksize
2015-05-21 16:11:47,598 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthorizor
2015-05-21 16:11:47,600 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthenticator
2015-05-21 16:11:47,603 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKPermHandler

2015-05-21 16:09:27,651 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss

2015-05-21 16:09:27,653 [init.Initialize] INFO : Hadoop Filesystem is dfs://sandbox.hortonworks.com:8020

2015-05-21 16:09:27,654 [init.Initialize] INFO : Accumulo data dirs are [hdfs://sandbox.hortonworks.com:8020/accumulo]

2015-05-21 16:09:27,654 [init.Initialize] INFO : Zookeeper server is localhost:2181

2015-05-21 16:09:27,654 [init.Initialize] INFO : Checking if Zookeeper is available. If this hangs, then you need to make sure zookeeper is running

Instance name : horton

Enter initial password for root (this may not be applicable for your security setup): ******

Confirm initial password for root: ******

2015-05-21 16:11:46,827 [Configuration.deprecation] INFO : dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min

2015-05-21 16:11:47,156 [Configuration.deprecation] INFO : dfs.block.size is deprecated. Instead, use dfs.blocksize

2015-05-21 16:11:47,598 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthorizor

2015-05-21 16:11:47,600 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthenticator

2015-05-21 16:11:47,603 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKPermHandler

You are now ready to start accumulo

/usr/hdp/2.2.4.2-2/accumulo/bin/start-all.sh

1

/usr/hdp/2.2.4.2-2/accumulo/bin/start-all.sh

Starting monitor on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting tablet servers .... done
Starting tablet server on localhost
WARN : Max open files on localhost is 1024, recommend 32768
2015-05-21 16:16:11,222 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
2015-05-21 16:16:11,270 [server.Accumulo] INFO : Attempting to talk to zookeeper
2015-05-21 16:16:11,539 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS
2015-05-21 16:16:11,966 [server.Accumulo] INFO : Connected to HDFS
Starting master on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting garbage collector on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting tracer on localhost
WARN : Max open files on localhost is 1024, recommend 32768

Starting monitor on localhost