Configurazione in locale su macchina Ubuntu 16.04 di un ambiente per test di sviluppo in locale, gli applicativi installati sono:
Prerequisito : instalare Java e configurare le variabili di ambiente.
Se le cose non vanno bene controlla i log... e le porte sudo netstat -tulpn | grep 22
Esempio di setting delle variabili di ambiente:
alias ll='ls -lah' alias gg='git status -s' alias python=python3 export SBT_OPTS="-Xmx2G -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=2G -Xss2M -Duser.timezone=GMT" #export JAVA_HOME=/home/simon/programmi/java/jdk1.8.0_171_64 export JAVA_HOME=/home/simon/programmi/java/openjdk-java-se-8u40-ri export M2_HOME=/home/simon/programmi/apache-maven-3.6.1 export M2=$M2_HOME/bin export MAVEN_OPTS="-Xms256m -Xmx512m" export SPARK_HOME=/home/simon/programmi/spark-2.4.3-bin-hadoop2.7 export HADOOP_HOME=/home/simon/programmi/hadoop-3.1.2 export SQOOP_HOME=/home/sqoop-1.4.7.bin__hadoop-2.6.0 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HIVE_HOME=/home/simon/programmi/apache-hive-3.1.2-bin export HBASE_HOME=/home/simon/programmi/hbase-2.2.1 export PYENV_ROOT=$HOME/.pyenv export PATH=$PYENV_ROOT/bin:$PATH export PATH=$JAVA_HOME/bin:$M2:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$SQOOP_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH
In prima battuta utilizzo Zookeeper di Hbase che ne ha uno integrato all'avvio! Ma per avere a disposizione gli script va scaricato da:
Comandi base:
cd /home/simon/programmi/apache-zookeeper-3.5.5-bin/bin ./zkCli.sh -server localhost:2181 ls /<hive.server2.zookeeper.namespace>
Kafka si puo' scaricare da:
Il file di configurazione di kafka e' server.properties dove inseriamo:
listeners = PLAINTEXT://localhost:9092 delete.topic.enable=true
Il file di configurazione di zookeeper e' zookeeper.properties dove inseriamo:
dataDir=/tmp/zookeeper clientPort=2181
I comandi base di kakfa sono https://kafka.apache.org/quickstart dalla cartella bin:
./zookeeper-server-start.sh config/zookeeper.properties ./kafka-server-start.sh config/server.properties ./kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test ./kafka-topics.sh --list --bootstrap-server localhost:9092 ./kafka-console-producer.sh --broker-list localhost:9092 --topic test ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning ./kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test --time -2 / -1
Note sui consumer https://stackoverflow.com/questions/38024514/understanding-kafka-topics-and-partitions
Link per il download https://hbase.apache.org/
Il file di configurazione e' hbase-site.xml dove inseriamo:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>file:///home/simon/programmi/hbase-2.2.1/data</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/simon/programmi/hbase-2.2.1/zookeeper</value> </property> </configuration>
Hbase si avvia con un zookeeper integrato!! dalla cartella bin:
./start-hbase.sh ./stop-hbase.sh
La shell di Hbase si avvia con il comando hbase shell dalla crtella bin, a seguire i comando base:
status version table_help Whoami create <tablename>, <columnfamilyname> list describe <table name> disable <tablename> disable_all<"matching regex"> enable <tablename> show_filters drop <table name> drop_all<"regex"> count <'tablename'>, CACHE =>1000 put <'tablename'>,<'rowname'>,<'columnvalue'>,<'value'> get <'tablename'>, <'rowname'>, {< Additional parameters>} //{TIMERANGE => [ts1, ts2]} {COLUMN => ['c1', 'c2', 'c3']} delete <'tablename'>,<'row name'>,<'column name'> truncate <tablename> scan <'tablename'>, {Optional parameters}
Scaricare Hadoop (Hdfs + Yarn) dal sito https://hadoop.apache.org/
Hadoop funziona tramite rete e come prerequisito l'utente che lancia il processo deve porter loggarsi in ssh senza password.
La cartella .ssh (con i permessi drwxr-xr-x 2 simon simon 4096 set 24 14:14 .ssh/), i permessi su tutti i file sono fondamentali deve contenere i seguenti file:
sudo apt-get install openssh-server openssh-client $ ssh-keygen -t rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ssh localhost fai login senza password (sudo ssystemctl restart sshd)
Nella cartela etc/hadoop configuriamo i file (vanno create le cartellle dentro i file xml)
core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/simon/programmi/hadoop-3.1.2/hadooptmpdata</value> </property> </configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> <name>dfs.name.dir</name> <value>file:///home/simon/programmi/hadoop-3.1.2/hdfs/namenode</value> <name>dfs.data.dir</name> <value>file:///home/simon/programmi/hadoop-3.1.2/hdfs/datanode</value> </property> </configuration>
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
yarn-site.xml
<?xml version="1.0"?> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>mapreduceyarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Inizializzare il namenode dalla cartella bin hdfs namenode -format
Dalla cartella sbin ./start-all.sh
Namenode Web UI http://localhost:50070