Appunti di Viaggio

SQOOP-EXPORT

Il tool di export permette di inserire in una tabella già esistente su un db i dati contenuti su HDFS. L’operazione di base prevede l’inserimento dei dati

The default operation is to transform these into a set of INSERT statements that inject the records into the database. In “update mode,” Sqoop will generate UPDATE statements that replace existing records in the database, and in “call mode” Sqoop will make a stored procedure call for each record.

$ sqoop export (generic-args) (export-args)
$ sqoop-export (generic-args) (export-args)

Common arguments

Argument	Description
--connect <jdbc-uri>	Specify JDBC connect string
--connection-manager <class-name>	Specify connection manager class to use
--driver <class-name>	Manually specify JDBC driver class to use
--hadoop-mapred-home <dir>	Override $HADOOP_MAPRED_HOME
--help	Print usage instructions
--password-file	Set path for a file containing the authentication password
-P	Read password from console
--password <password>	Set authentication password
--username <username>	Set authentication username
--verbose	Print more information while working
--connection-param-file <filename>	Optional properties file that provides connection parameters
--relaxed-isolation	Set connection transaction isolation to read uncommitted for the mappers.

Recap Mysql

come db di riferimento per i test uso MySql, a seguire i comandi per creare una base dati, una tabella.

mysql -u root -p

CREATE DATABASE miodatabase;

USE miodatabase;

CREATE TABLE miatabella (
    -> id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 
    -> nome VARCHAR(20),
    -> cognome VARCHAR(30),
    -> anni INT, 
    -> compleanno DATE);

DELETE FROM miatabella;
DROP TABLE miatabella;

Carichiamo dei dati in HDFS, con il vostro editor di testo preferito generiamo il file anagrafica.csv:

Carlo;Rossi;34;1974-01-10
Angelo;Bianchi;33;1972-01-10
Romeo;Frassino;57;1938-01-10

Inseriamo i dati su HDFS:

Avviare hadoop: /.../hadoop-2.7.4/sbin/start_all.sh

hdfs dfs -mkdir /tmp/anagrafica/
hdfs dfs -put anagrafica.csv /tmp/anagrafica/anagrafica.csv

Export control arguments

Argument	Description
--columns <col,col,col…>	Columns to export to table
--direct	Use direct export fast path
--export-dir <dir>	HDFS source path for the export
-m,–num-mappers <n>	Use n map tasks to export in parallel
--table <table-name>	Table to populate
--call <stored-proc-name>	Stored Procedure to call
--update-key <col-name>	Anchor column to use for updates. Use a comma separated list of columns if there are more than one column.
--update-mode <mode>	Specify how updates are performed when new rows are found with non-matching keys in database. Legal values for mode include updateonly (default) and allowinsert.
--input-null-string <null-string>	The string to be interpreted as null for string columns
--input-null-non-string <null-string>	The string to be interpreted as null for non-string columns
--staging-table <staging-table-name>	The table in which data will be staged before being inserted into the destination table.
--clear-staging-table	Indicates that any data present in the staging table can be deleted.
--batch	Use batch mode for underlying statement execution.

Input parsing arguments:

Argument	Description
--input-enclosed-by <char>	Sets a required field encloser
--input-escaped-by <char>	Sets the input escape character
--input-fields-terminated-by <char>	Sets the input field separator
--input-lines-terminated-by <char>	Sets the input end-of-line character
--input-optionally-enclosed-by <char>	Sets a field enclosing character

Output line formatting arguments:

Argument	Description
--enclosed-by <char>	Sets a required field enclosing character
--escaped-by <char>	Sets the escape character
--fields-terminated-by <char>	Sets the field separator character
--lines-terminated-by <char>	Sets the end-of-line character
--mysql-delimiters	Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ’
--optionally-enclosed-by <char>	Sets a field enclosing character

Esempi:

sqoop export -libjars /home/simon/sqldeveloper/jdbc/lib/mysql-connector-java-5.1.14-bin.jar \
--connect jdbc:mysql://127.0.0.1:3306/miodatabase \
--driver com.mysql.jdbc.Driver \
--username root --password password \
--export-dir /tmp/anagrafica \
--table miatabella \
--fields-terminated-by ";" \
--columns "nome,cognome,anni,compleanno"

--escaped-by

Inseriamo nel nostro file una nuova riga con un escape \’: Ciccio;L\'Infante;23;2001-10-13

sqoop export -libjars /home/simon/sqldeveloper/jdbc/lib/mysql-connector-java-5.1.14-bin.jar \
--connect jdbc:mysql://127.0.0.1:3306/miodatabase \
--driver com.mysql.jdbc.Driver \
--username root --password password \
--export-dir /tmp/anagrafica \
--table miatabella \
--fields-terminated-by ";" \
--columns "nome,cognome,anni,compleanno" \
--escaped-by "\\"

--input-optionally-enclosed-by

Inseriamo nel nostro file una nuova riga con un escape \ e l'apice come delimitatore di campo: Gianpietro;'L\'Arcere';23;2001-10-13

sqoop export -libjars /home/simon/sqldeveloper/jdbc/lib/mysql-connector-java-5.1.14-bin.jar \
--connect jdbc:mysql://127.0.0.1:3306/miodatabase \
--driver com.mysql.jdbc.Driver \
--username root --password password \
--export-dir /tmp/anagrafica \
--table miatabella \
--fields-terminated-by ";" \
--columns "nome,cognome,anni,compleanno" \
--escaped-by "\\" \
--input-optionally-enclosed-by "\'"