Sito di riferimento : http://sqoop.apache.org/
Documentazione di riferimento: http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html
Il tool di export permette di inserire in una tabella già esistente su un db i dati contenuti su HDFS. L’operazione di base prevede l’inserimento dei dati
The default operation is to transform these into a set of INSERT statements that inject the records into the database. In “update mode,” Sqoop will generate UPDATE statements that replace existing records in the database, and in “call mode” Sqoop will make a stored procedure call for each record.
$ sqoop export (generic-args) (export-args) $ sqoop-export (generic-args) (export-args)
Argument | Description |
---|---|
--connect <jdbc-uri> | Specify JDBC connect string |
--connection-manager <class-name> | Specify connection manager class to use |
--driver <class-name> | Manually specify JDBC driver class to use |
--hadoop-mapred-home <dir> | Override $HADOOP_MAPRED_HOME |
--help | Print usage instructions |
--password-file | Set path for a file containing the authentication password |
-P | Read password from console |
--password <password> | Set authentication password |
--username <username> | Set authentication username |
--verbose | Print more information while working |
--connection-param-file <filename> | Optional properties file that provides connection parameters |
--relaxed-isolation | Set connection transaction isolation to read uncommitted for the mappers. |
Recap Mysql
come db di riferimento per i test uso MySql, a seguire i comandi per creare una base dati, una tabella.
mysql -u root -p CREATE DATABASE miodatabase; USE miodatabase; CREATE TABLE miatabella ( -> id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, -> nome VARCHAR(20), -> cognome VARCHAR(30), -> anni INT, -> compleanno DATE); DELETE FROM miatabella; DROP TABLE miatabella;
Carichiamo dei dati in HDFS, con il vostro editor di testo preferito generiamo il file anagrafica.csv:
Carlo;Rossi;34;1974-01-10 Angelo;Bianchi;33;1972-01-10 Romeo;Frassino;57;1938-01-10
Inseriamo i dati su HDFS:
Avviare hadoop: /.../hadoop-2.7.4/sbin/start_all.sh
hdfs dfs -mkdir /tmp/anagrafica/ hdfs dfs -put anagrafica.csv /tmp/anagrafica/anagrafica.csv
Argument | Description |
---|---|
--columns <col,col,col…> | Columns to export to table |
--direct | Use direct export fast path |
--export-dir <dir> | HDFS source path for the export |
-m,–num-mappers <n> | Use n map tasks to export in parallel |
--table <table-name> | Table to populate |
--call <stored-proc-name> | Stored Procedure to call |
--update-key <col-name> | Anchor column to use for updates. Use a comma separated list of columns if there are more than one column. |
--update-mode <mode> | Specify how updates are performed when new rows are found with non-matching keys in database. Legal values for mode include updateonly (default) and allowinsert. |
--input-null-string <null-string> | The string to be interpreted as null for string columns |
--input-null-non-string <null-string> | The string to be interpreted as null for non-string columns |
--staging-table <staging-table-name> | The table in which data will be staged before being inserted into the destination table. |
--clear-staging-table | Indicates that any data present in the staging table can be deleted. |
--batch | Use batch mode for underlying statement execution. |
Argument | Description |
---|---|
--input-enclosed-by <char> | Sets a required field encloser |
--input-escaped-by <char> | Sets the input escape character |
--input-fields-terminated-by <char> | Sets the input field separator |
--input-lines-terminated-by <char> | Sets the input end-of-line character |
--input-optionally-enclosed-by <char> | Sets a field enclosing character |
Argument | Description |
---|---|
--enclosed-by <char> | Sets a required field enclosing character |
--escaped-by <char> | Sets the escape character |
--fields-terminated-by <char> | Sets the field separator character |
--lines-terminated-by <char> | Sets the end-of-line character |
--mysql-delimiters | Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ’ |
--optionally-enclosed-by <char> | Sets a field enclosing character |
sqoop export -libjars /home/simon/sqldeveloper/jdbc/lib/mysql-connector-java-5.1.14-bin.jar \ --connect jdbc:mysql://127.0.0.1:3306/miodatabase \ --driver com.mysql.jdbc.Driver \ --username root --password password \ --export-dir /tmp/anagrafica \ --table miatabella \ --fields-terminated-by ";" \ --columns "nome,cognome,anni,compleanno"
--escaped-by
Inseriamo nel nostro file una nuova riga con un escape \’: Ciccio;L\'Infante;23;2001-10-13
sqoop export -libjars /home/simon/sqldeveloper/jdbc/lib/mysql-connector-java-5.1.14-bin.jar \ --connect jdbc:mysql://127.0.0.1:3306/miodatabase \ --driver com.mysql.jdbc.Driver \ --username root --password password \ --export-dir /tmp/anagrafica \ --table miatabella \ --fields-terminated-by ";" \ --columns "nome,cognome,anni,compleanno" \ --escaped-by "\\"
--input-optionally-enclosed-by
Inseriamo nel nostro file una nuova riga con un escape \ e l'apice come delimitatore di campo: Gianpietro;'L\'Arcere';23;2001-10-13
sqoop export -libjars /home/simon/sqldeveloper/jdbc/lib/mysql-connector-java-5.1.14-bin.jar \ --connect jdbc:mysql://127.0.0.1:3306/miodatabase \ --driver com.mysql.jdbc.Driver \ --username root --password password \ --export-dir /tmp/anagrafica \ --table miatabella \ --fields-terminated-by ";" \ --columns "nome,cognome,anni,compleanno" \ --escaped-by "\\" \ --input-optionally-enclosed-by "\'"