Import RDBMS table to HDFS with sqoop from postgreSQL

Steps:

1. Download JDBC driver

 

[code lang=”bash”]$wget http://jdbc.postgresql.org/download/postgresql-9.3-1102.jdbc4.jar[/code]

 

 2. Copy: 

[code lang=”bash”]cp /home/cloudera/Desktop/postgresql-9.3-1102.jdbc4.jar /usr/lib/sqoop/lib/ [/code]

 

3. Configure: 

[code lang=”bash”]/var/lib/pgsql/data/pg_hba.conf[/code]

file. You need to allow the IP/host of machine running Hadoop.

Restart postgreSQL using 

[code lang=”bash”]$pg_ctl restart[/code]

 

4. Run sqoop: Open the terminal on machine running hadoop and type the below command.

 

[code lang=”bash”] cloudera@cloudera-vm:/usr/lib/sqoop bin/sqoop import –connect jdbc:postgresql://192.168.0.34:5432/Testdb–table employee –username postgres -P –target-dir /sqoopOut1 -m 1 [/code]

 

Enter password:

 

prerequisites:

  • Cloudera hadoop VM distribution or any other machine running hadoop.
  • postgreSQL installation.
  • database Testdb and employee table on a running instance of postgreSQL (e.g.; 192.168.0.34:5432 in point 4).

 

All set! Your pgsql table data is now available on HDFS of  VM hadoop cluster.

 

Enjoy hadoop learning!

PostgreSQL useful tips

I am posting some of the PostgreSQL commands which I use frequently.

Create a new database name testdb

# createdb <dbname>
e.g:
[code lang=”sql”]
# createdb testdb
[/code]
Remove a PostgreSQL database
# dropdb <dbname>
e.g:
[code lang=”sql”]
# dropdb testdb
[/code]

Backing up a PostgreSQL database
[code lang=”sql”]
# su – postgres
[/code]

[code lang=”sql”]
# pg_dump –blob -Fc testdb -f /var/lib/pgsql/backups/testdb_backup
[/code]

Restoring PostgreSQL database from back up dump
[code lang=”sql”]
# pg_restore –dbname=testdb /var/lib/pgsql/backups/testdb_backup
[/code]

Writing query output to a CSV file:
[code lang=”sql”]
# \o ‘tmp/logs/query_out_dump.csv’
[/code]

After this operation all the query results will be stored in a CSV file.
Using console again for query output:
[code lang=”sql”]
# \o
[/code]
For more on pg_dump and pg_restore pl. check the documentation