Import RDBMS table to HDFS with sqoop from postgreSQL

Steps:

1. Download JDBC driver

http://jdbc.postgresql.org/download/postgresql-9.3-1102.jdbc4.jar

 2. Copy: cp /home/cloudera/Desktop/postgresql-9.3-1102.jdbc4.jar /usr/lib/sqoop/lib/

3. Configure: /var/lib/pgsql/data/pg_hba.conf file. You need to allow the IP/host of machine running hadoop.

Restart postgreSQL using pg_ctl restart

4. Run sqoop: Open the terminal on machine running hadoop and type the below command.

cloudera@cloudera-vm:/usr/lib/sqoop$ bin/sqoop import –connect jdbc:postgresql:                                  //192.168.0.34:5432/Testdb –table employee –username postgres -P –target-dir   /sqoopOut1 -m 1

Enter password:

 

prerequisites:

  • Cloudera hadoop VM distribution or any other machine running hadoop.
  • postgreSQL installation.
  • database Testdb and employee table on a running instance of postgreSQL (e.g.; 192.168.0.34:5432 in point 4).

 

All set! Your pgsql table data is now available on HDFS of  VM hadoop cluster.

 

Enjoy hadoop learning!