Import RDBMS table to HDFS with sqoop from postgreSQL


1. Download JDBC driver




 2. Copy: 

cp /home/cloudera/Desktop/postgresql-9.3-1102.jdbc4.jar /usr/lib/sqoop/lib/ 


3. Configure: 


file. You need to allow the IP/host of machine running Hadoop.

Restart postgreSQL using 

$pg_ctl restart


4. Run sqoop: Open the terminal on machine running hadoop and type the below command.


 cloudera@cloudera-vm:/usr/lib/sqoop bin/sqoop import --connect jdbc:postgresql:// employee --username postgres -P --target-dir /sqoopOut1 -m 1 


Enter password:



  • Cloudera hadoop VM distribution or any other machine running hadoop.
  • postgreSQL installation.
  • database Testdb and employee table on a running instance of postgreSQL (e.g.; in point 4).


All set! Your pgsql table data is now available on HDFS of  VM hadoop cluster.


Enjoy hadoop learning!

PostgreSQL useful tips

I am posting some of the PostgreSQL commands which I use frequently.

Create a new database name testdb

# createdb <dbname>

# createdb testdb

Remove a PostgreSQL database
# dropdb <dbname>

# dropdb testdb

Backing up a PostgreSQL database

# su - postgres

# pg_dump --blob -Fc testdb -f /var/lib/pgsql/backups/testdb_backup

Restoring PostgreSQL database from back up dump

# pg_restore --dbname=testdb /var/lib/pgsql/backups/testdb_backup

Writing query output to a CSV file:

# \o 'tmp/logs/query_out_dump.csv'

After this operation all the query results will be stored in a CSV file.
Using console again for query output:

# \o

For more on pg_dump and pg_restore pl. check the documentation