Import RDBMS table to HDFS with sqoop from postgreSQL
Steps:
- Download JDBC driver
[code lang=”bash”]$wget http://jdbc.postgresql.org/download/postgresql-9.3-1102.jdbc4.jar[/code]
2. Copy:
[code lang=”bash”]cp /home/cloudera/Desktop/postgresql-9.3-1102.jdbc4.jar /usr/lib/sqoop/lib/ [/code]
3. Configure:
[code lang=”bash”]/var/lib/pgsql/data/pg_hba.conf[/code]
file. You need to allow the IP/host of machine running Hadoop.
Restart postgreSQL using
[code lang=”bash”]$pg_ctl restart[/code]
4. Run sqoop: Open the terminal on machine running hadoop and type the below command.
[code lang=”bash”] cloudera@cloudera-vm:/usr/lib/sqoop bin/sqoop import –connect jdbc:postgresql://192.168.0.34:5432/Testdb–table employee –username postgres -P –target-dir /sqoopOut1 -m 1 [/code]
Enter password:
prerequisites:
- Cloudera hadoop VM distribution or any other machine running hadoop.
- postgreSQL installation.
- database Testdb and employee table on a running instance of postgreSQL (e.g.; 192.168.0.34:5432 in point 4).
All set! Your pgsql table data is now available on HDFS of VM hadoop cluster.
Enjoy hadoop learning!