Stanford Machine learning class slides

Andrew NG Machine learning class is the best class so far which I took online.

Apart from the course video sometimes lecture slides are also important for quick reference. For quite some time, I was looking for them as they are not available on course home.

Here all the lecture slides available at:
https://d396qusza40orc.cloudfront.net/ml/docs/slides/Lecture1.pdf

Lecture2.pdf

Lecture3.pdf

Lecture4.pdf

and so on…

 

My own experience slides only make sense if you go through the full video course.  Professor is an amazing teacher.

 

Enjoy learning.

 

Getting and cleaning data using R programming project notes

Brief notes of my learning from course project of getting and cleaning data course from John Hopkins University.

The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. Final goal here is to prepare tidy data that can be used for later analysis.

One of the most exciting areas in all of the data science right now is wearable computing – see for example companies like Fitbit, Nike, tomtom, Garmin etc are racing to develop the most advanced algorithms to attract new users. In this case study, the data is collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at the site where the data was obtained:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Here is the dataset for the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

I have created an R script called run_analysis.R which does the following.

  • Merges the training and the test sets to create one data set.
  • Extracts only the measurements on the mean and standard deviation for each measurement.
  • Uses descriptive activity names to name the activities in the data set.
  • Appropriately labels the data set with descriptive variable names.
  • Finally, creates a second, independent tidy data set with the average of each variable for each activity and each subject.References:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
https://www.coursera.org/learn/data-cleaning
https://github.com/ppant/getting-and-cleaning-data-project-coursera

 

For working code and tidy dataset please check my Github repo.

 

Accessing Github API with OAuth example using R

Modern API provided by Google, Twitter, Facebook, Github etc uses OAuth for authentication and authorization. In this example, I am using GitHub API. We get a JSON response which can be used to fetch specific information. In this code I have used my Github account.Code is written R programming languages.

Here are the steps:
1. Find OAuth settings for Github
2. Create a application in Github
3. Add/Modify secret keys
4. Get OAuth credentials
5. Finally use API and parse json data to show response

## Load required modules
library(httr)
library(httpuv)
require(jsonlite)

# 1. Find OAuth settings for github:
# http://developer.github.com/v3/oauth/
oauth_endpoints("github")

# 2. To make your own application, register at at
# https://github.com/settings/applications.
## https://github.com/settings/applications/321837
## Use any URL for the homepage URL
# (http://github.com is fine) and http://localhost:1410 as the callback url. You will need httpuv

## Add Secret keys
## Secret keys can be get from developer github
myapp <- oauth_app("github",
key = "7cd28c82639b7cf76fcc",
secret = "d1c90e32e12baa81dabec79cd1ea7d8edfd6bf53")

# 3. Get OAuth credentials
github_token <- oauth2.0_token(oauth_endpoints("github"), myapp)
## Authentication will be done automatically

# 4. Use API
gtoken <- config(token = github_token)
req <- GET("https://api.github.com/users/ppant/repos", gtoken)
stop_for_status(req)
##content(req)
output <- content(req)
## Either of the two can be used to fetch the required info, name and date created of repo ProgrammingAssignment3
out<-list(output[[30]]$name, output[[30]]$created_at)

BROWSE("https://api.github.com/users/ppant/repos",authenticate("Access Token","x-oauth-basic","basic"))
# OR:
req <- with_config(gtoken, GET("https://api.github.com/users/ppant/repos"))
stop_for_status(req)
content(req)


For updated code please check github

Creating recurring date patterns using Perl

This program will be helpful if someone want to create recur date patterns based on criteria (yearly, monthly,weekly and daily). Program is written in Perl using old version of Date::Manip CPAN module.

# Script to calculate recurrence dates based on given criteria using Perl Date::Manip module.
# All the input dates in this are given hard coded. These shall be passed through external program etc.
#!/usr/local/bin/perl -w
use strict;
use Date::Manip; 
use Data::Dumper;  
# calculate the dates for yearly patterns.
&yearly();
&monthly();
&weekly();
&daily();
sub yearly {
	my $base = "2015-10-29";
	my $start_date = "2015-10-29";
	my $end_date = "2018-01-01";
	my $yearly_recur_every ="1";
	my $yearly_on_month = "10";
	my $yearly_on_week = "0";
	my $yearly_on_day = "29";
	my $yearly_on_the_month = "10";
	my $yearly_on_the_week = "1";
	my $yearly_on_the_day = "1";
	my $frequency = "";
	my $frequency_pattern_yearly_on = "$yearly_recur_every*$yearly_on_month:$yearly_on_week:$yearly_on_day:0:0:0";
	my $frequency_pattern_yearly_on_the = "$yearly_recur_every*$yearly_on_the_month:$yearly_on_the_week:$yearly_on_the_day:0:0:0";
	my @yearly_dates_on = ParseRecur($frequency_pattern_yearly_on,$base,$start_date,$end_date); # On a certain day of a month
	my @yearly_dates_on_the = ParseRecur($frequency_pattern_yearly_on_the,$base,$start_date,$end_date); # First Monday of Oct 
	print "\n";
	print "******************************************************************************\n";
	print "**************************** YEARLY *******************************************\n";
	print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "******************************************************************************\n";
print "Temporal expression: every 1 year on October 29\n";
print "Rule: ".$frequency_pattern_yearly_on;
print "\n";
print "Dates:\n";
print Dumper (\@yearly_dates_on);
print "\n";
print "Temporal expression: every 1 year on the first Monday of October\n";
print "Rule: ".$frequency_pattern_yearly_on_the;
print "\n";
print "Dates:\n";
print Dumper (\@yearly_dates_on_the);
print "\n";
}
# Monthly
sub monthly () {
	my $base = "2015-10-29";
	my $start_date = "2016-01-22";
	my $end_date = "2017-06-01";
	my $monthly_recur_every ="1";
	my $monthly_day_of = "29";
	my $monthly_the_day = "1";
	my $monthly_the_week = "1";
	my $frequency = "";
	my $frequency_pattern_monthly_day = "0:$monthly_recur_every*0:$monthly_day_of:0:0:0";
	my $frequency_pattern_monthly_the_day ="0:1*-2:5:0:0:0"; # Every month on the 2nd last Friday
	my @monthly_dates_day = ParseRecur($frequency_pattern_monthly_day,$base,$start_date,$end_date); # On a certain day of a month
	my @monthly_dates_the_day = ParseRecur($frequency_pattern_monthly_the_day,$base,$start_date,$end_date); # First Monday of Oct 
	
print "\n";
print "******************************************************************************\n";
print "**************************** MONTHLY *******************************************\n";
print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "******************************************************************************\n";
print "Temporal expression: Day 29 of every 1 month\n";
print "Rule: ".$frequency_pattern_monthly_day;
print "\n";
print "Dates:\n";
print Dumper (\@monthly_dates_day);
print "\n";
print "Temporal expression: The first monday of every month\n";
print "Rule: ".$frequency_pattern_monthly_the_day;
print "\n";
print "Dates:\n";
print Dumper (\@monthly_dates_the_day);
print "\n";
}
# Weekly
sub weekly () {
	my $base = "2015-10-29";
	my $start_date = "2016-01-22";
	my $end_date = "2016-03-01";
	my $weekly_recur_every ="1";
	# We need to add comma on the value we are getting from UI .. if the field is not selected means no value then 
	# no comma will be added
	my $first_day_of_the_week = ""; # Monday
	my $second_day_of_the_week = "2,"; # Tuesday
	my $third_day_of_the_week = ""; # Wednesday
	my $fourth_day_of_the_week = "4,"; #Thrusday
	my $fifth_day_of_the_week = ""; # Friday
	my $sixth_day_of_the_week = ""; # Saturday
	my $seventh_day_of_the_week = ""; # Sunday
	# my $weekly_the_day = "1";
	# my $weekly_the_week = "1";
	my $frequency = "";
	my $frequency_pattern_weekly_day = "0:0:$weekly_recur_every*$first_day_of_the_week$second_day_of_the_week$third_day_of_the_week$fourth_day_of_the_week$fifth_day_of_the_week$sixth_day_of_the_week$seventh_day_of_the_week:0:0:0";
		my @weekly_dates_day = ParseRecur($frequency_pattern_weekly_day,$base,$start_date,$end_date); # On a certain day of a month
	
print "\n";
print "******************************************************************************\n";
print "**************************** WEEKLY *******************************************\n";
print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "Temporal expression: Every every week on Tuesday and Thrusday\n";
print "Rule: ".$frequency_pattern_weekly_day;
print "\n";
print "Dates:\n";
print Dumper (\@weekly_dates_day);
print "\n";
}
# Daily
sub daily () {
	my $base = "2015-10-29";
	my $start_date = "2016-01-22";
	my $end_date = "2016-02-05";
	my $daily_recur_everyday ="1";
	# We need to add comma on the value we are getting from UI .. if the field is not selected means no value then 
	# no comma will be added
	my $first_day_of_the_weekday = "1,"; # Monday
	my $second_day_of_the_weekday = "2,"; # Tuesday
	my $third_day_of_the_weekday = "3,"; # Wednesday
	my $fourth_day_of_the_weekday = "4,"; #Thrusday
	my $fifth_day_of_the_weekday = "5"; # Friday
	my $daily_start_time = "8:00"; # 8AM
	my $frequency = "";
	my $frequency_pattern_daily_everyday = "0:0:0:$daily_recur_everyday*0:0:0";
	# 0:1*1-5:$dow:0:0:0";
	# "0:0:0:$n*0:0:0";  # Every nth day
	my $frequency_pattern_daily_every_weekday = "0:0:$daily_recur_everyday*$first_day_of_the_weekday$second_day_of_the_weekday$third_day_of_the_weekday$fourth_day_of_the_weekday$fifth_day_of_the_weekday:0:0:0";
	my @daily_dates_everyday = ParseRecur($frequency_pattern_daily_everyday,$base,$start_date,$end_date); # On a certain day of a month
	my @daily_dates_every_weekday = ParseRecur($frequency_pattern_daily_every_weekday,$base,$start_date,$end_date); # On a certain day of a month
	print "\n";
	print "******************************************************************************\n";
	print "**************************** DAILY *******************************************\n";
	print "******************************************************************************\n";
	print "Start date: ". $start_date."\n";
	print "End date: ". $end_date."\n";
	print "\n";
	print "Temporal expression: Everyday\n";
	print "Rule: ".$frequency_pattern_daily_everyday;
	print "\n";
	print "Dates:".@daily_dates_everyday."\n";
	print Dumper (\@daily_dates_everyday);
	print "\n";
	print "Temporal expression: Every weekday\n";
	print "Rule: ".$frequency_pattern_daily_every_weekday;
	print "\n";
	print "Dates:\n";
	print Dumper (\@daily_dates_every_weekday);
	print "\n";
	}
	
	# End of script

Full working code is available on GitHub with documentation.

Enjoy,

apache lucy search examples

Investigating search engines and this time apache Lucy 0.4.2. I am showing a basic indexer and a small search application. See below code for indexer (This will take documents one by one and then index them). Search module will take arugument as STDIN and then will show the search result.

This is pure command line utility just to show how basic indexing and searching works using apache lucy.

indexer.pl

#!/usr/local/bin/perl

use strict;
use warnings;
use Lucy::Simple;

#
# Ensure the index directory is both available and empty.
#
my $index = "/ppant/LucyTest/index";
system( "rm", "-rf", $index );
system( "mkdir", "-p", $index );
# Create the helper...a new Lucy::Simple object
my $lucy = Lucy::Simple new( path = $index, language = 'en', );

# Add the first "document". (We are mainly adding meta data of the document)
my %one = ( title ="This is a title of first article" , body ="some text inside the body we need to test the implementaion of lucy", id =1 );
$lucy-add_doc( \%one );

# Add the second "document".
my %two = ( title ="This is another article" , body ="I am putting some basic content, using some words which are also in first document like implementation", id =2 );
$lucy add_doc( \%two );

# Both the documents are now indexed in path

One indexing of the documents is done we'll make a small search script.

search.cgi

#!/usr/local/bin/perl

use strict;
use warnings;

use Lucy::Search::IndexSearcher;

my $term = shift || die "Usage: $0 search-term";

my $searcher = Lucy::Search::IndexSearcher new( index ='/ppant/LucyTest/index');
# A basic search command line which will look for indexed items based on STDIN and will show that in which document query string is found and no of hits
my $hits = $searcher hits( query =$term );
while ( my $hit = $hits next ) {
print "Title: $hit {title} - ID: $hit {id}\n";
}
# End of search.cgi


***********************************************************************

If you want to explore more check Full Code on GitHub

Implementing mapper attachment in elasticsearch

Create a new index

curl -X PUT "192.168.0.37:9200/test" -d '{
"settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}
}'

Mapping attachement type

curl -X PUT "192.168.0.37:9200/test/attachment/_mapping" -d '{
"attachment" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"title" : { "store" : "yes" },
"file" : { "term_vector":"with_positions_offsets", "store":"yes" }
} } } } }'

Shell script to convert content to base64 encoding and index

#!/bin/sh</code>

coded=`cat TestPDF.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" &gt; json.file
curl -X POST "192.168.0.37:9200/test/attachment/" -d @json.file


Query  (Search esults will be highlighted)
curl "192.168.0.37:9200/_search?pretty=true" -d '{
"fields" : ["title"],
"query" : {
"query_string" : {
"query" : "Cycling tips"
}
},
"highlight" : {
"fields" : {
"file" : {}
} } }'

***********************************************************************

If you want to explore more check Full Code on GitHub

RESTfull brief overview

Putting some thoughts mainly for newbies trying to understand REST. Sometimes I observed that the actual documentation on the web is too technical or abstract and scattered which a bit difficult for newbies..  I am writing down some broad points which in my view REST is and it’s equation with HTTP … these points based on my reading and experience working with REST. It doesn’t replace any of REST documentation. Advised to go through references given at the end for detaled study.
  • REST  REpresentational State Transfer is a design pattern/design concept (architecture) used in web applications for managing state information-> REST is not a tool/techology/specification/protocol. In other words, REST isn’t a tangible thing like a piece of software or even a specification, it’s a selection of ideals, of best practices distilled from the HTTP specs.
  •  If you are using HTTP you are being RESTful to some degree since HTTP is a REST protocol, but to take full advantage of the platform, APIs should use RESTful practices as much as possible.
  • We can make our application more RESTful by using the correct HTTP methods.
  • RESTful Web service is required to be stateless in it’s communication between server and client  so you should be able to request almost any format while non-REST mainly SOAP uses XML.
  • URL is the important part of REST. REST is more than GET/POST actually browsers pretty much just GET stuff. They don’t do a lot of other types of interaction with resources. This is a problem because it has led many people to assume that HTTP is just for GETing. But HTTP is actually ageneral purpose protocol for applying http verbs (HTTP verbs (GET, POST, PUT, DELETE, etc.) to nouns (object/web page which has a URL).

Recommended reading:

Paper by Roy Fielding http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

HTTP Specs http://www.w3.org/Protocols/Specs.html

How I Explained REST to My Wife:
http://www.looah.com/source/view/2284

Happy reading!

Web development: LAMP: which programming languages should be used: Some thoughts

Now a days people keep asking which technology stack to be used for web development (LAMP, Java, Microsoft) and finally which programming language mainly server-side. Most of the expert says that use whichever you like and comfortable and I totally agree. If you intend to use Java and Microsoft based env then you don’t have much choice but if you are using LAMP stack then you have a lot of options so question again arises which language should be used? Again, I personally think that decision should mainly on based on the requirement, experience, comfort, team etc. Still here is my take based on my little own experiences working with languages:

Perl:
Pros: Old fellow still widely used, Very powerful, secure, well tested over the years in web dev, very good market repo among users, huge collection of open source libraries, new framework like Dancer, Mojolicious are positive sign.
Cons: Difficult to maintain (dirty syntax etc), Hard to get resources, industry is not very positive about its future versions.

Python:
Pros: Powerful, widely used in handling scientific data, academics, analytics, system administrators, Market sentiment is positive, Very good framework like Django.
Cons: Less flexible, performance issues mainly threading.

PHP:
Pros: Most preferred language, widely used, fast development, big community, huge available resource pool.
Cons: Some reported security loopholes, Less trustworthy, Market image as cheap and dirty option for quick development, multi-threading issue, debugging issues.

Ruby:
Pros: Very flexible, good support, positive image in communities, Very popular framework for web development (ROR).
Cons: Some benchmarks shows that its request-response time is a bit slow than others in same category, Getting good resources can be difficult.

Again few things differ project to project so choose based on your own requirement.

I personally prefer Perl 5.

Import RDBMS table to HDFS with sqoop from postgreSQL

Steps:

1. Download JDBC driver

 

$wget http://jdbc.postgresql.org/download/postgresql-9.3-1102.jdbc4.jar

 

 2. Copy: 

cp /home/cloudera/Desktop/postgresql-9.3-1102.jdbc4.jar /usr/lib/sqoop/lib/ 

 

3. Configure: 

/var/lib/pgsql/data/pg_hba.conf

file. You need to allow the IP/host of machine running Hadoop.

Restart postgreSQL using 

$pg_ctl restart

 

4. Run sqoop: Open the terminal on machine running hadoop and type the below command.

 

 cloudera@cloudera-vm:/usr/lib/sqoop bin/sqoop import --connect jdbc:postgresql://192.168.0.34:5432/Testdb--table employee --username postgres -P --target-dir /sqoopOut1 -m 1 

 

Enter password:

 

prerequisites:

  • Cloudera hadoop VM distribution or any other machine running hadoop.
  • postgreSQL installation.
  • database Testdb and employee table on a running instance of postgreSQL (e.g.; 192.168.0.34:5432 in point 4).

 

All set! Your pgsql table data is now available on HDFS of  VM hadoop cluster.

 

Enjoy hadoop learning!