grep and map-Two magical operators

Over the years I have extensively used map and grep in Perl, JavaScript, Python, Linux. I am sure most of the programmers love using these two operators. Read below some of my personal notes gathered from several resources and own understanding.

map – transforms the values of a list

The “map” function applies a transformation to each element of a list and returns the result, leaving the original list unchanged. A map can also be seen as the form of foreach loop but with the map, the implementation is much cleaner.

Ex: map { $_ => undef} is better than, map{$_=>1} the former one will save memory. It’s a better idea to use undef instead of 1.

Instead of one scalar of the value ‘1’ for every key, you get to share the same undef value for all the keys and thus don’t have to allocate tons of memory you aren’t going to use anyway.

The above idiom is a simple way of creating a list of unique values from another list, as the output of the code aptly demonstrates. However, with all those curly braces it may not be immediately obvious what’s going on, so let’s break it down.

map { $_ => 1 } @list

This is pretty straight-forward – create a list of key-value pairs where the keys are the values from @list

{ map { $_ => 1 } @list }

The surrounding pair of curly braces creates an anonymous hash which is populated with they key-value pairs from the map statement. So we now have a hash reference to an anonymous hash whose keys are the elements from @list, and because hash keys are unique, the keys of the anonymous hash represents a unique set of values.

keys %{ { map { $_ => 1 } @list } }

Finally, with the last pair of curly braces, the hash reference to the anonymous hash is dereferenced and we get its list of keys.

grep – filters a list

The “grep” function returns only the elements of a list that meet a certain condition:

@positive_numbers = grep($_ > 0, @numbers);

As you can see, each element is refered to as “$_”. This (plus the fact that parentheses are optional) allows you write commands that look similar to invocations of the Unix “grep” program:

@non_blank_lines = grep /S/, @lines;

In addition, you can specify a code block rather than a single condition:

@non_blank_lines = grep { /S/ } @lines; # Equivalent to the above.

Obviously it doesn’t matter in this case, but code blocks are helpful when you want a complex filter with multiple lines of code. The result of the code block is the result of the last statement executed:

# All positive numbers can be used as exponents,
# but negative exponents must be integers.
@can_be_used_as_exponent = grep {
if ( $_ < 0 ) {
! /./; # No decimal point -> integer.
}
else {
1; # Always true.
}
} @array;

What “grep” and “map” have in common?

"grep" and "map" have a lot in common. They both “magically” take a piece of code (either an expression or a code block) as a parameter. You need to put a comma after an expression but shouldn’t put a comma after a code block. Changing "$_" in "grep" or "map" will change the original list.

This isn’t generally a good idea because it makes the code hard to read. Remember that "map" builds a list of results by evaluating an expression, NOT by setting "$_". A side effect of this fact is that you should not use "s///" with "map". The "s///" operator changes "$_" rather than returning a result, so you won’t get what you would expect if you use it with "map" (and you CERTAINLY shouldn’t use it with "grep").

Happy programming!

References:
Discussion on PerlMonks
perldoc

Creating recurring date patterns using Perl

This program will be helpful if someone want to create recur date patterns based on criteria (yearly, monthly,weekly and daily). Program is written in Perl using old version of Date::Manip CPAN module.

# Script to calculate recurrence dates based on given criteria using Perl Date::Manip module.
# All the input dates in this are given hard coded. These shall be passed through external program etc.
#!/usr/local/bin/perl -w
use strict;
use Date::Manip; 
use Data::Dumper;  
# calculate the dates for yearly patterns.
&yearly();
&monthly();
&weekly();
&daily();
sub yearly {
	my $base = "2015-10-29";
	my $start_date = "2015-10-29";
	my $end_date = "2018-01-01";
	my $yearly_recur_every ="1";
	my $yearly_on_month = "10";
	my $yearly_on_week = "0";
	my $yearly_on_day = "29";
	my $yearly_on_the_month = "10";
	my $yearly_on_the_week = "1";
	my $yearly_on_the_day = "1";
	my $frequency = "";
	my $frequency_pattern_yearly_on = "$yearly_recur_every*$yearly_on_month:$yearly_on_week:$yearly_on_day:0:0:0";
	my $frequency_pattern_yearly_on_the = "$yearly_recur_every*$yearly_on_the_month:$yearly_on_the_week:$yearly_on_the_day:0:0:0";
	my @yearly_dates_on = ParseRecur($frequency_pattern_yearly_on,$base,$start_date,$end_date); # On a certain day of a month
	my @yearly_dates_on_the = ParseRecur($frequency_pattern_yearly_on_the,$base,$start_date,$end_date); # First Monday of Oct 
	print "\n";
	print "******************************************************************************\n";
	print "**************************** YEARLY *******************************************\n";
	print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "******************************************************************************\n";
print "Temporal expression: every 1 year on October 29\n";
print "Rule: ".$frequency_pattern_yearly_on;
print "\n";
print "Dates:\n";
print Dumper (\@yearly_dates_on);
print "\n";
print "Temporal expression: every 1 year on the first Monday of October\n";
print "Rule: ".$frequency_pattern_yearly_on_the;
print "\n";
print "Dates:\n";
print Dumper (\@yearly_dates_on_the);
print "\n";
}
# Monthly
sub monthly () {
	my $base = "2015-10-29";
	my $start_date = "2016-01-22";
	my $end_date = "2017-06-01";
	my $monthly_recur_every ="1";
	my $monthly_day_of = "29";
	my $monthly_the_day = "1";
	my $monthly_the_week = "1";
	my $frequency = "";
	my $frequency_pattern_monthly_day = "0:$monthly_recur_every*0:$monthly_day_of:0:0:0";
	my $frequency_pattern_monthly_the_day ="0:1*-2:5:0:0:0"; # Every month on the 2nd last Friday
	my @monthly_dates_day = ParseRecur($frequency_pattern_monthly_day,$base,$start_date,$end_date); # On a certain day of a month
	my @monthly_dates_the_day = ParseRecur($frequency_pattern_monthly_the_day,$base,$start_date,$end_date); # First Monday of Oct 
	
print "\n";
print "******************************************************************************\n";
print "**************************** MONTHLY *******************************************\n";
print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "******************************************************************************\n";
print "Temporal expression: Day 29 of every 1 month\n";
print "Rule: ".$frequency_pattern_monthly_day;
print "\n";
print "Dates:\n";
print Dumper (\@monthly_dates_day);
print "\n";
print "Temporal expression: The first monday of every month\n";
print "Rule: ".$frequency_pattern_monthly_the_day;
print "\n";
print "Dates:\n";
print Dumper (\@monthly_dates_the_day);
print "\n";
}
# Weekly
sub weekly () {
	my $base = "2015-10-29";
	my $start_date = "2016-01-22";
	my $end_date = "2016-03-01";
	my $weekly_recur_every ="1";
	# We need to add comma on the value we are getting from UI .. if the field is not selected means no value then 
	# no comma will be added
	my $first_day_of_the_week = ""; # Monday
	my $second_day_of_the_week = "2,"; # Tuesday
	my $third_day_of_the_week = ""; # Wednesday
	my $fourth_day_of_the_week = "4,"; #Thrusday
	my $fifth_day_of_the_week = ""; # Friday
	my $sixth_day_of_the_week = ""; # Saturday
	my $seventh_day_of_the_week = ""; # Sunday
	# my $weekly_the_day = "1";
	# my $weekly_the_week = "1";
	my $frequency = "";
	my $frequency_pattern_weekly_day = "0:0:$weekly_recur_every*$first_day_of_the_week$second_day_of_the_week$third_day_of_the_week$fourth_day_of_the_week$fifth_day_of_the_week$sixth_day_of_the_week$seventh_day_of_the_week:0:0:0";
		my @weekly_dates_day = ParseRecur($frequency_pattern_weekly_day,$base,$start_date,$end_date); # On a certain day of a month
	
print "\n";
print "******************************************************************************\n";
print "**************************** WEEKLY *******************************************\n";
print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "Temporal expression: Every every week on Tuesday and Thrusday\n";
print "Rule: ".$frequency_pattern_weekly_day;
print "\n";
print "Dates:\n";
print Dumper (\@weekly_dates_day);
print "\n";
}
# Daily
sub daily () {
	my $base = "2015-10-29";
	my $start_date = "2016-01-22";
	my $end_date = "2016-02-05";
	my $daily_recur_everyday ="1";
	# We need to add comma on the value we are getting from UI .. if the field is not selected means no value then 
	# no comma will be added
	my $first_day_of_the_weekday = "1,"; # Monday
	my $second_day_of_the_weekday = "2,"; # Tuesday
	my $third_day_of_the_weekday = "3,"; # Wednesday
	my $fourth_day_of_the_weekday = "4,"; #Thrusday
	my $fifth_day_of_the_weekday = "5"; # Friday
	my $daily_start_time = "8:00"; # 8AM
	my $frequency = "";
	my $frequency_pattern_daily_everyday = "0:0:0:$daily_recur_everyday*0:0:0";
	# 0:1*1-5:$dow:0:0:0";
	# "0:0:0:$n*0:0:0";  # Every nth day
	my $frequency_pattern_daily_every_weekday = "0:0:$daily_recur_everyday*$first_day_of_the_weekday$second_day_of_the_weekday$third_day_of_the_weekday$fourth_day_of_the_weekday$fifth_day_of_the_weekday:0:0:0";
	my @daily_dates_everyday = ParseRecur($frequency_pattern_daily_everyday,$base,$start_date,$end_date); # On a certain day of a month
	my @daily_dates_every_weekday = ParseRecur($frequency_pattern_daily_every_weekday,$base,$start_date,$end_date); # On a certain day of a month
	print "\n";
	print "******************************************************************************\n";
	print "**************************** DAILY *******************************************\n";
	print "******************************************************************************\n";
	print "Start date: ". $start_date."\n";
	print "End date: ". $end_date."\n";
	print "\n";
	print "Temporal expression: Everyday\n";
	print "Rule: ".$frequency_pattern_daily_everyday;
	print "\n";
	print "Dates:".@daily_dates_everyday."\n";
	print Dumper (\@daily_dates_everyday);
	print "\n";
	print "Temporal expression: Every weekday\n";
	print "Rule: ".$frequency_pattern_daily_every_weekday;
	print "\n";
	print "Dates:\n";
	print Dumper (\@daily_dates_every_weekday);
	print "\n";
	}
	
	# End of script

Full working code is available on GitHub with documentation.

Enjoy,

apache lucy search examples

Investigating search engines and this time apache Lucy 0.4.2. I am showing a basic indexer and a small search application. See below code for indexer (This will take documents one by one and then index them). Search module will take arugument as STDIN and then will show the search result.

This is pure command line utility just to show how basic indexing and searching works using apache lucy.

indexer.pl

#!/usr/local/bin/perl

use strict;
use warnings;
use Lucy::Simple;

#
# Ensure the index directory is both available and empty.
#
my $index = "/ppant/LucyTest/index";
system( "rm", "-rf", $index );
system( "mkdir", "-p", $index );
# Create the helper...a new Lucy::Simple object
my $lucy = Lucy::Simple new( path = $index, language = 'en', );

# Add the first "document". (We are mainly adding meta data of the document)
my %one = ( title ="This is a title of first article" , body ="some text inside the body we need to test the implementaion of lucy", id =1 );
$lucy-add_doc( \%one );

# Add the second "document".
my %two = ( title ="This is another article" , body ="I am putting some basic content, using some words which are also in first document like implementation", id =2 );
$lucy add_doc( \%two );

# Both the documents are now indexed in path

One indexing of the documents is done we'll make a small search script.

search.cgi

#!/usr/local/bin/perl

use strict;
use warnings;

use Lucy::Search::IndexSearcher;

my $term = shift || die "Usage: $0 search-term";

my $searcher = Lucy::Search::IndexSearcher new( index ='/ppant/LucyTest/index');
# A basic search command line which will look for indexed items based on STDIN and will show that in which document query string is found and no of hits
my $hits = $searcher hits( query =$term );
while ( my $hit = $hits next ) {
print "Title: $hit {title} - ID: $hit {id}\n";
}
# End of search.cgi


***********************************************************************

If you want to explore more check Full Code on GitHub

Implementing mapper attachment in elasticsearch

Create a new index

curl -X PUT "192.168.0.37:9200/test" -d '{
"settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}
}'

Mapping attachement type

curl -X PUT "192.168.0.37:9200/test/attachment/_mapping" -d '{
"attachment" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"title" : { "store" : "yes" },
"file" : { "term_vector":"with_positions_offsets", "store":"yes" }
} } } } }'

Shell script to convert content to base64 encoding and index

#!/bin/sh</code>

coded=`cat TestPDF.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" &gt; json.file
curl -X POST "192.168.0.37:9200/test/attachment/" -d @json.file


Query  (Search esults will be highlighted)
curl "192.168.0.37:9200/_search?pretty=true" -d '{
"fields" : ["title"],
"query" : {
"query_string" : {
"query" : "Cycling tips"
}
},
"highlight" : {
"fields" : {
"file" : {}
} } }'

***********************************************************************

If you want to explore more check Full Code on GitHub

Web development: LAMP: which programming languages should be used: Some thoughts

Now a days people keep asking which technology stack to be used for web development (LAMP, Java, Microsoft) and finally which programming language mainly server-side. Most of the expert says that use whichever you like and comfortable and I totally agree. If you intend to use Java and Microsoft based env then you don’t have much choice but if you are using LAMP stack then you have a lot of options so question again arises which language should be used? Again, I personally think that decision should mainly on based on the requirement, experience, comfort, team etc. Still here is my take based on my little own experiences working with languages:

Perl:
Pros: Old fellow still widely used, Very powerful, secure, well tested over the years in web dev, very good market repo among users, huge collection of open source libraries, new framework like Dancer, Mojolicious are positive sign.
Cons: Difficult to maintain (dirty syntax etc), Hard to get resources, industry is not very positive about its future versions.

Python:
Pros: Powerful, widely used in handling scientific data, academics, analytics, system administrators, Market sentiment is positive, Very good framework like Django.
Cons: Less flexible, performance issues mainly threading.

PHP:
Pros: Most preferred language, widely used, fast development, big community, huge available resource pool.
Cons: Some reported security loopholes, Less trustworthy, Market image as cheap and dirty option for quick development, multi-threading issue, debugging issues.

Ruby:
Pros: Very flexible, good support, positive image in communities, Very popular framework for web development (ROR).
Cons: Some benchmarks shows that its request-response time is a bit slow than others in same category, Getting good resources can be difficult.

Again few things differ project to project so choose based on your own requirement.

I personally prefer Perl 5.