grep and map-Two magical operators

Over the years I have extensively used map and grep in Perl, JavaScript, Python, Linux. I am sure most of the programmers love using these two operators. Read below some of my personal notes gathered from several resources and own understanding.

map – transforms the values of a list

The “map” function applies a transformation to each element of a list and returns the result, leaving the original list unchanged. A map can also be seen as the form of foreach loop but with the map, the implementation is much cleaner.

Ex: map { $_ => undef} is better than, map{$_=>1} the former one will save memory. It’s a better idea to use undef instead of 1.

Instead of one scalar of the value ‘1’ for every key, you get to share the same undef value for all the keys and thus don’t have to allocate tons of memory you aren’t going to use anyway.

The above idiom is a simple way of creating a list of unique values from another list, as the output of the code aptly demonstrates. However, with all those curly braces it may not be immediately obvious what’s going on, so let’s break it down.

map { $_ => 1 } @list

This is pretty straight-forward – create a list of key-value pairs where the keys are the values from @list

{ map { $_ => 1 } @list }

The surrounding pair of curly braces creates an anonymous hash which is populated with they key-value pairs from the map statement. So we now have a hash reference to an anonymous hash whose keys are the elements from @list, and because hash keys are unique, the keys of the anonymous hash represents a unique set of values.

keys %{ { map { $_ => 1 } @list } }

Finally, with the last pair of curly braces, the hash reference to the anonymous hash is dereferenced and we get its list of keys.

grep – filters a list

The “grep” function returns only the elements of a list that meet a certain condition:

@positive_numbers = grep($_ > 0, @numbers);

As you can see, each element is refered to as “$_”. This (plus the fact that parentheses are optional) allows you write commands that look similar to invocations of the Unix “grep” program:

@non_blank_lines = grep /S/, @lines;

In addition, you can specify a code block rather than a single condition:

@non_blank_lines = grep { /S/ } @lines; # Equivalent to the above.

Obviously it doesn’t matter in this case, but code blocks are helpful when you want a complex filter with multiple lines of code. The result of the code block is the result of the last statement executed:

# All positive numbers can be used as exponents,
# but negative exponents must be integers.
@can_be_used_as_exponent = grep {
if ( $_ < 0 ) {
! /./; # No decimal point -> integer.
}
else {
1; # Always true.
}
} @array;

What “grep” and “map” have in common?

"grep" and "map" have a lot in common. They both “magically” take a piece of code (either an expression or a code block) as a parameter. You need to put a comma after an expression but shouldn’t put a comma after a code block. Changing "$_" in "grep" or "map" will change the original list.

This isn’t generally a good idea because it makes the code hard to read. Remember that "map" builds a list of results by evaluating an expression, NOT by setting "$_". A side effect of this fact is that you should not use "s///" with "map". The "s///" operator changes "$_" rather than returning a result, so you won’t get what you would expect if you use it with "map" (and you CERTAINLY shouldn’t use it with "grep").

Happy programming!

References:
Discussion on PerlMonks
perldoc

Data Structures and Algorithms in Python – Sorting

Bubble Sort Implementation

The bubble sort makes multiple passes through a list. It compares adjacent items and exchanges those that are out of order. Each pass through the list places the next largest value in its proper place. In essence, each item “bubbles” up to the location where it belongs.

  • Regardless of how the items are arranged in the initial list, n−1 passes will be made to sort a list of size n, so 1 pass n-1 comparisons, 2 pass n-2 comparions and n-1 is 1 comparions.
  • A bubble sort is often considered the most inefficient sorting method since it must exchange items before the final location is known. These “wasted” exchange operations are very costly. However, because the bubble sort makes passes through the entire unsorted portion of the list, it has the capability to do something most sorting algorithms cannot. In particular, if during a pass there are no exchanges, then we know that the list must be sorted. A bubble sort can be modified to stop early if it finds that the list has become sorted. This means that for lists that require just a few passes, a bubble sort may have an advantage in that it will recognize the sorted list and stop.
  • Performance: – Worst case: O(n2) n-square – Best case: O(n) – Average case: O(n2) n-square
arr = [2,7,1,8,5,9,11,35,25]
bubble_sort(arr)
print (arr)
[1, 2, 5, 7, 8, 11, 25, 35]
Selection Sort Implementation
  • Selection sort is a in-place algorithm
  • It works well with small files
  • It is used for sorting the files with large values and small keys this is due to the fact that selection is based on keys and swaps are made only when required
  • The selection sort improves on the bubble sort by making only one exchange for every pass through the list. In order to do this, a selection sort looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location. As with a bubble sort, after the first pass, the largest item is in the correct place. After the second pass, the next largest is in place. This process continues and requires n−1 passes to sort n items, since the final item must be in place after the (n−1) st pass
  • Performance: – Worst case: O(n2) n-square – Best case: O(n) – Average case: O(n2) n-square – worst case space complexity: O(1)
arr = [2,7,1,8,5,9,11,35,25]
selection_sort(arr)
print (arr)
[1, 2, 5, 7, 8, 11, 25, 35]

 

Insertion Sort Implementation

Insertion sort always maintains a sorted sub list in the lower portion of the list Each new item is then “inserted” back into the previous sublist such that the sorted sub list is one item larger complexity O(n2) square

arr = [2,7,1,8,5,9,11,35,25]
insertion_sort(arr)
print (arr)
[1, 2, 5, 7, 8, 11, 25, 35]

 

Merge Sort Implementation

Merge sort is a recursive algorithm (example of divide and conquer) that continually splits a list in half.

  • If the list is empty or has one item, it is sorted by definition (the base case).
  • If the list has more than one item, we split the list and recursively invoke a
  • Merge sort on both halves.
  • Once the two halves are sorted, the fundamental operation, called a merge, is performed.
  • Merging is the process of taking two smaller sorted lists and combining them
  • together into a single, sorted, new list.
  • This algorithm is used to sort a linked list
  • Performance: – Worst case: O(nlog n) – Best case: O(nlog n) – Average case: O(nlog n)
arr = [11,2,5,4,7,6,8,1,23]
merge_sort(arr)
print (arr)
[1, 2, 4, 5, 6, 7, 8, 11, 23]

 

Quick Sort Implementation

The quick sort uses divide and conquer to gain the same advantages as the merge sort, while not using additional storage also known as “partition exchange sort”.

  • As a trade-off, however, it is possible that the list may not be divided in half.
  • When this happens, we will see that performance is diminished.
  • A quick sort first selects a value, which is called the pivot value.
  • The role of the pivot value is to assist with splitting the list.
  • The actual position where the pivot value belongs in the final sorted list, commonly called the split point, will
    be used to divide the list for subsequent calls to the quick sort.
  • Performance:
    • Worst case: O(n square)
    • Best case: O(nlog n)
    • Average case: O(nlog n)
arr = [2,7,1,8,5,9,11,35,25]
quick_sort(arr)
print (arr)
[1, 2, 5, 7, 8, 11, 25, 35]

 

Shell Sort Implementation

This is also called diminishing incremental sort

  • The shell sort improves on insertion sort by breaking the original list into a number of smaller sublists.
  • The unique way these sun lists are chosen is the key to the shell sort
  • Instead of breaking the list into sublists of contiguous items, the shell sort uses an increment ”i” to create a sublist by choosing all items that are ”i” items apart.
  • Shell sort is efficient for medium size lists
  • Complexity somewhere between O(n) and O(n2) square
arr = [45,67,23,45,21,24,7,2,6,4,90]
shell_sort(arr)
print (arr)
[2, 4, 6, 7, 21, 23, 24, 45, 45, 67, 90]

Check GitHub for the full working code.

I will keep adding more problems/solutions.

Stay tuned!

Ref:  The inspiration of implementing DS in Python is from this course

Implemeting Data Structures and Algorithms in Python: Problems and solutions

Recently I have started using Python in a lot of places including writing algorithms for MI/data science,  so I thought to try to implement some common programming problems using data structures in Python. As I have mostly implemented in C/C++ and Perl.

Let’s get started with a very basic problem.

Anagram algorithm

An algorithm will take two strings and check to see if they are anagrams. An anagram is when the two strings can be written using the exact same letters, in other words, rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly once

Some examples of anagram:
“dormitory” is an anagram of “dirty room”
“a perfectionist” is an anagram of “I often practice.”
“action man” is an anagram of “cannot aim”

Our anagram check algorithm with take two strings and will give a boolean TRUE/FALSE depends on anagram found or not?
I have used two approaches to solve the problem. First is to sorted function and compare two string after removing white spaces and changing to lower case. This is straightforward.

like


def anagram(str1,str2):
# First we'll remove white spaces and also convert string to lower case letters
str1 = str1.replace(' ','').lower()
str2 = str2.replace(' ','').lower()
# We'll show output in the form of boolean TRUE/FALSE for the sorted match hence return
return sorted(str1) == sorted(str2)

The second approach is to do things manually, this is because to learn more about making logic to check. In this approach, I have used a counting mechanism and Python dictionary to store the count letter. Though one can use inbuilt Python collections idea is to learn a bit about the hash table.

Check GitHub for the full working code.

I will keep adding more problems/solutions.

Stay tuned!

Ref:  The inspiration of implementing DS in Python is from this course

apache lucy search examples

Investigating search engines and this time apache Lucy 0.4.2. I am showing a basic indexer and a small search application. See below code for indexer (This will take documents one by one and then index them). Search module will take arugument as STDIN and then will show the search result.

This is pure command line utility just to show how basic indexing and searching works using apache lucy.

indexer.pl

#!/usr/local/bin/perl

use strict;
use warnings;
use Lucy::Simple;

#
# Ensure the index directory is both available and empty.
#
my $index = "/ppant/LucyTest/index";
system( "rm", "-rf", $index );
system( "mkdir", "-p", $index );
# Create the helper...a new Lucy::Simple object
my $lucy = Lucy::Simple new( path = $index, language = 'en', );

# Add the first "document". (We are mainly adding meta data of the document)
my %one = ( title ="This is a title of first article" , body ="some text inside the body we need to test the implementaion of lucy", id =1 );
$lucy-add_doc( \%one );

# Add the second "document".
my %two = ( title ="This is another article" , body ="I am putting some basic content, using some words which are also in first document like implementation", id =2 );
$lucy add_doc( \%two );

# Both the documents are now indexed in path

One indexing of the documents is done we'll make a small search script.

search.cgi

#!/usr/local/bin/perl

use strict;
use warnings;

use Lucy::Search::IndexSearcher;

my $term = shift || die "Usage: $0 search-term";

my $searcher = Lucy::Search::IndexSearcher new( index ='/ppant/LucyTest/index');
# A basic search command line which will look for indexed items based on STDIN and will show that in which document query string is found and no of hits
my $hits = $searcher hits( query =$term );
while ( my $hit = $hits next ) {
print "Title: $hit {title} - ID: $hit {id}\n";
}
# End of search.cgi


***********************************************************************

If you want to explore more check Full Code on GitHub

RESTfull brief overview

Putting some thoughts mainly for newbies trying to understand REST. Sometimes I observed that the actual documentation on the web is too technical or abstract and scattered which a bit difficult for newbies..  I am writing down some broad points which in my view REST is and it’s equation with HTTP … these points based on my reading and experience working with REST. It doesn’t replace any of REST documentation. Advised to go through references given at the end for detaled study.
  • REST  REpresentational State Transfer is a design pattern/design concept (architecture) used in web applications for managing state information-> REST is not a tool/techology/specification/protocol. In other words, REST isn’t a tangible thing like a piece of software or even a specification, it’s a selection of ideals, of best practices distilled from the HTTP specs.
  •  If you are using HTTP you are being RESTful to some degree since HTTP is a REST protocol, but to take full advantage of the platform, APIs should use RESTful practices as much as possible.
  • We can make our application more RESTful by using the correct HTTP methods.
  • RESTful Web service is required to be stateless in it’s communication between server and client  so you should be able to request almost any format while non-REST mainly SOAP uses XML.
  • URL is the important part of REST. REST is more than GET/POST actually browsers pretty much just GET stuff. They don’t do a lot of other types of interaction with resources. This is a problem because it has led many people to assume that HTTP is just for GETing. But HTTP is actually ageneral purpose protocol for applying http verbs (HTTP verbs (GET, POST, PUT, DELETE, etc.) to nouns (object/web page which has a URL).

Recommended reading:

Paper by Roy Fielding http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

HTTP Specs http://www.w3.org/Protocols/Specs.html

How I Explained REST to My Wife:
http://www.looah.com/source/view/2284

Happy reading!