Data Structures and Algorithms in Python – Graphs

Graph Implementation – Adjacency list

  • We’ve used dictionaries to implement the adjacency list in Python which is the easiest way.
  • To implement Graph ADT we’ll create two classes, Graph, which holds the master list of vertices, and Vertex, which will represent each vertex in the graph.
  • Each Vertex uses a dictionary to keep track of the vertices to which it is connected, and the weight of each edge. This dictionary is called connectedTo.
# Create six vertices numbered 0 through 5. 
# Display the vertex dictionary
g = Graph()
for i in range(6):
    g.addVertex(i)
print(g.vertList)

# Add the edges that connect the vertices together
g.addEdge(0,1,5)
g.addEdge(0,5,2)
g.addEdge(1,2,4)
g.addEdge(2,3,9)
g.addEdge(3,4,7)
g.addEdge(3,5,3)
g.addEdge(4,0,1)
g.addEdge(5,4,8)
g.addEdge(5,2,1)
# Nested loop verifies that each edge in the graph is properly stored. 
for v in g:
   for w in v.getConnections():
       print("( %s , %s )" % (v.getId(), w.getId()))

Graph Implementation – Solving Word Ladder Problem using Breadth First Search (BFS)

let’s consider the following puzzle called a word ladder. Transform the word “FOOL” into the word “SAGE”. In a word ladder puzzle you must make the change occur gradually by changing one letter at a time. At each step you must transform one word into another word, you are not allowed to transform a word into a non-word. The following sequence of words shows one possible solution to the problem posed above.

  • FOOL
  • POOL
  • POLL
  • POLE
  • PALE
  • SALE
  • SAGE

This is implemented using dictionary

# The Graph class, contains a dictionary that maps vertex names to vertex objects.
# Graph() creates a new, empty graph.
Graph()   

buildGraph()

#BFS begins at the starting vertex s and colors start gray to show that 
#it is currently being explored. Two other values, the distance and the 
#predecessor, are initialized to 0 and None respectively for the starting
#vertex. Finally, start is placed on a Queue. The next step is to begin 
#to systematically explore vertices at the front of the queue. We explore 
#each new node at the front of the queue by iterating over its adjacency 
#list. As each node on the adjacency list is examined its color is 
#checked. If it is white, the vertex is unexplored, and four things happen:
#	* The new, unexplored vertex nbr, is colored gray.
#	* The predecessor of nbr is set to the current node currentVert
#The distance to nbr is set to the distance to currentVert + 1
#nbr is added to the end of a queue. Adding nbr to the end of the queue 
#effectively schedules this node for further exploration, but not until all the 
#other vertices on the adjacency list of currentVert have been explored.

bfs()

Graph Implementation – Solving Knight tour problem using Depth First Search (DFS)

The knight’s tour puzzle is played on a chess board with a single chess piece, the knight. The object of the puzzle is to find a sequence of moves that allow the knight to visit every square on the board exactly once. One such sequence is called a “tour.”
we will solve the problem using two main steps: Represent the legal moves of a knight on a chessboard as a graph. Use a graph algorithm to find a path of length rows×columns−1rows×columns−1 where every vertex on the graph is visited exactly once. To represent the knight’s tour problem as a graph we will use the following two ideas: Each square on the chessboard can be represented as a node in the graph. Each legal move by the knight can be represented as an edge in the graph.

# The Graph class, contains a dictionary that maps vertex names to vertex objects.
# Graph() creates a new, empty graph.
Graph()

# To represent the knight’s tour problem as a graph we will use the 
# following two ideas: Each square on the chessboard can be represented 
# as a node in the graph. Each legal move by the knight can be represented
# as an edge in the graph. 

knightGraph()

# The genLegalMoves function takes the position of the knight on the 
# board and generates each of the eight possible moves. The legalCoord 
# helper function makes sure that a particular move that is generated is 
# still on the board.
genLegalMoves()

# DFS implementation
        
# we will look at two algorithms that implement a depth first search. 
# The first algorithm we will look at directly solves the knight’s tour 
# problem by explicitly forbidding a node to be visited more than once. 
# The second implementation is more general, but allows nodes to be visited 
# more than once as the tree is constructed. The second version is used in 
# subsequent sections to develop additional graph algorithms.

# The depth first exploration of the graph is exactly what we need in 
# order to find a path that has exactly 63 edges. We will see that when 
# the depth first search algorithm finds a dead end (a place in the graph 
# where there are no more moves possible) it backs up the tree to the next
# deepest vertex that allows it to make a legal move.
        
# The knightTour function takes four parameters: n, the current depth in 
# the search tree; path, a list of vertices visited up to this point; u, 
# the vertex in the graph we wish to explore; and limit the number of nodes 
# in the path. The knightTour function is recursive. When the knightTour 
# function is called, it first checks the base case condition. If we have 
# a path that contains 64 vertices, we return from knightTour with a status 
# of True, indicating that we have found a successful tour. If the path is not 
# long enough we continue to explore one level deeper by choosing a new vertex 
# to explore and calling knightTour recursively for that vertex.

# DFS also uses colors to keep track of which vertices in the graph have been visited. 
# Unvisited vertices are colored white, and visited vertices are colored gray. 
# If all neighbors of a particular vertex have been explored and we have not yet reached 
# our goal length of 64 vertices, we have reached a dead end. When we reach a dead end we 
# must backtrack. Backtracking happens when we return from knightTour with a status of False. 
# In the breadth first search we used a queue to keep track of which vertex to visit next. 
# Since depth first search is recursive, we are implicitly using a stack to help us with 
# our backtracking. When we return from a call to knightTour with a status of False, in line 11, 
# we remain inside the while loop and look at the next vertex in nbrList.

knightTour()

Please check GitHub for the full working code.

I will keep adding more problems/solutions.

Stay tuned!

Ref:  The inspiration of implementing DS in Python is from this course

Data Structures and Algorithms in Python – Sorting

Bubble Sort Implementation

The bubble sort makes multiple passes through a list. It compares adjacent items and exchanges those that are out of order. Each pass through the list places the next largest value in its proper place. In essence, each item “bubbles” up to the location where it belongs.

  • Regardless of how the items are arranged in the initial list, n−1 passes will be made to sort a list of size n, so 1 pass n-1 comparisons, 2 pass n-2 comparions and n-1 is 1 comparions.
  • A bubble sort is often considered the most inefficient sorting method since it must exchange items before the final location is known. These “wasted” exchange operations are very costly. However, because the bubble sort makes passes through the entire unsorted portion of the list, it has the capability to do something most sorting algorithms cannot. In particular, if during a pass there are no exchanges, then we know that the list must be sorted. A bubble sort can be modified to stop early if it finds that the list has become sorted. This means that for lists that require just a few passes, a bubble sort may have an advantage in that it will recognize the sorted list and stop.
  • Performance: – Worst case: O(n2) n-square – Best case: O(n) – Average case: O(n2) n-square
arr = [2,7,1,8,5,9,11,35,25]
bubble_sort(arr)
print (arr)
[1, 2, 5, 7, 8, 11, 25, 35]
Selection Sort Implementation
  • Selection sort is a in-place algorithm
  • It works well with small files
  • It is used for sorting the files with large values and small keys this is due to the fact that selection is based on keys and swaps are made only when required
  • The selection sort improves on the bubble sort by making only one exchange for every pass through the list. In order to do this, a selection sort looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location. As with a bubble sort, after the first pass, the largest item is in the correct place. After the second pass, the next largest is in place. This process continues and requires n−1 passes to sort n items, since the final item must be in place after the (n−1) st pass
  • Performance: – Worst case: O(n2) n-square – Best case: O(n) – Average case: O(n2) n-square – worst case space complexity: O(1)
arr = [2,7,1,8,5,9,11,35,25]
selection_sort(arr)
print (arr)
[1, 2, 5, 7, 8, 11, 25, 35]

 

Insertion Sort Implementation

Insertion sort always maintains a sorted sub list in the lower portion of the list Each new item is then “inserted” back into the previous sublist such that the sorted sub list is one item larger complexity O(n2) square

arr = [2,7,1,8,5,9,11,35,25]
insertion_sort(arr)
print (arr)
[1, 2, 5, 7, 8, 11, 25, 35]

 

Merge Sort Implementation

Merge sort is a recursive algorithm (example of divide and conquer) that continually splits a list in half.

  • If the list is empty or has one item, it is sorted by definition (the base case).
  • If the list has more than one item, we split the list and recursively invoke a
  • Merge sort on both halves.
  • Once the two halves are sorted, the fundamental operation, called a merge, is performed.
  • Merging is the process of taking two smaller sorted lists and combining them
  • together into a single, sorted, new list.
  • This algorithm is used to sort a linked list
  • Performance: – Worst case: O(nlog n) – Best case: O(nlog n) – Average case: O(nlog n)
arr = [11,2,5,4,7,6,8,1,23]
merge_sort(arr)
print (arr)
[1, 2, 4, 5, 6, 7, 8, 11, 23]

 

Quick Sort Implementation

The quick sort uses divide and conquer to gain the same advantages as the merge sort, while not using additional storage also known as “partition exchange sort”.

  • As a trade-off, however, it is possible that the list may not be divided in half.
  • When this happens, we will see that performance is diminished.
  • A quick sort first selects a value, which is called the pivot value.
  • The role of the pivot value is to assist with splitting the list.
  • The actual position where the pivot value belongs in the final sorted list, commonly called the split point, will
    be used to divide the list for subsequent calls to the quick sort.
  • Performance:
    • Worst case: O(n square)
    • Best case: O(nlog n)
    • Average case: O(nlog n)
arr = [2,7,1,8,5,9,11,35,25]
quick_sort(arr)
print (arr)
[1, 2, 5, 7, 8, 11, 25, 35]

 

Shell Sort Implementation

This is also called diminishing incremental sort

  • The shell sort improves on insertion sort by breaking the original list into a number of smaller sublists.
  • The unique way these sun lists are chosen is the key to the shell sort
  • Instead of breaking the list into sublists of contiguous items, the shell sort uses an increment ”i” to create a sublist by choosing all items that are ”i” items apart.
  • Shell sort is efficient for medium size lists
  • Complexity somewhere between O(n) and O(n2) square
arr = [45,67,23,45,21,24,7,2,6,4,90]
shell_sort(arr)
print (arr)
[2, 4, 6, 7, 21, 23, 24, 45, 45, 67, 90]

Check GitHub for the full working code.

I will keep adding more problems/solutions.

Stay tuned!

Ref:  The inspiration of implementing DS in Python is from this course

Implemeting Data Structures and Algorithms in Python: Problems and solutions

Recently I have started using Python in a lot of places including writing algorithms for MI/data science,  so I thought to try to implement some common programming problems using data structures in Python. As I have mostly implemented in C/C++ and Perl.

Let’s get started with a very basic problem.

Anagram algorithm

An algorithm will take two strings and check to see if they are anagrams. An anagram is when the two strings can be written using the exact same letters, in other words, rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly once

Some examples of anagram:
“dormitory” is an anagram of “dirty room”
“a perfectionist” is an anagram of “I often practice.”
“action man” is an anagram of “cannot aim”

Our anagram check algorithm with take two strings and will give a boolean TRUE/FALSE depends on anagram found or not?
I have used two approaches to solve the problem. First is to sorted function and compare two string after removing white spaces and changing to lower case. This is straightforward.

like


def anagram(str1,str2):
# First we'll remove white spaces and also convert string to lower case letters
str1 = str1.replace(' ','').lower()
str2 = str2.replace(' ','').lower()
# We'll show output in the form of boolean TRUE/FALSE for the sorted match hence return
return sorted(str1) == sorted(str2)

The second approach is to do things manually, this is because to learn more about making logic to check. In this approach, I have used a counting mechanism and Python dictionary to store the count letter. Though one can use inbuilt Python collections idea is to learn a bit about the hash table.

Check GitHub for the full working code.

I will keep adding more problems/solutions.

Stay tuned!

Ref:  The inspiration of implementing DS in Python is from this course

Choropleth Maps in Python

Choropleth maps are a great way to represent geographical data. I have done a basic implementation of two different data sets. I have used jupyter notebook to show the plots.

World Power Consumption 2014

First do Plotly imports

import plotly.graph_objs as go
from plotly.offline import init_notebook_mode,iplot
init_notebook_mode(connected=True)

Next step is to fetch the dataset, we’ll use Python pandas library to read the read the csv file

import pandas as pd
df = pd.read_csv('2014_World_Power_Consumption')

Next, we need to create data and layout variable which contains a dict

data = dict(type='choropleth',
locations = df['Country'],
locationmode = 'country names', z = df['Power Consumption KWH'],
text = df['Country'], colorbar = {'title':'Power Consumption KWH'},
colorscale = 'Viridis', reversescale = True)

Let’s make a layout

layout = dict(title='2014 World Power Consumption',
geo = dict(showframe=False,projection={'type':'Mercator'}))

Pass the data and layout and plot using iplot

choromap = go.Figure(data = [data],layout = layout)
iplot(choromap,validate=False)

The output will be be like below:

Check github for full code.

In next post I will try to make a choropleth for a different data set.

 References: 

https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp

      https://plot.ly/python/choropleth-maps/

Web development: LAMP: which programming languages should be used: Some thoughts

Now a days people keep asking which technology stack to be used for web development (LAMP, Java, Microsoft) and finally which programming language mainly server-side. Most of the expert says that use whichever you like and comfortable and I totally agree. If you intend to use Java and Microsoft based env then you don’t have much choice but if you are using LAMP stack then you have a lot of options so question again arises which language should be used? Again, I personally think that decision should mainly on based on the requirement, experience, comfort, team etc. Still here is my take based on my little own experiences working with languages:

Perl:
Pros: Old fellow still widely used, Very powerful, secure, well tested over the years in web dev, very good market repo among users, huge collection of open source libraries, new framework like Dancer, Mojolicious are positive sign.
Cons: Difficult to maintain (dirty syntax etc), Hard to get resources, industry is not very positive about its future versions.

Python:
Pros: Powerful, widely used in handling scientific data, academics, analytics, system administrators, Market sentiment is positive, Very good framework like Django.
Cons: Less flexible, performance issues mainly threading.

PHP:
Pros: Most preferred language, widely used, fast development, big community, huge available resource pool.
Cons: Some reported security loopholes, Less trustworthy, Market image as cheap and dirty option for quick development, multi-threading issue, debugging issues.

Ruby:
Pros: Very flexible, good support, positive image in communities, Very popular framework for web development (ROR).
Cons: Some benchmarks shows that its request-response time is a bit slow than others in same category, Getting good resources can be difficult.

Again few things differ project to project so choose based on your own requirement.

I personally prefer Perl 5.