WebDAV: an old horse but still useful!

My first encounter with WebDav protocol back in 2004-05 when I was writing the web part for SharePoint. WebDav is an old protocol of the ’90s but still very useful in certain scenarios.

WebDAV (RFC 4918) is an extension to HTTP, the protocol that web browsers and web servers use to communicate with each other. The WebDAV protocol enables a webserver to behave like a fileserver too, supporting collaborative authoring of web content. For an example, one can edit a word document online directly on the server using WebDav protocol. A web server that supports WebDAV simultaneously works like a fileserver. So, that’s a powerful capability.

In many of its use cases, WebDAV is being replaced by more modern mechanisms like Wikis, cloud solutions, etc. But it is still a reliable workhorse when the right servers and clients are matched, so it’s still encountered in many different applications.

Some of the servers which have implemented WebDav:

  • Apache HTTP Server
  • Microsoft IIS
  • Box.com
  • WordPress
  • Drupal
  • Microsoft Sharepoint
  • Subversion
  • Git
  • Microsoft Office
  • Apple iWork
  • Adobe Photoshop etc.

References:
WebDav RFC: https://tools.ietf.org/html/rfc4918

Reinforcement Learning Explained in brief for a layperson

As we know, Machine Learning algorithms can broadly be divided into 3 main categories:

  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning (RL)

Let’s understand in layman term what is reinforcement learning. The main thing RL does is Learning Control – This is neither supervised or unsupervised learning but typically these are problems where you are learning to control the behavior of a system.

Example:

How to cycle.
Remember the days when you are trying to ride a cycle…. It’s trial and error. Actually, it is some kind of feedback which is not fully unsupervised. So we can say that this is a type of learning where you are trying to control the system with trial and error and with minimum feedback. RL learns from the close interaction with the environment, close interaction means in this context is that an agent senses that state of the environment and takes the appropriate action. So the agent takes feedback from the close environment and we typically assume that the environment is stochastic means every time you take action you are not getting the same response from the env.

Apart from the feedback, there is an evaluation measure from the env which tells how well you are performing in a particular task. So each Reinforcement learning algorithm’s goal is to implement a policy that maximizes some measure of long term performance.

Just to summarize:

Reinforcement learning algorithm:

  • Learn from close interaction
  • Stochastic environment
  • Noisy delayed scalar evaluation
  • Learn policy – Maximize a measure of long term performance

Some applications:        

  • Game playing  – Games like backgammon (One of the oldest board game), Atari
  • Robot navigation
  • Helicopter pilot      
  • VLSI placement 

This was a brief introduction to RL for an easy understanding of the concept. For further study look for a good book or course.

My recommendation:

Happy learning!

elasticserach FScrawler intermediate file location? help

I am looking for a method that can intercept the output of FSCrwaler before this gets passed to the elastic engine for indexing.

Just take a case, what I have is a PDF file which I wanted to index plus I also have some attributes/metadata for the same PDF which are stored in the database. I want to index both the content (PDF file content + attributes stored in PostgreSQL) in a single index file so that I can refine my search criteria and get correct results. As far I understand for PDF indexing text needs to be extracted from the file and stored in a text format and then passed to the indexing engine. I am relatively new (an HTDIG veteran -:D ) to the elastic ecosystem so looking for a way to intercept the text extracted from a PDF file so that I can append other text (in this scenario fetched from PostgreSQL) and then pass to the indexing mechanism, a kind of single index file for both the content.

Wondering if anyone has encountered this kind of scenario and can provide some pointers? It will be good to know that where does FSCrawler stores intermediate files created during the indexing process in elastic search? Can we intercept them and add some custom info?

Comments will be much appreciated!

Thanks!

Debugging a crucial skill but very rarely taught

Listening on software engineering radio podcast Diomidis Spinellis mentions how debugging is so much important in software development but still, we don’t teach much in-depth this skill in our universities. I and believe any other programmer will agree that debugging tools are the key arsenal in fixing bugs and even understanding the system.

Either you use modern tools or just by basic print/printf statements that don’t matter. Students should learn these key skills and professors should emphasize on educating and not only in universities even in industry set-up when a new developer joins in there should be good exposure to debugging so that they dissect code base and become productive fast.

Worth considering I think …

What do you think? Please share in comments.

Date print formats Perl Date::Manip

Perl Date::Manip is one of the modules which I use a lot. It’s a wonderful lib and has very clean API with great documentation. Below is a quick look at Date::Manip print format options which sometimes is very handy. For detailed interpretation and other options encourage to go through Date::Manip on CPAN

Example:

my $present_date_hash = Date::Manip::Date->new("today");
my $present_date = $present_date_hash->printf("%Y-%m-%d %H:%M:%S");

Happy Coding!

also, can check my previous article on generating date patterns using Date::Manip

Algorithms performance: big O notation: simplified short notes

 The big O notation is used to analyze runtime time complexity. big O notation provides an abstract measurement by which we can judge the performance of algorithms without using mathematical proofs. Some of the most common big O notations are:

  • O(1) : constant: the operation doesn’t depend on the size of its input, e.g. adding a node to the tail of a linked list where we always maintain a pointer to the tail node.
  • O(n): linear: the run time complexity is proportionate to the size of n.
  • O(log n): logarithmic: normally associated with algorithms that break the problem into similar chunks per each invocation, e.g. searching a binary search tree.
  • O(n log n): just n log n: usually associated with an algorithm that breaks the problem into smaller chunks per each invocation, and then takes the results of these smaller chunks and stitches them back together, e.g, quicksort.
  • O(n2): quadratic: e.g. bubble sort.
  • O(n3): cubic: very rare
  • O(2n): exponential: incredibly rare.

Brief explanation:     
Cubic and exponential algorithms should only ever be used for very small problems (if ever!); avoid them if feasibly possible. If you encounter them then this is really a signal for you to review the design of your algorithm always look for algorithm optimization particularly loops and recursive calls. 

The biggest asset that big O notation gives us is that it allows us to essentially discard things like hardware means if you have two sorting algorithms, one with a quadric run time and the other with a logarithmic run time then logarithmic algorithm will always be faster than the quadratic one when the data set becomes suitably large. This applies even if the former is ran on a machine that is far faster than the latter, Why?

Because big O notation isolates a key factor in algorithm analysis: growth. An algorithm with quadratic run time grows faster than one with logarithmic run time.

Note: The above notes are for quick reference. Understanding algorithmic performance is a complex but interesting field. I would recommend picking a good book to understand the nitty-gritty of big O and other notations.

Python learning: scrapbook

While browsing my Evernote I found a scrapbook which I have made while learning Python some years back. Thought to share if this helps someone. I am pasting directly (no editing so there might be some spell and grammar mistake).

  • Python 2 division, just use integer part (3/2=1) whereas Python 3 uses real division 3/2 = 1.5
  • Strings in Python are immutable means you can’t change the in-place value of a char. Once string is created you can’t change/replace its elements
  •  s= “Hello World” s[::-1] this will reverse string s “dlroW olleH” double colon is used to tell the range and also how many elements can be skipped
  • if you want to use Python 3 functions in Python 2 then use ‘from __future__ import print_function‘ and similarly other functions 
  • List are mutable but tuples are not mutable (does not support item assignment) aka immutable, fewer methods in tuples then why to use instead of a list? The key is immutability. in a program if you want sequence/Val does not to get changed then tuple is a solution e.g.; storing calendar dates which know will not change during your programs. 
  • Set is a collection of un-ordered unique items it looks like a dictionary (in notation) but only keys which are unique. It can help in removing repeated items means you can use set to cast list.
  • List comprehensive are an excellent way to write clean and efficient code – they are actually de-constructed for loop flatted out in a list
  • Lambda expressions can be used to shorten function this is really useful when used with map(), reduce() and filter() functions
  • First class functions: Treat functions like any other object, we can pass functions, we can return functions, we can assign functions to a variable
  • Closure: Closure takes advantage of first-class functions and returns inner functions and variables local to them.
  • Decorators: It is a function which takes another function as an argument and returns as a function without changing the source code of the original function. Decorator allows easily to add functionality inside our wrapper without modifying original function. 

Note: These are notes for quick reference. If you are serious in learning Python I encourage you to take a book or a tutorial.

Enjoy learning! More to come …

Perl development journey and books collection

While doing routine cleaning of my personal library I was surprised to see the Perl book collection I have made over the period of time. My Perl dev journey started in a full-fledged manner way back in fall 2007. Prior to that was mainly developing using C, C++, assembly language. My first impression with Perl was not very exciting mainly due to ugly syntax and the way the OO is achieved and being from C++ background initially it was really difficult to grasp. But over the years working with language and while developing a large scale web application I learned a lot of nitty-gritty of the language and still learning…  Today I can vouch for Perl for its speed, portability, great module system CPAN and excellent dedicated community. Thanks to all the module authors and contributors on Perl Monks and StackOverflow. You guys are amazing! 
Now, the books which helped me immensely to wrote better Perl programs.

New coder and Experienced coders

I’ve found that a big difference between new coders and experienced coders is faith: faith that things are going wrong for a logical and discoverable reason, faith that problems are fixable, faith that there is a way to accomplish the goal. The path from “not working” to “working” might not be obvious, but with patience, you can usually find it.

Read the above quote somewhere on medium, myself as a programmer can easily sync with the saying. Have observed this many many times in my coding experience of the last two decades. To add, patience is needed much more while dealing with critical real-time production bugs.

I personally feel that analysis of the problem with a cool head and focus and pateince is is the key to debug and resolve critical problems.

Happy Coding!