apache lucy search examples

Investigating search engines and this time apache Lucy 0.4.2. I am showing a basic indexer and a small search application. See below code for indexer (This will take documents one by one and then index them). Search module will take arugument as STDIN and then will show the search result.

This is pure command line utility just to show how basic indexing and searching works using apache lucy.

indexer.pl

#!/usr/local/bin/perl

use strict;
use warnings;
use Lucy::Simple;

#
# Ensure the index directory is both available and empty.
#
my $index = "/ppant/LucyTest/index";
system( "rm", "-rf", $index );
system( "mkdir", "-p", $index );
# Create the helper...a new Lucy::Simple object
my $lucy = Lucy::Simple new( path = $index, language = 'en', );

# Add the first "document". (We are mainly adding meta data of the document)
my %one = ( title ="This is a title of first article" , body ="some text inside the body we need to test the implementaion of lucy", id =1 );
$lucy-add_doc( \%one );

# Add the second "document".
my %two = ( title ="This is another article" , body ="I am putting some basic content, using some words which are also in first document like implementation", id =2 );
$lucy add_doc( \%two );

# Both the documents are now indexed in path

One indexing of the documents is done we'll make a small search script.

search.cgi

#!/usr/local/bin/perl

use strict;
use warnings;

use Lucy::Search::IndexSearcher;

my $term = shift || die "Usage: $0 search-term";

my $searcher = Lucy::Search::IndexSearcher new( index ='/ppant/LucyTest/index');
# A basic search command line which will look for indexed items based on STDIN and will show that in which document query string is found and no of hits
my $hits = $searcher hits( query =$term );
while ( my $hit = $hits next ) {
print "Title: $hit {title} - ID: $hit {id}\n";
}
# End of search.cgi


***********************************************************************

If you want to explore more check Full Code on GitHub

Citrus Perl Raspberry Pi dev

Anyone interested in GUI Perl dev in Pi? Please go through the link here and download the distribution from sourceforge project site.

I am using Citrus Perl on Pi (Raspbian Wheezy OS) for quite some time ..no major issue.

 

Enjoy GUI dev on Pi .

 

Links:

http://raspberrypi.citrusperl.com/perl

http://www.citrusperl.com/

 

Out of memory Cent OS Linux: unicode_start hang issue: Solved

My Cent OS 5.6 RHEL based virtual machine has suddenly became very slow ..almost died 🙁 after few min I saw a message on console related to “Out of memory“, after doing a bit analysis and search I found that this is a known issue and also reported in redhat bug tracker (obviously got resolved in upstream versions).
Reason: Problem is that setting BASH_ENV=~/.bashrc runs /etc/profile, which sources /etc/profile.d/lang.sh. If TERM=linux, lang.sh runs /bin/unicode_start sending the whole process into an infinite loop.

I found two solutions (as of now)

1. Open vi /etc/sysconfig/i18n this file contains a reference to unicode (UTF-8). We have take out all the references to UTF-8)

Code before:


LANG="en_US.UTF-8"
SUPPORTED="en_US.UTF-8:en_US:en"
SYSFONT="latarcyrheb-sun16"

Code after:


LANG="en_US"
SUPPORTED="en_US:en"
SYSFONT="latarcyrheb-sun16"

2. Another solution is to change the shell from “bash” to “sh” in unicode_start script
We need to open the vi /bin/unicode_start and from first line change “#!/bin/bash” to “#!/bin/sh“.

References: https://bugzilla.redhat.com/show_bug.cgi?id=622981

Solving the authentication problem while opening Office documents hosted on Apache in IE8/IE9 on Windows 7

We were facing a problem in IE 8/9 on Windows 7 while accessing  Office 2007/ Office 2010 documents hosted on apache/Cent OS 4.6. After some analysis I found the reason and finally ended in a fix. See below my findings and solution. Hope this helps:

The main issue is with the Microsoft’s way of implementing Webdav protocol for accessing web content through Microsoft Web Client. When we click on a Office document then web client  sends HTTP /1.1 OPTIONS Request header to server to check the WebDav communication (My server doesn’t have WebDav). In response Apache return 200 OK Response header to Web Client which results in prompting the authentication screen by Windows 7.  Well you have option in IE to pass the authentication login automatically but that would be security breach as you will be exposing your machine authentication to internet so I would not prefer that. Best way is to configure Apache to reject these request. This is how i have solved. These changes needs to be done in httpd.conf file in /etc/httpd/conf folder (Cent OS 4.6)

# One way to doing it – Deny access based on request method


RewriteEngine On
RewriteCond %{REQUEST_METHOD} ^(OPTIONS|PROPFIND)$ [NC]
RewriteRule ^.*$ - [F,L]

# Another way to implementing – Deny acess based on user agent (Vista and Windows 7 used same user agent with different version so this Regx shall work for both

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Microsoft-WebDAV-MiniRedir
RewriteRule ^.*$ - [F,L]

Explanation on Flags:

1. [F] flag causes the server to return a 403 Forbidden status code to the client.

2. Use of the [NC] flag causes the RewriteRule to be matched in a case-insensitive manner. That is, it doesn’t care whether letters appear as upper-case or lower-case in the matched URI.

3. The [L] flag causes mod_rewrite to stop processing the rule set. In most contexts, this means that if the rule matches, no further rules will be processed. This corresponds to the last command in Perl.

 Some References:

Microsoft knowledge article on authentication requests from office documents 

Apache mod_rewrite rule documentaion

fiddler tool for debugging HTTP requests

Installing Microsoft TrueType fonts (TTF) on CentOS 4.6 and RHEL 4

Default Linux installation (Cent OS in this case) doesn’t contain true type fonts. The application like open office, PDF generators requires proper fonts to embed into if not it will use the free system fonts which can create a lot of issues like  pdf, not showing content properly etc. It’s always good idea to install ms core fonts Well you can buy more fonts as per your requirement if you want.
To install the ms core fonts follow the below steps (Login as a ROOT):
Install RPM: Change the directory to download folder and run the follwing command:
#rpm -ivh cabextract-0.6-1.i386.rpm

Create RPM: Change the directory to download folder and run the follwing command
#rpmbuild -bb msttcorefonts-1.3-4.spec

(This step will download Microsoft CAB files and extracts the fonts and builds an RPM.  This will use system utilities [wget, rpm-build, chkfontpath, fc-cache, ttmkfdir] and also check that http port 80 opened or not. This process will download the executable for all the font files.
This step will create RPM in /usr/src/redhat/RPMS/noarch/msttcorefonts-1.3-4.noarch.rpm
  • Installing RPM: Change the directory to /usr/src/redhat/RPMS/noarch/ and run the follwing command
# rpm -ivh msttcorefonts-1.3-4.noarch.rpm

  • Restart X server:
/sbin/service xfs restart

Now you can check the newly installed fonts on /usr/X11R6/lib/X11/fonts/TTF
Important Note:  Sometimes you might need to modify the msttcorefonts-1.3-4.spec file  for adding new address of mscorefonts location in sourceforge.net.  If you still face problems downloading font file then you can use my own font RPM [msttcorefonts-1.3-4.noarch.rpm].  You can download and install directly using # rpm -ivh msttcorefonts-1.3-4.noarch.rpm.
Good Luck!

Enjoy new fonts in your Linux machine.

Apache JIRA compromised

This is how apache JIRA is hacked

Browse the link to see the details https://blogs.apache.org/infra/entry/apache_org_04_09_2010

If any of you have account on apache.org for various projects then you might have received mail from them to change the password. If not then change your password for apache account (JIRA, Bugzilla etc)

I must say this is a big learning. Let’s be a bit more alert.