PDL A way to deal with larger arrays

If you are working in scientific domain or using bioperl where you require to deal with bulk data processing then you should seriously consider learning PDL (“Perl Data Language”). PDL is actually a way to deal with larger arrays in Perl.  It allows large N-dimensional data sets such as large images, spectrogram, etc to be stored efficiently and manipulated quickly.

To say it with the words of Karl Glazebrook, initiator of the PDL project:

    “The PDL concept is to give standard perl5 the ability
    to COMPACTLY store and SPEEDILY manipulate the large
    N-dimensional data sets which are the bread and butter
    of scientific computing. e.g. $a=$b+$c can add two
    2048x2048 images in only a fraction of a second.”

 

PDL is well suited for matrix computations, general handling of multidimensional data, image processing, general scientific computation, and numerical applications. It supports I/O for many popular image and data formats including 1D (line plots), 2D (images) and 3D (volume visualization, surface plots via OpenGL) etc.

Latest stable release of PDL is PDL-2.4.7 and can be found at CPAN or Sourceforge.

Happy learning.

Perl 6 documentation and screencast

Perl 6 is new language from Perl community. As always good documentation plays a key role in understanding new language at faster pace. Perl 6 has plenty of resources on official documentation page available on Perl foundation.

I also found some interesting screen casts by Gabor Szabo which were indeed helpful to understand Perl 6 data structures.

Happy learning!

Data Serialization in Perl

Before I dig more into details let’s try to understand some basic facts about Serialization :

What exactly Serialization is?

Serialization is the process of converting a data structure or object into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object.

Serialization in Perl:

In Perl, the data which is represented as key-value pairs can be stored in a file. This gives persistence to such data so that the data can be retrieved and manipulated by many independent, but possibly related, programs. However, if the structure of the data is more complex,  associative arrays or hashes do not provide adequate representation. For example, if the data item is an anonymous array of arrays, or a hash of hashes, or an object belonging to a certain class, it is not enough to be able to tie a file to a hash in order to be able to save the data to a file. So, we need to perform serialization in such cases which involves converting the contents of any complex data structure into a string following a certain well-designed algorithm.

The string produced by the serialization algorithm can then be deserialized later in the program when and if necessary. The serialized data structure can be sent over socket connections or used in remote procedure calls as long as the receiving end deserializes it using an appropriate algorithm. The serialized string can be stored to a text file, or a DBM file if it can be somehow associated with a key, or can be stored in a database. The only requirement that before using again, the string should be deserialized. There are several Perl modules that provide for serialization and de-serialization facilities. These include FreezeThaw.pm, Storable.pm, Data::Dumper.pm, Data::Denter.pm XML::DumperJSON::XS
etc. A module called Data::Serializer.pm provides a common interface to some of the serialization modules. The default module used by Data::Serializer.pm is Data::Dumper.pm.

In my next post I will try to show some examples.