#! /usr/bin/perl -w

use strict;
#use lib ".";
use MRML::Server::CLIAdapter;
use Getopt::Long;

=pod

=head1 NAME

CLIAdapter.pl - Adapting a command line based tool to MRML

=head1 SYNOPSIS

perl CLIAdapter.pl --config-file Misc/cli-adapter.config \
                   --image-list Misc/list-of-images.txt 

=head1 DESCRIPTION

The goal of this tool is to turn a Content Based Image Retrieval 
(CBIRS) tool that is able to do Query By Example into an MRML server. 
For using CLIAdapter.pl, you will have to edit configuration file.
You find it in this distribution in Misc/cli-adapter.config, and the 
documentation of the configuration file format will be given below.

After startup, the CLIAdapter.pl will listen on the socket 10101 for
incoming MRML signals and will answer communication requests etc. such 
that the client will be satisfied. I<Queries> will be translated into
calls for your command line tool. I<Results> will be collected from your
command line tool and then tranlated into MRML responses that will be sent
to the MRML client.


=head1 MOTIVATION

At the origin of this tool is the Benchathlon contest. The Benchathlon
is an international initiative working on providing a common benchmark
for CBIR tools. Such a benchmark draws its value from wide use. In
particular, we want command-line tools to be comparable to
client-server retrieval systems.

Furthermore, we want overhead to be low. We want small groups focusing
on image retrieval algorithms being able to compete with industrial
players and large research groups that are able to afford full-time
software engineers.

=head2 SURVEY: PLEASE CONTRIBUTE!

The next step in making this tool is a serial mail that the author of
this file sent out to a number of CBIRS professionals. There were only
two detailed responses. One concerning a Client/Server based tool, and
one concerning a command line based tool. Our contact person for the
command line based tool was Dr. Tuytelaars from Prof. van Gools group
at the KU Leuven in Belgium (Thank you very much!).

If you are not content what with what this tool offers, please do not
hesitate to describe your system to the wolfgang.mueller@cui.unige.ch,
and we will try to extend this system to suit your needs.

=head1 OPTIONS

Mandatory options:

=over 4

=item 

--config-file <CONFIGFILE>
The file given in CONFIGFILE has to contain a valid configuration file
as described in the "CONFIGURATION" section.


=item 

--image-list <IMAGELIST>
The file given by <IMAGELIST> contains a list of image URLs, one URL per line.
These URLs should be I<remote> image URLs that should be accessible by a 
web client. The term I<remote URL> is to be understood in the same sense
as in the description of the variable REMOTE_URL_PREFIX given in the
"CONFIGURATION" section.

=back

OPTIONS that have default values:

=over 4

=item

--host <HOST>
The server will accept connections to <HOST>


=item

--port <PORT>
The port to which the server is listening

=item

--temporary-directory <TMPDIR>
You have to have write rights to <TMPDIR>. we will use this directory to write 
temporary files.

=back

=head1 CONFIGURATION

=head2 CONFIGURATION FILE FORMAT

=head3 BASICS

In the following, we will describe how a CLIAdapter can be configured
using the cli-adapter.config file. Here we quickly summarise the
format. 

Each line is either 

  empty

  a comment (if it starts with a "#")

  a variable definition.

Variable definitions start with the variable name (this name I<must>
start at the fist column of the line), I<immediately> (no whitespace!) 
followed by a colon (":") and then I<immediately> following the value
assigned to the variable. Any whitespace in the line will be taken
into account.

Each variable definitions can contain I<variable expansions>,
i.e. uses of previously defined variables. CLIAdapter identifiers
preceded by the dollar sign $ will be expanded, all other sequences of
alphanumeric characters are interpreted as strings and remain unchanged.

=head3 RESERVED VARIABLES

The following variables are to be set by you, the user of the
CLIAdapter.pl. Exceptions to the rule (i.e. variables set by the
system) are explicitly marked.

The following variables are related to I<query formulation>:

=over 4

=item *  

C<COMMAND_LINE_TO_START> contains a template of the command
line that is to be executed by CLIAdapter.pl on each query. We can use
here any variable that has been defined within the config file, and
the reserved variables C<QUERY> and C<TMP_FILE>. After the command
line, CLIAdapter will expect the results to be in the file named
by $TMP_FILE.

=item *

C<QUERY> contains the complete query string assembled from C<QUERY_ITEM>s.

=item *

C<MAX_NUMBER_OF_QUERY_IMAGES_PER_CALL> 
retrieval systems that allow only one query
image, choose "1" here. If you allow an arbitrary
number of images, take something big here. 9999 should
be enough for the Benchathlon

=item *

C<REMOTE_URL_PREFIX>:
when a query comes in, this prefix is 
REMOVED from the front of all query image URLs.

=item *

C<LOCAL_URL_PREFIX> this prefix *can* be prepended to 
to query image URLs after the remote url prefix
has been removed. If this is done depends on the 
C<QUERY_ITEM> template.

=item * 

C<QUERY_ITEM>: in query by example, a query will consist of a list of
images, possibly with assigned relevances. We call such a group of
values (e.g. URL, relevance) a I<query item>. We consider that each
query will be expressed command line as a string of query
items. C<QUERY_ITEM> is a template that describes each query item. For
this.

Currently we can use in C<QUERY_ITEM> two special variables that are
set by the system for each query item:

=over 4

=item -

C<URL_POSTFIX> is the I<URL postfix>, i.e. the query image URL with
the C<REMOTE_URL_PREFIX> removed. 

=item -

C<RELEVANCE_LEVEL> the relevance level assigned to the image. This can
be a value between 0 and 1.

=back

=back

The following variables are for configuring I<result analysis>. They
will be used on reading a result file.

=over 4

=item *

C<SKIP_FIRST_LINES>: we expect the result file to start with a
fixed-length header, followed by a sorted sequence of query results,
sorted in descending order. This option, C<SKIP_FIRST_LINES> specifies
how many lines we will skip before expecting the first result line. 

=item *

C<PARSE>: This template describes how each result line is read. Again,
this is a template, comprised of strings, and variables. Variables are
designated by the dollar sign $, followed by alpha-numeric characters
or underscores. All characters between variables are interpreted as
regular expressions. In particular, a string is interpreted as itself.

There are two important variables:

=over 4

=item -

C<IMAGE>: the URL of the image returned in a given result line

=item -

C<SCORE>: this is the score the C<$IMAGE> has obtained.

=back

=back

Thats all syntax/semantics for the moment, if you need more information, please
contact wolfgang.mueller@cui.unige.ch and suggest extensions. To this
document or to the corresponding perl modules.

To give more insights in the use and into the inner workings of this
program we give a complete example in the following section.


=head2 EXAMPLE: MY_CBIRS_SIMULATOR.PL

=head3 COMMAND LINE

=head4 EXAMPLE

The tool we are looking at, is a command line based tool that takes
one single image as input. We can choose the algorithm used (matching
function etc.) and we will tell the system to write the output into a
result file. A typical command line can look like:

perl Misc/my_cbirs_simulator.pl ./image33.ppm my_algorithm result-image33.txt

=head4 EXPRESSING THE COMMAND LINE FORMAT IN CLI-ADAPTER.CONFIG

In the command line for calling Misc/my_cbirs_simulator.pl, we need
two variables: C<$QUERY> will be the query string that depends on the 
actual query, C<$TMP_FILE> will contain the result. Which makes our 
definition line in the config file (again "|" denotes the start of the 
line):

  |COMMAND_LINE_TO_START:perl Misc/my_cbirs_simulator.pl $QUERY my_algorithm $TMP_FILE

However, we also know that our query can contain at most one image, which 
makes it:

  |MAX_NUMBER_OF_QUERY_IMAGES_PER_CALL:1

We will receive queries for images on the Benchathlon server. The benchathlon server is 
remote, and the query URLs will have the prefix C<http://www.benchathlon.net/img/done> 
which makes the line in the configuration file:

  |REMOTE_URL_PREFIX:http://www.benchathlon.net/img/done

Please note that there is I<no whitespace> before or after the URL prefix.

Locally, the images will be used within the curent working directory. 
This makes the local url prefix 

  |LOCAL_URL_PREFIX:.

("." without any spaces before or after).

Now we can assemble the query items: 
for each image queries we want to have 
the URL generated by combination of the local URL prefix 
and the url postfix of the query image. Behind that is one space " ", 
which makes it:

  |QUERY_ITEM:$LOCAL_URL_PREFIX$URL_POSTFIX 

I<Note that the space at the end of the line counts!>

Voila! A query by example with example image 
C<http://www.benchathlon.net/img/done/image33.ppm> will now become 
a call to the command line

  perl Misc/my_cbirs_simulator.pl ./image33.ppm my_algorithm /tmp/some-unique-file.txt

Now we just need to parse the result file.

=head3 RESULT FILE

=head4 EXAMPLE

The result file is comprised of lines of the format

  |C<score> C<filename>

(The | marks here the left limit of the line). For example, a result
file could look like 

|1 image15.ppm
|0.95 image12.ppm
|0.01 image01.ppm

=head4 Expressing the result file format in cli-adapter.config

There is no header, so we do not need to skip any lines at the 
beginning of the result file:

  |SKIP_FIRST_LINES:0

Each line will be parsed using the C<PARSE> expression.

  |PARSE:$SCORE $IMAGE

The score will begin with the beginning of the line, after 
the score I<one> space for separation, and then the image URL.


I<Please note> that after parsing the result file, an MRML 
message will be generated. For this purpose, the content of the $IMAGE 
variable will be stripped of the C<$LOCAL_URL_PREFIX>, and then the 
C<$REMOTE_URL_PREFIX> will be prepended.

=head1 CONCLUSION

C<CLIAdapter.pl> permits interfacing a command line tool with an MRML 
compliant client. We propose a simple, yet flexible configuration 
mechanism. This mechanism has been described. In addition to that we give 
a complete usage example.

=head1 CONTRIBUTIONS

While we have taken care to generate something flexible and useful, 
we need more input about needed features. Please contact us.

=head1 SEE

  http://www.benchathlon.net
  http://www.mrml.net

  Misc/cli-adapter.config
  MRML::Server::CLIAdapter

=head1 BUGS

If you find bugs, please report them to the author

=head1 AUTHOR

  wolfgang.mueller@cui.unige.ch

=cut


my $VERSION="0.01";

package main;

use vars qw(%l_options);

#  
# Helper procedure
# reads a list of images and creates a reference to the URLs
#
sub create_image_list_reference{
  my $in_file_name=shift;
  open IMAGE_LIST,$in_file_name or die "Could not open $l_options{'image-dir'} $!\n";
  my @l_image_list;
  
  while(<IMAGE_LIST>){
    chomp;
    push @l_image_list,$_;
  }
  return \@l_image_list
}

#
# Get the options
#
local(%l_options)=();
GetOptions (\%l_options,
	    "host:s",
	    "port:i",
	    "temporary-directory:s",
	    "image-list=s",
	    "config-file=s"
	   );



my $lDefaultPort=10101;

# the MRML server will listen on port 10101 unless
# another value was given on the command line
my $lAddress= $l_options{host} || "$ENV{HOST}" || "localhost";
my $lPort= $l_options{port} || $lDefaultPort;

print STDERR "$0 --config-file CONFIGFILE --image-list <IMAGELIST> \\
         [--host HOST] [--port PORT] \\
         [--temporary-directory <TMPDIR>] 
Default for adress is: $ENV{HOST} or 'localhost' if \$HOST is not set
Default for port is:          $lDefaultPort

MANDATORY OPTIONS:
-----------------

--config-file <CONFIGFILE>
  The file given in CONFIGFILE has to contain a valid $0 configuration file

--image-list <IMAGELIST>
  The file given by <IMAGELIST> contains a list of image URLs, one URL per line.


OPTIONS that have default values:
---------------------------------

--host <HOST>
  The server will accept connections to <HOST>

--port <PORT>
  The port to which the server is listening

--temporary-directory <TMPDIR>
  You have to have write rights to <TMPDIR> we will use this directory to write 
  temporary files.

WARNING: with the current parameters,
will accept connections *to* your $lAddress In case of problems, use 
your *full host name* (e.g. viper.unige.ch) to allow people to connect 
from the outside! 

In case of problems, mailto:wolfgang.mueller\@cui.unige.ch
";

my $l_harness=MRML::Server::CLIAdapter->new(
					    address => $lAddress,
					    port    => $lPort,
					    "config-file" => $l_options{"config-file"} || die,
					    "temporary-directory" => $l_options{"temporary-directory"} || "/tmp/",
					    images  => create_image_list_reference($l_options{'image-list'
											     }
										  )
					   );

$l_harness->serve();


