EazyErl !: 2007

Thursday, December 20, 2007

Server DNS problems, and no longer available downloads :/

My own box has some troubles with its DNS entries, this means that you can no longer reach the LibTre (treregex-0.7)...

I've set up an googlepage to store it in the mean time: easyerl.googlepages.com.

Why did I take some much time to fix it ? That's because I've left Paris to work in the South of France, exactly "sur la côté d'Azur".

I'll publish some other simple things in the near future... So you just need to subscribe to my Feed :).

Download libTre.

PS: Now if you read the mailing list you'll be able to retrieve the package...

Monday, November 19, 2007

NAGIOS (beurk) nrpe support for erlang

NAGIOS a pretty bad software uses a pretty bad protocol, but NAGIOS seems to be installed everywhere...
I needed a way to bypass its really poor scheduling process, and naturally erlang comes to my rescue... But everything is not so simple.

NRPE this horrible protocols uses fixed length packets (from the code the 2 last characters are never sets to 0, sizeof seems to be really misunderstood by the nagios developer :p).

But NRPE is another crap CRC32 code, and for efficiency and time saving I didn't wanted to reimplemented it in Erlang, so I wrote a nrpe_crc32 port...

Here's the crc32 code:



#include <unistd.h>
#include <stdio.h>
#include <string.h>

static unsigned long crc32_table[256];

typedef struct packet_struct
{
 int16_t   packet_version;
 int16_t   packet_type;
 u_int32_t crc32_value;
 int16_t   result_code;
 char      buffer[MAX_PACKETBUFFER_LENGTH];
} packet;

/* build the crc table - must be called before calculating the crc value */
void generate_crc32_table(void){
 unsigned long crc, poly;
 int i, j;

 poly=0xEDB88320L;
 for(i=0;i<256;i++){
  crc=i;
  for(j=8;j>0;j--){
   if(crc & 1)
    crc=(crc>>1)^poly;
   else
    crc>>=1;
  }
  crc32_table[i]=crc;
 }

 return;
}

/* calculates the CRC 32 value for a buffer */
unsigned long calculate_crc32(char *buffer, unsigned int buffer_size){
 register unsigned long crc;
 int this_char;
 int current_index;

 crc=0xFFFFFFFF;

 for(current_index=0;current_index  this_char=(int)buffer[current_index];
  crc=((crc>>8) & 0x00FFFFFF) ^ crc32_table[(crc ^ this_char) & 0xFF];
 }

 return (crc ^ 0xFFFFFFFF);
}

unsigned long test(const char *value)
{
 return calculate_crc32((char *) value, strlen(value));
}

The port_driver:


/* port_driver.c */

#include "erl_driver.h"

extern void generate_crc32_table(void);
extern unsigned long calculate_crc32(char *, unsigned int);

typedef struct {
 ErlDrvPort port;
} crc32_data;

static ErlDrvData crc32_drv_start(ErlDrvPort port, char *buff)
{
 crc32_data* d = (crc32_data*)driver_alloc(sizeof(crc32_data));
 d->port = port;

 /* init crc32 table */
 generate_crc32_table();
 return (ErlDrvData) d;
}

static void crc32_drv_stop(ErlDrvData handle)
{
 driver_free((char*)handle);
}

static void crc32_drv_output(ErlDrvData handle, char *buff, int bufflen)
{
 crc32_data* d = (crc32_data*)handle;

 char fn = buff[0];
 char *arg = &buff[1];
 unsigned long res;

 switch (fn) {
  case 1:
   res = calculate_crc32(arg, bufflen - 1);
   driver_output(d->port, (char *) &res, (sizeof(unsigned long)));
   break;
  default:
   break;
 }
}

ErlDrvEntry crc32_driver_entry = {
 NULL,                       /* F_PTR init, N/A */
 crc32_drv_start,          /* L_PTR start, called when port is opened */
 crc32_drv_stop,           /* F_PTR stop, called when port is closed */
 crc32_drv_output,         /* F_PTR output, called when erlang has sent */
 NULL,                       /* F_PTR ready_input, called when input descriptor ready */
 NULL,                       /* F_PTR ready_output, called when output descriptor ready */
 "crc32_drv",              /* char *driver_name, the argument to open_port */
 NULL,                       /* F_PTR finish, called when unloaded */
 NULL,                       /* F_PTR control, port_command callback */
 NULL,                       /* F_PTR timeout, reserved */
 NULL                        /* F_PTR outputv, reserved */
};

DRIVER_INIT(crc32_drv) /* must match name in driver_entry */
{
 return &crc32_driver_entry;
}

The crc32 module, initializing the lib, and calling the crc32 fun:



-module(crc32).

-export([start/0,init/1,compute/1]).

start() ->
 start("crc32_drv").

start(SharedLib) ->
 case erl_ddll:load_driver(".", SharedLib) of
  ok -> ok;
  {error, already_loaded} -> ok;
  _E ->  io:format("Error: ~p~n", [_E]),
   exit({error, could_not_load_driver})
 end,
 spawn(?MODULE, init, [SharedLib]).

init(SharedLib) ->
 register(?MODULE, self()),
 Port = open_port({spawn, SharedLib}, [binary]),
 loop(Port).


compute(X) ->
 Bin = iolist_to_binary(X),
 call_port(<<1, Bin/binary>>).

call_port(Msg) ->
 ?MODULE ! {call, self(), Msg},
 receive
  {?MODULE, Result} ->
   Result
 end.

loop(Port) ->
 receive
  {call, Caller, Msg} ->
   Port ! {self(), {command, Msg}},
   receive
    {Port, {data, Data}} ->
     Caller ! {?MODULE, decode(Data)}
   end,
   loop(Port);

  stop ->
   Port ! {self(), close},
   receive
    {Port, closed} ->
     exit(normal)
   end;

  {'EXIT', Port, Reason} ->
   io:format("~p ~n", [Reason]),
   exit(port_terminated)
 end.

% Also, Valid for Network 
decode(<<U:32/big-unsigned>> = Bin) when is_binary(Bin) ->
 U.

decode(X) -> X.

Now the nrpe module, there you'll see why the nrpe is pure crap, fixed packet length for this type of tool is nonsense...



-module(nrpe).

-export([encode/1, request/1, crc32/1, connect/1, connect/2]).


encode(Bin) ->
 { Crc, _} = crc32:compute(Bin),
 <<Crc:32, Bin>>.

request(Query) ->
 Version = 2,
 Type = 1,
 Crc = 0,
 Code = 0,
 Blank = <<0:32/unit:256>>, % 1024 bytes 
 Q = iolist_to_binary(Query),
 Padlen = 1024 - size(Q),
 {C, _} = crc32:compute(
 <<Version:16, Type:16, Crc:32, Code:16, Q/binary, 0, 0, Blank:Padlen/binary>>),
 <<Version:16, Type:16, C:32, Code:16, Q/binary, 0, 0, Blank:Padlen/binary>>.

Building two binaries to only send one, is completely dump. But this is required... Thanks
to nrpe...



crc32(Bin) ->
 {Crc, _} = crc32:compute(Bin),
 {Crc, Bin}.

%
% send_packet.packet_version=(int16_t)htons(NRPE_PACKET_VERSION_2);
% send_packet.packet_type=(int16_t)htons(QUERY_PACKET);
% strncpy(&send_packet.buffer[0],query,MAX_PACKETBUFFER_LENGTH);
% send_packet.buffer[MAX_PACKETBUFFER_LENGTH-1]='\x0';
% 
% send_packet.crc32_value=(u_int32_t)0L;
% calculated_crc32=calculate_crc32((char *)&send_packet,sizeof(send_packet));
% send_packet.crc32_value=(u_int32_t)htonl(calculated_crc32);

%% #define QUERY_PACKET  1  /* id code for a packet containing a query */
%% #define RESPONSE_PACKET  2  /* id code for a packet containing a response */
%% 
%% #define NRPE_PACKET_VERSION_2   2               /* packet version identifier */
%% #define NRPE_PACKET_VERSION_1 1  /* older packet version identifiers (no longer supported) */
%% 
%% #define MAX_PACKETBUFFER_LENGTH 1024  /* max amount of data we'll send in one query/response */

%% typedef struct packet_struct{
%%  int16_t   packet_version;
%%  int16_t   packet_type;
%%  u_int32_t crc32_value;
%%  int16_t   result_code;
%%  char      buffer[MAX_PACKETBUFFER_LENGTH];
%%         }packet;

connect(Host) ->
 connect(Host, 5666). 

connect(Host, Port) ->
 case gen_tcp:connect(Host, Port, [binary, {active, false}]) of
  {ok, Sock} ->
   Query = request("test"),
   send(Sock, Query),
   io:format("Response: '~s'~n", [recv(Sock)]), 
   close(Sock);

  {error, Error} ->
   io:format("Connect-error: ~p~n", [Error])
 end.

send(Sock, Data) ->
 case gen_tcp:send(Sock, Data) of
  ok ->
   ok;

  {error, Error} ->
   io:format("send-error: ~p~n", [Error])
 end.

recv(Sock) ->
 case gen_tcp:recv(Sock, 0, 2000) of
  {ok, Packet} ->
   io:format("read: ~p~n", [Packet]),
   decode(Packet);

  {error, Error} ->
   io:format("recv-error: ~p~n", [Error])
 end.
  
  
close(Sock) ->
 gen_tcp:close(Sock).


decode(<<Version:16, Type:16, Crc:32, 0, 0, Rest/binary>>) ->
 io:format("Version: ~p, Type: ~p, Crc: ~p~n", [Version, Type, Crc]),
 decode_response(Rest).

decode_response(Bin) ->
 Len = msg_len(Bin, 0),
 {Msg, _} = split_binary(Bin, Len),
 Msg.
 

msg_len(<<0, Rest/binary>>, Len) ->
 Len;
msg_len(Bin, Len) ->
 {_, Next} = split_binary(Bin, 1),
 msg_len(Next, Len + 1).

I hope someone will find this interesting :p

Sunday, November 18, 2007

Treregex-0.7 download now

You can download the treregex-0.7 from here.

To test it you need to install the libtre library, normally you can do this by using apt-get:


-apt-cache search libtre
libtre-dev - development package for the libtre4 regexp matching library
libtre4 - regexp matching library with approximate matching

Install the libtre4 and libtre-dev, and then try to ./configure the treregex. You may need to edit Makefile adjusting path for your needs...

And as usual, feedback is welcome :p

Digraph and your network, too easy

The digraph module can help you build directed graph (or not directed) very easily. I need to know the status of all my hosts within my network, and I make statistics about service availability.
With the digraph module I am able to write links between my hosts, my hosts are: karoten, ultraten, muloten, arsen, masculen, colen, pollen.
So I define them at the first time


(beta@karoten)425> f(D), D = digraph:new(). % a new digraph
{graph,22,23,24,true}
(beta@karoten)426> servers:add(D, [karoten,ultraten,muloten,arsen,masculen,colen,pollen]). 
ok

Now I can manipulate my nodes (my servers)


(beta@karoten)427> servers:connect(D, karoten, [ultraten,{muloten,http}, arsen, {masculen,ssh}]).
ok

karoten can reach ultraten, and muloten with http, arsen, and masculen with ssh.


(beta@karoten)428> servers:connect(D, colen, [{muloten,http}, arsen, {pollen,ssh}]).             
ok

colen can reach muloten with http, arsen, and pollen with ssh.

So let's find colen links:


(beta@karoten)429> servers:links(D, colen).
[{colen,muloten,http},{colen,arsen,[]},{colen,pollen,ssh}]

This exactly what I've written before, good ...

And muloten links:


(beta@karoten)430> servers:links(D, muloten).
[{karoten,muloten,http},{colen,muloten,http}]

This is deduced from what I've describe before...

Now let's imagine we want to find a way to reach one node from another:


(beta@karoten)431> digraph:get_path(D, karoten, arsen).
[karoten,arsen]

Karoten seems to be connected with arsen.

Let's create a new link, between ultraten and colen:


(beta@karoten)434> servers:connect(D, ultraten, colen). 
['$e'|7]

Let's try to reach pollen from karoten:


(beta@karoten)435> digraph:get_path(D, karoten, pollen).
[karoten,ultraten,colen,pollen]

So the way is: thru ultraten, colen, karoten can reach pollen...

Now let's design a more web design approach, with a firewall, a load balancer lb, and various httpd and application servers, finally databases:


(beta@karoten)436> servers:add(D, [firewall,lb,http1,http2,http3,app1,app2,app3,app4,db1,db2]).
ok

The firewall is directly connected to the load balancer:


(beta@karoten)437> servers:connect(D, firewall, lb).
ok

The load balancer distribute the load to three httpd:


(beta@karoten)438> servers:connect(D, lb, [http1,http2,http3]).
ok
(beta@karoten)439> servers:connect(D, http1, [app1,app2,app3]).
ok
(beta@karoten)440> servers:connect(D, http2, [app2,app3]).
ok
(beta@karoten)441> servers:connect(D, http3, [app3]).
ok
(beta@karoten)442> servers:connect(D, app3,[db1,db2]).
ok
(beta@karoten)443> servers:connect(D, app2, [db1]).
ok
(beta@karoten)444> servers:connect(D, app1, db2).
['$e'|21]

Finally I can find a path between the firewall and the database 2:


(beta@karoten)445> digraph:get_path(D, firewall, db2).
[firewall,lb,http3,app3,db2]

Now the code:


module(servers).

-export([add/2,del/2,connect/3,links/2,reachable/2]).

add(Graph, Servers) when list(Servers) ->
 lists:foreach(fun(X) -> digraph:add_vertex(Graph, X) end, Servers);

add(Graph, Server) ->
 digraph:add_vertex(Graph, Server).

del(Graph, Servers) when list(Servers) ->
 lists:foreach(fun(X) -> digraph:del_vertex(Graph, X) end, Servers);

del(Graph, Server) ->
 digraph:del_vertex(Graph, Server).

connect(_Graph, _Server, []) ->
 ok;
connect(Graph, Server, [ {S, L} | Servers ]) ->
 digraph:add_edge(Graph, Server, S, L),
 connect(Graph, Server, Servers);
connect(Graph, Server, [ S | Servers ]) ->
 digraph:add_edge(Graph, Server, S),
 connect(Graph, Server, Servers);
 
% connect(Graph, Server, Servers) when list(Servers) ->
%   lists:foreach(fun(X) -> digraph:add_edge(Graph, Server, X) end, Servers);

connect(Graph, Server, S) ->
 digraph:add_edge(Graph, Server, S).

links(Graph, Server) ->
 lists:map(fun(X) -> {_, S1, S2, Label} = digraph:edge(Graph, X), {S1, S2, Label} end, digraph:edges(Graph, Server)).

reachable(Graph, Server) when list(Server) ->
 digraph_utils:reachable(Server, Graph);
reachable(Graph, Server) ->
 digraph_utils:reachable([Server], Graph).

Saturday, November 17, 2007

Experimenting with the erlang SSH support, or remote 'tail' with SSH...

While designing my monitoring tool, and working and treregex, I found the ssh documentation and realize that it can be very useful for my tool.

A simple question needed to be answered, is the ssh module able to easily spawn a remote process for me ?
To verify, I tried to build a remote tail module called ssh_tail :)


-module(ssh_tail).
        
-export([tail/3]).
-define(TIMEOUT, 5000). 

tail(Host, User, Pass) ->
 case ssh_cm:connect(Host, 22, [{user_dir, "/var/tmp/ssh"}, {user, User}, {password, Pass}]) of
  {ok, CM} ->
   session(CM, fun(X) -> io:format("-ssh: ~p~n", [X]) end);

  Error ->
   Error
 end.

From the ssh documentation user_dir let you decide where you want to store keys, from my experience it's better to use a separate directory from the ~/.ssh.
It happens that latest version of ssh add meta information to their files that the ssh module can't handle. (more on this in another post).

For the test I wanted to do a "tail -f" on a specific file ie "/var/log/syslog".


session(CM, Callback) ->
 case ssh_cm:session_open(CM, ?TIMEOUT) of
  {ok, Channel} ->
   case ssh_cm:shell(CM, Channel) of 
    ok ->
     ssh_cm:send(CM, Channel, "tail --follow=name /var/log/syslog\n"),
     ssh_loop(CM, Channel, Callback);

    Error ->
     error_logger:error_msg("Error: ~p~n", [Error])
   end;
  Error ->
   error_logger:error_msg("Session Error: ~p~n", [Error])
 end.

ssh_cm is responsible for starting a shell, and sending commands to the remote shell process. I send


tail --follow=name /var/log/syslog\n

Don't forget the final '\n' character, since you won't get any results if you don't send it :p
(I didn't think of that while testing for the first time and think that the code didn't work at all...)


ssh_loop(CM, Channel, Callback) ->
 receive
  stop ->
   % Closing channel
   % ssh_cm:detach(CM, ?TIMEOUT),
   ssh_cm:close(CM, Channel);

  {ssh_cm, CM, {data, _Channel, 0, Data}} ->
   Callback(Data), 
   ssh_loop(CM, Channel, Callback);

  {ssh_cm, CM, {data, Channel, Type, Data}} ->
   io:format("extended (~p): ~p~n", [Type, Data]),
   ssh_loop(CM, Channel, Callback);

  {ssh_cm, CM, {closed, _Channel}} ->
   ssh_cm:detach(CM, ?TIMEOUT);
   
  E ->
   error_logger:info_msg("[~p] Received: ~p~n", [?MODULE, E]),
   ssh_loop(CM, Channel, Callback) 
 end.

ssh_cm sends various message to the calling process, more important tuples are


{ssh_cm, CM, {data, _Channel, 0, Data}}

Data holds what you want, and in our case a line sent by the tail process...
The callback defined at the beginning is then executed:


tail(Host, User, Pass) ->
 case ssh_cm:connect(Host, 22, [{user_dir, "/var/tmp/ssh"}, {user, User}, {password, Pass}]) of
  {ok, CM} ->
   session(CM, 
    fun(X) ->  % Our Callback 
     io:format("-ssh: ~p~n", [X]) % simple display...
    end);

  Error ->
   Error
 end.

To conclude this simple module is able to spawn a remote "tail -f" using a ssh connection and using a callback function on every data received.

The code was designed from the ssh_ssh module that you can find in the ssh module source code, because the ssh documentation is really sparse for now...

Thursday, November 15, 2007

An gen_server for mass regexp computing... (LibTre)

This is the first test session of my 'tregex_srv' that provides some nice regexp features:


266> l(tregex_srv).                                                 
{module,tregex_srv}
267> tregex_srv:start_link().                                       
{ok,<0.4045.0>}
268> tregex_srv:store( [<<"[0-9+] pid">>, <<"[a-z]+.tmp">>]).       
ok
269> tregex_srv:grep(<<"test 9405904.tmp acuu.tmpmulaor 10+ pid">>).
[[[{34,39,<<"+ pid">>}],[{17,25,<<"acuu.tmp">>}]]]
270> tregex_srv:store( [{ <<"test">>, fun(X) -> io:format("found: ~p~n", [X]) end},  <<"[0-9][0-9]">>]).
ok
271> tregex_srv:grep(<<"test 9405904.tmp acuu.tmpmulaor 10+ pid">>).                                    
found: [{0,4,<<"test">>}]
[[[{34,39,<<"+ pid">>}],[{17,25,<<"acuu.tmp">>}]],
 [[{0,4,<<"test">>}],[{5,7,<<"94">>}]]]
272> tregex_srv:store( [{ <<"SRC=[^ ]+">>, fun(X) -> 
 [{_,_,M}] = X, io:format("Source: ~p~n", [M]) 
end}]).                   
ok
273> tregex_srv:grep(<<"test 9405904.tmp acuu.tmpmulaor 10+ pid">>).                                      
found: [{0,4,<<"test">>}]
[[[{34,39,<<"+ pid">>}],[{17,25,<<"acuu.tmp">>}]],
 [[{0,4,<<"test">>}],[{5,7,<<"94">>}]],
 []]
274> tregex_srv:grep(<<"tst SRC=192.135.15.1 pid">>).               
Source: <<"SRC=192.135.15.1">>
[[[{19,24,<<"1 pid">>}]],
 [[{8,10,<<"19">>}]],
 [[{4,20,<<"SRC=192.135.15.1">>}]]]
275> tregex_srv:store( [{ <<"SRC=([^ ]+)">>, fun(X) -> 
 [{_,_,_}, {_,_,M}] = X, io:format("Source IP: ~p~n", [M]) 
 end}]).
ok
276> tregex_srv:grep(<<"tst SRC=192.135.15.1 pid">>).                                                                   
Source IP: <<"192.135.15.1">>
Source: <<"SRC=192.135.15.1">>
[[[{19,24,<<"1 pid">>}]],
 [[{8,10,<<"19">>}]],
 [[{4,20,<<"SRC=192.135.15.1">>}]],
 [[{4,20,<<"SRC=192.135.15.1">>},{8,20,<<"192.135.15.1">>}]]]

As you can see, you can associate Funs with regexp Matches. This means that you can bind action to regexp...
First we store (in fact add regexp to the existing regexp list) new tuples {RE, Fun}:


275> tregex_srv:store( [{ <<"SRC=([^ ]+)">>, fun(X) -> 
 [{_,_,_}, {_,_,M}] = X, io:format("Source IP: ~p~n", [M]) 
 end}]).
ok

Now the exec does call already registered funs, but call the new one since our regexp matches and you can see that the IP number is only printed, the "submatches" feature works as expected:


276> tregex_srv:grep(<<"tst SRC=192.135.15.1 pid">>).                                                                   
Source IP: <<"192.135.15.1">>
Source: <<"SRC=192.135.15.1">>
...

The gen_server state is the following:


-record(state, {
                requests,
                reindex,
                re = [],
                pids = []
                }).

Its init function is:


init(_Args) ->
        process_flag(trap_exit, true),
        {ok, #state{  
 re = ets:new(?MODULE, [set,private]), 
 requests = 0, 
 reindex = 1 }}.

Internally the module calls 'treregex:compile' to compile regexp and store the resulting #port into a list that is stored in the 'ets' table. Every call to 'tregex_srv:store' create a new entry in the ets table.


%% Storing RE and Funs
%% Creating simple fun when there's none provided...
%%
store([], Res, State) ->
        ets:insert(State#state.re, { State#state.reindex, Res});
store([ { Regexp, Fun } | List ], Res, State) ->
        {ok, Re } = treregex:compile(iolist_to_binary(Regexp), [extended]),
        store(List, [ { Re, Fun } | Res ], State);
store([ Regexp | List ], Res, State) ->
        {ok, Re } = treregex:compile(iolist_to_binary(Regexp), [extended]),
        store(List, [ { Re, fun(_) -> false end} | Res ], State).

The 'tregex_srv:grep' just uses 'ets:foldl' to compute results:


handle_call({grep, Line}, _Node, State) ->
        Requests = State#state.requests,
        Grep = fun({_Reindex, ReList}, Acc) ->
                        [ exec(ReList, Line, []) | Acc]
                end,
        {reply, ets:foldl(Grep, [], State#state.re), State#state{ requests = Requests + 1} }.

%% exec, using a List of {Re, Funs}
exec([], _Line, Acc) ->
        Acc;
exec([ { Re, Fun } | ReList ], Line, Acc) ->
        case treregex:exec(Re, Line) of
                {ok, Matches} ->
                        Fun(Matches),
                        exec(ReList, Line, [ Matches | Acc ]);

                {error, nomatch} ->
                        exec(ReList, Line, Acc)
        end;
exec([ _Any | ReList ], Line, Acc) ->
        exec(ReList, Line, Acc).

The code is still young, but seems to work.

The main purpose here, is to be able to massively process lines of logs. I want to be able to
spawn multiple process on multiples nodes that will be able to extract valuable content from
various lines. This is the first step forward :-)

I may cleanup the 'grep' fun since it will returns empty list whenever a regexp didn't match anything from the supplied line...

I'm really excited to think that I will be able to use the 'gen_server:multi_call' with this module :)

Monday, October 22, 2007

LibTRE returning matching values

New version of libTRE driver, now exec returns also matching binaries:


Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.5.5  (abort with ^G)
1> erl_ddll:load_driver(code:priv_dir(treregex)++"/bin", "TRE_drv").
ok
2> {ok, RE} = treregex:compile(<<"([a-z]+)([0-9]+)">>, [extended]).
{ok,#Port<0.79>}
3> treregex:exec(RE, <<"this is a test9234 of blast">>). 
{ok,[{10,18,<<"test9234">>},{10,14,<<"test">>},{14,18,<<"9234">>}]}
4> 
4>  treregex:exec(RE, <<"this is arolpghin39235 test9234 of blast">>). 
{ok,[{8,22,<<"arolpghin39235">>},{8,17,<<"arolpghin">>},{17,22,<<"39235">>}]}
5>

Soon to be released !

Wednesday, October 17, 2007

LibTRE in Action, the approximative match

Here's a sample session showing some of the libTre features:


Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.5.5  (abort with ^G)
1> erl_ddll:load_driver(code:priv_dir(treregex)++"/bin", "TRE_drv").
ok
2> f(RE),  {ok, RE} =  treregex:compile(<<"fear">>, [extended]).
{ok,#Port<0.74>}
3> treregex:approx(RE, <<"fir">>, 0, []).
{error,nomatch}
4> treregex:approx(RE, <<"fir">>, 1, []).
{error,nomatch}
5> treregex:approx(RE, <<"fir">>, 2, []).
{ok,[{0,3}]}
6> treregex:approx(RE, <<"fir">>, 1, []).
{error,nomatch}
7> treregex:exec(RE, <<"fir">>).         
{error,nomatch}
8> treregex:free(RE).
ok
9> f(RE),  {ok, RE} =  treregex:compile(<<"loubov">>, [extended]).
{ok,#Port<0.76>}
10>  treregex:approx(RE, <<"love">>, 3, []).
{ok,[{0,3}]}
11>  treregex:approx(RE, <<"to love">>, 0, []). 
{error,nomatch}
12>  treregex:approx(RE, <<"to love">>, 1, []).
{error,nomatch}
13>  treregex:approx(RE, <<"to love">>, 2, []).
{error,nomatch}
14>  treregex:approx(RE, <<"to love">>, 3, []).
{ok,[{0,6}]}
15>  treregex:approx(RE, <<"aimer">>, 3, []).  
{error,nomatch}
16>  treregex:approx(RE, <<"aimour">>, 3, []).
{error,nomatch}
17>  treregex:approx(RE, <<"amour">>, 3, []). 
{error,nomatch}
18>  treregex:approx(RE, <<"amour">>, 10, []).
{ok,[{1,6}]}

LibTRE, I finally got it working !

Hi !
I'm please to say that I finally managed the 'TRE_drv.c' code to work as expected. This time there's no more 'segfault'. The solution is to use only non conflicting function names:

regncomp and not regcomp
regnexec and not regexec
tree_free and not regfree

Since every posix Regex function will be used instead of their Tre equivalent, I was forced do find a special case for 'regfree'. There's no special regX function for 'regfree'.
Because regfree can be called at many places, when an erlang error occurs, or simply when we quit the shell; the function is trying to free the posix regex_t with a pointer to a TRE regex_t, so the crash is always there.
While looking at 'tre-internal.h' I've found the 'tre_free' function that will do the real job of freeing the TRE regex_t... So I've just declared as 'extern' this function directly into the TRE_drv.c file...

extern void tree_free(regex_t *preg);
static void tre_stop(ErlDrvData drv_data)
{
    struct driver *d = (struct driver*) drv_data;

//    if (d->compiled)
//      regfree(&(d->re));

        if (d->compiled)
                tre_free(&d->re);

    driver_free(d);
}

This time I've also added a call to the 'reganexec' function that's able to find approximate matches. The code is of course located into the 'tre_from_erlang' function. (this function gets called whenever erlang talks to the driver).


... snip ...
                case APPROX: 
                        
                        /* <> */

                        flags = get_int32(buf + 4);
                        cost = get_int32(buf + 8);

                        memset(&amatches, 0, sizeof(amatches));
                        amatches.pmatch = matches;
                        amatches.nmatch = MAX_MATCHES;

                        regaparams_default(®aparams);
                        regaparams.max_cost = cost;


                        status = reganexec(&d->re, buf + 12,  len - 12, &amatches, regaparams,  flags);
                        if (status != 0) {
                                driver_send_status(d, status);
                                return;
                        }

                        driver_send_matches(d, matches, MAX_MATCHES);
                        break;
... snip ...

With this you're able to make things like this:


{ok, RE} = treregex:compile(<<"test">>, [extended]).
treregex:approx(RE, <<"this is a tast">>, 0, []).

the 'fun approx/4' has the 'cost' (an integer) as extra argument, this correspond to the cost of manipulation characters to find a match. You'll find more about this here.


Approximate pattern matching allows matches to be approximate, that is, allows the matches to be 
close to the searched pattern under some measure of closeness. TRE uses the edit-distance measure
 (also known as the Levenshtein distance) where characters can be inserted, deleted, or 
substituted in the searched text in order to get an exact match. Each insertion, deletion, or 
substitution adds the distance, or cost, of the match. TRE can report the matches which have a 
cost lower than some given threshold value. TRE can also be used to search for matches with the 
lowest cost.

Finally, now that the driver and the erlang module works well I'll upload it to various repositories...

Saturday, October 13, 2007

LibTre and Posix, the segfault explanation

You already know that I want to make a libTRE driver such as the RE posix driver... But what you may not know is that I've never lost this amount of time debugging things...
I've, for a long long time ago, called 'gdb' my friend to rescue me ...
Let me explain the problem: we want to call 'regcomp' and 'regexec' from the libTRE package. Take your time and reread this sentence:


We want to call 'regcomp' and 'regexec' from the libTRE package.

This seems really easy, but we want to make this from a shared library... A shared library that we will open at runtime.
And lauching our erlang shell make the ld process finds symbols addresses in various .so files for us automagically...

Since I'm using ubuntu, the posix regexp are located in the libc.so.6 library, so ld knows about regcomp and regexec before I can even call 'erl_ddll:load_driver/2'...

Loading the driver into the erlang vm from my freshly built .so file, don't really work as expected... 'regcomp' is still the one from the glibc, and not the one from the libTRE driver...

But 'regex_t' is the one from tre, and 'regcomp' is the one from posix... There definition of the regex_t struct isn't the same :p

The driver structure:


typedef struct _desc {
    ErlDrvPort     port;
    ErlDrvTermData dport;          /* the port identifier as ErlDrvTermData */
    regex_t re;
    regmatch_t pm[16];
    int compiled;
} Desc;

Now the 'sizeof(regex_t)' from Tre is 8 and the posix is a lot more ... So when I call the 'regcomp' function like this:


        switch(op) {
                case COMPILE: 
... snip ...
                        status = regcomp(&d->re, buf+8, flags);
                        d->compiled = 1;
... snip ...

The 'regmatch_t' is exactly overflowed by exactly

( sizeof (posix regex_t) - (2 * sizeof (tre regex_t)) )

bytes ...

This is the reason why I get a segfault (from my last post) when I try to call 'regexec'...

I'm now rewriting a simpler interface to libtre that will not have names that can collide with posix...

Wednesday, October 10, 2007

LibTre for Fast regular expression

While reading the almost famous article about regular expressions, I tried to use TRE.
Since TRE is posix Compliant, but unfortunately don't have any erlang driver I downloaded ''posregex'' and hack it a little to make it use TRE.

Now here's some experiment with it:


22> Init = fun() ->  erl_ddll:load_driver(code:priv_dir(treregex)++"/bin", "TRE_drv") end.
#Fun
23> Init().
ok
24> f(List), List = treutils:build([<<"test">>, <<"toto">>, <<"[a-z][0-9]$">>, <<"^[a-zA-Z][a-zA-Z0-9_]+">>]).
[{<<"test">>,#Port<0.84>},
{<<"toto">>,#Port<0.85>},
{<<"[a-z][0-9]$">>,#Port<0.86>},
{<<"^[a-zA-Z][a-zA-Z0-9_]+">>,#Port<0.87>}]

25> treutils:exec(List, <<"alkjlskdjflskjglksjflakgjlkfgjl;dkgjklsdjglkdsjfglksd
jlkgjsdlkfgjlsdk fg989t9sgdkgj lkyrdy sjd gyrdsl;gkj test asl;dksdf">>).
{<<"alkjlskdjflskjglksjflakgjlkfgjl;dkgjklsdjglkdsjfglksdjlkgjsdlkfgjlsdk 
fg989t9sgdkgj lkyrdy sjd gyrdsl;gkj test a"...>>,

[{ok,<<"^[a-zA-Z][a-zA-Z0-9_]+">>},{ok,<<"test">>}]}

26> treutils:exec(List, <<"10alkjlskdjflskjglksjflakgjlkfgjl;dkgjklsdjglkdsjfglksdjlkgjsdlkfgjlsdk 
fg989t9sgdkgj lkyrdy sjd gyrdsl;gkj test asl;dksdf">>).                   
{<<"10alkjlskdjflskjglksjflakgjlkfgjl;dkgjklsdjglkdsjfglksdjlkgjsdlkfgjlsdk 
fg989t9sgdkgj lkyrdy sjd gyrdsl;gkj test"...>>,

[{ok,<<"test">>}]}

I build a List of Port that are compiled regular expressions, then I iterate thru the list matching "Line".

Here's the TreUtils module (BTW, you can replace treregex with posregex if you want...)


-module(treutils).
-export([build/1, exec/2, destroy/1]).

build(List) ->
       Fun = fun(X) ->
               {ok, RE} = treregex:compile(X, [extended]),
               {X, RE}
       end,
       lists:map(Fun, List).

destroy(List) ->
       lists:foreach( fun({_Name, RE}) -> treregex:free(RE) end, List).

exec(List, Line) ->
       exec(List, Line, []).

exec([], Line, Acc) ->
       {Line, Acc};
exec([H|Rest], Line, Acc) ->
       {Name, RE} = H,
       case treregex:match(RE, Line) of
               ok ->
                       exec(Rest, Line, [ {ok, Name} | Acc ]);

               {error, nomatch} ->
                       exec(Rest, Line, Acc)
       end.

Other libraries are also available.

Unfortunately I'm unable to make this module rock solid since it segfault if I ever call the 'exec' method two times... I think that there's a problem in the TRV_drv.c (heavily copied from the RE_drv.c) in the 'RE_from_erlang' function. The regexec call may garbage some of its internal...


typedef struct _desc {
    ErlDrvPort     port;
    ErlDrvTermData dport;          /* the port identifier as ErlDrvTermData */
    regex_t re;
    regmatch_t pm[16];
    int compiled;
} Desc;

... snip ...

static void RE_from_erlang(ErlDrvData drv_data, char *buf, int len)
{
        int status;
        unsigned int op = get_int32(buf);
        unsigned int flags = get_int32(buf+4);
        Desc *d = (Desc*) drv_data;

        switch(op) {

... snip ...

                case EXEC: 
                        status = regexec(&d->re, buf+8, (size_t) 16, &d->pm[0], flags);
                        if (status != 0) {
                                driver_send_status(d, status);
                                return;
                        }
                        driver_send_pm(d);
                        break;

I hope I'll come back soon with good news, since this TRE library looks very very promising...
If anyone have any clue ;p, please comment !

Tuesday, October 2, 2007

High Order Functions must be tested before use

From my previous article comments it is a small and efficient method of filtering datas:


Cpu = fun({cpu, _, _}) -> true; (_) false end.

But whenever we want to transposing it into a HOF (high order function):


Filter = fun(Elem) ->
 fun({Elem, _, _}) -> true; (_) false end
 end.

This naive approach doesn't work:


1> Filter = fun(Elem) -> fun({Elem, _, _}) -> true; (_) -> false end end. 
#Fun<erl_eval.6.49591080>
2> C = Filter(cpu).
#Fun<erl_eval.6.49591080>
3> C({test, t, t}).
true % this should have been false ...

As explained in this document, we need to use guards to make our high order function effective:


The rules for importing variables into a fun has the consequence that certain pattern matching 
operations have to be moved into guard expressions and cannot be written in the head of the fun.

The correct way is:


Filter = fun(Elem) ->
 fun({X, _, _}) when X == Elem -> true; (_) false end
 end.

So we must keep in mind that sometimes we should really check that our HOF is working as expected !

Monday, October 1, 2007

High Order Functions, filtering lists...

I have a list of collected cpu and network values, from eth0 and eth1:

 L =
[{cpu,user,<<"3.05">>},
 {cpu,nice,<<"0.00">>},
 {cpu,system,<<"0.72">>},
 {cpu,iowait,<<"0.03">>},
 {cpu,steal,<<"0.00">>},
 {cpu,idle,<<"96.20">>},
 {eth0,rxpck,<<"2.52">>},
 {eth0,txpck,<<"0.15">>},
 {eth0,rxbyt,<<"173.80">>},
 {eth0,txbyt,<<"44.68">>},
 {eth0,rxcmp,<<"0.00">>},
 {eth0,txcmp,<<"0.00">>},
 {eth0,rxmcst,<<"1.25">>},
 {eth1,rxpck,<<"0.00">>},
 {eth1,txpck,<<"0.02">>},
 {eth1,rxbyt,<<"0.00">>},
 {eth1,txbyt,<<"1.00">>},
 {eth1,rxcmp,<<"0.00">>},
 {eth1,txcmp,<<"0.00">>},
 {eth1,rxmcst,<<"0.00">>}]

If I want to manipulate such data set I'll need some filter functions that will help me to extract values, I need cpu values and eth0 values. High order function can do that for me !

First we need to select tuples, 'cpu' tuples:


Cpu = fun({X, Y, Z}) -> 
 if X == cpu -> 
  true; 
 true ->
  false 
 end 
end.

A better and far more erlangish method (thanks Zvi):


Cpu = fun({cpu,_,_}) -> true;
         (_) -> false
end.

Okay this fun will return true whenever the first element of the tuple is 'cpu'.
But this fun is static, since 'cpu' is written in the function body. Let's make it dynamic:


Filter = fun(Motif) ->
  fun({X, Y, Z}) ->
   if X == Motif ->
    true;
   true ->
    false 
   end
  end 
 end.

The new High order function 'Filter' is generated by 'fun(Motif)' and takes as argument a tuple '{X, Y, Z}', this is a fun that return a fun...
This function can be used like this:


List = {cpu, test, dummy}. % a sample list
Cpu = Filter(cpu). % generate the Cpu fun
Cpu(List). %executing the Cpu fun
true.
List2 = {test, cpu, dummy}. %Other dummy list
Cpu(List2).
false.

Come back to our initial data set, and realize that we must iterate thru the list to extract possible values. Iterate and apply a fun to every element is what we need to do, futhermore we need to retrieve matching values... In fact we need to build the list of extracted values, and this can be accomplished by 'lists:foldl':


extractor(Motif) ->
 fun(L) -> 
  lists:foldl(
   fun({X, Y, Z}, List) -> 
    if X == Motif 
     -> [ {Y,Z} | List ]; 
     true -> List 
    end 
   end, 
   [], L) 
 end.

In details, 'List' is the accumulator list, the one that will grow with valid tuple, the one we will return. The fun is the same as described before. To make things clear 'extractor/1' returns a fun that will parse a list of tuple extracting values that matches 'Motif'.

Another easier method, using only list comprehension:


extractor(Motif) ->
        fun(L) when is_list(L) ->
                [ {Y, Z} || {X, Y, Z} <- L, X == Motif]
        end.

Usage:


38> Cpu = module:extractor(cpu).
#Fun<module.1.131615259>
39> Cpu(L).
[{idle,<<"96.20">>},
 {steal,<<"0.00">>},
 {iowait,<<"0.03">>},
 {system,<<"0.72">>},
 {nice,<<"0.00">>},
 {user,<<"3.05">>}]
40> Eth0 = module:extractor(eth0).
41> Eth0(L).
[{rxmcst,<<"1.25">>},
 {txcmp,<<"0.00">>},
 {rxcmp,<<"0.00">>},
 {txbyt,<<"44.68">>},
 {rxbyt,<<"173.80">>},
 {txpck,<<"0.15">>},
 {rxpck,<<"2.52">>}]

I use such code for my monitoring project, I use the sar output (sysstat package), and I graph using rrdtool... This is a part of a bigger project that will compose a 'scheduler'.

Saturday, September 29, 2007

Building Things with high order functions...

One thing that always surprise me, is code that extensively uses high order function or let's call that "functions that returns functions" (may be known also as "closure")...
What can I, myself, do with such things ?
After a little time reading and thinking, and reading and reading, I designed a really simple usage of all of this. I'll create some helper functions to build xml tags...


-module(tags).
-export([tags/1]).

tags(Elem) ->
       fun(X) ->
               "<" ++ Elem ++ ">" ++ X ++ "</" ++ Elem ++ ">"
       end.

Let me explain:

Elem will be the enclosing tag
X is the parameter the function will receive at call time

The module in action:


1> c(tags).
{ok,tags}
2> Div = tags:tags("div").
#Fun<tags.0.28130594>
3> Div("html text").
"<div>html text</div>"
4>

'tags:tags' returns a function that will create div tags...

Now you're able to build a list of functions that will create valid output, without knowing the real syntax. You can see 'tags:tags' as a function that abstract the final notation of an element of your choice.

For example, building a 'title' tags is done by :


Title = tags:tags("title").

Building a list of functions for your language can be done like this:


lists:map(fun tags:tags/1, ["title", "div", "p", "ul", "li", "script"]).

Tuesday, September 11, 2007

Erlang LDAP support and Active Directory

Where I work now, LDAP and Active Directory are used everywhere, since this is a nice idea and everybody is used to it, I have the obligation to use the same framework for users authentication...
Once there's a central authentication mecanism somewhere it would be idiotic to not use it :p

This is why I started looking the Erlang LDAP support...

Everybody knows that erlang is really good at doing ASN1, but more people may not know that this feature is very very powerful... And while using the LDAP protocol this is really a killer feature, because once all ELDAP.* generated files are written you have access to the full power of LDAPv3...

I was idling a little on #erlang and talked a bit about LDAP and finally 'etnt' shows me the 'eldap' module... The first article I found was the one on Trapexit.org.
Since I use CEAN to work I just need to do


cean:install(eldap).

To get the 'eldap' module installed, I also use the developer cean package so I can retrieve source files aka *.erl files...

Then comes problems...
I wasn't able to connect to any of both servers. I was able to see that the tcp connection was correct but that the module don't want to bind (LDAP meaning).

I then activate the debug level (really nice feature):


eldap:debug_level("ldap", 2).

After that everything becomes clearer... The 'check_Pkt/1' fun incorrectly drop the packet !
I modified the fun, and make it returns 'ok'. Right after that every connection (bind) attempt worked flawlessly...

Another thing I need to do was searching DN from a DN attributes, this cannot be done by anything else than a 'distinguishedNameMatch' only available with an 'ExtensibleMatch':

Performing ExtensibleMatch (LDAPv3):
We need to add this line into the 'v_filter/1' fun:


v_filter({extensibleMatch, AV}) -> {extensibleMatch, AV};

The rest is fully handled by the ASN1 part...

Making an ExtensibleMatch part 2:


Base = "dc=example,dc=com".
Filter = {extensibleMatch, {'MatchingRuleAssertion',"2.5.13.1", "CN", "Username", false}}
eldap:search("ldap", [ Base, Filter ]).

While this query perfectly works with OpenLdap it doesn't with Active Directory !
In fact AD closes the connection directly... (May be a bug ?)

Wednesday, August 15, 2007

Erlang Picasa API, it works, and here's some reason why...

The Post request to the picasa web service:


request_picasa(AuthToken, User, ImgName, Data) ->
     Body = iolist_to_binary(Data),
     Authorization = <<"GoogleLogin auth=", AuthToken/binary>>,
     Url = "http://picasaweb.google.com/data/feed/api/user/" ++ User ++ "/album/blog",
     case http:request(post,
                     {       Url,
                             [ { "Authorization", binary_to_list(Authorization) },
                               { "Slug", ImgName }
                             ],
                             "image/jpeg",                           % Content-type
                             Body                                    % Body
                     },
             [ {timeout, 30000}, {sync, false} ],            % HTTPOptions
             [ {body_format, binary} ]) of                   % Options

             {ok, Result} ->
                     case Result of
                             {{_, 201, _}, Headers, Response} ->
                                     {ok, Headers, Response};

                             {_,_,Response} ->
                                     Response
                     end;

             {error, Reason} ->
                     io:format("Error: ~p~n", [Reason])
     end.

And what's you must note is the following:

a new url to Post your image.
a new header named 'Slug' holding the value of the filename you want for your image
a timeout of 30 seconds, letting enough time for the upload process to complete.

Other headers are mandatory, and hopefully already set by 'http:request':

content-length holding the size of the binary data of your image
content-type holding the image type, here 'image/jpeg'

I choose to use the 'prim_file:load_file/1' fun to retrieve a binary stream of octet composing the image I want to upload:


      {ok, File} = prim_file:read_file(Image),
      {value, {_, Auth}} = lists:keysearch(picasa, 1, State#state.auth),
      case request_picasa(Auth, User, ImgName, File ) of

The variable File holds the binary data, the Auth variable holds the 'correct' Picasa AuthToken, so the 'request_picasa/4' fun can be called correctly...

I'll upgrade later the project http://code.google.com/p/googerl, because I want the 'request_picasa' fun to return {ok, ImageUrl}, and to do that I need to parse the atom response sent by Google... And I'm asking myself wether I doit using xmerl or leex ;]

Tuesday, August 14, 2007

Picasa API for erlang, within the googerl project...

I'm working on the Picasa module for googerl, and was unable to get something else than:


Token Invalid

While scouting my network traffic to really see what I sent:


POST /data/feed/api/user/XXXXXXX/album/blog HTTP/1.1.
content-type: image/jpeg.
content-length: 24425.
te: .
host: picasaweb.google.com.
authorization: GoogleLogin auth=DQAAAIAAAAA5xws0PFfzZEzJ9JsboXhEtKXCdV06ZobagrB61zPWy9lU4j9dOQEK247yR8aQMS83mr3AmPPjkGLTnk3aGUjWsaYmUjmyw20gWrgrZUMcLijbRJUyg5J_t1x45qbglqn7Z07NtIHXudh8GpmnWkTE9T-sghc7AokEiDMzOFmGRg.
connection: keep-alive.
slug: test-v.jpg.
.
......JFIF.....H.H......Exif..MM.*.....................

I realize that my post was correct !
So why this error ?

The answer is in the Picasa Documentation, and it's hard to find but here it is:

Include the relevant parameters in the body of the POST request, as described in the ClientLogin documentation. Use lh2 as the service name.

(extracted from http://code.google.com/apis/picasaweb/gdata.html#Add_Album_Manual_Installed)
I must retrieve a token for the 'lh2' service ! The one I've sent was for the 'blogger' service.

So there's a problem with my current implementation, since my 'google.erl' module is only storing one AuthToken... I'll have to fix this, I'll use a list of AuthToken tuple:


[ {picasa, "authtoken"}, {blogger, "authtoken"} ]

So the code will be heavily modified, and for the time being doesn't work anymore :p
I'll come back soon with good news for everyone !

Saturday, August 11, 2007

Managing lists of binaries or strings...

When you want to deal with binaries, the preserve speed and 'normal' memory footprint there's one fun that you must be aware !

This is 'iolist_to_binary' and it's brother 'iolist_size'.

With theses you can do whatever you need to manager large array of strings, or manage small strings or small binaries... In fact you can do everything.

From my blogger module I need a fun to give me back only binary stuff, for example this fun that just returns a valid 'div' element:


test(Title) ->
       [ <<"<div class='title'>">>, iolist_to_binary(Title), <<"</div">> ].

With the 'iolist_to_binary' fun I don't have to deal with guards... ie: I don't need to call 'list_to_binary' if the argument is a list...
Here's some tests:


35> blogger_utils:test("blah"). %this is a string
[<<'"<div class='title'>">>,<<"blah">>,<<"</div">>]

36> blogger_utils:test(<<"blah">>). %this is a binary
[<<"<div class='title'>">>,<<"blah">>,<<"</div">>]

37> blogger_utils:test([<<"blah">>, "bli"]). %this is a list of mixed types
[<<"<div class='title'>">>,<<"blahbli">>,<<"</div">>]

Convenient isn't it ?

Tuesday, August 7, 2007

Blogger_srv.erl version 0.1 is here !

Here's the link to find the 'blogger_srv' you're waiting for: blogger_srv.erl.

This is the first version of the code, I think I'll add other features later:

auto reauth whenever the authtoken gets invalid
add some support for posting binary data, like images..
replace those antislashed double quotes by simple quotes ;)

For the time now, the code works, and all you need to do is starting the server:

blogger_srv:init().

Call the 'auth' fun to obtain a valid 'AuthToken':

blogger_srv:auth("youremail@gmail.com", "yourpass").

Finally call the 'new/4' or 'new/5' fun to post a message:

blogger_srv:new(yourBlogId, Title, [Tags], <<Content>>).
blogger_srv:new(yourBlogId, Title, [Tags], <<Content>>, AuthorName, AuthorEmail).

When the post is successfull, 'new/4' or 'new/5' will returns:

{ok, NewPostId}

That's all for now !

Blogger gen_server, a running session

Here's a screenshots of a sample session using 'blogger_srv'.
You can see some of available funs 'auth', 'snap' and others...

Here's the link of the google project.

Monday, August 6, 2007

Blogger API gen_server !

I'm proud to announce the start of the Blogger API gen_server !

I'll use the code I already wrote for testing the blogger API, and put some OTP requirement around it... For the moment there's three gen_server call:

new: to create a new post
reset: to reset credentials, i.e. retrieve another AuthToken
auth: to retrieve the AuthToken (recomputing it if needed)

Extracted from the 'blogger_srv.erl' file:


auth(Username, Password) ->
 gen_server:call(?MODULE, {auth, Username, Password}).

new(BlogId, Title, Tags, Content, {AuthorName, AuthorEmail} ) ->
 gen_server:call(?MODULE, {post, BlogId, Title, Tags, Content, AuthorName, AuthorEmail}).

reset() ->
 gen_server:cast(?MODULE, reset).

For example, the 'post' fun:


handle_call({post, BlogId, Title, Tags, Content, {AuthorName, AuthorEmail}}, _Node, State) ->
 Requests = State#state.requests,
 Auth = State#state.auth,
 Data = entry_new(Title, {AuthorName, AuthorEmail}, Content, Tags),
 case request(Auth, BlogId, Data) of 
  {ok, PostId} ->
   {reply, {ok, PostId}, State#state{ requests = Requests + 1 } };

  Msg ->
   {reply, {err, Msg}, State#state{ requests = Requests + 1 } }
 end;

The final request the post your Atom message.

The final request to post the message in the atom format:


request(AuthToken, BlogId, Data) ->
        Body = iolist_to_binary(Data),
        io:format("Sending: ~nContent-length: ~p~nBody:~n~s~n", [ size(Body), Body ]),
        Authorization = "GoogleLogin auth=" ++ AuthToken,
        Url = "http://www.blogger.com/feeds/" ++ BlogId ++ "/posts/default",
        case http:request(post,                                         % Method ;)
                        {
                                Url,                                    % URL
                                [ { "Authorization", Authorization } ], % Headers
                                "application/atom+xml; charset=utf-8",  % Content-type
                                Body                                    % Body
                        },
                [ {timeout, 3000}, {sync, false} ],             % HTTPOptions
                [ {body_format, binary} ]) of                   % Options

                {ok, Result} ->
                        case Result of
                                {{_, 201, _}, Headers, _Response} ->
                                        PostId = get_postid(Headers),
                                        {ok, PostId};

                                {_,_,Response} ->
                                        Response
                        end;

                {error, Reason} ->
                        io:format("Error: ~p~n", [Reason])
        end.

The thing that's cool overthere is the 'get_postid/1' to retrieve the postId generated by the blogger api... This fun will be called only when the HTTP 'status' code will be "201", which means "created".

I parse response headers and search for 'location' string, and once found I extract the last part of the url:


get_postid([]) ->
 "none";
get_postid(Headers) ->
 case lists:keysearch("location", 1, Headers) of
  {value, {_, Value}} ->
   lists:last( string:tokens(Value, "/") );

  _ ->
   "none"
 end.

I've look at the Zend Framework, and found that it parse the entire response body to extract the same value... I think that my method is simpler and works better. That's also why erlang is for smarter people than php (gratuitous troll :p)

More things about the blogger API...

I've reworked many times the atom template I use to post things to blogger, and what I'll describe here is a design I found simple and efficient, so here's the code :

This is the main function:


entry_new(Title, Author, { Content, ContentType }, Tags) when is_list(Tags) ->
        NTitle = post_title(Title),
        NContent = post_content(Content, ContentType),
        NTags = post_tags(Tags),
        NAuthor = post_author(Author),
        [
                entry_header(),
                NTitle,
                NContent,
                NAuthor,
                NTags,
                entry_footer()
        ];

The main idea is to use a list of binary, using this simple methods we don't mess with multiple copies or padding or strings management... This is quick and clean.

Here's some little function to create simple tempates:


entry_new(Title, Author, Content, Tags) ->
        entry_new(Title, Author, {Content, text}, Tags).

entry_header() ->
        <<"<entry xmlns=\"http://www.w3.org/2005/Atom\">">>.

entry_footer() ->
        <<"</entry>">>.
        
post_title(Title) ->
        [ <<"<title type=\"text\">">>, list_to_binary(Title), <<"</title>">> ].

post_author({ AuthorName, AuthorEmail }) ->
        [ 
        <<"<author><name>">>, list_to_binary(AuthorName), <<"</name><email>">>,
        list_to_binary(AuthorEmail), <<"</email></author>">> ];
post_author(AuthorEmail) ->
        post_author({ "", AuthorEmail }).

Here's some other function to carefully set the content-type with the post content:


post_content(Content, ContentType) when is_list(Content) ->
        post_content(list_to_binary(Content), ContentType);
post_content(Content, html) ->
        post_content(Content, "html");
post_content(Content, xhtml) ->
        post_content(Content, "xhtml");
post_content(Content, text) ->
        post_content(Content, "text");
post_content(Content, ContentType) ->
        [ <<"<content type=\"">>, list_to_binary(ContentType), <<"\">">>,
        Content,
        <<"</content>">> ].

And now the code the post 'tags':


post_tags([]) ->
        <<>>;
post_tags(List) ->
        lists:map(
                fun(X) ->
                        [ <<"<category scheme='http://www.blogger.com/atom/ns#' term='">>,
                        list_to_binary(X) ,
                        <<"'/>">> ]
                end,
                List).

The magic stands in the 'category' atom tag.

Thursday, August 2, 2007

Blogger API, posting a message (with tags)

Here's some code from the current version:


Data = entry_new("Test no1", {"tonio", "ako@gmail.com"}, <<"TestContent">>, ["erlang", "test"]),
request(Auth, iolist_to_binary(Data)).

The Title,
The author, a tuple with authorName and authorEmail
A binary holding the content of the post
A list of tags

I think I'm on the right track...
More news later...

Erlang Blogger API is working !!!

I've finally manage to get it working !
The solution was in the AuthToken from my code, I didn't squeezed the final '\n' character !

Later when the http request was generated, headers were split, making the GFE returning 400 Bad Request.

This is the corrected 'extract_auth/1':


extract_auth(<<"Auth=", Rest/binary>>) ->
        Size = size(Rest) - 1,
        <<Auth:Size/binary, _/binary>> = Rest,
        {ok, Auth};

I squeeze the final character !

Here's a link to the google groups discussion.

Wednesday, August 1, 2007

Blogger API, xml sample

Here's the XML code is used as test:


simple_post(AuthToken) ->
 Data = 
<<"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>
<entry xmlns=\"http://www.w3.org/2005/Atom\">
<title type=\"text\">UberKwl</title>
<content type=\"xhtml\">
<div xmlns=\"http://www.w3.org/1999/xhtml\">
       <p>test post</p>
</div>
</content>
<author> <name> Gautier </name> <email> gauth@gmail.com </email>
</author>
</entry>">>,
 request(AuthToken, Data).

Really, really simple, and this was extracted from a previous post somewhere in the google Blogger API groups...

I've 'xmllint'ed it and of course it was correct...

Tuesday, July 31, 2007

Connecting Erlang to Blogger (Part 2) - Adding an entry

For this second part, we start where we left the last time.
We were able to read the response of a succesful login, data was three lines of key value pairs.
The last line holds the final 'AuthToken' we need to send to the blogger atom post service...

So here's the code to extract the line that begins with the 'Auth' keyword and store the value after the '=' and before the end of line:


extract_auth(<<>>) ->
 {error, not_found};
extract_auth(<<"Error=", Rest/binary>>) ->
 Size = size(Rest) - 1,
 <<Msg:Size/binary, _/binary>> = Rest,
 {error, binary_to_list(Msg)};

extract_auth(<<"Auth=", Rest/binary>>) ->
 {ok, Rest};
extract_auth(<<_:1/binary, Rest/binary>>) ->
 extract_auth(Rest).

Note that we are also able to read 'Error' lines, those lines are sent in case of an failed login attempt...
I can describe what 'extract_auth/1' do like this:

if binary in empty returns {error, not_found}
if binary begins with 'Error' catch it and returns its content with the tuple {error, Msg}
if binary begins with the 'Auth' keyword extract everything till the end of the binary
in any other case extract one character and parse the rest of the binary

To conclude, upon successful login this fun will return:


{ok, "AAAAAQAARAFA..."}

This is, of course our Blogger AuthToken...

Now how can we use it ? This is Simple !
Open a new file named blogger.erl and write something like this:


-module(blogger).
-export([ new/3, post/3, test/2, template/3 ]).

post(AuthToken, Title, Content) ->
 Data = template(Title, Content),
 request(AuthToken, iolist_to_binary(Data)).

template(Title, Content) when is_list(Content) ->
 template(Title, Content, {"none", "none"}). 

template(Title, Content, Author) when is_list(Content) ->
 template(Title, list_to_binary(Content), Author);

template(Title, Content, Author) ->
 {AuthorName, AuthorEmail} = Author,
 [ <<"<entry xmlns=\"http://www.w3.org/2005/Atom\">\n<title type=\"text\">\n">>, 
 list_to_binary(Title), 
% <<"</title><content type='xhtml'><div xmlns='http://www.w3.org/1999/xhtml'>">>, 
 <<"</title>\n<content type=\"text\">">>, 
 Content, 
 <<"</content>\n<author><name>">>,
 list_to_binary(AuthorName),
 <<"</name>">>,
 list_to_binary(AuthorEmail),
 <<"<</author>\n</entry>\n">> ].

request(AuthToken, Data) when is_binary(AuthToken) ->
 request(binary_to_list(AuthToken), Data);

request(AuthToken, Data) ->
 io:format("Sending: ~nContent-length: ~p~nBody:~n~s~n", [ size(Data), Data ]),
 Authorization = "GoogleLogin auth=" ++ AuthToken,
 Url = "http://www.blogger.com/feeds/199963XXXX081936700/posts/default", % Put your BlogID, this one is invalid
 case http:request(post,  
  { Url, 
   [ { "Authorization", Authorization } ], 
   "application/atom+xml; charset=utf-8", Data}, 
  [ {timeout, 3000}, {sync, false} ], 
  [ {body_format, binary} ]) of 

  {ok, Result} -> 
   %io:format("Received: ~p~n", [Result]),
   {_,_,Body} = Result,
   Body;
 
  {error, Reason} ->
   io:format("Error: ~p~n", [Reason])
 end.

Once you're done, you'll be pleased to found that this doesn't work :/.
Yep, I wasn't able to post anything !
May be I've missed something in the documentation, but all I get is an nice SAXexception...

Testing my Erlang Blogger API ...

I'm stuck on this error message !


<<"org.xml.sax.SAXParseException: Content is not allowed in prolog.">>

I'm simply using this code:


request(AuthToken, Data) ->
        io:format("Sending: ~nContent-length: ~p~nBody:~n~s~n", [ size(Data), Data ]),
        Authorization = "GoogleLogin auth=" ++ AuthToken,
        Url = "http://www.blogger.com/feeds/19996386XX081936700/posts/default",
        case http:request(post,
                { Url,
                        [ { "Authorization", Authorization } ],
                        "application/atom+xml; charset=utf-8", Data},
                [ {timeout, 3000}, {sync, false} ],
                [ {body_format, binary} ]) of

                {ok, Result} ->
                        %io:format("Received: ~p~n", [Result]),
                        {_,_,Body} = Result,
                        Body;

                {error, Reason} ->
                        io:format("Error: ~p~n", [Reason])
        end.

And the resulting Body is always the error:


<<"org.xml.sax.SAXParseException: Content is not allowed in prolog.">>

Help Meeee !

Monday, July 30, 2007

Connecting Erlang to Blogger (Part 1) - Auth with ClientLogin

With the Gdata API from google you can connect your application to some nice services... Calendar, Blogger etc.
Since this is completly REST based you can of course use your 'http:request' to connect and exploit those services. Let's begin with the ClientLogin process.

For this article we will focus on the Blogger API, the main purpose is of course create an Erlang client for Blogger :)

Connecting to google is as simple as sending something like this:


accountType=HOSTED_OR_GOOGLE&Email=YOURGOOGLEACOUNT&Passwd=YOURPASSWORD&source=SelfCo-TestApp-1&service=blogger

Now we can do it in Erlang too !. First we need to build the query string, second we need to send it to the ClientLogin service using 'http:request'.


auth(Username, Password, Application) ->
    Sep = <<"&">>,
    Post = [
            <<"accountType=HOSTED_OR_GOOGLE&">>,
            <<"Email=">>, list_to_binary(Username), Sep,
            <<"Passwd=">>, list_to_binary(Password), Sep,
            <<"source=">>, list_to_binary(Application), Sep,
            <<"service=blogger">> ],
    request(erlang:iolist_to_binary(Post)).

The fun 'erlang:iolist_to_binary/1' transforms the list of binaries to a simple binary, this is not really necessary but this will ease yourself later for debugging...

Now we can send this query string to the google ClientLogin process:


request(Data) ->
      case http:request(post,
          {"https://www.google.com/accounts/ClientLogin", [],
           "application/x-www-form-urlencoded", Data},
          [ {timeout, 3000} ], [{stream, "/tmp/google.test"}, {body_format, binary}]) of

              {ok, saved_to_file}  ->
                      io:format("Saved to file~n");

              {ok, Result} ->
                      io:format("Received: ~p~n", [Result]);

              {error, Reason} ->
                      io:format("Error: ~p~n", [Reason])
      end.

This is a POST query
The service is https://www.google.com/accounts/ClientLogin
The content-type is application/x-www-form-urlencoded
We sets the timeout to 3 seconds
We store the result (if successful to '/tmp/google.test')

Let's try this code:


65> google:auth("test@gmail.com", "secretcode").
Received: {{"HTTP/1.1",403,"Forbidden"},
           [{"cache-control","no-cache"},
            {"date","Sun, 29 Jul 2007 20:44:20 GMT"},
            {"pragma","no-cache"},
            {"server","GFE/1.3"},
            {"content-length","24"},
            {"content-type","text/plain"}],
           <<"Error=BadAuthentication\n">>}
ok

The connection fails, so let's try with a valid user account:


70> google:auth("validaccount@gmail.com", "validpassword").
Saved to file
ok

Success !

The content of '/tmp/google.test':


SID=DQAAAG8AAACuATb7YJxMdqQhp0LIf546SWLfDNfTlANffRc0B6OGbTat4Ebdj89s6hVEzfNZRL...
LSID=DQAAAHEAAAAG1iqBgOrgzrY5cdgpBv9y42HxkvjNuUaYKImw6yH7xh0GtL5EG19C9GkGdPEb1...
Auth=DQAAAHAAAAAG1iqBgOrgzrY5cdgpBv9y42HxkvjNuUaYKImw6yH7xh0GtL5EG19C9GkGdPEb1...

The final token we need is the 'Auth=' one, this string will be passed with every new query as an 'Authorization' header:


Authorization: GoogleLogin auth=DQAAAHAAAA...

Next Time in Part 2, I'll show you how we'll use this AuthToken and how we will be able to post a message to our blog !

Tuesday, July 24, 2007

Erlang and JBOSS, talking AJP13 ! (PART I)


-module(ajp13).
-export([get/3, cping/2, request/1, hexdump/1]).

Every ajp packets starts with 0x1234. In Erlang when you need to express this thing you just need to use the notation 'Base#number'.
So for our example, here's the 'ajp_header' fun :

   
ajp_header() ->
        <<16#12, 16#34>>.

We use the binary notation to write 2 bytes expressed on base 16 (hexadecimal). To be crystal clear hex notation can be written with '16#'... :


Eshell V5.5.1  (abort with ^G)
1> 16#deadbeef. 
3735928559

I'm sure you get the point !

Let's comes back to our AJP problem... Now that we can write hexadecimal number we can reread the ajp13 protocol description,
and succesfully start to build a simple packet:


get(Host, Port, Url) -> 
        H = ajp_header(), 
        Request = request(Url),
        Length = size(Request), 
        Data = <<H/binary, Length:16, Request/binary>>,
        
        case gen_tcp:connect(Host, Port, [binary, {packet, 0}]) of
                {ok, Socket} ->
                        send(Socket, Data),
                        loop(Socket);

                {error, Msg} ->
                        io:format("error: ~p~n", [Msg])
        end.

Let's look at a simple command in the ajp13 protocol, the 'ping', here's its implementation:

                                   
cping() ->
        <<                
        10:8                            
        >>.

cping(Host, Port) ->
        H = ajp_header(),
        Request = cping(),
        Length = size(Request),
        Data = <<H/binary, Length:16, Request/binary>>,

        case gen_tcp:connect(Host, Port, [binary, {packet, 0}]) of
                {ok, Socket} ->
                        send(Socket, Data),
                        loop(Socket);

                {error, Msg} ->
                        io:format("error: ~p~n", [Msg])
        end.

What's important to see is also that ajp13 is derived from the xdr protocol, where every type is always written with its length... In ajp13 this length is always encoded as two bytes (16bits so max size is 16#ffff ;)

The 'Data' variable is what you should look at :


        Data = <<H/binary, Length:16, Request/binary>>,

H is the ajp header
Length is the length of Request written on 2 bytes (2 * 8)
Request is the request

Now that we've sent the packet, we need to catch the response, so here's the 'loop' fun:


loop(Socket) ->
        receive
                {tcp, Socket, Data} ->
                        % io:format("~p~n", [Data]),

                        case ajp_response(Data, Socket) of
                                {ok, continue} ->
                                        loop(Socket);

                                {ok, body, Bin} ->
                                        io:format("Body: read ~p bytes~n", [size(Bin)]),
                                        loop(Socket);

                                {ok, closed} ->
                                        gen_tcp:close(Socket)
                        end;

                {tcp_error, Socket, Error} ->
                        io:format("Error: ~p~n", [Error]),
                        loop(Socket);

                {tcp_closed, Socket} ->
                        io:format("Closed~n")

        after 8000 ->
                io:format("Timeout~n"),
                gen_tcp:close(Socket)

        end.

Whenever our erlang process will receive a message matching the '{tcp, Socket, Data}' tuple we will parse the 'Data' with the 'ajp_response' fun:


ajp_response(<<65,66,0,2,5,1>>, _Socket) ->
        {ok, closed};
ajp_response(<<65,66,Rest/binary>>, Socket) ->
        ajp_data_length(Rest, Socket);
ajp_response(Bin, Socket) ->
        {ok, body, Bin}.

Yeah ! Polymorphism ! Or matching power ?! Whatever, this completely rox the programming planet !
We are simply matching binary data... Binary data that's sent back to us from the jboss server (in our case).

Ajp13 protocol describes the termination of the request by a packet containing 'AB' followed by the response length '2' bytes which are '5' and '1'.


ajp_response(<<65,66,0,2,5,1>>, _Socket) ->
        {ok, closed};

Remember Length is encoded on two bytes: '0,2'...

Now comes the AJP13_FORWARD_REQUEST !!!


request(Request) ->
        {Protocol, L0} = ajp_string("HTTP/1.1"),
        {Request_uri, L1} = ajp_string(Request),
        {Remote_addr, L2} = ajp_string("127.0.0.1"),
        {Remote_host, L3} = ajp_string("ajbchecker"),
        {Server_name, L4} = ajp_string("www.server-example.com"),

        <<
        2:8,                            %byte JK_AJP13_FORWARD_REQUEST
        2:8,                            %byte GET
        L0:16, Protocol/binary,         %string
        L1:16, Request_uri/binary,      %string
        L2:16, Remote_addr/binary,      %string
        L3:16, Remote_host/binary,      %string
        L4:16, Server_name/binary,      %string
        80:16,                          %integer
        0:8,                            %boolean
        1:16,                           %integer
        16#A0, 16#0B,                   %Header: Host
        L4:16, Server_name/binary,      %Servername
        16#ff                           %terminator
        >>.

ajp_string(String) ->
        S = list_to_binary(String),
        Bin = <<S/binary, 0>>,
        {Bin, size(Bin) - 1}.

The 'ajp_string/1' is used to calculate the final size of the binary data, and is simply used with the ajp13 string encoding format...

BTW, it's really time consuming to explain code ! When I started this article I was thinking that I'll finish it rather quickly, and I realise now that's not the case, there's so many things to say...
This is why I stop here for the First Part, the next part will come tomorrow...

Thursday, July 19, 2007

Parallelizing simple external commands ... Part II

Our loop/3 fun looks like this:


loop(_Max, 0, []) ->
     unregister(computing_master),
     exit(normal);

Whenever our list of jobs is empty, we deregister the 'computing_master' process and quit normally.


loop(Max, Current, []) ->
     receive
             stop ->
                     unregister(computing_master),
                     exit(normal);

             {exited, _Result} ->
                     io:format("Still ~p childs~n", [Current]),
                     loop(Max, Current - 1, []);

             E ->
                     io:format("Unhandled message: ~p~n", [E])

     after 60000 ->
             io:format("~p: Waiting for the last process ~p/~p~n", [erlang:now(), Max, Current]),
             loop(Max, Current, [])
     end;

In this case, we have are computing the last external process since our job list is empty.
And finally this version of loop/3 is the main one:


loop(Max, Current, List)  ->
     receive
             stop ->
                     unregister(computing_master),
                     exit(normal);

             {update, NewMax} ->
                     upto(NewMax, Max, List);

             {exited, _Result} ->
                     io:format("Still ~p childs~n", [Max]),
                     upto(Max, Max - 1, List);

             E ->
                     io:format("Unhandled message: ~p~n", [E])

     after 60000 ->
             io:format("~p: Running ~p processes~n", [erlang:now(), Max]),
             upto(Max, Current, List)
     end.

Here we have a non empty list of job and a number of job to start.

Every 60 seconds we write how many processes are running.
The message {update, NewMax} let's you alter the number max of concurrent tasks
The message {exited, _Result} is received whenever a child process dies, so we restart another job...

Bonus Code, a simple function to test the code:


sleep(Ident) ->
        io:format("Waiting ~p~n", [Ident]),
        Delay = [ "5", "3", "15", "8" ],
        Time = lists:nth(random:uniform(4), Delay),
        Cmd = [ "sleep ", Time ], 
        io:format("Starting: ~p~n", [Cmd]),
        Status = os:cmd(Cmd),
        computing_master ! {exited, Status}.

This code just calls the 'sleep' command with various arguments picked randomly... Once a process stops the 'os:cmd/1' fun exits and 'computing_master' will receive the {exited, Status} message (explained above)

Tuesday, July 17, 2007

Parallelizing simple external commands ... Part I

Once upon a time I need to parse enormous files to find simple patterns... My prefered tools were so far the shell based one, i.e. 'grep'.

But now I have a Magical Ability, Erlang Magic... So I decide to split this enormous file, using the 'split' comand, 'split -l 10000' for example.

Now that I have a lot of smaller file, I can parallelize their parsing, and this is were erlang comes...

First, let's design a bit:

I need a central process that will control all my processes
Processes and master must be able to communicate

That's all. Hopefully the latter is directly provided by erlang, this the ! operator.
The master process will be a little more tricky, but this is 'easyerl' remember, so here we go:


doit(Step) ->
    Master = spawn(?MODULE, test, [Step]),
    register(computing_master, Master).

test(Step) ->
    file:set_cwd("/home/rolphin/Work"),
    List = filelib:wildcard("seg-a*"),
    upto(Step, 0, List).

We create a process running the test function, whose job is starting the upto/3 fun...
What's interesting here is the 'filelib' function that provides me the list of file contained in the directory '/home/rolphin/Work'.

Now we go describe the 'upto/3' fun :


upto(Max, Current, []) ->
    loop(Max, Current, []);

upto(Max, Max, List) ->
    loop(Max, Max, List);

upto(Max, Current, [New|List]) ->
    io:format("upto: ~p/~p~n", [Max, Current]),
    spawn(?MODULE, grep, ["user.list", New, ["result-", New]]),
    upto(Max, Current + 1, List).

More details:

upto with an empty list will just call the loop/3 fun
upto with the Max number of processe allowed equals the current number of process, will call loop/3
upto with less active process than the max, with a non empty list, will spawn a child process

The child process is a 'grep' command, and here it is:


grep(File, Source, Result) ->
       % Command line is: "grep -f motif_file sourcefile > result"
       Cmd = [ "grep -f ", File , $ , Source, $>, Result ],
       io:format("Starting: ~p~n", [Cmd]),
       Status = os:cmd(Cmd),
       computing_master ! {exited, Status}.

Okay that's it for today ! It's a little late now ! And I need some sleep to succesfully pass the required skill tests for my new job !

More of this tomorrow...

Tuesday, July 3, 2007

Simple command execution

Sometimes you need to run external commands, and just need the return value or exit code...
One simple way to do this is the following:


-module(tport).
-export([ execute/2 ]).

execute(Host, Cmd) ->
 Port = open_port({spawn, Cmd}, [ {cd, Host}, exit_status, binary ] ),
 wait(Port).

wait(Port) ->
 receive 
  {Port, {data, BinData}} ->
   io:format("dump:~n~p~n", [BinData]),
   wait(Port);
  {Port, {exit_status, Status}} ->
   io:format("exit_code: ~p~n", [Status]);
  %% {Port, eof} ->
  %%  port_close(Port);
  {Port, exit} ->
   io:format("Received : ~p~n", [Port])
 end.

Once a port opened your process will receive various messages and one we're interested in is the 'exit_status' one:


  {Port, {exit_status, Status}} ->
   io:format("exit_code: ~p~n", [Status]);

The variable 'Status' will hold the exit code.

Simple isn't it ?

Tuesday, June 26, 2007

Using ports for fast templating...

I find myself many times searching for fast templating system, and I always stop my search by developing one myself. ( the one is done using flex generated code, which means that it is very fast... )

Here, I need some fast (rather simple) template system that gives me some data to send later to an SMTP server (the body of the mail).

First the code:

get(Template, Args) ->
     NewArgs = list_to_shell(Args),
     Cmd = lists:flatten([ ?engine, $ , "conf/template/", Template, $ , NewArgs ]),
     Port = open_port({spawn, Cmd}, [ {cd, code:lib_dir(mmailer)}, stream, binary]),
     process_flag(trap_exit, true),
     loop(Port).

You'll find some unknown notation: '?engine' and $.

?engine is a macro, defined like this -define(engine, "templater").
$ is the character notation, the character that follows the dollar sign is the character I want. Actually I need the space character...

Now the function list_to_shell:

list_to_shell(Args) ->
      L = lists:map(fun(X) -> [X, $ ] end, Args),
      lists:flatten(L).

This function add a space after after element of the array passed as parameter. I construct the shell command this way.

The call to open_port is the main part of the code:

Port = open_port({spawn, Cmd}, [ {cd, code:lib_dir(mmailer)}, stream, binary]),

The following describe what the above line does:

spawn the command 'Cmd'
in the directory code:lib_dir(mmailer),
this command will write data as a stream
and I want erlang to give me a binary stream.

(code:lib_dir(mmailer) gives me the directory where the mmailer module is located)

More about my engine... My engine takes as firts arguments the template file to parse and every other parameter is saved as $1, or $2, or $3 etc.

Example of a template 'test.template'

Hello $1 $2 !
We have a good news for you, join us at url/$3

So calling:

templater test.template Mister John connect?John

Will compute the following:

Hello Mister John !
We have a good news for you, join us at url/connect?John

Conclusion, you can use whatever template system efficiently with the open_port function, since you can dynamically build the command line.

NB: I've not secure the line generated by list_to_shell, any '; rm -rf *' in the command line will do nasty things with your stuff :)

Atoms...

Atoms, Variable, what they are and what you can do with them ?

Atoms are what you'll find strange at the first look, you may ask yourself
continously, but where the hell this thing is defined !!

Relax and just read the following words, atoms are just there to document your code...
So they can appear anywhere, have no special meaning or just may be funny !
You'll find them most of the time tagging tuples, example:


test() ->
        % code that fails
        {error, "Can't find file"}.

This way you can try to match 'error', and bind a Variable to the reason:


case test() of
        {ok, Result} ->
                Result;
        {error, Reason} ->
                io:format("Error: because ~p~n", [Reason])
end;

Simple isn't it ?

Tuesday, June 12, 2007

Erlang and calling functions

In Erlang, when you want to call a function you can use various notations:


modulename:functionname(arguments).
functionname(arguments).

Calling a function in another process:


NewPid = spawn(modulename, functionname, [arguments]).

Now a simple test module:


-module(test).
-export([test/1]).

test(String) ->
  { erlang:now(), String }.

This module has one function named 'test', its arity is one (one parameter), and this function
returns a tuple that contains the datetime and the 'string' passed as the parameter.

Calling 'erlang:now()':


(master@karoten)52> erlang:now().
{1181,599629,877590}
(master@karoten)53>

This is some sort of UNIX timestamp (number of seconds since 1970).

Now test our module:

Compiling it, ie compiling 'test.erl' located in the current directory:


(master@karoten)52> c(test).
ok

Calling the funcname 'test':


(master@karoten)53> test:test("test").
{{1181,599702,993670},"test"}
(master@karoten)54>

Here's we have used the notation 'modulename:functionname(arguments)'.
But within the module itself we could use the notation 'functionname(arguments)'...


-module(test).
-export([test/1, test/0]).

test(String) ->
  { erlang:now(), String }.

test() ->
  test("test").

Here's come what's really important to understand in Erlang, a function is defined by its name AND its arity.
So 'test/0' is NOT 'test/1'.

In our module, what the function 'test/0' do is just calling 'test/1' with a string that contains 'test'.

Erlang is a functionnal language, so the parameter of the function 'test/1' can really be what you want, from simple string to integer to complex list of tuple...

Just try this, and you'll understand :


test:test( [ complex, list, of, atoms, "and a string" ] ).

Sunday, June 10, 2007

Shrink or Strip a binary octet-stream easily

I've found a simple way to suppress ending characters of a Binary without parsing it or change it into a list !

I wanted to remove those <<"\r\n">> characters from lines read from a file, and know that my line are made of 29 characters (reading date strings).

So with this function:


shrinkbin(Data, Size) ->
  <<Data:Size/binary-unit:8>>.

I can do:


node> test:shrinkbin(<<"Sun Jun 10 15:20:53 CEST 2007\r\n">>, 29).
<<"Sun Jun 10 15:20:53 CEST 2007">>.

That's it !

Matching Protocol status code using Binary notation .

Once you're succesfully connected to some remote host, you may need to read what this peer sends you...

Sometimes the protocol is using simple integer number to describe what's going on. Let's have a look at the SMTP protocol:
2XX are ok codes,
5XX are error codes.

Testing Binary matching:


-module(bintest).
-export([test/0, check/1]).

test() ->
   Bin = <<"200 OK\r\n">>,
   check(Bin).

check(<<"200", Rest/binary>>) ->
   {200, Rest};
check(<<"300", Rest/binary>>) ->
   {300, Rest};
check(_Bin) ->
   {800, unknown}.

Saturday, June 9, 2007

Sorting Mx servers using their preference number...

Once you've retrieved your mx lists, you want to be kind enough to gently contact mx server in their prefered order...

In erlang, sorting a list is done with sort function like 'keysort'.


SortedList = lists:keysort(1, List).

lists:keysort uses as first parameter, the nth element of the tuple to use in the sort, and as second parameter the tuple list you want to sort...

Once you've sorted your tuple list, you may want to remove the key you use to sort, lists:unzip remove this...

Finally, retrieving a mx list, sorting it, and directly using it can be done like this:


mxlist(Domain) ->
 List = get_mx(Domain),
 {_, Hosts} = lists:unzip( lists:keysort(1, List) ),
 Hosts.

Thursday, June 7, 2007

Configuring your DNS server for inet_res.

When using inet_res, you must configure a DNS server to query, this could be done like this:


-define(MASTER_DNS, {212,XX,XX,252}).

init() ->
 inet_db:add_ns(?MASTER_DNS).

Once you've called your ?MODULE:init/1 you're able to use your ?MODULE:get_mx/1 function.


node> mail:init().
ok

node> mail:get_mx("yahoo.com").                                                       
[{1,"e.mx.mail.yahoo.com"},
 {1,"a.mx.mail.yahoo.com"},
 {1,"b.mx.mail.yahoo.com"},
 {1,"c.mx.mail.yahoo.com"},
 {1,"d.mx.mail.yahoo.com"},
 {1,"f.mx.mail.yahoo.com"},
 {1,"g.mx.mail.yahoo.com"}]

The result is a list of tuples, the first element is the server weight, the second the server name :p

Wednesday, June 6, 2007

Retrieve MX DNS record using erlang inet_res, it's easy !

Ever needed to retrieve some MX servers from any big domains out there ?

I'm pretty sure that you've asked yourself this question, right ?

It's EasyErl here, so let's go to the code directly:


get_mx(Domain) ->
       {ok, {hostent, Domain, _, _, _Len, List}} = inet_res:getbyname(Domain, mx),
       List.

Woh ! This wasn't too much difficult !

Okay, inet_res:getbyname isn't really well documented that's right, but Erlang comes with its source code, what's a better Erlang documentation than Erlang himself ?

The next time, I'll show you how you can easily sort the List by using unzip to obtain a correct list of MX servers... (where servers weight are correctly used)