EazyErl !: 2008

Monday, November 3, 2008

CEAN 1.4 is released

Tdoay is a great day, today you can read on the erlang mailling list that "CEAN 1.4 is released" ! ( this time it's R12B4 based release )
There's also a new website, new design and new Cean packages.

Go grab it !

Saturday, October 18, 2008

Secure Cookies for your web application...

Now that new erlang web framework are here, I think that sessions are still today a weakness.

Session and Cookies must be secure, there's no single day without some new vulnerability about session hijacking.

That's why very clever people design the secure cookie protocol [PDF].

Here's the Cookie value:

user name|expiration time|(data)k|HMAC( user name|expiration time|data|session key, k)

where

k=HMAC(user name|expiration time, sk)

and where sk is a secret key

Now you can verify the cookie using theses techniques:

1. Compare the cookie’s expiration time and the server’s current
  time. If the cookie has expired, then return FALSE.
2. Compute the encryption key as follows:
 k=HMAC(user name|expiration time, sk)
3. Decrypt the encrypted data using k.
4. Compute HMAC(user name|expiration time|data|session key, k),
 and compare it with the keyed-hash message authentication code
 of the cookie. If they match, then return TRUE;
 otherwise return FALSE.
 TRUE

Here's the erlang module

-module(scookies).
-export([start/0, gen_auth/1, gen_build/2, gen_check/2, read/1, check/4, test/0]).
-export([message/1]).

start() ->
 application:start(crypto).

gen_build(ServerKey, IVec) ->
 fun(Username, D, SessionKey) ->
  Expiration = integer_to_list(1212559656),
  Key = crypto:md5_mac( [Username, Expiration], ServerKey), %16bytes
  Data = crypto:aes_cbc_128_encrypt(Key, IVec, D),
  Hmac = crypto:sha_mac([Username, Expiration, Data, SessionKey], Key),
  io:format("Build: ~p ~p ~p ~p ~p~n",[Username, Expiration, Data, SessionKey, Key]),
  iolist_to_binary([ Username, $,, Expiration, $,, Data, $,, Hmac ])
 end.

read(Cookie) ->
 {A, B, C} = Cookie,
 {A, B, C}.

 
gen_check(ServerKey, IVec) ->
  fun(Cookie, SessionKey) ->
  [ Username, Expiration, Crypted, Hmac ] = string:tokens(binary_to_list(Cookie), ","),
  Key = crypto:md5_mac([ Username, Expiration ], ServerKey),
  Data = crypto:aes_cbc_128_decrypt(Key, IVec, Crypted),
  MAC = crypto:sha_mac([ Username, Expiration, Crypted, SessionKey], Key),
  io:format("Check: ~p ~p ~p ~p ~p~n",[Username, Expiration, Data, SessionKey, Key]),
  <<Len:16,Message:Len/binary,_/binary>> = Data,
  io:format("Decrypted: ~p '~s'~n'~p'~n'~p'~n", [Len, Message, MAC, list_to_binary(Hmac)]),
  [ Username, Expiration, {Len, Message}, MAC, list_to_binary(Hmac)]
 end.

% Returns the build fun and check fun 
% This is a helper fun to let you build in q simple way bot the build fun and
% the decode fun...
gen_auth(ServerKey) -> 
 IVec = <<"3985928509201031">>, %16bytes Must be Random
 [ gen_build(ServerKey, IVec), gen_check(ServerKey, IVec) ].
 

check(Cookie, ServerKey, InitVec, SessionKey) ->
 {Username, ExpirationTime, Crypted, CookieMAC} = read(Cookie),
 case check_time(ExpirationTime) of % see later check_time...
  ok ->
   Key = crypto:sha_mac([ Username, ExpirationTime ], ServerKey),
   Data = crypto:aes_cbc_128_decrypt(Key, InitVec, Crypted),
   MAC = crypto:sha_mac([ Username, ExpirationTime, Data, SessionKey], Key),
   compare(MAC, CookieMAC, Data);

  _E -> 
   {error, _E}
 end.

compare(_A, _A, Data) ->
 {ok, Data};
compare(_A, _B, _Data) ->
 {error, nomatch}.

check_time(1212559656) -> % It's up to you to set it 
 true;
check_time(_) ->
 false.

message(Text) ->
 Len = size(Text),
 Pad = 64 - Len - 2,
 <<Len:16,Text/binary, 0:Pad/unit:8>>.
 
test() ->
 ServerKey = <<"serverkey">>,
 SessionKey = <<"3ID409a0sd09">>,
 [ Enc, Dec ] = gen_auth(ServerKey),
 CCookie = Enc("rolphin", message(<<"stream/128693">>), SessionKey), 
 DCookie = Dec(CCookie, SessionKey),
 io:format("C: ~s~n", [ CCookie ]),
 display(DCookie).

display([Username, Expiration, {Len, Message}, _Mac, _Mac]) ->
 io:format("Message ok: ~s (~s) ~p: '~s'~n", [Username, Expiration, Len, Message]);
display([Username, Expiration, {Len, Message}, _Mac, _OtherMac]) ->
 io:format("Invalid Mac ! ~s (~s) ~p: '~s'~n", [Username, Expiration, Len, Message]).

Thursday, October 9, 2008

Benchmarking must be done carefully !

An example is better than a long post !

Timeout were hidden, smp by default modified the expected behaviour, and more...
Don't assume things, just take time to verify :)

Explained here

From the beginning:
started here

Sunday, September 28, 2008

gen_fsm vs a simple fsm for sending emails...

I needed to send emails, and check for emails validity.
I choose to use at the beginning the gen_fsm used to send email, for
example this one.

I found that I just need a simple fun that just send a mail, and don't want a full gen_fsm that need to be called for every steps involved in sending a simple mail...

I needed a way to mail efficiently, by efficiently I mean:
- the mx server will be contacted directly
- if one mx server fail try another one
- handle the greylisting transparently, ( don't block the send queue for one message )

My fun then becomes:


start_link(Server,Port,ServerName,MailFrom,To,Data) ->
  spawn_link(?MODULE, init, [Server,Port,ServerName,MailFrom,To,Data]).

And my module gets a name 'smtp_client'.

While hacking with the gen_fsm above, I've found that there was no binary strings usage at all,
manipulations are all done using lists... The binary type fit perfectly for this task so I've used
it instead.

So, I've rewrote everything, I've build a simple "smtp_proto.erl" file that just send binary strings for every state in the smtp protocol, and wait for a smtp response from the server. And write a simple "smtp_client.erl" that use only one single loop and call the needed "StateName" fun whenever it is needed.

Here's the loop fun, where everything takes place:


loop(Phase,State) ->
 receive 
  {'$c', Who, info} ->
   Who ! {ok, Phase, State#state.to},
   loop(Phase, State); 

  {'$c', stop} ->
   error_logger:info_msg("Forced to quit~n"),
   [_,_, Email ] = State#state.to,
    exit({stop, Email});

  {tcp, Socket, Data} ->
   [_,_, Email ] = State#state.to,
   case smtp_proto:check(Data) of
    {more, _Rest} ->
     inet:setopts(Socket, [{active,once}]),
     loop(Phase, State);

    {ok, _Rest} ->
      inet:setopts(Socket, [{active,once}]),
     case ?MODULE:Phase({Socket, Data}, State) of

      {ok, NewPhase, NewState} ->
       loop(NewPhase, NewState);
 
      {stop, success, _NewState} ->
       mail_stats:add(success),
       exit({ok, Email});

      {stop, How, _NewState} ->
       exit({How, Email});

      {error, _NewPhase, _NewState} ->
       error_logger:error_msg("Error: ~p '~p'~n", [Phase, Email]),
       exit({error, Email});

      _Any ->
       error_logger:msg("What: ~p~n", [_Any]),
       exit({error, _Any})
     end;

    {error, 421, Rest} ->
     error_logger:error_msg("Error ~p for ~s (greylist): ~p ~p~n", [Phase, Email, 421, Rest]),
     greylist(State),
     exit({error, Rest});
     
    {error, Code, Rest} ->
     error_logger:error_msg("Error ~p for ~s: ~p ~p~n", [Phase, Email, Code, Rest]),
     exit({error, Rest});

    {error, Rest} ->
     error_logger:error_msg("Error ~p for ~s: ~p~n", [Phase, Email, Rest]),
     exit({error, Rest}) 
   end;  

  {tcp_error, _Socket, timeout} ->
   [_,_, Email ] = State#state.to,
   error_logger:error_msg("Error: ~p 'timeout' for ~s~n", [Phase, Email]),
   exit({error, timeout});
   
  {tcp_closed, _Socket} ->
   [_,_, Email ] = State#state.to,
   error_logger:error_msg("Error: ~p 'connection closed' for ~s~n", [Phase, Email]),
   exit({error, closed});

  _Any ->
   error_logger:error_msg("Unhandled message: ~p~n", [_Any]),
   loop(Phase,State) 

 after ?INACTIVITY ->
  exit({error, inactivity})
 end.

What's important here, is that the gen_tcp sends various messages to the process,
and those messages controls the state of the fsm.


    {ok, _Rest} ->
      inet:setopts(Socket, [{active,once}]),
     case ?MODULE:Phase({Socket, Data}, State) of

My module "smtp_client" calls a fun "Phase" which correspond to a SMTP state...


loop(Phase,State) ->
...
    {ok, _Rest} ->
      inet:setopts(Socket, [{active,once}]),
     case ?MODULE:Phase({Socket, Data}, State) of
      {ok, NewPhase, NewState} ->
       loop(NewPhase, NewState);

Let's take a look at the helo fun:


helo({Socket, _Data}, State) ->
 smtp_proto:mailfrom(Socket, State#state.mailfrom),
 {ok, mailfrom, State}.

It's job is simply calling "smtp_proto:mailfrom" and returning the tuple "{ok, mailfrom, State}".
Then in the main loop the consequence is:


     case ?MODULE:Phase({Socket, Data}, State) of
      {ok, NewPhase, NewState} ->
       loop(NewPhase, NewState);

NewPhase = mailfrom, and NewState = State...

Now the mailfrom fun use exactly the same method:


mailfrom({Socket, _Data}, State) ->
 [_,_,Email] = State#state.to,
     smtp_proto:rcptto(Socket, Email),
 {ok, rcptto, State}.

The fun extract the email, then call the "smtp_proto:rcptto" on the Socket with the argument Email.
Technically, it just write this on the Socket:


RCPTO TO:<Email>\r\n

Then it returns the tuple holding the new Phase "rcptto" and the new State (which is the same unmodified)

So what's nice with this method, is that every tcp related
actions are handled within only one and only loop. None of Phase fun need
to catch {tcp_ messages or handle tcp disconnections.

The last thing before the full code, remember the mx trick that returns a list of valid servers.
Then you'll understand why the start_link fun takes a list of servers as first parameter...
(if one mx is down, connect to the next one)

Now the full source code:


-module(smtp_client).

-record(state,{
 socket,
 servername,
 mailfrom, to,
 data }).

%% Extra long timeout for strange SMTPs...
-define(TIMEOUT, 10000).
-define(INACTIVITY, 100000).
-define(GREYSLEEP, 300000).   %5 minutes

% States
-export([connect/2,helo/2,mailfrom/2,rcptto/2,data/2,quit/2]).

%% External Exports
-export([start_link/1,start_link/6,stop/1]).

%% gen_server callbacks
-export([init/6]).


%%%----------------------------------------------------------------------
%%% API
%%%----------------------------------------------------------------------
% Manual Start

% Supervised Start
start_link(Server,Port,ServerName,MailFrom,To,Data) ->
 spawn_link(?MODULE, init, [Server,Port,ServerName,MailFrom,To,Data]).

stop(Pid) ->
 Pid ! {'$c', stop}.
 
%%%----------------------------------------------------------------------
%%% HELO State
%%%----------------------------------------------------------------------
 
helo(timeout, State) ->
 error_logger:error_msg("helo timeout (~p)~n", [State]),
  {stop, normal, State};

helo({timeout, _Ref, Reason}, State) ->
 [_,_,Email] = State#state.to,
 error_logger:error_msg("Timeout: ~p email: '~s'~n", [Reason, Email]),
  {stop, normal, State};

 
helo({Socket, _Data}, State) ->
 smtp_proto:mailfrom(Socket, State#state.mailfrom),
 {ok, mailfrom, State}.


%%
%% Right after the connection... 
%%
connect({Socket, _Data}, State) ->
 smtp_proto:helo(Socket, State#state.servername),
 {ok, helo, State}.


%%%----------------------------------------------------------------------
%%% MAILFROM State
%%%----------------------------------------------------------------------
 
mailfrom(timeout, State) ->
 error_logger:error_msg("timeout (~p)~n", [State]),
  {stop, normal, State};

mailfrom({timeout, _Ref, Reason}, State) ->
 [_,_,Email] = State#state.to,
 error_logger:error_msg("Timeout: ~p email: '~s'~n", [Reason, Email]),
  {stop, normal, State};

mailfrom({Socket, _Data}, State) ->
 [_,_,Email] = State#state.to,
     smtp_proto:rcptto(Socket, Email),
 {ok, rcptto, State}.

%%%----------------------------------------------------------------------
%%% RCPT TO State
%%%----------------------------------------------------------------------

rcptto(timeout, State) ->
 error_logger:error_msg("rcptto timeout (~p)~n", [State]),
  {stop, normal, State};

rcptto({timeout, _Ref, Reason}, State) ->
 [_,_,Email] = State#state.to,
 error_logger:error_msg("Timeout: ~p email: '~s'~n", [Reason, Email]),
  {stop, normal, State};

rcptto({Socket, _Data}, State) ->
       smtp_proto:data(Socket),
   {ok, data, State}.

%%%----------------------------------------------------------------------
%%% DATA State
%%%----------------------------------------------------------------------

data(timeout, State) ->
 error_logger:error_msg("data timeout (~p)~n", [State]),
  {stop, normal, State};

data({timeout, _Ref, Reason}, State) ->
 [_,_,Email] = State#state.to,
 error_logger:error_msg("Timeout: ~p email: '~s'~n", [Reason, Email]),
  {ok, data, State};

data({Socket, _Data}, State) ->
 smtp_proto:write(Socket, [ State#state.data, <<"\r\n.">> ]),
 {ok, quit, State}.
 
%%%----------------------------------------------------------------------
%%% QUIT State
%%%----------------------------------------------------------------------

quit(timeout, State) ->
 error_logger:error_msg("data timeout (~p)~n", [State]),
  {stop, normal, State};

quit({_Socket, _Data}, State) ->
  [_,_,Email] = State#state.to,
 smtp_proto:quit(State#state.socket),
  error_logger:info_msg("Sent to ~s : OK~n", [Email]),
  {stop, success, State}.

%%%----------------------------------------------------------------------
%%% Callback functions from gen_fsm
%%%----------------------------------------------------------------------
%% Timeout from ?TIME
%% gen_fsm:start_timer(70000, slow),

init(Servers,Port,ServerName,MailFrom,To,Data) ->
 process_flag(trap_exit, true),
 connect(Servers,Port,ServerName,MailFrom,To,Data).

connect([], _,_,_,To,_) ->
 [_,_,Email] = To,
 exit({error, Email});

connect([H | T],Port,ServerName,MailFrom,To,Data) ->
 case gen_tcp:connect(H,Port,[binary,{packet,0},{active,once}], ?TIMEOUT) of 
  {ok,Socket} ->
   inet:setopts(Socket, [{active,once}]),
   loop(connect, #state{ 
     socket=Socket,
     servername=ServerName,
     to=To,
     mailfrom=MailFrom,
     data=Data});
  {error, timeout} ->
   connect(T,Port,ServerName,MailFrom,To,Data);

  {error, Reason} ->
   exit({error, Reason})
 end.

loop(Phase,State) ->
 receive 
  {'$c', Who, info} ->
   Who ! {ok, Phase, State#state.to},
   loop(Phase, State); 

  {'$c', stop} ->
   error_logger:info_msg("Forced to quit~n"),
   [_,_, Email ] = State#state.to,
    exit({stop, Email});

  {tcp, Socket, Data} ->
   [_,_, Email ] = State#state.to,
   case smtp_proto:check(Data) of
    {more, _Rest} ->
     inet:setopts(Socket, [{active,once}]),
     loop(Phase, State);

    {ok, _Rest} ->
      inet:setopts(Socket, [{active,once}]),
     case ?MODULE:Phase({Socket, Data}, State) of

      {ok, NewPhase, NewState} ->
       loop(NewPhase, NewState);
 
      {stop, success, _NewState} ->
       mail_stats:add(success),
       exit({ok, Email});

      {stop, How, _NewState} ->
       exit({How, Email});

      {error, _NewPhase, _NewState} ->
       error_logger:error_msg("Error: ~p '~p'~n", [Phase, Email]),
       exit({error, Email});

      _Any ->
       error_logger:msg("What: ~p~n", [_Any]),
       exit({error, _Any})
     end;

    {error, 421, Rest} ->
     error_logger:error_msg("Error ~p for ~s (greylist): ~p ~p~n", [Phase, Email, 421, Rest]),
     greylist(State),
     exit({error, Rest});
     
    {error, Code, Rest} ->
     error_logger:error_msg("Error ~p for ~s: ~p ~p~n", [Phase, Email, Code, Rest]),
     exit({error, Rest});

    {error, Rest} ->
     error_logger:error_msg("Error ~p for ~s: ~p~n", [Phase, Email, Rest]),
     exit({error, Rest}) 
   end;  

  {tcp_error, _Socket, timeout} ->
   [_,_, Email ] = State#state.to,
   error_logger:error_msg("Error: ~p 'timeout' for ~s~n", [Phase, Email]),
   exit({error, timeout});
   
  {tcp_closed, _Socket} ->
   [_,_, Email ] = State#state.to,
   error_logger:error_msg("Error: ~p 'connection closed' for ~s~n", [Phase, Email]),
   exit({error, closed});

  _Any ->
   error_logger:error_msg("Unhandled message: ~p~n", [_Any]),
   loop(Phase,State) 

 after ?INACTIVITY ->
  exit({error, inactivity})
 end.

greylist(State) ->
 spawn(greylist, start, [ ?GREYSLEEP, State#state.servername,
  State#state.mailfrom,State#state.to,State#state.data]).

Here's the smtp_proto module:


-module(smtp_proto).
-export([read/1, 
 read/2, 
 write/2,
 helo/2,
 ehlo/2,
 mailfrom/2,
 rcptto/2,
 data/1,
 noop/1,
 rset/1,
 help/1,
 check/1,
 quit/1]).

read(Socket) ->
 read(Socket, 5000).

read(Socket, Timeout) ->
 case gen_tcp:recv(Socket, 0, Timeout) of
  {ok, Packet} ->
   check(Packet);

  {error, Why} ->
   {error, Why} 
 end.

%% 2XX codes are OK
check(<<"250 ", Rest/binary>>) ->
 {ok, Rest};
check(<<"214", Rest/binary>>) ->
 {ok, Rest};
check(<<"220", Rest/binary>>) ->
 {ok, Rest};
check(<<"221", Rest/binary>>) ->
 {ok, Rest};
check(<<"354", _Rest/binary>>) ->
 {ok, data};

%Errors
check(<<"421", Rest/binary>>) ->
 {error, 421, Rest};
check(<<"503", Rest/binary>>) ->
 {error, 503, Rest};
check(<<"511", Rest/binary>>) ->
 {error, 511, Rest};
check(<<"540", Rest/binary>>) ->
 {error, 540, Rest};
check(<<"550", Rest/binary>>) ->
 {error, 550, Rest};
check(<<"554", Rest/binary>>) ->
 {error, 554, Rest};
check(Bin) ->
 {more, Bin}.

write(Socket, Msg) ->
 gen_tcp:send(Socket, [Msg, <<"\r\n">>] ).

helo(Socket, Name) ->
 Msg = [ <<"HELO ">>, Name ],
 write(Socket, iolist_to_binary(Msg)).

ehlo(Socket, Name) ->
 Msg = [ <<"EHLO ">>, list_to_binary(Name) ],
 write(Socket, Msg).

rcptto(Socket, Name) ->
 Msg = iolist_to_binary([ <<"RCPT TO:<">>, Name, <<">">> ]),
 write(Socket, Msg).

mailfrom(Socket, Name) ->
 Msg = [ <<"MAIL FROM:<">>, list_to_binary(Name), <<">">> ],
 write(Socket, Msg).

help(Socket) ->
 Msg = <<"HELP">>,
 write(Socket, Msg).

noop(Socket) ->
 Msg = <<"NOOP">>,
 write(Socket, Msg).

quit(Socket) ->
 Msg = <<"QUIT">>,
 write(Socket, Msg).

rset(Socket) ->
 Msg = <<"RSET">>,
 write(Socket, Msg).

data(Socket) ->
 Msg = <<"DATA">>,
 write(Socket, Msg).

But wait ! There's more !
BONUS: the greylist client
(where you can see the smtp_client in action):


-module(greylist).

-export([start/5, stop/1, flush/1]).
-define(TIMEOUT, 100000).

start(Sleep,ServerName,MailFrom,To,Data) ->
        receive 
                {'$c', flush} ->
                        ok;

                {'$c', stop} ->
                        error_logger:info_msg("~p: Manual stop~n", [?MODULE]),
                        exit(normal)
        after Sleep -> 
                ok 
        end,
        [ _, _, Email ] = To,
        [ _, Domain ] = string:tokens( Email, "@"),
        List = mmailer:get_mx(Domain),
        {_, Servers} = lists:unzip( lists:keysort(1, List) ),
        Pid = smtp_client:start_link(Servers, 25, ServerName,MailFrom,To,Data),
        loop(Pid).

flush(Pid) ->
        Pid ! {'$c', flush}.

stop(Pid) ->
        Pid ! {'$c', stop}.

loop(Child) ->
        receive 
                {'$c', stop} ->
                        smtp_client:stop(Child),
                        error_logger:info_msg("Greylist: Manual stop~n");

                {'EXIT', Child, Reason} ->
                        error_logger:info_msg("~p child ~p died : ~p~n", [?MODULE, Child, Reason]);
                
                {'EXIT', Pid, Reason} ->
                        error_logger:info_msg("~p not own child ~p died : ~p~n", [?MODULE, Pid, Reason]),
                        loop(Child);

                Msg -> 
                        error_logger:error_msg("~p: Unhandled message received: '~p'", [?MODULE, Msg]),
                        loop(Child) 

        after ?TIMEOUT ->
                smtp_client:stop(Child)
        end.

Tuesday, July 15, 2008

From Googerl to googleatom_app

Hi, (a quick post)
I've been busy this weekend working on a more generic layer for google API. Here comes the 'googleatom_app'.

I've uploaded it in the googerl project.
The application needs 'crypto, ssl, inets' to work.

[ application:start(X) || X <- [crypto, ssl, inets, googleatom] ]

should start the application.
There's currently not much documentation, this is still at a very early stage. Here's a list of what you can do now:

Post an article into blogger
Post a photo into picasa, taken from a file or from a URL (inets download the image and send it to picasa)

Once the application has started, you'll see two new registered processes 'blogger_srv' and 'picasa_srv'. The first thing you need to do before being able to use thoses process is setting your google credentials:

picasa_srv:auth(Username, Password).

Once the call returns you'll receive a google token string. If not you'll have to investigate a little :).
Ok now that you're authenticated you can post a nice photo into your gallery:

picasa_srv:new(Username, Album, "http://www.aniceimageservier.com/superimage.jpg", "Fake Image Name").

Since the process of posting images takes time, picasa_srv uses 'gen_server:cast', and now the success is indicated by some 'error_logger:info_msg' ... (xmerl is used to extract various new urls of your posted photo).

That's it for this really quick introduction, I'll will update files within days, actually some funs are not available everywhere, I'll fix this.

I'll post a googleatom_app tutorial later this week...

Thursday, July 10, 2008

My mochiweb webservice...

Now that mochiweb is popular, and that I use it for some of my work, I can publish here some webservice thingy that I use.

I wanted to have this simple thing: modulename,funname, and parameters inside the URL:


http://www.server.tld/modulename/fun/arg1/arg2 
wmodulename:Fun(arg1, arg2)

(Note the 'w' prefix, I explain it later)

Here's the code:


-module(webservice).
-export([start/0, start/2, loop/2, stop/0, test/0]).
-export([dolog/2]).
-define(PORT, 8080).

start() ->
 start("/home/rolphin/Devel/Web", ?PORT).

start(Wwwroot, Port) ->
 Loop = fun (Req) ->
  ?MODULE:loop(Req, Wwwroot)
 end,
 mochiweb_http:start([{loop, Loop}, {name, ?MODULE}, {port, Port}]).

stop() ->
    mochiweb_http:stop(?MODULE).

loop(Req, DocRoot) ->
 log(Req),
 case string:tokens(Req:get(path), "/") of
  [ "dump" ] ->
   Req:ok({"text/plain",
    io_lib:format("~p~n", [Req:dump()])});

  [ "favicon.ico" ] ->
   Req:respond({404, [], ""});

  [ "codepath" ] ->
   Req:ok({"text/plain",
    io_lib:format("codepath: ~p~n", [code:get_path()])});

  [ "codepath", "json" ] ->
   Req:ok({"text/plain",
    mochijson:encode({array, code:get_path()})});

  [ Path, Fun | Elems ] ->
   % Every module name should begin with 'w'
   dispatch(Req, DocRoot, list_to_atom("w" ++ Path), Fun, Elems);

  [] ->
   launch(Req, DocRoot, wdefault, do, []);

  _ ->
   Req:respond({502, [], []})
   
 end.

dispatch(Req, DocRoot, Module, Fun, Elems) ->
 M = Req:get(method),
 case M of
  'GET' ->
   launch(Req, DocRoot, Module, Fun, Elems);
  'POST' ->
   launch(Req, DocRoot, Module, Fun, Elems);
  'PUT' ->
   launch(Req, DocRoot, Module, Fun, Elems);
  'DELETE' ->
   launch(Req, DocRoot, Module, Fun, Elems);
  'HEAD' ->
   launch(Req, DocRoot, Module, Fun, Elems);
  _Any ->
   launch(Req, DocRoot, wdefault, get, [])
 end.

launch(Req, DocRoot, wcontent, Fun, Args) ->
 case catch wcontent:default(Req, DocRoot, [ Fun | Args] ) of
  {'EXIT', {Type, _Error}} ->
   Req:ok({"text/plain",  
    io_lib:format("GET Error: '~p' for '~p' ~p ~p~n~p~n", [Type, wcontent, Fun, Args, _Error])});
  _ ->
   ok
 end;
 
launch(Req, DocRoot, Module, Fun, Args) ->
 F = list_to_atom(Fun),
 case catch Module:F(Req, DocRoot, Args) of
  {'EXIT', {Type, _Error}} ->
   Req:ok({"text/plain",  
    io_lib:format("~p Error: '~p' for ~p ~p ~p~n~p~n", [Req:get(method), Type, Module, Fun, Args, _Error])});
  _ ->
   ok
 end.

log(Req) ->
 Ip = Req:get(peer),
 spawn(?MODULE, dolog, [Req, Ip]).

dolog(Req, Ip) ->
 stat_logger:log("~p ~p", [Ip, Req:get(path)]).

First you can see that I use "mochiweb:start" and "mochiweb:stop" in the "start" and "stop" funs.
Then parameters are extracted from the URI:


 case string:tokens(Req:get(path), "/") of

I split the URI on the "/" character, and compare the resulting list with various possibilities.

Then the main part of the code:


launch(Req, DocRoot, Module, Fun, Args) ->
 F = list_to_atom(Fun),
 case catch Module:F(Req, DocRoot, Args) of
  {'EXIT', {Type, _Error}} ->
   Req:ok({"text/plain",  
    io_lib:format("~p Error: '~p' for ~p ~p ~p~n~p~n", [Req:get(method), Type, Module, Fun, Args, _Error])});
  _ ->
   ok
 end.

Using "catch" prevent your app from crashing without any information. If a module doesn't exists, you'll be able to see why (the main problem is sometimes the module is not in the code:path...)

Every webservice module name should start with "w", it's some sort of namespace :), now you know the reason for the 'w' prefix:


  [ Path, Fun | Elems ] ->
   % Every module name should begin with 'w'
   dispatch(Req, DocRoot, list_to_atom("w" ++ Path), Fun, Elems);

Now, how can I use it ? It's simple, you just have to build a module with such a template:


-module(wsample).
-export([do/3]).

do(Req, _DocRoot, Args) ->
 Req:ok({"text/plain", [ <<"Hello ">>, hd(Args) , <<", you are authorized :)">>] }).

Then you build it, and store it somewhere in your "code:path". And call


webservice:start().

in your erl shell. Then locate your webrowser to the URI

www.server.tld:8080/sample/do/PUT-ANYTHING-YOU-WANT-HERE

As a sample module, here is a module that is used everyday the "mobile tags" generator:


-module(wbarcode).
-export([do/3]).

-define(PNG, "image/png").

do(Req, _DocRoot, _Args) ->
 barcode(Req).

% the process 'barcode' must be there
barcode(Req) ->
 Options = Req:parse_qs(),
 %io:format("~p~n", [Options]),

 Text = proplists:get_value("text", Options, "http://www.shootit.fr"),
 Res = proplists:get_value("res", Options, "72"),
 Z = proplists:get_value("z", Options, "1"),

 case barcode:generate('barcode', Text) of 
  timeout ->
   Req:ok({"text/plain", [], <<"Timeout">>});
  
  {data, {Width, Height}, Data} ->
   Zoom = list_to_integer(Z),
   Xres = integer_to_list(list_to_integer(Res)  * Zoom),
   Yres = integer_to_list(list_to_integer(Res)  * Zoom),
   Params = [ {"PS", [integer_to_list(Width + 1), integer_to_list(Height + 1)]}, {"HW", [Xres, Yres]} ],
   case gs:draw(gs, Data, Params) of
    {png, Image} ->
     Req:ok({?PNG, Image});

    _E ->
     Req:ok({"text/plain", [], [ <<"Error: ">>, 
      io_lib:format("~s", [_E])]})
   end
 end.

Wednesday, July 9, 2008

Dealing with multiple pages of results easily

We you want to deal with multiples pages of responses and using MySQL, you are always dealing with some sort of "LIMIT" statement. Since erlang is a vm, you can efficiently retrieving results while the user is not asking you to... You can transparently in the background prefetch data for him. This is what I call a pager.

I've build simple module to that for me :


-module(db_pager).
-export([retrieve/1, loop/2]).

new(Request, Start, Step) ->
 Prepared = 
 Pid = spawn(?MODULE, loop, [Start, Step]),
 retrieve(Pid),

retrieve(Pid) ->
 io:format("DEBUG: ~p ! {next, ~p}~n", [ Pid, self() ]),
 Pid ! { next, self() },
 receive
  {eof, Msg} ->
   io:format("eof: ~p~n", [Msg]),
   {eof, Msg};

  {data, Data} ->
   io:format("Received: ~p~n", [Data]),
   timer:sleep(3000),
   retrieve(Pid); 

  stop ->
   stop;

  _E ->
   io:format("Unhandled: ~p~n", [_E]),
   client(Pid) 
 after 2000 ->
  timeout
 end.

select(Fields, Table) ->
 [ Head | Rest ] = Fields,
 [ <<"select ">>, atom_to_list(Head),
  lists:foldl(fun(X, Acc) ->
   [ [ <<",">> |  atom_to_list(X) ] | Acc ]
   end, [], Rest),
  <<" from ">>, atom_to_list(Table) ].

sql(Current, Step) ->
 Query = select([id,lastname,firstname], users),
 Next = Current + Step,
 case mysql:fetch(mysql, [ 
  Query, <<" where id between ">>, 
  integer_to_list(Current), <<" and ">>, integer_to_list(Next) ]) of

  {data, Data} -> 
   case mysql:get_result_rows(Data) of
    [] ->
     {eof, empty};

    Res ->
     %io:format("DEBUG: sql: data: ~p~n", [Data]),
     {data, Res}  
   end;

  {error, Data} -> 
   io:format("DEBUG: sql: data: ~p~n", [Data]),
   {eof, mysql:get_result_reason(Data)}
 end.

loop(Current, Step) ->
 loop(Current, Step, []).

loop(Current, Step, []) ->
 Res = sql(Current, Step),
 loop(Current + Step, Step, Res);

loop(Current, Step, Res) ->
 Next = Current + Step,
 receive
  {next, Who} ->
   Who ! Res,
   NewRes = sql(Current, Step),
   loop(Next, Step, NewRes);

  stop ->
   stop

 after 30000 ->
  timeout
 end.

This code build a sql query that contains some "BETWEEN" statement for some "id" field:


 Query = select([id,lastname,firstname], users),
 Next = Current + Step,
 case mysql:fetch(mysql, [ 
  Query, <<" where id between ">>, 
  integer_to_list(Current), <<" and ">>, integer_to_list(Next) ]) of

This part specifically builds a string like this:


select id,lastname,firstname where id between X and Y

Where X and Y are variable, X stands for the Current offset and Y the Next offset.

When the code is called multiple times the X and Y are automatically computed, so every row from the table will be returned one page (Y - X elems) at a time. The "loop/1" function is here for that:


loop(Current, Step, Res) ->
 Next = Current + Step,
 receive
  {next, Who} ->
   Who ! Res,
   NewRes = sql(Current, Step),
   loop(Next, Step, NewRes);

  stop ->
   stop

 after 30000 ->
  timeout
 end.

The receiver Who will have informations one page at a time. The data sent to him is fetched before he needs to, except the first time:


loop(Current, Step, []) ->
 Res = sql(Current, Step),
 loop(Current + Step, Step, Res);

Monitoring your servers with sysstat (sar)

There's sometimes things that are so helpfull that you think that everyone is aware of them, but sometimes this is not the case. Here I'll talk about a little package that is so powerful and efficient that you won't change anymore...
Taken from the ubuntu man page:


DESCRIPTION
       The sar command writes to standard output the contents of  selected  cumula-
       tive activity counters in the operating system. The accounting system, based
       on the values in the count and interval parameters, writes  information  the
       specified  number of times spaced at the specified intervals in seconds.  If
       the interval parameter is set to zero, the sar command displays the  average
       statistics  for the time since the system was started. The default value for
       the count parameter is 1. If its value is set to zero, then reports are gen-
       erated continuously.  The collected data can also be saved in the file spec-
       ified by the -o filename flag, in  addition  to  being  displayed  onto  the
       screen.  If filename is omitted, sar uses the standard system activity daily
       data file, the /var/log/sysstat/sadd file, where the dd parameter  indicates
       the  current  day.   By  default  all the data available from the kernel are
       saved in the data file. Exceptions are interrupts and disks data, for  which
       the  relevant  options  must  be explicitly passed to sar (or to its backend
       sadc ) when the data file is created (see options below).

"sar" comes with the sysstat package. Once it's installed you can monitor your server like never before...

Here's the description of the sysstat package from the author

The sysstat utilities are a collection of performance monitoring tools for Linux. 
These include sar, sadf, mpstat, iostat, pidstat and sa tools. Go to the Features page to display 
a list of sysstat's features, or see the Documentation page to learn some more about them.

For example, you can watch realtime the network usage:


# sar -n DEV 1 0
Linux 2.6.22-15-generic (xXxXx)  07/09/2008

11:26:36 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
11:26:37 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
11:26:37 AM      eth0      5.05      0.00      0.86      0.00      0.00      0.00      0.00

11:26:37 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
11:26:38 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
11:26:38 AM      eth0      4.00      0.00      0.45      0.00      0.00      0.00      0.00
...

Today, I'll introduce the erlang-sar package that's able to retrieve information from the sar command.

The application is composed of a collector "sar_collector", a helper module "sar_values" and the main module "sar".
Here comes a quick sample session:


% Starting the collector
sar_collect:start().

% Retrieving the data
sar:stats(cpu).
[{cpu,idle,<<"98.62">>},
 {cpu,steal,<<"0.00">>},
 {cpu,iowait,<<"0.00">>},
 {cpu,system,<<"0.18">>},
 {cpu,nice,<<"0.00">>},
 {cpu,user,<<"1.20">>}]

% Retrieving more data
sar:stats([cpu,mem]).
[{swap,swpcad,<<"33236">>},
 {swap,usage,<<"64.72">>},
 {swap,used,<<"389872">>},
 {swap,free,<<"212492">>},
 {mem,kbcached,<<"84496">>},
 {mem,kbbuffers,<<"63408">>},
 {mem,memused,<<"98.78">>},
 {mem,kbmemused,<<"508984">>},
 {mem,kbmemfree,<<"6308">>},
 {cpu,idle,<<"97.83">>},
 {cpu,steal,<<"0.00">>},
 {cpu,iowait,<<"0.75">>},
 {cpu,system,<<"0.20">>},
 {cpu,nice,<<"0.00">>},
 {cpu,user,<<"1.22">>}]

The module "sar_values" also export an "extractor" function that can be used to build fun()s:


% build a Mem fun()
Mem = sar_values:extractor(mem).

% Calling Mem fun() on sar:stats()
Mem(sar:stats([cpu,mem])).
[{kbcached,<<"84496">>},
 {kbbuffers,<<"63480">>},
 {memused,<<"98.77">>},
 {kbmemused,<<"508976">>},
 {kbmemfree,<<"6316">>}]

% Calling it on sar:stats()
Mem(sar:stats()).
[{kbcached,<<"84496">>},
 {kbbuffers,<<"63520">>},
 {memused,<<"98.80">>},
 {kbmemused,<<"509100">>},
 {kbmemfree,<<"6192">>}]

With this package you have access to all the data sar can export for you.
Here's the "sar.erl" code:


-module(sar).

-export([systat/0, stats/0, stats/1, option/1]).
-export([extract/1]).
-define(OPTIONS, "-u -r -v -c -q -n DEV").
-define(DATA, "/tmp/last").

systat() ->
 Cmd = "sadf " ++ ?DATA ++ " -- " ++ ?OPTIONS,
 execute(".", Cmd).

stats() ->
 Cmd = "sadf " ++ ?DATA ++ " -- " ++ ?OPTIONS,
 {ok, _, Bin} = execute(".", Cmd),
 extract(Bin).

stats(List) when is_list(List) ->
 Args = lists:foldl(fun(X, Acc) -> case option(X) of
     error ->
      Acc;
     T ->
      [ $ , T | Acc ]
    end end, [], List),
 Cmd = "sadf " ++ ?DATA ++ " -- " ++ lists:reverse(Args),
 {ok, _, Bin} = execute(".", lists:flatten(Cmd)),
 extract(Bin);

stats(Elem) ->
 stats([Elem]).

option(cpu) ->
 "-u";
option(disk) ->
 "-d";
option(sock) ->
 "-n SOCK";
option(eth0) ->
 "-n DEV";
option(eth1) ->
 "-n DEV";
option(eth2) ->
 "-n DEV";
option(proc) ->
 "-c";
option(run) ->
 "-q";
option(mem) ->
 "-r";
option(inode) ->
 "-v";
option(switch) ->
 "-w";
option(swaping) ->
 "-W";
option(_) ->
 error.

execute(_Host, Cmd) ->
 Port = open_port({spawn, Cmd}, [ exit_status, binary ] ),
 wait(Port, []).

wait(Port, Content) ->
 receive 
  {Port, {data, BinData}} ->
   %error_logger:info_msg("dump:~n~p~n", [BinData]),
   NewContent = [ BinData | Content ],
   wait(Port, NewContent);

  {Port, {exit_status, Status}} ->
   %error_logger:info_msg("exit_code: ~p~n", [Status]),
   {ok, Status, Content};

  {Port, eof} ->
   %error_logger:info_msg("Port closed"),
   port_close(Port),
   {ok, eof, Content};

  {Port, exit} ->
   error_logger:info_msg("Received : ~p~n", [Port]),
   Content
 end.

extract(Bin) ->
 sar_values:extract(iolist_to_binary(Bin)).

You can see the "option/1" function that let you convert atoms into command line arguments easily. I use also this function to test if sar is able to handle a specific parameter. For example and with the help of my webservice I can query remote stats easily:

http://monitoring.lan/stats/q/cpu/servername

Here's the "sar_collect" module


-module(sar_collect).

-export([systat/1, sartime/1, start/0, start/1]).
-export([extract/1]).
        spawn(?MODULE, systat, []).

start(Seconds) ->
        spawn(?MODULE, systat, [Seconds]).

% update the file every second        
systat(0) -> 
        loop(1);

systat(Seconds) ->
        loop(Seconds).

%update the file every 59 seconds
systat() ->
        loop(59).

loop(Seconds) when Seconds < 60 ->
 Cmd = lists:flatten([ "sar -o /tmp/last.tmp ", integer_to_list(Seconds),  " 1" ]),
 execute(".", Cmd),
 file:rename("/tmp/last.tmp", "/tmp/last"),
 timer:sleep(60 - Seconds),
 receive
  stop ->
   exit(normal);

  {interval, NewSeconds} ->
   loop(NewSeconds);

  _A ->
   loop(Seconds)

 after 0 ->
  loop(Seconds) 

 end;

%default update 20 seconds (arbitrary chosen)
loop(_Seconds) ->
 loop(20).

execute(Host, Cmd) ->
 Port = open_port({spawn, Cmd}, [ {cd, Host}, exit_status, binary ] ),
 wait(Port, []).

wait(Port, Content) ->
 receive 
  {Port, {data, _BinData}} ->
   wait(Port, Content);

  {Port, {exit_status, _Status}} ->
   ok;

  {Port, eof} ->
   port_close(Port),
   Content;

  {Port, exit} ->
   error_logger:info_msg("Received : ~p~n", [Port]),
   Content
 end.

Finally there is the "sar_values" source code:


-module(sar_values).

-export([extract/1, extractor/1, sort/1]).
-export([parse/1, parse_value/2]).

extract(Bin) ->
 extract(Bin, []).

extract(Bin, Stats) ->
 case parse(Bin) of
  {Class, Type, Rest} ->
   %io:format("~p.~p", [Class, Type]),
   case  parse_value(Rest, <<>>) of
    {more, Value, More} ->
     NewStats = [ {Class, Type, Value} | Stats ],
     extract(More, NewStats);

    {eof, Value} ->
     NewStats = [ {Class, Type, Value} | Stats ],
     NewStats 
   end;

  eof -> 
   Stats 
 end.

parse(<<"%user", Rest/binary >>) -> {cpu, user, Rest};
parse(<<"%nice", Rest/binary>>) -> {cpu, nice, Rest};
parse(<<"%system", Rest/binary>>) -> {cpu, system, Rest};
parse(<<"%iowait", Rest/binary>>) -> {cpu, iowait, Rest};
parse(<<"%steal", Rest/binary>>) ->  {cpu, steal, Rest};
parse(<<"%idle", Rest/binary>>) -> {cpu, idle, Rest};

parse(<<"kbmemfree", Rest/binary>>) -> {mem, kbmemfree, Rest};
parse(<<"kbmemused", Rest/binary>>) -> {mem, kbmemused, Rest};
parse(<<"%memused", Rest/binary>>) -> {mem, memused, Rest};
parse(<<"kbbuffers", Rest/binary>>) -> {mem, kbbuffers, Rest};
parse(<<"kbcached", Rest/binary>>) -> {mem, kbcached, Rest};

parse(<<"kbswpfree", Rest/binary>>) -> {swap, free, Rest};
parse(<<"kbswpused", Rest/binary>>) -> {swap, used, Rest};
parse(<<"%swpused", Rest/binary>>) -> {swap, usage, Rest};
parse(<<"kbswpcad", Rest/binary>>) -> {swap, swpcad, Rest};

parse(<<"dentunusd", Rest/binary>>) -> {inode, dentryunused, Rest};
parse(<<"file-sz", Rest/binary>>) -> {inode, fileopened, Rest};
parse(<<"inode-sz", Rest/binary>>) -> {inode, inodes, Rest};
parse(<<"super-sz", Rest/binary>>) -> {inode, super, Rest};
parse(<<"%super-sz", Rest/binary>>) -> {inode, superusage, Rest};
parse(<<"dquot-sz", Rest/binary>>) -> {inode, dquotsz, Rest};
parse(<<"%dquot-sz", Rest/binary>>) -> {inode, dquotszusage, Rest};
parse(<<"rtsig-sz", Rest/binary>>) -> {rtsig, count , Rest};
parse(<<"%rtsig-sz", Rest/binary>>) -> {rtsig, usage, Rest};

parse(<<"totsck", Rest/binary>>) -> {sock, total, Rest};
parse(<<"tcpsck", Rest/binary>>) -> {sock, tcp, Rest};
parse(<<"udpsck", Rest/binary>>) -> {sock, udp, Rest};
parse(<<"rawsck", Rest/binary>>) -> {sock, raw, Rest};
parse(<<"ip-frag", Rest/binary>>) -> {sock, ipfrag, Rest};

parse(<<"runq-sz", Rest/binary>>) -> {procs, running, Rest};
parse(<<"plist-sz", Rest/binary>>) -> {procs, total, Rest};

parse(<<"ldavg-15", Rest/binary>>) -> {load, min15, Rest};
parse(<<"ldavg-1", Rest/binary>>) -> {load, min1, Rest};
parse(<<"ldavg-5", Rest/binary>>) -> {load, min5, Rest};

parse(<<"pswpin/s", Rest/binary>>) -> {swaping, pswpin, Rest};
parse(<<"pswpout/s", Rest/binary>>) -> {swaping, pswpout, Rest};

parse(<<"l0", Rest/binary>>)   -> parsebis(Rest, l0);
parse(<<"eth0", Rest/binary>>) -> parsebis(Rest, eth0);
parse(<<"eth1", Rest/binary>>) -> parsebis(Rest, eth1);
parse(<<"eth2", Rest/binary>>) -> parsebis(Rest, eth2);

parse(<<>>) -> eof;
 
parse(Bin) ->
 {_, Next} = split_binary(Bin, 1), 
 parse(Next).

parsebis(<<"rxpck/s", Rest/binary>>,  Category) -> {Category, rxpck, Rest};
parsebis(<<"txpck/s", Rest/binary>>,  Category) -> {Category, txpck, Rest};
parsebis(<<"rxbyt/s", Rest/binary>>,  Category) -> {Category, rxbyt, Rest};
parsebis(<<"txbyt/s", Rest/binary>>,  Category) -> {Category, txbyt, Rest};
parsebis(<<"rxcmp/s", Rest/binary>>,  Category) -> {Category, rxcmp, Rest};
parsebis(<<"txcmp/s", Rest/binary>>,  Category) -> {Category, txcmp, Rest};
parsebis(<<"rxmcst/s", Rest/binary>>, Category) -> {Category, rxmcst, Rest};
parsebis(Bin, Category) -> 
 {_, Next} = split_binary(Bin, 1), 
 parsebis(Next, Category).

parse_value(<<$\t, Rest/binary>>, _Value) ->
 parse_value(Rest, _Value);
parse_value(<<$ , Rest/binary>>, _Value) ->
 parse_value(Rest, _Value);

parse_value(<<$\n, _Rest/binary>>, Value) ->
 {more, Value, _Rest};

parse_value(<<>>, Value) ->
 {eof, Value};

parse_value(Bin, Value) ->
 {H, Next} = split_binary(Bin, 1),
 parse_value(Next, iolist_to_binary([Value, H])).

extractor(Motif) ->
 fun(L) when is_list(L) ->
  [ {Y, Z} || {X, Y, Z} <- L, X == Motif]
 end.

sort(List) ->
 lists:sort( fun({X, _V}, {Y, _W}) when X < Y -> 
   true; 
   (_A, _B) -> false 
  end, List).

Now that Erlang is R12B, I'm not so sure if "binary parsing code" is really as efficient as it can...

Thursday, June 26, 2008

Parsing Binaries with erlang, lamers inside

Edit: I don't want to offend anyone with the following, this is just an expression of what i encounter every day with people that do technology but don't know anything. I don't say anyone on the mailling list is a lamer, I say that question like the one evocated here exists because there's too much ignorance in the technology world. And lastly, don't forget this is a personal rant...

It's seems that there's a really high expectation on "parsing binaries" with erlang and reach others languages performance. For me this is a complete nonsense.
What's the meaning of "parsing a binary", a binary is not a text, it's a binary, a sequence of bytes...
A sequence must be defined by its length. You must know before reading anything, the size that'll be needed store what's coming.
Every crap software you can find has always prefered to use strcpy instead of memcpy. Whatever the language you use, you MUST know the size of what you're working with, this is not an advice this is mandatory.

From the post above, you can find that the only delimiter seems to be "\r\n". So if someone sends you 4Gb of data not ending with "\r\n" you'll keep reading it... (and of course blow your memory because this was not supposed to be)

While working at low level with C and flex scanners, I've always ask me this question: "What's the max size of the element I can accept ?". This simple question helps me build software that don't break with a simple 'perl print Ax60000' trick...

So is HTTP badly designed, because delimiters are "\r\n" and headers can spread on multiple lines ? The answer is absolutely YES.
Was'it difficult to build something more secure, using prefixed elements with their size ? The answer is absolutely NO ! (take ajp13 for example...)

Now that erlang is becoming more and more popular, lamers are lurking in the erlang direction. This is life, but will the erlang mailling list suffer from this ? The answer is yes :/

Someone with knowledge must not try to resolve someone's else problem, he must help him by asking the good question. (do you know the size a priori ?)

Why parsing binaries in java is faster than erlang ? Who cares, since parsing binaries is of course stupid !
Parsing real world protocol with erlang is lightning fast, both for writing and for executing. So teach lamers how to build real protocols and don't try help them with some trickery.

That's my rant for today :)

Friday, June 20, 2008

Quick Tip, list join

Another "lists:join" :


Join = fun([X|Rest], D) -> 
  [ X | [ [D,E] || E <- Rest ] ] 
  end.

Usage:


io:format("~s~n", [Join(["a", "b", "cde", "h", "k", "lm"], $,)]).
a,b,cde,h,k,lm
ok

Wednesday, June 18, 2008

Ubuntu and Ghostscript

On my development box, I've recently upgraded my ubuntu. The ghostscript package was also upgraded, but my erlang webservice wasn't able anymore to draw any mobile tag...

I've found that the new ghostscript binary 'gs' has new command line parameters that are incompatible with their previous version...

I used to initialize gs like this:


 Cmd = "gs -sDEVICE=pngalpha -q -dNOPLATFONTS -dNOPAUSE -dGraphicsAlphaBits=2 -sOutputFile=- -",

But that no longer works since this flushing is done only when the process quits...

The correct command line is then:


 Cmd = "gs -sDEVICE=pngalpha -q -dGraphicsAlphaBits=2 -sOutputFile=%stdout -dNOPROMPT",

Everything works fine now, but I've re-read the very long Ghostscript documentation, and the solution comes from this page.

Friday, June 13, 2008

Quick Bash Script for Checking HTTP Headers

Whenever it comes to efficiently configure Web Servers and setting headers like ETAg, Cache-Control, or Expires and you have more than one server, you need to check them all since any user may hit different servers.

This script is meant to call wget on every IP returned by dig and display http response headers.
Usage is simple:


./script.sh http://www.example.com/ressource/with-long-cache-and-no-etag/big-image.jpg

Here's the code:


#!/bin/bash

BIN=${0##*/}
URL=${1?usage: $BIN url}
HOST=${URL#http://*}
REQ=${HOST#*/}
HOST=${HOST%%/*}

function check
{
	wget -S --header="Host: $HOST" "http://$line/$REQ" -O/dev/null
}

dig +short $HOST | \
while read line
do
	case $line in	
		[a-z]*)
		;;
		[0-9]*.[0-9]*.[0-9]*)
			echo === Testing $HOST with IP $line
			check $line
		;;
		*)
		;;
	esac
done

Monday, May 19, 2008

Monitoring log files with 'tail'

When you need to look for specific events from logfiles, your first idea is to use 'tail'. Tail is obviously the number one command that any sysadmin knows about.

From the first version of Tail and nowadays, some really nice feature have been implemented, one of those is the "follow=name" feature...

Since your erlang node will stay alive for many days, you'll end up meeting some logrotation tool that will replace the file you're lurking... So "follow=name" is for you !

Extract from a manual page:


     There are two ways to specify how you'd like to track files with
     this option, but that difference is noticeable only when a
     followed file is removed or renamed.  If you'd like to continue to
     track the end of a growing file even after it has been unlinked,
     use `--follow=descriptor'.  This is the default behavior, but it
     is not useful if you're tracking a log file that may be rotated
     (removed or renamed, then reopened).  In that case, use
     `--follow=name' to track the named file by reopening it
     periodically to see if it has been removed and recreated by some
     other program.

Implementing this feature in pure erlang is of course possible, but why loose time when you can directly use the "tail" binary already installed on your system ?


-module(tail).
-export([start/1, start/2, start/3, stop/1, snapshot/1, display/1, init/3]).

start(File) ->
	start(File, fun display/1, "/var/log").

start(File, Callback) ->
	Dir = "/var/log",
	start(File, Callback, Dir).

start(File, Callback, Dir) ->
	spawn_link(?MODULE, init, [File, Callback, Dir]).

snapshot(Pid) ->
	Pid ! {snap, self() },
	receive
		{Port, Callback} ->
			{Port, erlang:fun_info(Callback)};
		_Any ->
			_Any
	end.

stop(Pid) ->
	Pid ! stop.

init(File, Callback, Dir) ->
	Cmd = "/usr/bin/tail --follow=name "++ File,
	Port = open_port({spawn, Cmd}, [ {cd, Dir}, stderr_to_stdout, {line, 256}, exit_status, binary]), 
	tail_loop(Port, Callback).

tail_loop(Port, Callback) ->
	receive
		{Port, {data, {eol, Bin}}} ->
			Callback(Bin),
			tail_loop(Port, Callback);

		{Port, {data, {noeol, Bin}}} ->
			Callback(Bin),
			tail_loop(Port, Callback);

		{Port, {data, Bin}} ->
			Callback(Bin),
			tail_loop(Port, Callback);

		{Port, {exit_status, Status}} ->
			{ok, Status};
			%tail_loop(Port, Callback);
	
		{Port, eof} ->
			port_close(Port),
			{ok, eof};

		{snap, Who} ->
			Who ! { Port, Callback},
			tail_loop(Port, Callback);

		stop ->
			port_close(Port),
			{ok, stop};
		
		_Any ->
			tail_loop(Port, Callback) 
	end.

display(Bin) ->
	Content = iolist_to_binary(Bin),
	io:format("[INFO] ~s~n", [Content]).

Let's say you want to monitor "/var/log/messages", here's how you can do it:


Shell> Tail = tail:start("messages").

This will display every new line (running in background) in your shell session.

Now let's say you want to do some tricky things with every line, you can pass as a parameter a callback fun:


Shell> Pid = logger_new(). % an example
Shell> Callback = fun(X) -> Pid ! {line, X} end. % sending a tuple to Pid
Shell> Tail = tail:start("message", Callback).

Finally, you'll be able to hack the code and transform this method to "tail" multiple files since "tail" is able to watch more than one file...

Quick tip :


init(ListOfFiles, Callback, Dir) ->
	Args = [ [ X, $ ] || X <- ListOfFiles ]
	Cmd = "/usr/bin/tail --follow=name "++ lists:flatten(Args),

Happy Tailing !

Wednesday, May 14, 2008

Following your directories

While designing some "integrity checker" tool, I've found myself in trouble whenever I need to
manage directories...

It seems that erlang, for now, is not able to detect "symbolic links" (Unix definition). Opening "kernel/include/file.hrl" holds the truth... The module "filelib" doesn't have any thing related to 'links' neither.

So if you have some links that point to "." you'll observe that "filelib:fold_files" follow the link many times (hopefully its stop somewhere) but you loose some precious time and increase disk accesses...

Then I've rewrote the fold_files to detect links by searching in the path some identical elements:


checktree([]) ->
	true;
checktree([_Elem,_Elem|_Rest]) ->
	false;
checktree([_|Rest]) ->
	checktree(Rest).

This is not really high quality but seems to work as expected...

I've also added some "maxdeep" functionality that prevent the script to go too many deeper.

I've named this module "wfile" for no particular reason :) and here's the full code:


-module(wfile).
-export([list/1, list/2]).

list(Dir) ->
	Fun = fun(X, _Acc) -> io:format("+ ~s~n", [X]) end,
	list(Dir, Fun).

list(Dir, Fun) ->
	fold_files(Dir, Fun, []).

fold_files(Dir, Fun, Acc) ->
	fold_files(Dir, true, Fun, Acc).

fold_files(Dir, Recursive, Fun, Acc) ->
	fold_files1(Dir, Recursive, Fun, Acc).

fold_files1(Dir, Recursive, Fun, Acc) ->
	case file:list_dir(Dir) of
		{ok, Files} -> fold_files2(Files, Dir, Recursive, Fun, Acc);
		{error, _}  -> Acc
	end.

fold_files2([], _Dir, _Recursive, _Fun, Acc) -> 
	Acc;
fold_files2([File|T], Dir, Recursive, Fun, Acc0) ->
	FullName = filename:join(Dir, File),
	case filelib:is_regular(FullName) of
		true  -> 
			Acc = Fun(FullName, Acc0),
			fold_files2(T, Dir, Recursive, Fun, Acc);

		false ->
			case Recursive and filelib:is_dir(FullName) and maxdeep(FullName, 6) of
				true ->
					Acc1 = fold_files1(FullName, Recursive, Fun, Acc0),
					fold_files2(T, Dir, Recursive, Fun, Acc1);
				false ->
					fold_files2(T, Dir, Recursive, Fun, Acc0)
		end
    end.

maxdeep(Filename, Max) ->
	Elems = filename:split(Filename),
	( Max > length(Elems) ) and checktree(Elems).

checktree([]) ->
	true;
checktree([_Elem,_Elem|_Rest]) ->
	false;
checktree([_|Rest]) ->
	checktree(Rest).

To end to story, here's how I used this module:


erl> IC = integrity_checker:start().
erl> wfile:list("/home/rolphin/tmp", fun(X, Acc) -> IC ! {add, X}, io:format("added: ~s~n", [X]) end).

Thursday, May 8, 2008

EasyErl goes Mobile Friendly

Hi, I'm experiencing Mobile Tagging.
You can see it at the left side of this page.
Tags are generated dynamically, I'm using of course Erlang, Mochiweb, Ghostscript (rendering backend) and iec16022 to build the semacode.

Since I own a Nokia N95, I can efficiently scan my own codes, but I don't know if it's the case for everyone :)

So leave me a comment if you find those code hard to read...
Thanks

Sunday, March 9, 2008

Directories Recursively and Simple Binary Matching

Hi, It has been a long time :)

So today two simple things first, some high order fun to work on tree:


-module(dir).
-export([create/1]).

create(List) ->
        H = fun(X) ->
                fun(Y) ->
                        Dir = filename:join([X,Y]),
                        io:format("mkdir(~s)~n", [Dir]),
                        Dir
                        end
                end,
        build(fun(X) -> X end, H,  List).

build(_Fun, _Builder, []) ->
        ok;
build(Fun, Builder, [Elem|List]) ->
        NewFun = Builder(Fun(Elem)),
        build(NewFun, Builder, List).

A sample session:


5> dir:create(["ab","cd", "ef","12", "35", "av"]).
mkdir(ab/cd)
mkdir(ab/cd/ef)
mkdir(ab/cd/ef/12)
mkdir(ab/cd/ef/12/35)
mkdir(ab/cd/ef/12/35/av)
ok

What's interesting is the Builder fun that construct the fun that will be called the next time.
That way we know were we are in the tree... The last Fun has all the knowledge of its ancestors :)

Now a simple binary matcher code, I need it while extracting values from regex results. It takes a list of offsets and returns what's inside:


slice(Slices, Bin) ->
       slice(Slices, [], Bin).

slice([], Acc, _Bin) ->
       lists:reverse(Acc);
slice([ {Start, Stop} | Rest ], Acc, Bin) ->
       Len = Stop - Start,
       <<_:Start/binary,Value:Len/binary,_/binary>> = Bin,
       slice(Rest, [ Value | Acc ], Bin).