Tuesday, September 22, 2009

erlang: Parsing binary data dynamically

here's a quick tip for parsing binary data which format is unknown at compile time...

Let's say that you have a binary string and that later you receive its structure. Take for
example the code below:

-export([test/0, test/2]).

test() ->
        test(<<4,0,0,0,5,0,0,0,7,0,8,0,33,1>>, [ 4, 4, 2, 2]).

test(Bin, List) ->
        {Final, End} = lists:foldl( fun(Len, {Res, Rest}) ->
                case Rest of 
                        <<M:Len/binary, NewRest/binary>> ->
                                {[ M | Res ], NewRest};
                        <<_:1/binary, NewRest/binary>> ->
                                { Res, NewRest}
                end, {[], Bin}, List),
        {lists:reverse(Final), End}.

Precisely, we want to slice the binary part into 4 parts described as '[4, 4, 2, 2]' where each element is the size.
test() ->
        test(<<4,0,0,0,5,0,0,0,7,0,8,0,33,1>>, [ 4, 4, 2, 2]).

Let's compile and run:
2> c(binm).    
3> binm:test().
Isn't this nice ? :p

Tuesday, August 18, 2009

erlang: testing many conditions easily with lists of funs...

Sometimes you have to test many things before being able to choose the next action...

In many languages, you'll end up using a bunch of "if then else".
But in erlang, and the power of fun()s, you can efficiently write a simple function that will do all the job for you :p

Here is our purpose: call many functions with one argument.
For this example, we need to determine a file type with its filename.

Let's say that:
- the filename could be a valid 'word' temporary file,
- or a 'excel' temporary file or
- a known file type.

Firts let's define a simple fun that takes a list of fun and stop evaluating those fun as soon as a result is found:
% the simple case where the list is empty
any(_, []) -> undefined;

% the general case when the list contains funs...
any(Arg, [ {F, PrepareFun} | Funs] ) ->
        case F( PrepareFun(Arg) ) of
                undefined ->
                        any(Arg, Funs);

                _V ->

In this code you'll notice that there are two fun()s:
- the 'F',
- the 'PrepareFun'.
The idea is that 'PrepareFun' will be called before calling 'F' to filter the argument 'Arg'.
Imagine that sometimes you need to extract the basename from the filename, or whatever else...

The code is a simple list iteration that recurse only if the result of the function call is 'undefined'.

Now that we have a valid fun that can iterate over a list of funs and stop whenever a valid result is found (or end of list), let's get back to our example, and build our 'filetype' function:
fileType(File) ->
        any( File, [ 
                        {fun word_temp/1, fun filename:basename/1}, 
                        {fun db_extension/1, fun lists:reverse/1},
                        {fun excel_temp/1, fun filename:basename/1}

You can read this code like this:
"any of the funs from the list may determine the type of the file".
And once found, stops.

Let's describe those called functions 'db_extension/1', 'word_temp/1', 'excel_temp/1'...

First 'db_extension':
You'll notice that we test only the end of the filename, that's why the filename is
reversed before being passed to the function:
db_extension( "pmt."  ++ _ ) -> temp;
db_extension( "PMT."  ++ _ ) -> temp;
db_extension( "xcod." ++ _ ) -> doc;
db_extension( "cod."  ++ _ ) -> doc;
db_extension( "xslx." ++ _ ) -> xls;
db_extension( "slx."  ++ _ ) -> xls;
db_extension( "xtpp." ++ _ ) -> ppt;
db_extension( "tpp." ++ _ ) -> ppt;
db_extension( _ ) -> undefined.

The 'word_temp/1' need to call the basename of the file but we don't need the full path, so 'PrepareFun' is simply 'filename:basename/1' in this case:
word_temp( "~$"   ++ _) -> temp;
word_temp( "~WRD" ++ _) -> temp;
word_temp( "~WRL" ++ _) -> temp;
word_temp( _ ) -> undefined.

For 'excel_temp/1', the temp file is determined by a number written as 8 hexadecimal values. We use the re module to easily match this with the filename. In this case the 'PrepareFun' is also the 'filename:basename/1':
excel_temp( File ) ->
        ReList = [ <<"^[0-9A-Z]{8}$">> ],
        do_re(File, ReList).
% We are able to test many re but in the specific 
% case the list contains only one element...
do_re(_, []) -> undefined;
do_re(Subject, [ Re | Rest ]) ->
        case re:run(Subject, Re, [{capture,none}]) of
                nomatch ->
                        do_re(Subject, Rest);

                match ->

From the re module, options "capture none" is used to only returns if the re match, and not the part that successfully match...
(this is simple optimisation, since we don't care about the matching part)

If we look at back at what we've done here, we can see that
fileType(File) ->
        any( File, [ 
                        {fun word_temp/1, fun filename:basename/1}, 
                        {fun db_extension/1, fun lists:reverse/1},
                        {fun excel_temp/1, fun filename:basename/1}

can really easily extended with other functions, as long as those new functions take only one parameter...
fileType(File) ->
        any( File, [ 
                        {fun word_temp/1, fun filename:basename/1}, 
                        {fun db_extension/1, fun lists:reverse/1},
                        {fun excel_temp/1, fun filename:basename/1},
                        {fun firefox_temp/1, fun filename:basename/1},
                        {fun directory_temp/1, fun(X) -> X end}

Building list of functions is an efficient way of "testing many conditions".

erlang: how to make a windows service

Tired of fighting with the command line to make erlsrv work ?
I have a solution for you !
The problem are always the quotes, you have quotes for erlang and quotes for the windows command line...
Here's what I use to test my service:

(Pack everything in a simple "install.bat")

erlsrv remove "YourService"
erlsrv add "YourService" -stopaction "init:stop()." -sname Service -debugtype reuse -args "-kernel error_logger {file,\\""C:/Test/kernel.txt\\""} -setcookie YourCookie -s YourInit"

YourInit is the name of the module you want to start. The fun "start/0" will be called by "erl".

This install.bat is meant to be your debug version of your service, because the log file will grow indefinitely.

See the documentation for more information:

DebugType: Can be one of none (default), new, reuse or console. Specifies that output from the Erlang shell should be sent to a "debug log". The log file is named "servicename".debug or "servicename".debug."n", where "n" is an integer between 1 and 99. The log-file is placed in the working directory of the service (as specified in WorkDir). The reuse option always reuses the same log file ("servicename".debug) and the new option uses a separate log file for every invocation of the service ("servicename".debug."n"). The console option opens an interactive Windows® console window for the Erlang shell of the service.
The console option automatically disables the StopAction and a service started with an interactive console window will not survive logouts, OnFail actions do not work with debug-consoles either. If no DebugType is specified (none), the output of the Erlang shell is discarded.
The consoleDebugType is not in any way intended for production. It is only a convenient way to debug Erlang services during development. The new and reuse options might seem convenient to have in a production system, but one has to take into account that the logs will grow indefinitely during the systems lifetime and there is no way, short of restarting the service, to truncate those logs. In short, the DebugType is intended for debugging only. Logs during production are better produced with the standard Erlang logging facilities.

If you don't define the "WorkDir" (-w option) your debug file will be located in the "WINDOWS\system32" directory.

Finally, the service will be described in the registry in


Monday, August 17, 2009

erlang: Extracting values from binary streams with macros

Writing a lot of binary matching strings, I now use simple macros to synchronise erlang code with others language...
Let me explain a bit, there were many lines who look like theses:

parse(<< Id:32/little-unsigned, Oid:32/little-unsigned, Soid:16/little-unsigned >>, State) ->

Now I really prefer to see lines looking like this:

parse(<< ?UINT32( Id ),
?UINT32( Oid ),
?UINT16( Soid ) >>, State) ->

The magic trick was to define macros at the beginning of the erl module like this:

-define( UINT32(X), X:32/little-unsigned).
-define( UINT16(X), X:16/little-unsigned).

Now everyone can read those parse lines easily...

erlang: Unicode support for your filenames...

From R13B you have full unicode support for strings.

I'm involved in some kind of interface between the windows kernel and an erlang vm, and I find this "unicode" module really

For your information, internally every file path or file name is encoded as a little endian utf16 string in the windows kernel.
Exchanging information between those two world means that you'll have to convert utf16 into ansi strings.

For example you can create an utf16 binary string using this

unicode:characters_to_binary("your string" latin1, {utf16,little}).

This means that your string is "latin1" and you want a binary utf16 little endian encoded.
Really easy !

Here's some free code that let you easily manipulate file paths and filenames...
I hope this will help someone :p


-export([extension/1, basename/1, dirname/1]).
-export([normalize/1, utf16toansi/1]).

extension(Bin) ->
filename:extension( utf16toansi(Bin) ).

basename(Bin) ->
filename:basename( utf16toansi(Bin) ).

dirname(Bin) ->
filename:nativename( filename:dirname( utf16toansi(Bin) ) ).

test(Mode) ->
Word = "C:\\Program Files\\WINWORD.EXE",
File = unicode:characters_to_binary(Word, latin1, {utf16,little}),
?MODULE:Mode( File ).

utf16toansi(Bin) ->
unicode:characters_to_list(Bin, {utf16,little}).

normalize(File) when is_list(File) ->
Path = filename:dirname(File),
Base = filename:basename(File),
Ext = filename:extension(File),
{Base, Path, Ext};

normalize(Bin) when is_binary(Bin) ->
Path = dirname(Bin),
Base = basename(Bin),
Ext = extension(Bin),
{Base, Path, Ext}.