Tuesday, July 31, 2007

Connecting Erlang to Blogger (Part 2) - Adding an entry

For this second part, we start where we left the last time.
We were able to read the response of a succesful login, data was three lines of key value pairs.
The last line holds the final 'AuthToken' we need to send to the blogger atom post service...

So here's the code to extract the line that begins with the 'Auth' keyword and store the value after the '=' and before the end of line:

extract_auth(<<>>) ->
{error, not_found};
extract_auth(<<"Error=", Rest/binary>>) ->
Size = size(Rest) - 1,
<<Msg:Size/binary, _/binary>> = Rest,
{error, binary_to_list(Msg)};

extract_auth(<<"Auth=", Rest/binary>>) ->
{ok, Rest};
extract_auth(<<_:1/binary, Rest/binary>>) ->
extract_auth(Rest).

Note that we are also able to read 'Error' lines, those lines are sent in case of an failed login attempt...
I can describe what 'extract_auth/1' do like this:
  1. if binary in empty returns {error, not_found}
  2. if binary begins with 'Error' catch it and returns its content with the tuple {error, Msg}
  3. if binary begins with the 'Auth' keyword extract everything till the end of the binary
  4. in any other case extract one character and parse the rest of the binary


To conclude, upon successful login this fun will return:

{ok, "AAAAAQAARAFA..."}

This is, of course our Blogger AuthToken...

Now how can we use it ? This is Simple !
Open a new file named blogger.erl and write something like this:

-module(blogger).
-export([ new/3, post/3, test/2, template/3 ]).

post(AuthToken, Title, Content) ->
Data = template(Title, Content),
request(AuthToken, iolist_to_binary(Data)).

template(Title, Content) when is_list(Content) ->
template(Title, Content, {"none", "none"}).

template(Title, Content, Author) when is_list(Content) ->
template(Title, list_to_binary(Content), Author);

template(Title, Content, Author) ->
{AuthorName, AuthorEmail} = Author,
[ <<"<entry xmlns=\"http://www.w3.org/2005/Atom\">\n<title type=\"text\">\n">>,
list_to_binary(Title),
% <<"</title><content type='xhtml'><div xmlns='http://www.w3.org/1999/xhtml'>">>,
<<"</title>\n<content type=\"text\">">>,
Content,
<<"</content>\n<author><name>">>,
list_to_binary(AuthorName),
<<"</name>">>,
list_to_binary(AuthorEmail),
<<"<
</author>\n</entry>\n">> ].

request(AuthToken, Data) when is_binary(AuthToken) ->
request(binary_to_list(AuthToken), Data);

request(AuthToken, Data) ->
io:format("Sending: ~nContent-length: ~p~nBody:~n~s~n", [ size(Data), Data ]),
Authorization = "GoogleLogin auth=" ++ AuthToken,
Url = "http://www.blogger.com/feeds/199963XXXX081936700/posts/default", % Put your BlogID, this one is invalid
case http:request(post,
{ Url,
[ { "Authorization", Authorization } ],
"application/atom+xml; charset=utf-8", Data},
[ {timeout, 3000}, {sync, false} ],
[ {body_format, binary} ]) of

{ok, Result} ->
%io:format("Received: ~p~n", [Result]),
{_,_,Body} = Result,
Body;

{error, Reason} ->
io:format("Error: ~p~n", [Reason])
end.


Once you're done, you'll be pleased to found that this doesn't work :/.
Yep, I wasn't able to post anything !
May be I've missed something in the documentation, but all I get is an nice SAXexception...

Testing my Erlang Blogger API ...

I'm stuck on this error message !

<<"org.xml.sax.SAXParseException: Content is not allowed in prolog.">>


I'm simply using this code:

request(AuthToken, Data) ->
io:format("Sending: ~nContent-length: ~p~nBody:~n~s~n", [ size(Data), Data ]),
Authorization = "GoogleLogin auth=" ++ AuthToken,
Url = "http://www.blogger.com/feeds/19996386XX081936700/posts/default",
case http:request(post,
{ Url,
[ { "Authorization", Authorization } ],
"application/atom+xml; charset=utf-8", Data},
[ {timeout, 3000}, {sync, false} ],
[ {body_format, binary} ]) of

{ok, Result} ->
%io:format("Received: ~p~n", [Result]),
{_,_,Body} = Result,
Body;

{error, Reason} ->
io:format("Error: ~p~n", [Reason])
end.



And the resulting Body is always the error:

<<"org.xml.sax.SAXParseException: Content is not allowed in prolog.">>


Help Meeee !

Monday, July 30, 2007

Connecting Erlang to Blogger (Part 1) - Auth with ClientLogin

With the Gdata API from google you can connect your application to some nice services... Calendar, Blogger etc.
Since this is completly REST based you can of course use your 'http:request' to connect and exploit those services. Let's begin with the ClientLogin process.

For this article we will focus on the Blogger API, the main purpose is of course create an Erlang client for Blogger :)

Connecting to google is as simple as sending something like this:

accountType=HOSTED_OR_GOOGLE&Email=YOURGOOGLEACOUNT&Passwd=YOURPASSWORD&source=SelfCo-TestApp-1&service=blogger


Now we can do it in Erlang too !. First we need to build the query string, second we need to send it to the ClientLogin service using 'http:request'.


auth(Username, Password, Application) ->
Sep = <<"&">>,
Post = [
<<"accountType=HOSTED_OR_GOOGLE&">>,
<<"Email=">>, list_to_binary(Username), Sep,
<<"Passwd=">>, list_to_binary(Password), Sep,
<<"source=">>, list_to_binary(Application), Sep,
<<"service=blogger">> ],
request(erlang:iolist_to_binary(Post)).



The fun 'erlang:iolist_to_binary/1' transforms the list of binaries to a simple binary, this is not really necessary but this will ease yourself later for debugging...

Now we can send this query string to the google ClientLogin process:

request(Data) ->
case http:request(post,
{"https://www.google.com/accounts/ClientLogin", [],
"application/x-www-form-urlencoded", Data},
[ {timeout, 3000} ], [{stream, "/tmp/google.test"}, {body_format, binary}]) of

{ok, saved_to_file} ->
io:format("Saved to file~n");

{ok, Result} ->
io:format("Received: ~p~n", [Result]);

{error, Reason} ->
io:format("Error: ~p~n", [Reason])
end.


  • This is a POST query
  • The service is https://www.google.com/accounts/ClientLogin
  • The content-type is application/x-www-form-urlencoded
  • We sets the timeout to 3 seconds
  • We store the result (if successful to '/tmp/google.test')


Let's try this code:

65> google:auth("test@gmail.com", "secretcode").
Received: {{"HTTP/1.1",403,"Forbidden"},
[{"cache-control","no-cache"},
{"date","Sun, 29 Jul 2007 20:44:20 GMT"},
{"pragma","no-cache"},
{"server","GFE/1.3"},
{"content-length","24"},
{"content-type","text/plain"}],
<<"Error=BadAuthentication\n">>}
ok

The connection fails, so let's try with a valid user account:

70> google:auth("validaccount@gmail.com", "validpassword").
Saved to file
ok

Success !

The content of '/tmp/google.test':

SID=DQAAAG8AAACuATb7YJxMdqQhp0LIf546SWLfDNfTlANffRc0B6OGbTat4Ebdj89s6hVEzfNZRL...
LSID=DQAAAHEAAAAG1iqBgOrgzrY5cdgpBv9y42HxkvjNuUaYKImw6yH7xh0GtL5EG19C9GkGdPEb1...
Auth=DQAAAHAAAAAG1iqBgOrgzrY5cdgpBv9y42HxkvjNuUaYKImw6yH7xh0GtL5EG19C9GkGdPEb1...


The final token we need is the 'Auth=' one, this string will be passed with every new query as an 'Authorization' header:

Authorization: GoogleLogin auth=DQAAAHAAAA...


Next Time in Part 2, I'll show you how we'll use this AuthToken and how we will be able to post a message to our blog !

Tuesday, July 24, 2007

Erlang and JBOSS, talking AJP13 ! (PART I)


-module(ajp13).
-export([get/3, cping/2, request/1, hexdump/1]).


Every ajp packets starts with 0x1234. In Erlang when you need to express this thing you just need to use the notation 'Base#number'.
So for our example, here's the 'ajp_header' fun :
   
ajp_header() ->
<<16#12, 16#34>>.


We use the binary notation to write 2 bytes expressed on base 16 (hexadecimal). To be crystal clear hex notation can be written with '16#'... :

Eshell V5.5.1 (abort with ^G)
1> 16#deadbeef.
3735928559

I'm sure you get the point !

Let's comes back to our AJP problem... Now that we can write hexadecimal number we can reread the ajp13 protocol description,
and succesfully start to build a simple packet:

get(Host, Port, Url) ->
H = ajp_header(),
Request = request(Url),
Length = size(Request),
Data = <<H/binary, Length:16, Request/binary>>,

case gen_tcp:connect(Host, Port, [binary, {packet, 0}]) of
{ok, Socket} ->
send(Socket, Data),
loop(Socket);

{error, Msg} ->
io:format("error: ~p~n", [Msg])
end.

Let's look at a simple command in the ajp13 protocol, the 'ping', here's its implementation:
                                   
cping() ->
<<
10:8
>>.

cping(Host, Port) ->
H = ajp_header(),
Request = cping(),
Length = size(Request),
Data = <<H/binary, Length:16, Request/binary>>,

case gen_tcp:connect(Host, Port, [binary, {packet, 0}]) of
{ok, Socket} ->
send(Socket, Data),
loop(Socket);

{error, Msg} ->
io:format("error: ~p~n", [Msg])
end.

What's important to see is also that ajp13 is derived from the xdr protocol, where every type is always written with its length... In ajp13 this length is always encoded as two bytes (16bits so max size is 16#ffff ;)

The 'Data' variable is what you should look at :

Data = <<H/binary, Length:16, Request/binary>>,

  • H is the ajp header
  • Length is the length of Request written on 2 bytes (2 * 8)
  • Request is the request

Now that we've sent the packet, we need to catch the response, so here's the 'loop' fun:

loop(Socket) ->
receive
{tcp, Socket, Data} ->
% io:format("~p~n", [Data]),

case ajp_response(Data, Socket) of
{ok, continue} ->
loop(Socket);

{ok, body, Bin} ->
io:format("Body: read ~p bytes~n", [size(Bin)]),
loop(Socket);

{ok, closed} ->
gen_tcp:close(Socket)
end;

{tcp_error, Socket, Error} ->
io:format("Error: ~p~n", [Error]),
loop(Socket);

{tcp_closed, Socket} ->
io:format("Closed~n")

after 8000 ->
io:format("Timeout~n"),
gen_tcp:close(Socket)

end.

Whenever our erlang process will receive a message matching the '{tcp, Socket, Data}' tuple we will parse the 'Data' with the 'ajp_response' fun:

ajp_response(<<65,66,0,2,5,1>>, _Socket) ->
{ok, closed};
ajp_response(<<65,66,Rest/binary>>, Socket) ->
ajp_data_length(Rest, Socket);
ajp_response(Bin, Socket) ->
{ok, body, Bin}.

Yeah ! Polymorphism ! Or matching power ?! Whatever, this completely rox the programming planet !
We are simply matching binary data... Binary data that's sent back to us from the jboss server (in our case).

Ajp13 protocol describes the termination of the request by a packet containing 'AB' followed by the response length '2' bytes which are '5' and '1'.

ajp_response(<<65,66,0,2,5,1>>, _Socket) ->
{ok, closed};

Remember Length is encoded on two bytes: '0,2'...

Now comes the AJP13_FORWARD_REQUEST !!!

request(Request) ->
{Protocol, L0} = ajp_string("HTTP/1.1"),
{Request_uri, L1} = ajp_string(Request),
{Remote_addr, L2} = ajp_string("127.0.0.1"),
{Remote_host, L3} = ajp_string("ajbchecker"),
{Server_name, L4} = ajp_string("www.server-example.com"),

<<
2:8, %byte JK_AJP13_FORWARD_REQUEST
2:8, %byte GET
L0:16, Protocol/binary, %string
L1:16, Request_uri/binary, %string
L2:16, Remote_addr/binary, %string
L3:16, Remote_host/binary, %string
L4:16, Server_name/binary, %string
80:16, %integer
0:8, %boolean
1:16, %integer
16#A0, 16#0B, %Header: Host
L4:16, Server_name/binary, %Servername
16#ff %terminator
>>.

ajp_string(String) ->
S = list_to_binary(String),
Bin = <<S/binary, 0>>,
{Bin, size(Bin) - 1}.



The 'ajp_string/1' is used to calculate the final size of the binary data, and is simply used with the ajp13 string encoding format...

BTW, it's really time consuming to explain code ! When I started this article I was thinking that I'll finish it rather quickly, and I realise now that's not the case, there's so many things to say...
This is why I stop here for the First Part, the next part will come tomorrow...

Thursday, July 19, 2007

Parallelizing simple external commands ... Part II

Our loop/3 fun looks like this:

loop(_Max, 0, []) ->
unregister(computing_master),
exit(normal);

Whenever our list of jobs is empty, we deregister the 'computing_master' process and quit normally.

loop(Max, Current, []) ->
receive
stop ->
unregister(computing_master),
exit(normal);

{exited, _Result} ->
io:format("Still ~p childs~n", [Current]),
loop(Max, Current - 1, []);

E ->
io:format("Unhandled message: ~p~n", [E])

after 60000 ->
io:format("~p: Waiting for the last process ~p/~p~n", [erlang:now(), Max, Current]),
loop(Max, Current, [])
end;

In this case, we have are computing the last external process since our job list is empty.
And finally this version of loop/3 is the main one:

loop(Max, Current, List) ->
receive
stop ->
unregister(computing_master),
exit(normal);

{update, NewMax} ->
upto(NewMax, Max, List);

{exited, _Result} ->
io:format("Still ~p childs~n", [Max]),
upto(Max, Max - 1, List);

E ->
io:format("Unhandled message: ~p~n", [E])

after 60000 ->
io:format("~p: Running ~p processes~n", [erlang:now(), Max]),
upto(Max, Current, List)
end.


Here we have a non empty list of job and a number of job to start.
  • Every 60 seconds we write how many processes are running.
  • The message {update, NewMax} let's you alter the number max of concurrent tasks
  • The message {exited, _Result} is received whenever a child process dies, so we restart another job...


Bonus Code, a simple function to test the code:

sleep(Ident) ->
io:format("Waiting ~p~n", [Ident]),
Delay = [ "5", "3", "15", "8" ],
Time = lists:nth(random:uniform(4), Delay),
Cmd = [ "sleep ", Time ],
io:format("Starting: ~p~n", [Cmd]),
Status = os:cmd(Cmd),
computing_master ! {exited, Status}.

This code just calls the 'sleep' command with various arguments picked randomly... Once a process stops the 'os:cmd/1' fun exits and 'computing_master' will receive the {exited, Status} message (explained above)

Tuesday, July 17, 2007

Parallelizing simple external commands ... Part I

Once upon a time I need to parse enormous files to find simple patterns... My prefered tools were so far the shell based one, i.e. 'grep'.

But now I have a Magical Ability, Erlang Magic... So I decide to split this enormous file, using the 'split' comand, 'split -l 10000' for example.

Now that I have a lot of smaller file, I can parallelize their parsing, and this is were erlang comes...

First, let's design a bit:
  • I need a central process that will control all my processes
  • Processes and master must be able to communicate
That's all. Hopefully the latter is directly provided by erlang, this the ! operator.
The master process will be a little more tricky, but this is 'easyerl' remember, so here we go:

doit(Step) ->
Master = spawn(?MODULE, test, [Step]),
register(computing_master, Master).

test(Step) ->
file:set_cwd("/home/rolphin/Work"),
List = filelib:wildcard("seg-a*"),
upto(Step, 0, List).


We create a process running the test function, whose job is starting the upto/3 fun...
What's interesting here is the 'filelib' function that provides me the list of file contained in the directory '/home/rolphin/Work'.

Now we go describe the 'upto/3' fun :

upto(Max, Current, []) ->
loop(Max, Current, []);

upto(Max, Max, List) ->
loop(Max, Max, List);

upto(Max, Current, [New|List]) ->
io:format("upto: ~p/~p~n", [Max, Current]),
spawn(?MODULE, grep, ["user.list", New, ["result-", New]]),
upto(Max, Current + 1, List).


More details:
  • upto with an empty list will just call the loop/3 fun
  • upto with the Max number of processe allowed equals the current number of process, will call loop/3
  • upto with less active process than the max, with a non empty list, will spawn a child process
The child process is a 'grep' command, and here it is:

grep(File, Source, Result) ->
% Command line is: "grep -f motif_file sourcefile > result"
Cmd = [ "grep -f ", File , $ , Source, $>, Result ],
io:format("Starting: ~p~n", [Cmd]),
Status = os:cmd(Cmd),
computing_master ! {exited, Status}.

Okay that's it for today ! It's a little late now ! And I need some sleep to succesfully pass the required skill tests for my new job !

More of this tomorrow...

Tuesday, July 3, 2007

Simple command execution

Sometimes you need to run external commands, and just need the return value or exit code...
One simple way to do this is the following:

-module(tport).
-export([ execute/2 ]).

execute(Host, Cmd) ->
Port = open_port({spawn, Cmd}, [ {cd, Host}, exit_status, binary ] ),
wait(Port).

wait(Port) ->
receive
{Port, {data, BinData}} ->
io:format("dump:~n~p~n", [BinData]),
wait(Port);
{Port, {exit_status, Status}} ->
io:format("exit_code: ~p~n", [Status]);
%% {Port, eof} ->
%% port_close(Port);
{Port, exit} ->
io:format("Received : ~p~n", [Port])
end.


Once a port opened your process will receive various messages and one we're interested in is the 'exit_status' one:

{Port, {exit_status, Status}} ->
io:format("exit_code: ~p~n", [Status]);

The variable 'Status' will hold the exit code.

Simple isn't it ?

Sticky