Thursday, June 26, 2008

Parsing Binaries with erlang, lamers inside

Edit: I don't want to offend anyone with the following, this is just an expression of what i encounter every day with people that do technology but don't know anything. I don't say anyone on the mailling list is a lamer, I say that question like the one evocated here exists because there's too much ignorance in the technology world. And lastly, don't forget this is a personal rant...

It's seems that there's a really high expectation on "parsing binaries" with erlang and reach others languages performance. For me this is a complete nonsense.
What's the meaning of "parsing a binary", a binary is not a text, it's a binary, a sequence of bytes...
A sequence must be defined by its length. You must know before reading anything, the size that'll be needed store what's coming.
Every crap software you can find has always prefered to use strcpy instead of memcpy. Whatever the language you use, you MUST know the size of what you're working with, this is not an advice this is mandatory.

From the post above, you can find that the only delimiter seems to be "\r\n". So if someone sends you 4Gb of data not ending with "\r\n" you'll keep reading it... (and of course blow your memory because this was not supposed to be)

While working at low level with C and flex scanners, I've always ask me this question: "What's the max size of the element I can accept ?". This simple question helps me build software that don't break with a simple 'perl print Ax60000' trick...

So is HTTP badly designed, because delimiters are "\r\n" and headers can spread on multiple lines ? The answer is absolutely YES.
Was'it difficult to build something more secure, using prefixed elements with their size ? The answer is absolutely NO ! (take ajp13 for example...)

Now that erlang is becoming more and more popular, lamers are lurking in the erlang direction. This is life, but will the erlang mailling list suffer from this ? The answer is yes :/

Someone with knowledge must not try to resolve someone's else problem, he must help him by asking the good question. (do you know the size a priori ?)

Why parsing binaries in java is faster than erlang ? Who cares, since parsing binaries is of course stupid !
Parsing real world protocol with erlang is lightning fast, both for writing and for executing. So teach lamers how to build real protocols and don't try help them with some trickery.

That's my rant for today :)

Friday, June 20, 2008

Quick Tip, list join

Another "lists:join" :

Join = fun([X|Rest], D) ->
[ X | [ [D,E] || E <- Rest ] ]
end.

Usage:

io:format("~s~n", [Join(["a", "b", "cde", "h", "k", "lm"], $,)]).
a,b,cde,h,k,lm
ok

Wednesday, June 18, 2008

Ubuntu and Ghostscript

On my development box, I've recently upgraded my ubuntu. The ghostscript package was also upgraded, but my erlang webservice wasn't able anymore to draw any mobile tag...

I've found that the new ghostscript binary 'gs' has new command line parameters that are incompatible with their previous version...

I used to initialize gs like this:

Cmd = "gs -sDEVICE=pngalpha -q -dNOPLATFONTS -dNOPAUSE -dGraphicsAlphaBits=2 -sOutputFile=- -",

But that no longer works since this flushing is done only when the process quits...

The correct command line is then:

Cmd = "gs -sDEVICE=pngalpha -q -dGraphicsAlphaBits=2 -sOutputFile=%stdout -dNOPROMPT",


Everything works fine now, but I've re-read the very long Ghostscript documentation, and the solution comes from this page.

Friday, June 13, 2008

Quick Bash Script for Checking HTTP Headers

Whenever it comes to efficiently configure Web Servers and setting headers like ETAg, Cache-Control, or Expires and you have more than one server, you need to check them all since any user may hit different servers.

This script is meant to call wget on every IP returned by dig and display http response headers.
Usage is simple:


./script.sh http://www.example.com/ressource/with-long-cache-and-no-etag/big-image.jpg


Here's the code:

#!/bin/bash

BIN=${0##*/}
URL=${1?usage: $BIN url}
HOST=${URL#http://*}
REQ=${HOST#*/}
HOST=${HOST%%/*}

function check
{
wget -S --header="Host: $HOST" "http://$line/$REQ" -O/dev/null
}

dig +short $HOST | \
while read line
do
case $line in
[a-z]*)
;;
[0-9]*.[0-9]*.[0-9]*)
echo === Testing $HOST with IP $line
check $line
;;
*)
;;
esac
done

Sticky