Thursday, June 26, 2008

Parsing Binaries with erlang, lamers inside

Edit: I don't want to offend anyone with the following, this is just an expression of what i encounter every day with people that do technology but don't know anything. I don't say anyone on the mailling list is a lamer, I say that question like the one evocated here exists because there's too much ignorance in the technology world. And lastly, don't forget this is a personal rant...

It's seems that there's a really high expectation on "parsing binaries" with erlang and reach others languages performance. For me this is a complete nonsense.
What's the meaning of "parsing a binary", a binary is not a text, it's a binary, a sequence of bytes...
A sequence must be defined by its length. You must know before reading anything, the size that'll be needed store what's coming.
Every crap software you can find has always prefered to use strcpy instead of memcpy. Whatever the language you use, you MUST know the size of what you're working with, this is not an advice this is mandatory.

From the post above, you can find that the only delimiter seems to be "\r\n". So if someone sends you 4Gb of data not ending with "\r\n" you'll keep reading it... (and of course blow your memory because this was not supposed to be)

While working at low level with C and flex scanners, I've always ask me this question: "What's the max size of the element I can accept ?". This simple question helps me build software that don't break with a simple 'perl print Ax60000' trick...

So is HTTP badly designed, because delimiters are "\r\n" and headers can spread on multiple lines ? The answer is absolutely YES.
Was'it difficult to build something more secure, using prefixed elements with their size ? The answer is absolutely NO ! (take ajp13 for example...)

Now that erlang is becoming more and more popular, lamers are lurking in the erlang direction. This is life, but will the erlang mailling list suffer from this ? The answer is yes :/

Someone with knowledge must not try to resolve someone's else problem, he must help him by asking the good question. (do you know the size a priori ?)

Why parsing binaries in java is faster than erlang ? Who cares, since parsing binaries is of course stupid !
Parsing real world protocol with erlang is lightning fast, both for writing and for executing. So teach lamers how to build real protocols and don't try help them with some trickery.

That's my rant for today :)

3 comments:

Unknown said...

Sebastian Dehne posed a problem he encountered to the erlang-questions list. It was initially met with skepticism, thinly veiled incredulity and unhelpful Erlang apologetics. Thankfully Per Gustafsson posted a simple solution (update Erlang to R12B-3) that addressed Sebastien's performance concern.

I would also not label genuinely interested newcomers as 'lamers'. The attitude of this post does injury to the reputation of the Erlang community. It lowers my impression of your blog substantially. Please retract this harsh and disingenuous commentary.

Antoine said...

Hi,
I'm sorry for that post, this is not against someone in particular. This is just an expression of problem I'm facing every day.

I'm not saying anyone is a lamer, I just want to say that stupidity rules too much technology.

I'll reedit this post to make my point clearer...
Thanks for your comment.

Unknown said...

Thank you rolphin :)

I know where you are coming from. I find a lot of topics on erlang-questions deeply frustrating to read. There is a serious problem with the average developer either asking questions prior to looking at any documentation or standing their ground that Erlang should be altered to match the way they think. It is really disappointing.

Sticky