But now I have a Magical Ability, Erlang Magic... So I decide to split this enormous file, using the 'split' comand, 'split -l 10000' for example.
Now that I have a lot of smaller file, I can parallelize their parsing, and this is were erlang comes...
First, let's design a bit:
- I need a central process that will control all my processes
- Processes and master must be able to communicate
The master process will be a little more tricky, but this is 'easyerl' remember, so here we go:
doit(Step) ->
Master = spawn(?MODULE, test, [Step]),
register(computing_master, Master).
test(Step) ->
file:set_cwd("/home/rolphin/Work"),
List = filelib:wildcard("seg-a*"),
upto(Step, 0, List).
We create a process running the test function, whose job is starting the upto/3 fun...
What's interesting here is the 'filelib' function that provides me the list of file contained in the directory '/home/rolphin/Work'.
Now we go describe the 'upto/3' fun :
upto(Max, Current, []) ->
loop(Max, Current, []);
upto(Max, Max, List) ->
loop(Max, Max, List);
upto(Max, Current, [New|List]) ->
io:format("upto: ~p/~p~n", [Max, Current]),
spawn(?MODULE, grep, ["user.list", New, ["result-", New]]),
upto(Max, Current + 1, List).
More details:
- upto with an empty list will just call the loop/3 fun
- upto with the Max number of processe allowed equals the current number of process, will call loop/3
- upto with less active process than the max, with a non empty list, will spawn a child process
grep(File, Source, Result) ->
% Command line is: "grep -f motif_file sourcefile > result"
Cmd = [ "grep -f ", File , $ , Source, $>, Result ],
io:format("Starting: ~p~n", [Cmd]),
Status = os:cmd(Cmd),
computing_master ! {exited, Status}.
Okay that's it for today ! It's a little late now ! And I need some sleep to succesfully pass the required skill tests for my new job !
More of this tomorrow...
No comments:
Post a Comment