Wednesday, July 9, 2008

Monitoring your servers with sysstat (sar)

There's sometimes things that are so helpfull that you think that everyone is aware of them, but sometimes this is not the case. Here I'll talk about a little package that is so powerful and efficient that you won't change anymore...
Taken from the ubuntu man page:

DESCRIPTION
The sar command writes to standard output the contents of selected cumula-
tive activity counters in the operating system. The accounting system, based
on the values in the count and interval parameters, writes information the
specified number of times spaced at the specified intervals in seconds. If
the interval parameter is set to zero, the sar command displays the average
statistics for the time since the system was started. The default value for
the count parameter is 1. If its value is set to zero, then reports are gen-
erated continuously. The collected data can also be saved in the file spec-
ified by the -o filename flag, in addition to being displayed onto the
screen. If filename is omitted, sar uses the standard system activity daily
data file, the /var/log/sysstat/sadd file, where the dd parameter indicates
the current day. By default all the data available from the kernel are
saved in the data file. Exceptions are interrupts and disks data, for which
the relevant options must be explicitly passed to sar (or to its backend
sadc ) when the data file is created (see options below).


"sar" comes with the sysstat package. Once it's installed you can monitor your server like never before...

Here's the description of the sysstat package from the author
The sysstat utilities are a collection of performance monitoring tools for Linux. 
These include sar, sadf, mpstat, iostat, pidstat and sa tools. Go to the Features page to display
a list of sysstat's features, or see the Documentation page to learn some more about them.

For example, you can watch realtime the network usage:

# sar -n DEV 1 0
Linux 2.6.22-15-generic (xXxXx) 07/09/2008

11:26:36 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
11:26:37 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11:26:37 AM eth0 5.05 0.00 0.86 0.00 0.00 0.00 0.00

11:26:37 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
11:26:38 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11:26:38 AM eth0 4.00 0.00 0.45 0.00 0.00 0.00 0.00
...

Today, I'll introduce the erlang-sar package that's able to retrieve information from the sar command.

The application is composed of a collector "sar_collector", a helper module "sar_values" and the main module "sar".
Here comes a quick sample session:

% Starting the collector
sar_collect:start().

% Retrieving the data
sar:stats(cpu).
[{cpu,idle,<<"98.62">>},
{cpu,steal,<<"0.00">>},
{cpu,iowait,<<"0.00">>},
{cpu,system,<<"0.18">>},
{cpu,nice,<<"0.00">>},
{cpu,user,<<"1.20">>}]

% Retrieving more data
sar:stats([cpu,mem]).
[{swap,swpcad,<<"33236">>},
{swap,usage,<<"64.72">>},
{swap,used,<<"389872">>},
{swap,free,<<"212492">>},
{mem,kbcached,<<"84496">>},
{mem,kbbuffers,<<"63408">>},
{mem,memused,<<"98.78">>},
{mem,kbmemused,<<"508984">>},
{mem,kbmemfree,<<"6308">>},
{cpu,idle,<<"97.83">>},
{cpu,steal,<<"0.00">>},
{cpu,iowait,<<"0.75">>},
{cpu,system,<<"0.20">>},
{cpu,nice,<<"0.00">>},
{cpu,user,<<"1.22">>}]


The module "sar_values" also export an "extractor" function that can be used to build fun()s:

% build a Mem fun()
Mem = sar_values:extractor(mem).

% Calling Mem fun() on sar:stats()
Mem(sar:stats([cpu,mem])).
[{kbcached,<<"84496">>},
{kbbuffers,<<"63480">>},
{memused,<<"98.77">>},
{kbmemused,<<"508976">>},
{kbmemfree,<<"6316">>}]

% Calling it on sar:stats()
Mem(sar:stats()).
[{kbcached,<<"84496">>},
{kbbuffers,<<"63520">>},
{memused,<<"98.80">>},
{kbmemused,<<"509100">>},
{kbmemfree,<<"6192">>}]


With this package you have access to all the data sar can export for you.
Here's the "sar.erl" code:

-module(sar).

-export([systat/0, stats/0, stats/1, option/1]).
-export([extract/1]).
-define(OPTIONS, "-u -r -v -c -q -n DEV").
-define(DATA, "/tmp/last").

systat() ->
Cmd = "sadf " ++ ?DATA ++ " -- " ++ ?OPTIONS,
execute(".", Cmd).

stats() ->
Cmd = "sadf " ++ ?DATA ++ " -- " ++ ?OPTIONS,
{ok, _, Bin} = execute(".", Cmd),
extract(Bin).

stats(List) when is_list(List) ->
Args = lists:foldl(fun(X, Acc) -> case option(X) of
error ->
Acc;
T ->
[ $ , T | Acc ]
end end, [], List),
Cmd = "sadf " ++ ?DATA ++ " -- " ++ lists:reverse(Args),
{ok, _, Bin} = execute(".", lists:flatten(Cmd)),
extract(Bin);

stats(Elem) ->
stats([Elem]).

option(cpu) ->
"-u";
option(disk) ->
"-d";
option(sock) ->
"-n SOCK";
option(eth0) ->
"-n DEV";
option(eth1) ->
"-n DEV";
option(eth2) ->
"-n DEV";
option(proc) ->
"-c";
option(run) ->
"-q";
option(mem) ->
"-r";
option(inode) ->
"-v";
option(switch) ->
"-w";
option(swaping) ->
"-W";
option(_) ->
error.

execute(_Host, Cmd) ->
Port = open_port({spawn, Cmd}, [ exit_status, binary ] ),
wait(Port, []).

wait(Port, Content) ->
receive
{Port, {data, BinData}} ->
%error_logger:info_msg("dump:~n~p~n", [BinData]),
NewContent = [ BinData | Content ],
wait(Port, NewContent);

{Port, {exit_status, Status}} ->
%error_logger:info_msg("exit_code: ~p~n", [Status]),
{ok, Status, Content};

{Port, eof} ->
%error_logger:info_msg("Port closed"),
port_close(Port),
{ok, eof, Content};

{Port, exit} ->
error_logger:info_msg("Received : ~p~n", [Port]),
Content
end.

extract(Bin) ->
sar_values:extract(iolist_to_binary(Bin)).


You can see the "option/1" function that let you convert atoms into command line arguments easily. I use also this function to test if sar is able to handle a specific parameter. For example and with the help of my webservice I can query remote stats easily:
http://monitoring.lan/stats/q/cpu/servername


Here's the "sar_collect" module

-module(sar_collect).

-export([systat/1, sartime/1, start/0, start/1]).
-export([extract/1]).
spawn(?MODULE, systat, []).

start(Seconds) ->
spawn(?MODULE, systat, [Seconds]).

% update the file every second
systat(0) ->
loop(1);

systat(Seconds) ->
loop(Seconds).

%update the file every 59 seconds
systat() ->
loop(59).

loop(Seconds) when Seconds < 60 ->
Cmd = lists:flatten([ "sar -o /tmp/last.tmp ", integer_to_list(Seconds), " 1" ]),
execute(".", Cmd),
file:rename("/tmp/last.tmp", "/tmp/last"),
timer:sleep(60 - Seconds),
receive
stop ->
exit(normal);

{interval, NewSeconds} ->
loop(NewSeconds);

_A ->
loop(Seconds)

after 0 ->
loop(Seconds)

end;

%default update 20 seconds (arbitrary chosen)
loop(_Seconds) ->
loop(20).

execute(Host, Cmd) ->
Port = open_port({spawn, Cmd}, [ {cd, Host}, exit_status, binary ] ),
wait(Port, []).

wait(Port, Content) ->
receive
{Port, {data, _BinData}} ->
wait(Port, Content);

{Port, {exit_status, _Status}} ->
ok;

{Port, eof} ->
port_close(Port),
Content;

{Port, exit} ->
error_logger:info_msg("Received : ~p~n", [Port]),
Content
end.


Finally there is the "sar_values" source code:

-module(sar_values).

-export([extract/1, extractor/1, sort/1]).
-export([parse/1, parse_value/2]).

extract(Bin) ->
extract(Bin, []).

extract(Bin, Stats) ->
case parse(Bin) of
{Class, Type, Rest} ->
%io:format("~p.~p", [Class, Type]),
case parse_value(Rest, <<>>) of
{more, Value, More} ->
NewStats = [ {Class, Type, Value} | Stats ],
extract(More, NewStats);

{eof, Value} ->
NewStats = [ {Class, Type, Value} | Stats ],
NewStats
end;

eof ->
Stats
end.

parse(<<"%user", Rest/binary >>) -> {cpu, user, Rest};
parse(<<"%nice", Rest/binary>>) -> {cpu, nice, Rest};
parse(<<"%system", Rest/binary>>) -> {cpu, system, Rest};
parse(<<"%iowait", Rest/binary>>) -> {cpu, iowait, Rest};
parse(<<"%steal", Rest/binary>>) -> {cpu, steal, Rest};
parse(<<"%idle", Rest/binary>>) -> {cpu, idle, Rest};

parse(<<"kbmemfree", Rest/binary>>) -> {mem, kbmemfree, Rest};
parse(<<"kbmemused", Rest/binary>>) -> {mem, kbmemused, Rest};
parse(<<"%memused", Rest/binary>>) -> {mem, memused, Rest};
parse(<<"kbbuffers", Rest/binary>>) -> {mem, kbbuffers, Rest};
parse(<<"kbcached", Rest/binary>>) -> {mem, kbcached, Rest};

parse(<<"kbswpfree", Rest/binary>>) -> {swap, free, Rest};
parse(<<"kbswpused", Rest/binary>>) -> {swap, used, Rest};
parse(<<"%swpused", Rest/binary>>) -> {swap, usage, Rest};
parse(<<"kbswpcad", Rest/binary>>) -> {swap, swpcad, Rest};

parse(<<"dentunusd", Rest/binary>>) -> {inode, dentryunused, Rest};
parse(<<"file-sz", Rest/binary>>) -> {inode, fileopened, Rest};
parse(<<"inode-sz", Rest/binary>>) -> {inode, inodes, Rest};
parse(<<"super-sz", Rest/binary>>) -> {inode, super, Rest};
parse(<<"%super-sz", Rest/binary>>) -> {inode, superusage, Rest};
parse(<<"dquot-sz", Rest/binary>>) -> {inode, dquotsz, Rest};
parse(<<"%dquot-sz", Rest/binary>>) -> {inode, dquotszusage, Rest};
parse(<<"rtsig-sz", Rest/binary>>) -> {rtsig, count , Rest};
parse(<<"%rtsig-sz", Rest/binary>>) -> {rtsig, usage, Rest};

parse(<<"totsck", Rest/binary>>) -> {sock, total, Rest};
parse(<<"tcpsck", Rest/binary>>) -> {sock, tcp, Rest};
parse(<<"udpsck", Rest/binary>>) -> {sock, udp, Rest};
parse(<<"rawsck", Rest/binary>>) -> {sock, raw, Rest};
parse(<<"ip-frag", Rest/binary>>) -> {sock, ipfrag, Rest};

parse(<<"runq-sz", Rest/binary>>) -> {procs, running, Rest};
parse(<<"plist-sz", Rest/binary>>) -> {procs, total, Rest};

parse(<<"ldavg-15", Rest/binary>>) -> {load, min15, Rest};
parse(<<"ldavg-1", Rest/binary>>) -> {load, min1, Rest};
parse(<<"ldavg-5", Rest/binary>>) -> {load, min5, Rest};

parse(<<"pswpin/s", Rest/binary>>) -> {swaping, pswpin, Rest};
parse(<<"pswpout/s", Rest/binary>>) -> {swaping, pswpout, Rest};

parse(<<"l0", Rest/binary>>) -> parsebis(Rest, l0);
parse(<<"eth0", Rest/binary>>) -> parsebis(Rest, eth0);
parse(<<"eth1", Rest/binary>>) -> parsebis(Rest, eth1);
parse(<<"eth2", Rest/binary>>) -> parsebis(Rest, eth2);

parse(<<>>) -> eof;

parse(Bin) ->
{_, Next} = split_binary(Bin, 1),
parse(Next).

parsebis(<<"rxpck/s", Rest/binary>>, Category) -> {Category, rxpck, Rest};
parsebis(<<"txpck/s", Rest/binary>>, Category) -> {Category, txpck, Rest};
parsebis(<<"rxbyt/s", Rest/binary>>, Category) -> {Category, rxbyt, Rest};
parsebis(<<"txbyt/s", Rest/binary>>, Category) -> {Category, txbyt, Rest};
parsebis(<<"rxcmp/s", Rest/binary>>, Category) -> {Category, rxcmp, Rest};
parsebis(<<"txcmp/s", Rest/binary>>, Category) -> {Category, txcmp, Rest};
parsebis(<<"rxmcst/s", Rest/binary>>, Category) -> {Category, rxmcst, Rest};
parsebis(Bin, Category) ->
{_, Next} = split_binary(Bin, 1),
parsebis(Next, Category).

parse_value(<<$\t, Rest/binary>>, _Value) ->
parse_value(Rest, _Value);
parse_value(<<$ , Rest/binary>>, _Value) ->
parse_value(Rest, _Value);

parse_value(<<$\n, _Rest/binary>>, Value) ->
{more, Value, _Rest};

parse_value(<<>>, Value) ->
{eof, Value};

parse_value(Bin, Value) ->
{H, Next} = split_binary(Bin, 1),
parse_value(Next, iolist_to_binary([Value, H])).

extractor(Motif) ->
fun(L) when is_list(L) ->
[ {Y, Z} || {X, Y, Z} <- L, X == Motif]
end.

sort(List) ->
lists:sort( fun({X, _V}, {Y, _W}) when X < Y ->
true;
(_A, _B) -> false
end, List).


Now that Erlang is R12B, I'm not so sure if "binary parsing code" is really as efficient as it can...

2 comments:

nem said...

Hi there - this looks really useful, maybe akin to or better than os_mon for platforms with sar available.

Is this code available from a source control repository anywhere?

Antoine said...

I'll put it on google I think.
Or somewhere else, really soon.

Sticky