Thursday, January 08, 2009

using AWK to do log file analysis-- a sample code

here is an awk source file:
profiling.awk

BEGIN {
FS="\n"
RS="\n\n"
i = 0
time_total = 0
elapsed_time_total = 0
}
{
if ($0 ~ /com.struts.BaseActionServlet.doGet\[[0-9]+\]\(/)
{
for(j=1;j<=NF;j++) {
if ($j ~ /^Elapsed Time:/) {
i++;
sub(/Elapsed Time: /, "", $j)
sub(/ milliseconds/, "", $j)
/*print $j*/
time_total += $j
}

if ($j ~ /^Elapsed CPU Time:/) {
sub(/Elapsed CPU Time: /, "", $j)
sub(/ milliseconds/, "", $j)
/*print $j*/
elapsed_time_total += $j
}
}
}
}
END {
print "count: ", i
print "Elapsed Time total:",time_total, "ms"
print "Elapsed Time average:", time_total/i, "ms"
print "Elapsed CPU Time total:",elapsed_time_total, "ms"
print "Elapsed CPU Time average:", elapsed_time_total/i, "ms"
}

then we start the GNU Awk software, and input the command line:
$ gawk -f profiling.awk profiling.log > output.txt

this mean i will using profiling.awk source file to process the profiling.log file, then i can get the result output like this:

count: 6508
Elapsed Time total: 4.83727e+06 ms
Elapsed Time average: 743.281 ms
Elapsed CPU Time total: 366172 ms
Elapsed CPU Time average: 56.2649 ms

the performance of awk itself is preeeeeeetty good! :) so i love it to do some data analysis for performance testing raw data.or profiling log file.

For your Reference:

变量 描述
NF 该变量包含每个记录的字段个数。
NR 该变量包含当前的记录个数。
FS 该变量是字段分隔符。
RS 该变量是记录分隔符。
OFS 该变量是输出字段分隔符。
ORS 该变量是输出记录分隔符。
FILENAME 该变量包含所读取的输入文件的名称。
IGNORECASE 当 IGNORECASE 设置为非空值,GAWK 将忽略模式匹配中的大小写。

No comments:

Post a Comment