Tuesday, January 27, 2009

Gran Torino,you are not alone!

一首动人的歌,让人读到了人的内心世界
饱经沧桑的面孔上,充斥着内心的矛盾,孤独(此片告诉我们,狗是人类忠诚的伙伴)。。。
处处表现出对现实的排斥,性格的固执与坚强,鲜明的爱与憎
这就像是一部老爷车,从来不会依照别人的眼光而去刻意改变或装饰自己,但却是最可靠最值得依赖的!
悲剧让老爷子的死有些壮烈,但悲剧给人的印象是深刻的,也使电影有了个合理的结局:一切都会过去,人都会变老,逝去。。。不过他的精神却被一个年轻人继承了下来,传承下去!

结尾的歌曲,我喜欢eastwood和Jamie cullum都有唱的那个版本,老爷子的嗓音是对这首歌最好的诠释:
http://www.youtube.com/watch?v=HEXF7U5TYV8



Gran Torino 老爷车
(Jamie Cullum & Clint Eastwood)


[*Sung By Clint Eastwood*]
So tenderly 如此温和
your story is 你的故事只是
nothing more than what you see 你所见到的和你所做过的事情
or what you've done or will become 你会变得坚强
standing strong do you belong 你会表里如一
in your skin; just wondering 只是还有点怀疑

gentle now the tender breeze blows 微风轻轻地吹
whispers through my Gran Torino 飒飒吹过我的老爷车
whistling another tired song 吹哨着另一首疲惫的歌

engine hums and bitter dreams grow 引擎嗡鸣 苦涩的梦继续
heart locked in a Gran Torino 心锁于老爷车
it beats a lonely rhythm all night long 整晚敲击孤独的节奏
it beats a lonely rhythm all night long 整晚敲击孤独的节奏
it beats a lonely rhythm all night long 整晚敲击孤独的节奏


[*sung by Jamie Cullum*]
Realign all the stars above my head 重新排列头顶上的所有星星
Warning signs travel far 警告牌越行越远
I drink instead on my own Oh! how I've known 我借酒消愁
the battle scars and worn out beds 我多么熟悉战争的伤痕和磨损的睡床


gentle now the tender breeze blows 微风轻轻地吹
whispers through my Gran Torino 飒飒吹过我的老爷车
whistling another tired song 吹哨着另一首疲惫的歌


engines hum and bitter dreams grow 引擎嗡鸣 苦涩的梦继续
heart locked in a Gran Torino 心锁于老爷车
it beats a lonely rhythm all night long 整晚敲击孤独的节奏

these streets are old they shine 这些街道都老了
with the things I've known 我熟悉的那些事物闪耀着光
and breaks through the trees 闪闪光辉穿过树丛
their sparkling

your world is nothing more than all the tiny things you've left behind
你的世界不过是你抛弃的那些微小的事物


So tenderly 如此温和
your story is 你的故事只是
nothing more than what you see 你所见到的和你所做过的事情
or what you've done or will become 你会变得坚强
standing strong do you belong 你会表里如一
in your skin; just wondering 只是还有点怀疑



gentle now the tender breeze blows 微风轻轻地吹
whispers through my Gran Torino 飒飒吹过我的老爷车
whistling another tired song 吹哨着另一首疲惫的歌
engines hum and bitter dreams grow 引擎嗡鸣苦涩的梦继续
heart locked in a Gran Torino 心锁于老爷车
it beats a lonely rhythm all night long 整晚敲击孤独的节奏


may I be so bold and stay 我可以鲁莽地留下来吗
I need someone to hold 我需要一个可以拥抱的人
that shudders my skin 使我的肌肤战悚
their sparkling 的是他们的光芒


your world is nothing more than all the tiny things you've left behind
你的世界不过是你抛弃的那些微小的事物


Realign all the stars above my head 重新排列头顶上的所有星星
Warning signs travel far 警告牌越行越远
I drink instead on my own Oh! how I've known 我借酒消愁
the battle scars and worn out beds 我多么熟悉战争的伤痕和磨损的睡床


gentle now the tender breeze blows 微风轻轻地吹
whispers through my Gran Torino 飒飒吹过我的老爷车
whistling another tired song 吹哨着另一首疲惫的歌
engines hum and bitter dreams grow 引擎嗡鸣 苦涩的梦继续
heart locked in a Gran Torino 心锁于老爷车
it beats a lonely rhythm all night long 整晚敲击孤独的节奏
it beats a lonely rhythm all night long 整晚敲击孤独的节奏
it beats a lonely rhythm all night long 整晚敲击孤独的节奏

Sunday, January 25, 2009

今天是除夕。。。

除夕之夜,祝愿所有人平安,健康!2009,希望的起步,也许很艰难,但是顺着霞光,努力的走下去!
昨天自己看了“救火员”,体会到了一些东西,虽然故事情节一般,但的确对我来说也算是“情景教育”:)给它打80分!

Tuesday, January 20, 2009

JMeter load get decreasing after a long duration test

Recently, I did a long duration test with a consistent load to the server to see how stable of our application is.

our application survival after almost 12 hours torment,but seems JMeter can not generate the consistent load(load get decreasing a lot) after about 5 hours or so, I do not know what happened to either our application or JMeter....

i just restart the JMeter only, so the load get back to normal again, it should be some problem on JMeter side:

BTW, I just add the aggregate report listener only during long duration testing, and no other logs output.

Have anyone met this problem before? someone found aggregate report will get the load down due to increasing calculation of 90% line.

Friday, January 16, 2009

Speed simulator, added into my toolbox :)

Found one Cool tool to do network speed simulation testing---Speed simulator, it can simulate from 4.8Kbps to 1Mbps

you can find it here :http://sourceforge.net/project/showfiles.php?group_id=212588&package_id=255744&release_id=624415


Speed simulator is a simple throttling proxy which allows you to see how your site behaves under diffrent connection speeds.

LR scripts debuging on controller, try to make a "product" without a bug

These days, my team member met certain problems on the Loadrunner scripts during running in the controller, However the scripts run smoothly in Vugen with many loops...

The errors he got from LR controller is a lot with 10+ concurrent vusers
First, debugging with the first error info always:
I just want to make sure that which is the very first error, and analysis the error which page may have the problem.

Second, can we reproduce this error in the LR controller with only single user??( precondition:the scripts run smoothly in Vugen)

If Yes, then we enable the log level settings in the run-time settings: usually i want to have "always send messages"-->"extended log"-->"parameter substitution" enabled meanwhile i will add bellowing codes:
lr_set_debug_message(LR_MSG_CLASS_EXTENDED_LOG|LR_MSG_CLASS_RESULT_DATA, LR_SWITCH_ON );

failed transaction;//which can be reached by the very first error info

lr_set_debug_message(LR_MSG_CLASS_EXTENDED_LOG|LR_MSG_CLASS_RESULT_DATA, LR_SWITCH_OFF );

uncheck "continue on error" in run-time settings, then run the test in controller again

After running, the error show at the same place, open the log file in the result location which you set in the report menu.

Read the log like you do with vugen, then you can find which unexpected page(response data) you got, is it related to some parameters setting? or really a server problem?(you can manually verify it if it is a server error)

If No(only happens under concurrent load), the most possible problem may related to the parameters you set, I would like to have "always send messages"-->"extended log"-->"parameter substitution" enabled and uncheck "continue on error" , to check the parameters we pass into scripts in the log file after running the script.

Another way you can write some code to print out into the controller console as an error message... anyway, you need to use any manner wisely to see what exactly happened when there is an error message. Do not let every small error pass in the script tuning phase... we are making a "product" without bug :)!

Last thing, Content checking code is very very important, I found some people does not like to write this.
But it saves us from the fake performance testing, it is really harmful and misleading if our scripts do not work correctly but it does not trigger a fail transaction!

锁住幸福,锁住快乐--Lock&Lock

今年的年会增加了奖品互换的环节,我准备的是一套Lock&Lock礼盒,不得不说自己是一个很“实际”的人,或者说对浪漫不感冒的人。。。。我更多的想到的是怎样解决他人的温饱问题:)

现在插播一段电视购物:
有了Lock&lock,以后中午带饭就方便了(有饭盒);
有了Lock&lock,以后工作喝茶倒水就容易了(有茶杯);
有了Lock&lock,以后存储零钱、杂物就方便了(有小储物箱);
嘿嘿,很实用,但却没有任何的美感;能做事,但却可能没有任何人会在意;

祝福每一个人,都能在新的一年珍惜幸福和快乐,因为他们来之不易,但却容易悄悄的溜走。如果有一天,我们真的可以锁住幸福和快乐,遇到困难时,打开盒子,用上那么一点,让我们快乐,那该有多好!

BTW,三年了,还没抽到过奖,难道运气被真的藏进了Lock&lock里?:)
春节快乐!

Sunday, January 11, 2009

火车票,真的“一票难求”!

今天熬了个通宵,没有睡觉,4点40从家出发,为了暖暖身子,路上一路小跑地到了售票处,5点刚过。。。。
没有觉得自己去的很早,但那边已经有30多人排在了我的前面,最早的一个老兄是昨晚10点就开始排队---凳子,毯子,毛巾,水壶,帽子,军大衣。。。。。你还有什么话,人家是专业干这行的,一看行头就知道了,老手!!别想太多了,排吧,6点30开始卖票,先要站一个半小时。
跟前后的几个兄弟聊天,打法时间,还别说,过的真快,一会功夫售票处的门开了,几个例行公事的警察在一旁看着,仿佛是在维持秩序,从前面不时的还是喊出几声:“喂,别插队了,你哪来的”

两个售票窗口,大家排成两排,挨个往里进,开始只要门一开,所有人都往里面挤(resource contention), 我跟旁边的警察说了句“为什么两排队伍,不能一次一边进一个人呢,这样大家都没有意见”,警察“恍然大悟”,直接去维持秩序了,的确井然了很多:)

悲剧开始了,开始几个人还都能买到票,有的人手里像拿着一副扑克牌似的,一张一张数着,让人看到是羡慕也是厌恶,有人情不自禁的前去瞅了瞅, 回来脸上露出了庆幸的笑容“还好不是到我那边的,都是到四川的”。。。。不过大部分人会买到3-4张票,无座居多,,那有座的票呢??
一个多小时的煎熬,终于轮到我了,心里有种坎特,尤其是在听到前面几位没有票的回答时,就觉得,可能3个小时的努力真的不够,我刚张开口跟售票员阿姨说,“请问,有去兖州的票吗?”,她直接说:“没有!”很果断,好像没有商量的余地,不行我不能这么快就放弃,“那您能帮我查查吗,也许您看错了呢”,仿佛在键盘上敲了两下,我也看不到屏幕,“没有!”冰冷的回答。。。
--“那去泰山的呢,T178次或者其他车都行”
--“没有!”
--“那去济南?
--“没有!”
--“那到兖州的动车D32”,这车最贵,不过也不抱什么希望可以买到。。。。
--“卖完了”,终于得到一句不是“没有”的回答了,我满足了
--“那去徐州的呢?”我在做最后的挣扎
--“也没有!”我的心凉了
8点多了,天已经很亮了,我走出了售票点,另外一个兄弟买到了去东北的车票,不过既没座,也不是直达的,但显然他已经很满足了
--“这年头,能买到车票就不错了:) 兄弟不要急,你找黄牛看看”, 兄弟在安慰我,感谢他
--“我再想想办法吧”其实说这句话的时候,只有无奈,不过还好,每年也都能回家,希望今年也可以顺利回家!
一票难求,但我求家人朋友幸福平安!这是我最大的愿望,哪怕不走这个形式!!
在联系“网上车票转让”中,碰到好心人,只加20元出售(辛苦赚来的),准备成交!

Saturday, January 10, 2009

"testing is overrated" by Luke Francl, so funny and meaningful

It reveals some actual facts within software development cycle, mostly emphasis
developer who should do the testing better and efficiently to prevent bugs in an early phase...

However, currently the (both dev and QA)testing is not good enough("good enough is not good" :) )

This video and slides are very funny to tell me the truth and impress me a lot, I watch for 3 times(of course, one reason is that it is just 20 mins,lol)

推荐这个视频,因为它很短:)
http://www.infoq.com/presentations/francl-testing-overrated

if you want to download Luke Francl's PPT, go to this link:
http://railspikes.com/2008/12/2/testing-is-overrated-great-talk

Thursday, January 08, 2009

Gzip matters!

一直以为Gzip是性能调优事半功倍的好办法,可是做好Gzip的trade off还是很重要的,今天就遇到Gzip之后,使用大量的load测试(enable下载所有的non-Html resources),导致apache server的CPU%非常高甚至死机。

想得出的潜在解决办法:

1.干脆不使用Gzip,小心网络延时。。。 :)

2.将web server的配置提高

3.将部分文件Gzip,需要做实验来判定哪些文件做Gzip能获得最好的收获

4.使用Gunzip方案,还在进一步测试中

5.将Gzip的等级降下来,一般Gzip的压缩等级从最低1级到最高9级,可以配置,需要测试得出最佳方案

6.注意browser和Gzip直接的兼容问题,可以选择性的将有些browser禁用Gzip

how to detect memory leak issue in JVM

昨天写在performanceengineer.com的一个comment,自己引用一下。博主已经很久没有更新网站,昨天一看,一下写了还不少:)

12 May 2008 - 11:22pm — joychester
why not use jconsole6 first

Here is my practice based on my daily testing work, not that hard if you make it systemiclly:
1.jconsole is a very good tool for detecting java memory leak issue at first.
2.narrow down the test cases or senarios to find which part may cause the problem
3.also i love jmap to get heap dump snapshot to get deep analysis after we are clear we have an issue on memory.
4. use some tool to read the binary heap dump file if necessary

中国人民好样的!

这篇有意义的blog是我在2008年5月22日写的,特此搬家:

看到这幅登录在google黑板报上的图,我也惊呆了,真的很漂亮,中国人是团结一心的,中国是伟大的!请看国家哀悼日5月19日14:28分之后的3分钟google的性能分析报表:

Loadrunner download PDF file into local

参考了Motevich'sblog,today solve one issue using Loadrunner to download PDF file into local or just read it in stream:

*fp=fopen("c:\\temp\\my_file.pdf","wb");

web_set_max_html_param_len("1000");
web_reg_save_param("PDFlink",
"LB=reporting.cgi/",
"RB=\"",
"Ord=1",
"RelFrameId=1",
"Search=Body",
"IgnoreRedirections=Yes",
LAST);

web_submit_form("reporting.cgi_13",
"Action=/cgi-bin/reporting.cgi",
//"Snapshot=t13.inf",
ITEMDATA,
LAST);

web_set_max_html_param_len("300000");

web_reg_save_param("PDF_content",
"LB=",
"RB=",
"Search=Body",
LAST);

web_url("PDFparser",
"URL={servername}/cgi-bin/reporting.cgi/{PDFlink}",
LAST);

lr_eval_string_ext("{PDF_content}",strlen("{PDF_content}"),&data,&prmLen,0,0,-1);

*fwrite(data,prmLen,1,fp);

*fclose(fp);

lr_output_message("123456: %s", lr_eval_string("{PDF_content}"));

代码前边加星号是指如果不想download到本地可以省略.

C语言,我决定放弃用静态数组

用了静态数组,内存里的记录就很难再删除,导致之后对这个数组再进行操作的时候出现问题,选择动态申请数组空间,然后释放,这样每次操作起来跟数组一样但是灵活方便了很多。算是对C的一次复习,三年了,终于写C了:

char *string1;
char *string2;

string1 = (char*)malloc(140* sizeof(char));
string2 = (char*)malloc(140* sizeof(char));

。。。。
Memset(string1,0,sizeof(char));
Memset(string2,0,sizeof(char));
// or you may
//use string1[j]='\0';
//use string2[j]='\0';

。。。

free(string1);
free(string2);

之前的处理:

char string1【140】;
char string2【140】;

注释:
如果你遇到数组开头打印出一些乱码,那么可以进行初始化:
Memset(string1,0,sizeof(char));

performance thoughts from me

Encountered environment issues and solutions:
1.Issues:
1.Application code change or upgrade
2.Loadrunner crashes when collecting results from load generators
Solutions:
1> Rewrite LR scripts + maintain existing scripts and functions
Notes: if our original scripts can not run smoothly under latest application, then evaluate how big the change is by re-recoding the scripts and compare with original scripts, if many ,then re-write scripts is a better solution than modify original code. However, some common functions and method you can also use in the new scripts.

2> I have no good solution on this till now, my suggestion is keep the log file small and meantime, we have to look into solution together(should use some tools to open the .eve files in the load generators). At the same time, we will have one Loadrunner locally, so it make the situation much better.

2. Encountered scripting issues and solutions:
Issues:
1.write scripts for unfamiliar application or third Party application
2.Deal with some strings which contain some special characters
3.Error handling in LR scripts
Solutions:
1> Trying to reduce the unnecessary parameters as much as possible, Look at the SDK document by yourself and try to understand the application how it pass the parameters in the URL. Using Ajax(Click & script) protocol and Web(Http/html) protocol together, in order to make the scripts as simple as you can.
2>Writing C code to deal with this, and also align with some C function itself, like strcpy(), strlen() ect. Learn C coding skills, especially array and pointer usage in C language. Make full use of help document!
3>Using content check functionality as much as you can, and also when the scripts meet an error, you’d better to print some useful log into console or log file, so that it help you identify problems.

If you want to see some samples, I have the source code to share with you. If you have some best practice, I would like to listen and learn from each other J Thanks!

Loadrunner scripts writing tips

* 首页 相册 标签

Loadrunner scripts writing tips
joychester 发表于 2008-7-11 13:36:00
0
推荐

这是我在平时的学习&实践中的总结,分享给大家,希望有些提示,很多细节的东西没有写出,只能大家自己去体会去尝试:

1. Content check and error handling: using web_reg_find() functions everywhere, and write some error handling code to make sure LR can handle the requests in a proper way, like using lr_exit() function and lr_error_message() to write error messages in the controller console, so that we can track the error in time
2. Dealing with special characters using C programming language: define dynamic array and using pointer to handle that, and remember to free the parameters at the end.
3. Write Dll to make scripts shorter: using some open source tool to make Dll file which includes some common functions, then we can call the common functions dynamically and make scripts shorter.currently I use Code::Blocks and MinGW compiler, i do not use VC++, because it is not free :)
4. Look at the detail log message on demand: using lr_set_debug_message() function, you can open and close it on demand, so make your debug skill more efficiency.

5. using Ajax(click& scripts) instead of web (click & scripts )and web(http&html), you can combine Ajax and Http method together when you are writing the scripts

6.Learn how to use regular expression, a very powerful and helpful "tool"!

在别人一篇blog里我的留言---关于性能测试环境的讨论

根据我的一些测试经验,给兄弟加点料:
在以前跟同事讨论中也谈到这个话题,不过我的题目是“How to Keep Performance test simple, and Why?”
模拟真实环境的测试是需要的,但不是必须的,最好在项目接近结束时,进行一次全面的测试,并且进行压力测试以及长时间稳定性测试。
在相对简单甚至简陋的环境中进行性能测试,可以更快更容易的发现一些低级错误。
当然作为一种测试而言,针对这种最小化资源的环境应提出合理的需求,例如在做性能测试时,不要盲目的去加压力,而造成一些不必要的误解和误导。
在平时的测试中,需要建立一个准确的baseline(基准)也很重要,这样在平时测试的时候可以参照这一基准,来判定当前的调优或者新加入的代码是否带来益处。
持续集成+最小化资源性能测试环境+各种监测工具(从前端到后端),可以作为agile performance testing的3个特色 :)

recap on G**gle performance testing's key point

Really like Bjedov's presentation on performance testing, just recap her idea here one more time :)

1. Mentally: tell yourself Performance testing is not that hard!



2. The concept of performance testing—different people has different idea; benchmark should be paid close attention to, which includes a set of environment, a suite of test scenarios, a group of profiling point. All these can be adjusted according to the testing context/situation.



3. Google is more focusing on service-tire/back-end than web-tire/front-end, they think front-end performance issue can be found through functional testing. Yahoo is different, they more focus on front-end tuning, they developed many rules to follow



4. You can not do everything like functional testing, please follow 2/8 golden rule.



5. I'll run this benchmark with different loads against a loosely controlled system (it would be nice to have 100 machines all to myself for every service we have, which I can use once a day or once a week, but that would be expensive and unrealistic) and investigate its behavior. Which transactions are taking the most time? Which transactions seem to get progressively worse with increasing load? Which transactions seem unstable (I cannot explain their behavior)?



6. If you have questions, please ask dev, but notice do not believe them very easily J only thing you need do is do the benchmark testing again to verify it.



7. Create all kinds of diagram and results, discuss with dev, then take some suggestions from them, and refine our benchmark as the final one, keep daily tracking for performance trend. Avoid big surprise.



8. Do profiling in time and isolation testing. It will help you find the bottleneck deeper and more accurate.



9. Monitor your services during performance tests, monitor CPU, memory and I/O usage in minimum

performance testing improving,too much to say

My friend and I discuss something on how our performance improvement in the future, we just talk about something below during 30 minutes,but not all stuff:

Version control,maintenance
- script
- result/raw data
- log

Env (server esp xeon, build control) common service perf test env and schedule

metrics (complete set used in tuning, typical set used in daily tracking)
- LR/JMeter: DB size, concurrent user, think time, env, 90% line... load resource
- other: log, os...

LR or JMeter
- understand difference and which one to choose

Process/scheduling - daily/formal

knowledge (tuning, isolation...)
- accumulate / display in organized way
- sharing

planning (case design)

script wring
- error handling
- common practice/F&Q

simulate real load
- prod monitor
- reflect to local env
- PPM/BAC

profiling tool

app profile log and log analyzer

test on load balanced env

stress test (analyze the throughput, trend and Knee)

single step test (resource, SQL happened, YSLOW...)

analyze & tuning
- Deployment diagram
- SQL (deadlock, lock)
- app code
- DB
- JBoss/JVM
- Apache (esp ability of analyze log then find bottleneck
- Load balance/cluster
- Hardware load balancer
- Front end
- CPU
- memory
- App thread
- Disk IO
- Network

吞吐量与响应时间

在性能测试中,作为评价性能好坏的两个重要指标:吞吐量和响应时间, 是很容易让人混淆的。
吞吐量:字面上的意思是单位时间里处理任务的能力,它的单位常常以hits/sec或者MB/sec为主,它以系统资源为对象的,因此系统性能的好坏直接影响了吞吐量的(理论)极限值。

响应时间:这里是指从发送请求到完成响应的整个过程所经历的时间,它的单位常常以s或者ms为主。它是以某个请求为对象的,因此请求的大小以及复杂程度直接影响到响应时间的长短。

这里提到一个概念是“排队论”(http://en.wikipedia.org/wiki/Queuing_theory), 在计算机系统中,这个概念是最常见的,了解排队理论对于我们理解吞吐量和响应时间的关系以及区别很有帮助!
通常,平均响应时间越短,系统吞吐量越大;平均响应时间越长,系统吞吐量越小;
但是,系统吞吐量越大, 未必平均响应时间越短;因为在某些情况(例如,不增加任何硬件配置)吞吐量的增大,有时会把平均响应时间作为牺牲,来换取一段时间处理更多的请求。
一个例子,比如一个理发店,原先只有一个理发师,因为穷,只买的一张理发椅子,和一个长凳用来方便等待的人休息。理发师一次只能处理一个客户,其他等待的用户显得很不耐烦,外面打算进来理发的人也放弃了这家店理发的打算。。。
有一天,理发师有钱了,他多买了2个理发椅子,这样,他可以同时给3个人理发,当其中一个人理到一定阶段需要调整或者定型的时候,他就转到另外一个客户去修剪头发,依次类推,这样,他发现一天他可以理的人数比以前增多了,但是还会有一些后来的客户抱怨等待时间太长。
后来,理发师打算招聘2名学徒帮助他一起干活,这样,他发现每天的理发效率增加了将近2倍,而且客户的等待时间明显也减少了许多。但是成本增多了,理发用具,洗发水,发工资,这让他觉得开个理发店也要精打细算:)

一个consultant的解释,很形象,很具体:http://www.forsythesunsolutions.com/node/114

Detecting Memory leak issue step by step

To be a performance testing engineer, I want to make some issues reproducible and easy to find the cause from application level(not the code level).
take the memory leak issue as an example,
1. Take the formal performance testing as normal work, there are a lot of important actions which simulate real users as much as possible. I'd like to use Jconsole to monitoring the JVM status, so that i can notice some potential problem as soon as possible during testing.

2. I just find my application has "memory leak", want to know where is the root cause. (how to confirm it has memory leak in your application, please refers to other articles :) )

3. I just run my test script by script, each script you include a certain part of test steps. please notice that during this kind of testing, i will add more load and without any sleep time to accelerate the memory heap accumulation in order to save my time.
So in this way, you can make sure which scripts in your test scenario has problems. Pick up the "sick ones", you can narrow down your view now.

4. from each "sick" test script, it should have several steps, like S1, S2, S3, S4.
so take the minimum set of steps, like take S1,S4 first as your tuning script1, to look at whether the problem exists or not, if not, then adjust your test steps, like S1, S2, S4.
Make sure you just go through every "sick" scripts, and find out the root cause step(s), then collaborate with develop friends to see what actually happen of our code....

4 steps, but might cost a lot of time(based on how complex your application is) but free, and practice your isolation skills :)

of course you can use some cool profiling tool to do this, but i love this exploration journey

Cognos8 performance testing-Loadrunner scripts

Note: This post is kind of out-of-date and some patterns are not good, so please read《Scripting a Basic IBM Cognos 8 Report Execution using LoadRunner》post from official http://www.ibm.com/developerworks/data/library/cognos/page406.html

I want to share my experience of writing loadrunner scripts for Cognos BI, basically here I just describe how to run the reports in cognos viewer:
general steps:
1. recording the initial scripts, using Ajax(click & scripts), not web(click & scripts)

2. recording web(Http&html) initial scripts (due to some generating reports requests simply using Ajax(click & scripts) can not be processed properly)

3. copy some web(Http&html) initial scripts into Ajax(click & scripts) initial scripts

4. do parametrized and correlation on both scripts

5. content check & transaction added

6. some C coding needed to adjust more dynamic situation, for example you will wirting some string processing, file read/wirte or logic sentences: loop or if-then- else and so on.

7. testing with more accounts and tuning the scripts

code samples:(just an idea, not the real code you can directly use :) )

// minimum set of correlation parameters
web_reg_save_param("m_tracking_para7_2",
"LB=\"tracking\": \"",
"RB=\",",
"Notfound=warning",
LAST);

web_reg_save_param("ui_conversation_para7_2",
"LB=\"conversation\": \"",
"RB=\",",
"Notfound=warning",
LAST);

web_reg_save_param("caf_para7_2",
"LB=\"caf\": \"",
"RB=\",",
"Notfound=warning",
LAST);

web_reg_save_param("action_state_para7_2",
"LB=\"action_state\": \"",
"RB=\",",
"Notfound=warning",
LAST);
//reminder: set the Notfound=warning, instead of default value Notfound=error, we do not //want to see any errors in the console when we might not use these parameters

//transaction get started
lr_start_transaction("XLS_view");

// get the XLS format path from html source, i just found there are two patterns in response source
//or you can use the "status":"complete/working", as the indicator as well, this can make your scripts shorter
web_reg_find("Text=sURL = '/cgi-bin/reporting.cgi/gd/",
"SaveCount=XLS_path_Count1_7",
LAST);

web_reg_find("Text=sURL = '/cgi-bin/reporting.cgi/gd/",
"SaveCount=XLS_path_Count2_7",
LAST);

// send the request to the cognos server to generate XLS format reports
web_submit_data("reporting.cgi_17",
"Action={servername1}/cgi-bin/reporting.cgi",
"Method=POST",
"RecContentType=text/html",
"Referer={servername1}/cgi-bin/reporting.cgi",
"Snapshot=t17.inf",
"Mode=HTML",
ITEMDATA,
"Name=b_action", "Value=cognosViewer", ENDITEM,
"Name=ui.gateway", "Value=/cgi-bin/reporting.cgi", ENDITEM,
"Name=ui.action", "Value=run", ENDITEM,
"Name=ui.object", "Value=/content/folder[@name='{custom_folder_name1_new}'] /folder[@name='Performance Test Reports']/query[@name='AD_HOC_Report']", ENDITEM,
"Name=run.outputFormat", "Value=XLWA", ENDITEM,
"Name=run.outputLocale", "Value=en", ENDITEM,
"Name=run.prompt", "Value=true", ENDITEM,
"Name=ui.name", "Value=AD_HOC_Report", ENDITEM,
"Name=ui.runOptions", "Value=true", ENDITEM,
LAST);

//get the int number of XLS path occurrence in the HTML source
i =atoi(lr_eval_string("{XLS_path_Count1_7}"));

j =atoi(lr_eval_string("{XLS_path_Count2_7}"));

// if number of XLS path occurrence more than once, including once, then get transaction done

if (i!=0||j!=0) { //find the XLS path without waiting dialaug popup

lr_end_transaction("XLS_view7",LR_AUTO);
}

// else if number of XLS path occurrence does not happen, then use "wait" action
if (i==0&&j==0) {

//wait until the XLS path come back
while(i==0&&j==0){
web_reg_find("Text=sURL = '/cgi-bin/reporting.cgi/gd/",
"SaveCount=XLS_path_Count3_7",
LAST);

web_reg_find("Text=sURL = '/cgi-bin/reporting.cgi/gd/",
"SaveCount=XLS_path_Count4_7",
LAST);
//wait action
web_submit_data("reporting.cgi_19",
"Action={servername1}/cgi-bin/reporting.cgi",
"Method=POST",
"RecContentType=text/html",
"Snapshot=t19.inf",
"Mode=HTML",
ITEMDATA,
"Name=b_action", "Value=cognosViewer",ENDITEM,
"Name=cv.id", "Value=_NS_",ENDITEM,
"Name=ui.action", "Value=wait",ENDITEM,
"Name=cv.actionState","Value={action_state_para7_2}",ENDITEM,
"Name=ui.primaryAction", "Value=runSpecification",ENDITEM,
"Name=run.outputFormat", "Value=XLWA",ENDITEM,
"Name=ui.conversation", "Value={ui_conversation_para7_2}",ENDITEM,
"Name=m_tracking", "Value={m_tracking_para7_2}",ENDITEM,
"Name=ui.cafcontextid", "Value={caf_para7_2}",ENDITEM,
"Name=cv.responseFormat", "Value=data",ENDITEM,
LAST);

i=atoi(lr_eval_string("{XLS_path_Count3_7}"));

j=atoi(lr_eval_string("{XLS_path_Count4_7}"));

k++;

if (k>5) {

lr_error_message("XLS view failed by: %s",lr_eval_string("{username1}"));
lr_exit(LR_EXIT_ACTION_AND_CONTINUE ,LR_FAIL);
break;
}

}

//get the XLS path in wait action response data
web_reg_save_param("XLS_src_path7",
"LB=sURL = '/cgi-bin/reporting.cgi/gd/",
"RB='",
"Notfound=warning",
LAST);

web_reg_save_param("XLS_src_path7_2",
"LB=sURL = '/cgi-bin/reporting.cgi/gd/",
"RB='",
"Notfound=warning",
LAST);

web_submit_data("reporting.cgi_29",
"Action={servername1}/cgi-bin/reporting.cgi",
"Method=POST",
"RecContentType=text/html",
"Snapshot=t29.inf",
"Mode=HTML",
ITEMDATA,
"Name=b_action", "Value=cognosViewer",ENDITEM,
"Name=cv.id", "Value=_NS_",ENDITEM,
"Name=ui.action", "Value=wait",ENDITEM,
"Name=cv.actionState","Value={action_state_para7_2}",ENDITEM,
"Name=ui.primaryAction", "Value=runSpecification",ENDITEM,
"Name=run.outputFormat", "Value=XLWA",ENDITEM,
"Name=ui.conversation", "Value={ui_conversation_para7_2}",ENDITEM,
"Name=m_tracking", "Value={m_tracking_para7_2}",ENDITEM,
"Name=ui.cafcontextid", "Value={caf_para7_2}",ENDITEM,
"Name=cv.responseFormat", "Value=data",ENDITEM,
LAST);

//choose one XLS path value, and get the XLS report from cognos server
if (strlen(lr_eval_string("{XLS_src_path7}"))) {

web_url("XLSviewer",
"URL={servername1}/cgi-bin/reporting.cgi/gd/{XLS_src_path7}",
LAST);
}

else if (strlen(lr_eval_string("{XLS_src_path7_2}"))) {

web_url("XLSviewer",
"URL={servername1}/cgi-bin/reporting.cgi/gd/{XLS_src_path7_2}",
LAST);
}
//transaction get done
lr_end_transaction("XLS_view",LR_AUTO);

}

this is my test scripts not the formal one, but it indicates that there are two cases for reporting generation, one is run the reports without extra wait, second is you have to wait for a certain time to get the reports done, so it is different from scripts perspective. if you have any better solutions, please leave a message to me.

the limitation of My solution here can not get the reports download time that means it can not include client side process time,so it is not the real user experiecne. you can wirte some C code to fopen, lr_eval_string_ext, fwrite to simulate the client side download action, but it will cause your scripts slower and slower(memory issue by LR??), so i just ignore this to make a trade-off, find a server problem is sufficient to me right now :)

if you met any problems or questions, you can share in this blog and freely discuss.
---Cheng Chi

机器locale不一致导致loadrunner 脚本出现错误的解决方案

先给一个建议:
- Recording of an application in a specific language (e.g., French, Japanese) must be performed in a machine whose default locale (in Settings > Control Panel > Regional Options) is the same language
- Load generator machines must have exactly the same default locale as the recording machine

如果真的出现脚本开发的环境与实际运行环境的lcoale不一致,首先你在Vugen运行脚本的时候,会出现警告:Script code generation code page (XXX) does not match the current locale code page (xxxx)
这种后果会出现有些本来可以定位到的元素或者对象,现在可能对应不上了,导致脚本出错,这里提出一个暂时的解决方案:
- 在实际运行环境下新建一个脚本
- copy原始脚本到新脚本中对应的action中
- 调试脚本,看时候通过(这时候脚本运行时已经没有了之前的警告Script code generation code pagedoes not match the current locale code page, 它会按照当前实际运行环境下的locale进行执行)
- 如果不通过,则需要查找错误原因,可能是copy时候有些字符(特殊字符)需要进行转换,最好的办法是针对错误的action重新录制一个一样的流程(用一样的协议),这样基本上就可以通过对比得出问题所在.

front-end tuning Vs back-end tweaking

Steve Sounders said:"So if what you are really trying to do is deal with scalability issues, the place to focus is the back-end. If you’re having a huge spike in traffic or a large increase in the amount of data or back-end calculations that you need to do, then the place of focus is on the back-end.But if your objective is making user experience faster, the place of focus is the front-end."

从这句话里我得到的信息是如果一个软件的构架做好了,开发做好本职工作,那么性能调优的重点都将放在前端,也就是从HTML产生后的一系列动作。

但往往我们所遇到的问题是先要排除后面一些严重而明显的缺陷,deadlock,memory leak,CPU% high utilization,GC efficiency,caching mechanism,indexing,isolation level setting,临时文件创建以及删除的规则。这些都是跟最初的设计相关,并且影响performance和scalability!在不同的阶段考虑不同的方案,这是我们应该做好计划的。

using AWK to do log file analysis-- a sample code

here is an awk source file:
profiling.awk

BEGIN {
FS="\n"
RS="\n\n"
i = 0
time_total = 0
elapsed_time_total = 0
}
{
if ($0 ~ /com.struts.BaseActionServlet.doGet\[[0-9]+\]\(/)
{
for(j=1;j<=NF;j++) {
if ($j ~ /^Elapsed Time:/) {
i++;
sub(/Elapsed Time: /, "", $j)
sub(/ milliseconds/, "", $j)
/*print $j*/
time_total += $j
}

if ($j ~ /^Elapsed CPU Time:/) {
sub(/Elapsed CPU Time: /, "", $j)
sub(/ milliseconds/, "", $j)
/*print $j*/
elapsed_time_total += $j
}
}
}
}
END {
print "count: ", i
print "Elapsed Time total:",time_total, "ms"
print "Elapsed Time average:", time_total/i, "ms"
print "Elapsed CPU Time total:",elapsed_time_total, "ms"
print "Elapsed CPU Time average:", elapsed_time_total/i, "ms"
}

then we start the GNU Awk software, and input the command line:
$ gawk -f profiling.awk profiling.log > output.txt

this mean i will using profiling.awk source file to process the profiling.log file, then i can get the result output like this:

count: 6508
Elapsed Time total: 4.83727e+06 ms
Elapsed Time average: 743.281 ms
Elapsed CPU Time total: 366172 ms
Elapsed CPU Time average: 56.2649 ms

the performance of awk itself is preeeeeeetty good! :) so i love it to do some data analysis for performance testing raw data.or profiling log file.

For your Reference:

变量 描述
NF 该变量包含每个记录的字段个数。
NR 该变量包含当前的记录个数。
FS 该变量是字段分隔符。
RS 该变量是记录分隔符。
OFS 该变量是输出字段分隔符。
ORS 该变量是输出记录分隔符。
FILENAME 该变量包含所读取的输入文件的名称。
IGNORECASE 当 IGNORECASE 设置为非空值,GAWK 将忽略模式匹配中的大小写。

loadrunner--Vuer中的Elapsed Time解释

Elapsed Time. Displays the amount of time that has elapsed in the scenario since the Vuser began running.
帮助文件中的解释,LR里有多次提到Elapsed Time,但是每处的含义可能不一样,像这里我想解释的就是一处微观的概念,针对每个Vuer而言,所以对于每个Vuser来说各自的Elapsed Time都是不一样的,这取决于ramp up time的分配。

【转】Hot Spot JVM5中的GC调优

今天很多人都在提GC的概念,到底GC是啥,怎么调优,我搜索到这篇比官方文章更有趣的调优过程,现转发一下,虽然已经是两三年前的东西,但是对于新人来说确实是很好的模板,思考的模板。。。

Hot Spot JVM5中的GC调优

Written by Halatu Hubisi

引言
有JAVA开发经验的朋友们一定碰到过下面的这种情况,那就是自己所开发的应用运行了一段时间后其性能或者响应速度会有明显的降低.这是由多方面的原因造成的即有程序本身的优化问题,也有运行环境问题.此运行环境即包括硬件环境也包括软件环境.大多数人第一个能想到的解决方法是提升硬件的配置而忽略了程序本身的运行环境JVM也提供了比较多的调优选项.本文将重点描述利用JVM的一些选项对GC进行调优.


约定:
1.读者应具备一定JAVA的知识.

2.本文中的JVM选项均以SUN公司发布的HotSpot JVM 5为准(不过大多数的选项在JVM1.3,JVM1.4中也是可用的).

3.以JAVA_HOME下demo/jfc/SwingSet2/SwingSet2.jar为例进行说明.

4.阅读本文需要一些关于GC的知识,可以到附录A中了解这些知识。

关键字:
JVM(java虚拟机),调优,GC(垃圾回收)

JVM GC调优
为了能够将JVM GC的调优能够使用在具体的实践当中,下面将利用若干个例子来说明GC的调优.
例1:Heap size 设置
JVM 堆的设置是指java程序运行过程中JVM可以调配使用的内存空间的设置.JVM在启动的时候会自动设置Heap size的值,其初始空间(即-Xms)是物理内存的1/64,最大空间(-Xmx)是物理内存的1/4。可以利用JVM提供的-Xmn -Xms -Xmx等选项可进行设置。Heap size 的大小是Young Generation 和Tenured Generaion 之和。
当在JAVA_HOME下demo/jfc/SwingSet2/目录下执行下面的命令。
java -jar -Xmn4m -Xms16m -Xmx16m SwingSet2.jar
系统输出为:
Exception in thread "Image Fetcher 0" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Image Fetcher 3" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Image Fetcher 1" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Image Fetcher 2" java.lang.OutOfMemoryError: Java heap space
除了这些异常信息外,还会发现程序的响应速度变慢了。这说明Heap size 设置偏小,GC占用了更多的时间,而应用分配到的执行时间较少。
提示:在JVM中如果98%的时间是用于GC且可用的Heap size 不足2%的时候将抛出此异常信息。
将上面的命令换成以下命令执行则应用能够正常使用,且未抛出任何异常。
java -jar -Xmn4m -Xms16m -Xmx32m SwingSet2.jar
提示:Heap Size 最大不要超过可用物理内存的80%,一般的要将-Xms和-Xmx选项设置为相同,而-Xmn为1/4的-Xmx值。

例2:Young Generation(-Xmn)的设置
在本例中看一下Young Generation的设置不同将有什么现象发生。
假设将Young generation 的大小设置为4M ,即执行java -jar -verbose:gc -Xmn4m -Xms32m -Xmx32m -XX:+Print GC Details SwingSet2.jar,屏幕输出如下(节选)
[GC [DefNew: 3968K->64K(4032K), 0.0923407 secs] 3968K->2025K(32704K), 0.0931870 secs]
[GC [DefNew: 4021K->64K(4032K), 0.0356847 secs] 5983K->2347K(32704K), 0.0365441 secs]
[GC [DefNew: 3995K->39K(4032K), 0.0090603 secs] 6279K->2372K(32704K), 0.0093377 secs]
[GC [DefNew: 3992K->23K(4032K), 0.0057540 secs] 6325K->2356K(32704K), 0.0060290 secs]
[GC [DefNew: 3984K->27K(4032K), 0.0013058 secs] 6317K->2360K(32704K), 0.0015888 secs]
[GC [DefNew: 3981K->59K(4032K), 0.0023307 secs] 6315K->2422K(32704K), 0.0026091 secs]
将程序体制并将Young Generation的大小设置为8M,即执行java -jar -verbose:gc -Xmn8m -Xms32m -Xmx32m -XX:+Print GC Details SwingSet2.jar,屏幕输出如下(节选)
[GC [DefNew: 7808K->192K(8000K), 0.1016784 secs] 7808K->2357K(32576K), 0.1022834 secs]
[GC [DefNew: 8000K->70K(8000K), 0.0149659 secs] 10165K->2413K(32576K), 0.0152557 secs]
[GC [DefNew: 7853K->59K(8000K), 0.0069122 secs] 10196K->2403K(32576K), 0.0071843 secs]
[GC [DefNew: 7867K->171K(8000K), 0.0075745 secs] 10211K->2681K(32576K), 0.0078376 secs]
[GC [DefNew: 7970K->192K(8000K), 0.0201353 secs] 10480K->2923K(32576K), 0.0206867 secs]
[GC [DefNew: 7979K->30K(8000K), 0.1787079 secs] 10735K->4824K(32576K), 0.1790065 secs]
那么根据GC输出的信息(这里取第一行)做一下Minor收集的比较。可以看出两次的Minor收集分别在Young generation中找回3904K(3968K->64K)和7616K(7808K->192K)而对于整个jvm则找回 1943K(3968K->2025)和5451K(7808K->2357K)。第一种情况下Minor收集了大约50%(1943/3904)的对象,而另外的50%的对象则被移到了tenured generation。在第二中情况下Minor收集了大约72%的对象,只有不到30%的对象被移到了Tenured Generation.这个例子说明此应用在的Young generation 设置为4m时显的偏小。
提示:一般的Young Generation的大小是整个Heap size的1/4。Young generation的minor收集率应一般在70%以上。当然在实际的应用中需要根据具体情况进行调整。

例3:Young Generation对应用响应的影响
还是使用-Xmn4m 和-Xmn8m进行比较,先执行下面的命令

java -jar -verbose:gc -Xmn4m -Xms32m -Xmx32m -XX:+Print GC Details -XX:+Print GC ApplicationConcurrentTime -XX:+Print GC ApplicationStoppedTime SwingSet2.jar
屏幕输出如下(节选)
Application time: 0.5114944 seconds
[GC [DefNew: 3968K->64K(4032K), 0.0823952 secs] 3968K->2023K(32704K), 0.0827626 secs]
Total time for which application threads were stopped: 0.0839428 seconds
Application time: 0.9871271 seconds
[GC [DefNew: 4020K->64K(4032K), 0.0412448 secs] 5979K->2374K(32704K), 0.0415248 secs]
Total time for which application threads were stopped: 0.0464380 seconds
Young Generation 的Minor收集占用的时间可以计算如下:应用线程被中断的总时常/(应用执行总时?L+应用线程被中断的总时常),那么在本例中垃圾收集占用的时?L约为系统的5%~14%。那么当垃圾收集占用的时间的比例越大的时候,系统的响应将越慢。
提示:对于互联网应用系统的响应稍微慢一些,用户是可以接受的,但是对于GUI类型的应用响应速度慢将会给用户带来非常不好的体验。

例4:如何决定Tenured Generation 的大小
分别以-Xmn8m -Xmx32m和-Xmn8m -Xmx64m进行对比,先执行
java -verbose:gc -Xmn8m -Xmx32m-XX:+Prirint GC Details -XX:+Print GC TimeStamps java类,命令行将提示(只提取了Major收集)

111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.0000505 secs]111.042: [Tenured: 18154K->2311K(24576K), 0.1290354 secs] 26282K->2311K(32704K), 0.1293306 secs]
122.463: [GC 122.463: [DefNew: 8128K->8128K(8128K), 0.0000560 secs]122.463: [Tenured: 18630K->2366K(24576K), 0.1322560 secs] 26758K->2366K(32704K), 0.1325284 secs]
133.896: [GC 133.897: [DefNew: 8128K->8128K(8128K), 0.0000443 secs]133.897: [Tenured: 18240K->2573K(24576K), 0.1340199 secs] 26368K->2573K(32704K), 0.1343218 secs]
144.112: [GC 144.112: [DefNew: 8128K->8128K(8128K), 0.0000544 secs]144.112: [Tenured: 16564K->2304K(24576K), 0.1246831 secs] 24692K->2304K(32704K), 0.1249602 secs]
再执行java -verbose:gc -Xmn8m -Xmx64m-XX:+Prirint GC Details -XX:+Print GC TimeStamps java类,命令行将提示(只提取了Major收集)
90.597: [GC 90.597: [DefNew: 8128K->8128K(8128K), 0.0000542 secs]90.597: [Tenured: 49841K->5141K(57344K), 0.2129882 secs] 57969K->5141K(65472K), 0.2133274 secs]
120.899: [GC 120.899: [DefNew: 8128K->8128K(8128K), 0.0000550 secs]120.899: [Tenured: 50384K->2430K(57344K), 0.2216590 secs] 58512K->2430K(65472K), 0.2219384 secs]
153.968: [GC 153.968: [DefNew: 8128K->8128K(8128K), 0.0000511 secs]153.968: [Tenured: 51164K->2309K(57344K), 0.2193906 secs] 59292K->2309K(65472K), 0.2196372 secs]
可以看出在Heap size 为32m的时候系统等候时间约为0.13秒左右,而设置为64m的时候等候时间则增大到0.22秒左右了。但是在32m的时候系统的Major收集间隔为 10秒左右,而Heap size 增加到64m的时候为30秒。那么应用在运行的时候是选择32m还是64m呢?如果应用是web类型(即要求有大的吞吐量)的应用则使用64m(即 heapsize大一些)的比较好。对于要求实时响应要求较高的场合(例如GUI型的应用)则使用32m比较好一些。
注意:
1。因为在JVM5运行时已经对Heap-size进行了优化,所以在能确定java应用运行时不会超过默认的Heap size的情况下建议不要对这些值进行修改。
2。 Heap size的 -Xms -Xmn 设置不要超出物理内存的大小。否则会提示“Error occurred during initialization of VM Could not reserve enough space for object heap”。

例5:如何缩短minor收集的时间
下面比较一下采用-XX:+UseParNewGC选项和不采用它的时候的minor收集将有什么不同。先执行
java -jar -server -verbose:gc -Xmn8m -Xms32m -Xmx32m SwingSet2.jar
系统将输出如下信息(片段〕
[GC 7807K->2641K(32576K), 0.0676654 secs]
[GC 10436K->3108K(32576K), 0.0245328 secs]
[GC 10913K->3176K(32576K), 0.0072865 secs]
[GC 10905K->4097K(32576K), 0.0223928 secs]
之后再执行 java -jar -server -verbose:gc -XX:+UseParNewGC -Xmn8m -Xms32m -Xmx32m SwingSet2.jar
系统将输出如下信息(片段〕
[ParNew 7808K->2656K(32576K), 0.0447687 secs]
[ParNew 10441K->3143K(32576K), 0.0179422 secs]
[ParNew 10951K->3177K(32576K), 0.0031914 secs]
[ParNew 10985K->3867K(32576K), 0.0154991 secs]
很显然使用了-XX:+UseParNewGC选项的minor收集的时间要比不使用的时候优。

例6:如何缩短major收集的时间
下面比较一下采用-XX:+UseConcMarkSweepGC选项和不采用它的时候的major收集将有什么不同。先执行
java -jar -verbose:gc -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xmn64m -Xms256m -Xmx256m SwingSet2.jar
系统将输出如下信息(片段〕
[Full GC 22972K->18690K(262080K), 0.2326676 secs]
[Full GC 18690K->18690K(262080K), 0.1701866 secs
之后再执行 java -jar -verbose:gc -XX:+UseParNewGC -Xmn64m -Xms256m -Xmx256m SwingSet2.jar
系统将输出如下信息(片段〕
[Full GC 56048K->18869K(260224K), 0.3104852 secs]
提示:此选项在Heap Size 比较大而且Major收集时间较长的情况下使用更合适。

例7:关于-server选项 在JVM中将运行中的类认定为server-class的时候使用此选项。SUN 的Hot Spot JVM5 如果判断到系统的配置满足如下条件则自动将运行的类认定为server-class,并且会自动设置jvm的选项(当没有手工设置这选项的时候〕而且 HOTSPOT JVM5提供了自动调优的功能,他会根据JVM的运行情况进行调整。如果没有特别的需要是不需要太多的人工干预的。SUN形象的称这个机制为“人体工学 ”(Ergonomics〕。具体可以参考http://java.sun.com/docs/hotspot/gc5.0/ergo5.html
*.具有2个或更多个物理的处理器
*.具有2G或者更多的物理内存
提示:此选项要放在所有选项的前面。例如:java -server 其他选项 java类

附录A:预备知识
.JVM中对象的划分及管理

JVM根据运行于其中的对象的生存时间大致的分为3种。并且将这3种不同的对象分别存放在JVM从系统分配到的不同的内存空间。这种对象存放空间的管理方式叫做Generation管理方式。
1。Young Generation:用于存放“早逝”对象(即瞬时对象)。例如:在创建对象时或者调用方法时使用的临时对象或局部变量。
2。Tenured Generation:用于存放“驻留”对象(即较长时间被引用的对象)。往往体现为一个大型程序中的全局对象或长时间被使用的对象。
3。Perm Generation:用于存放“永久”对象。这些对象管理着运行于JVM中的类和方法。

.JVM选项的分类

JVM有这么几种选项供使用.
1.供-X选项使用的项目,又称为非标准选项,不同厂商的此类型选项是有所不同的。例如:IBM的JVM用的一些选项在Sun的JVM中就不一定能生效。这种选项的使用方式如下:
java -Xmn16m -Xms64m -Xmx64m java类名
2.供-XX选项使用的项目,这种类型的选项可能要求有对系统信息访问的权限。所以要慎用。这种选项的使用方式如下:
java -XX:MaxHeapFreeRatio=70 -XX:+Print GC Details java类名
3.java选项(即在命令行执行java后提示的选项).
java -server -verbose:gc -d64 java类名

.垃圾收集分类

在JVM中有两种垃圾方式,一种叫做Minor(次收集),另一种叫做Major(主收集)。其中Minor在 Young Generation的空间被对象全部占用后执行,主要是对Young Generation中的对象进行垃圾收集。而Major是针对于整个Heap size的垃圾收集。其中Minor方式的收集经常发生,并且Minor收集所占用的系统时间小。Major方式的垃圾收集则是一种“昂贵”的垃圾收集方式,因为在Major要对整个Heap size进行垃圾收集,这会使得应用停顿的时间变得较长。

.GC信息的格式

[GC [: -> , secs] -> , secs]
GC为minor收集过程中使用的垃圾收集器起的内部名称.
young generation 在进行垃圾收集前被对象使用的存储空间.
young generation 在进行垃圾收集后被对象使用的存储空间
minor收集使应用暂停的时间长短(秒)
整个堆(Heap Size)在进行垃圾收集前被对象使用的存储空间
整个堆(Heap Size)在进行垃圾收集后被对象使用的存储空间
整个垃圾收集使应用暂停的时间长短(秒),包括major收集使应用暂停的时间(如果发生了major收集).
.GC信息的选项
-XX:+Print GC Details 显示GC的详细信息
-XX:+Print GC ApplicationConcurrentTime 打印应用执行的时间
-XX:+Print GC ApplicationStoppedTime 打印应用被暂停的时间
提示:1.":"后的"+"号表示开启此选项,如果是"-"号那么表示关闭此选项。
2.在不同的选项和不同的收集方式和类型下输出的格式会有所不同。

附录B:HotSpot JVM 选项
请参考JavaTM HotSpot VM Options
附录C:其他资源
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
http://java.sun.com/docs/hotspot/gc5.0/ergo5.html

if you want to get English version please see below:
http://www.anyang-window.com.cn/hot-pot-jvm5-tuning-in-the-gc/

Web service performance testing

I 'd like to use both JMeter and SoapUI to do web service performance testing, SoapUI provide more options or many strategies to conduct performance testing, and it support connection reuse and non-reuse to simulate the real situation more accurately.
However,I am used to using Jmeter and bellieve its results and its GUI :) so combine SoapUI and JMeter should be a good way to do web service performance.

1. get the WSDL from Developer friend
2.select Load WSDL from URL option both in SoapUI and JMeter SOAP sampler
3. copy soap/XML-RPC data from SoapUI into WebService(SOAP) Request of Jmeter
or you can create some XML file, load them using filename
4. do parameterization and correlation
5.design senarios and sheduler of performance testing in JMeter

give you an example of soap/XML-RPC data:







${uniqueID}
${uniqueID2}




one tip for making your testing scripts easier is to make full use of post processor--XPath Extractor, here is one example for Xpath query format:
//results[${__Random(16,30,)}]/uniqueID
a tree structure, it can be easier for you to look up the elements which you want.

吃饭与性能

今天中午吃饭,有同事闲打饭的效率太低,让我给点建议怎么调优。。。。
我立即给出一个建议:增加打饭员工的人数,不过好像有点浪费:)
其实打饭这个环节,我们还可以做另外的tuning,比如预先准备好足够的米饭,供客户索取,不必每次都需要客户向打饭的阿姨索要,有时候米饭少了还要重新添。
这个道理其实就像是对系统做Cache,米饭对吃饭的人来讲,每一碗基本没有什么差别,预先准备好足够的饭,这样加快了客户取饭的速度,也减缓了打饭阿姨的压力。
如果每次吃饭的人都来一个挨一个的问打饭阿姨要,那么如同每次发送的request都需要访问一次DB去取数据,而每次取得的数据几乎没有差别(都是一碗米饭),增加了DB的压力,给性能带来不好的影响。
当然预先准备好足够的米饭也有一些不足之处,比如,长时间放置米饭会凉,(cache会过期),那一般系统会有一些机制去控制cache过期,比如加exprie header,或者是版本控制,来防止客户可以取得想要的数据。
总之,在吃饭高峰之前,打饭阿姨把尽量多的米饭准备好,这样,打饭的效率就会高了:)

当然,如果是打菜的话,就不像准备米饭那么简单,菜的花样多,人的口味和偏好也不一样,组合也多种多样,提高处理效率还是用几个阿姨同时为客户打菜的好。

批处理:买饭票的例子可以说明这个原理,比如大家吃饭买票,一般都是把钱给一个人拖他买票,其他人去排队,而不是每个人都去买票,这样的结果是所有人都会在买完票时发现吃饭的队伍已经排的很长了。one mistake of performance practice:Chatty instead of Batch processing

HTTPfox for firefox,really good

Httpfox 是firefox上的一个插件,主要是来监测http traffic详细信息的工具,很好用,提供的内容非常全面,对于分析浏览器和服务器端的交互很有帮助。
针对每一个URL都有分类信息,包括:
response time
cache or not
resource type
URL
Headers--request header & response header
Cookies
Query string
Post Data
HTML source
相比之下IE上收费的软件就可以说ByeBye了

Server monitoring intelligence tool---Hyperic HQ

Today i install the hyperic on my application server successfully!

it is COOL , i follow the step on this page: http://support.hyperic.com/display/DOC/Installation+Windows#InstallationWindows-startServer

it is intelligence i have to say

Performance test training for just 7.5 hours

This week, i have been told to hold a session to train my team members who are interested in performance test, and I only have 1.5 hours per day, so totally only 7.5 hours to give a whole picture of what i am doing with performance test daily work.Really challenging and funny!

Basically, most of people are more likely to do the hands-on work than listening to the methodologies or principles, so I arrange the last 3 days to do more scripting work...i do not like this situation indeed

During the First day, I have a lot of passion to describe and explain what I think and learned all about Performance, so that losing My voice after almost 80 minuets non-stop talking,lol

Personally, I am a fan of Jmeter not LR, but I have to bring this "big Giant" to my team,.you know.....

I list my initial schedule and prepare my PPT,documentation and test scripts one week before:
Day1: Performance engineering overview
Performance test overview PPT: 1 hour
Q&A: 10 mins
Quiz: 20 mins
What kind of measurements or metrics should we collect during performance testing?
What's the difference between response time and throughput?
How can you make a plan or test scenarios before you conduct performance testing?
What's the life cycle of performance engineering?

Day2: Loadrunner Controller introduction
Fundamental of LR PPT: 50 mins
Q&A:10 mins
Quiz: 30 mins
describe the main components in Loadrunner, and what's the responsibility in them respectively.
Run demo scripts in LR controller, adding Load generators, setting the run-time settings, collect and analyze performance result.

Day3: Loadrunner Vugen introduction
Loadrunner VuGen Script Development Process introduction: 40 mins
Q&A: 10 mins
Quiz: 30 mins
Writing scripts and tuning on Demo application

Day4: Loadrunner scripting tips
introduce some important functions and method in Loadrunner and C: 40 mins
Q&A: 10 mins
Quiz:30 mins
Creating one scripts on XXX application.

Day5: Lab practice and Recap
45mins-1 hour:Yahoo! suggest perf scripts creating ---using Ajax(web & click) combine with web(Html&http) protocol
Recap and Q&A: 30 mins

performance trend diagram for load testing

I made some daigrams for performance result, The perf trend diagram bellowing, I did “Normalization” on the actual data to make the result can be comparable in single chart, so the unit is “1”:

it is useful for tracking the trend of performance under different load or even under different version of code, you can make two sets of these diagrams to make a track:

Four Laws for page response time

Alberto Savoia 在7年前写了这篇文章,但现在看来还是那么经典,把复杂的问题简化,而不是简单化。问题分析的很清晰: 1> Four Laws for page response time: The Law of stickiness: 用户会有依赖心理,但这并不代表忠诚。如果有一天他忍受不了或者有更易用的系统,他会离开 The Law of User perspective: 所有在内部测试的结果都不能完全代表用户的体验,只有用户真实的体验才是最正确的答案 The Law of Responsibility: 用户总是正确的,所有的缓慢都是根源于开发者没有预见到各种各样的问题所造成 The Law of Expectation: 每个用户都有自己的评判标准,快慢尺度,他们更多的感受来源于对相似产品的对比,所以跟竞争对手的产品进行对比是测试的重要一部分,至少不能比别人慢
2> Inside of web page response time - Page size: do not overlook all components on the web page, usually it includes HTML source, JS,imgs,Css etc. One way to find them out is to use some tools, I use Yslow to catch all components and calculate the Page size. - Minimum Bandwidth between server and Client: 短板原则, 往往此值经常取客户端的网络传输速度, 56K modem, usually takes 4Kbytes/second network speed on average. - Round Trips time: from a request sent to the first bytes of data returned, Round trip time can be considered to use "ping" command line - Turns: it takes several cycles to send the entire content of a page due to browser deign and Http 1.1 Turns= 4 + 3*NumberObjects/4 + 3*JSObjects, 4 means base page total turns, one round of DNS lookups, three turns to download the base page 3*NumberObjects means it takes 3 turns to download each object, and divided by 4 means browser can download 4 objects concurrently for JavaScript download, it will block any other objects to download, so it is occupied the dedicate turns - Processing time: Server side: Static page< Dynamic page< complex transactions; load sensitive; Jmeter or other load test tool Client side: based on number of objects, page parser, CSS draw and js execution. type of browser and client side machine may effect as well; not load sensitive; stopwatch or watir-Jmeter(single user run) - Browser Cache matters: if open browser cache, then it will not need to download objects besides base page(HTML doc) Web page response time: Question: So how can we make response time shorter? Steve souders answered many on front-end part tuning tips.

利用TOP 和 NewID() 取得随机记录

学到了如何在数据库中寻找随机记录的方法:利用 TOP 和 NewID() 语句
Select top (@top) tables.fields
FROM MytablesJoinedTogether
Where stuff = otherstuff
Order by newid()

就可以随机得到前top条记录了。。。

Beanshell kickoff

最近在写beanshell 脚本 for JMeter,在这个脚本乱飞的时代:
${__BeanShell(String newstr = "helloworld".replace("e"\,"j"); return newstr;,)}

买到了《最后的演讲》

买到了最后的演讲,整本书是英文原版,108元,是至今为止买的最贵也是最薄的一本外文书,已经期待了很久,那是在兰迪教授还没有离开人世之前,这本讲述实现儿时梦想的书,是兰迪留给我们最后的财富。。。
兰迪教授,一路走好!

梦想

1. 开书店
2.旅游
3.大学老师,真正对学生好的那种
4.打球,评球(中国足球除外)
5.做动画片

Set unique variable value in Loadrunner

设置唯一值的方法通常可以用当前的时间戳(精确到毫秒):
1. 首先将timestamp设置成一个date/time类型的参数,
2. 然后,保存参数到变量里,便于每个interation都可以使用到这个固定的时间戳lr_save_var( lr_eval_string("{timestamp}"), 14, 0, "timestampstring");
3. 接下来引用在其他的语句中:

web_edit_field("userName_2",
"Snapshot=t7.inf",
DESCRIPTION,
"Type=text",
"Name=userName",
ACTION,
"SetValue=atestperf{timestampstring}{threadnum}",
LAST);

我加入了{threadnum}用来在并发过程中保证每个thread的唯一性,当然还有其他方法和格式能达到这个目的 :)

How monitoring data helps Performance analysis

This message will be focusing on how monitoring different data in performance test help analysis. only a draft and brain storming list, all the cases are we met during performance test and tuning phase :)

This message will NOT INCLUDE below 2 categories.

* LoadRunner/JMeter test tech and result analysis(90% response time, throughput, etc)
* Regular profiling (Web/App profiling, DB profiling, JProfiler, etc)

Total CPU of each server

* Case 1: "X Application" call common search service. While search server CPU is low and "X Application" server is extremely high, we realized bottleneck is on "X Application" side.
* Case 2: web server CPU high but App low. Later we found it is g-zip function in web server takes lot CPU so we reduce compression level to solve it. Similarly you can do quick judgment which one is bottleneck – DB, app, web, etc.
* Case 3: if you found DB server CPU very low, but from SQL Profiler SQL is slow, then it is possible there is DB lock or too much data fetched.

CPU / Disk read write/Memory/GDI Objects of each process especially java process

* Case 1: Rtvscan.exe is busy – obviously virus scan is on-going
* Case 2: csrss.exe very high. This process is kernel process that handles windows graphical work. Later we found in performance environment the app server console is not minimized and there is lot output of log on console. Painting takes lot CPU and slows down response time.

Check server log

* You might find lot debug log inside it which indicates log level incorrect. Usually needs double check log level after performance environment deployment complete.
* See if any errors in log
o If a lot errors, the response time is meaningless and should resolve this first.
o If there is few, even result could be great we still should not let them go.

File system

* If your application is using file system for temporary data transfer and designed to delete after process complete, you need to check if any file not cleaned up.
o If yes, that indicates you have some problem in temp file clean up or exception caused clean up work not happen
o Also if files accumulate, it can hurt performance greatly! – File read write will be very slow while too many files/folders under same folder.
o Design review needed at this time

Message Queue (check if it queues up)

* Case 1: We once found after each round test there are 10000 messages not processed and find send email has issue.
* Case 2: while "X Application" call search service, we find after few minutes several hundreds message queue up at ActiveMQ which blocks the communication between "X Application" and search service
* Also check how many DLQ left – that indicates your test is not that successful as you saw from JMeter/Loadrunner

JVM Memory usage

* If after each full GC the memory gets higher compare to last full GC, there is possible memory leak. (have a lot of doc describing how to tune and debug GC problems)

Cache hit rate

* If you see the cache hit very rate is very high during test, you better doubt if the case is designed realistically and if it hides issue

Connection pool status check(web server connections, Tomcat connections, DB connections)

Actually we don’t have real case finding issue by monitoring the connection pool status but we do double check connection pool size setting after deployment complete.

welcome all comments and feedback. :P(Neil wrote this, I am his assistant and first reviewer,haha)

Isolation issues on performance test and tuning

This message will be focusing on how we isolate performance issue from testing perspective
Found many errors (from Loadrunner/Jmeter or from back-end server log)

* Try single user see if it works fine then add load slowly till the error happens
* Try different data (e.g. account / user)
* Remove certain scenarios from the complete set and see which scenario is the trouble maker. E.g. in reporting performance test, with / without scenario 6, 7, 8 the error rate is very different

Note for sure we should analyze loadrunner/jmeter/server error log as early as possible. and I always look at Loadrunner/Jmeter errors from the very first occurrence(look at the error message time stamp), usually the following erros are caused related to the first one.
Certain scenario / step is slow in formal performance test

* Tune the scenario / step using single user
E.g. When tuning website publish module, we first focus on single user to go through every method/SQL and optimized the code (especially remove unnecessary SQL). After this step we begin to tune the performance under multiple user.
Note tuning under single user is helpful at the beginning phase but analyze multi user result (profiling result, etc) is also critical for performance tuning.
* Disable other scenarios / steps and focus on few cases with multiple user
E.g. we have a combined case of login, sign up, search, vendor detail. When executing the full set of test we find vendor detail is relatively slow. So we decided to run the vendor detail only – all the threads running login once then loop forever to view vendor detail. Thus, the issue inside view vendor detail got highlighted and easily find the bottleneck and fixed.
Note often the performance issue is caused by contention between different cases but before that we can try to tune single step first with multiple users.

if you have memory leak or connections being killed situation, then you also can do the isolation work to narrow down the issue and do the certian step tests with dev or DBA, then you can seize the killer quickly.
when doing tests you can ignore the think time and add more load to reproduce this kind of issues, do not be shy, be rude :)

thread Dump analysis on Myloadtest

recently, i am often work with Neil on performance tuning with thread dump analysis, so i need to find some material about thread dump stuff:
http://www.myloadtest.com/java-thread-dump/

Recently, I just notice that one very interesting discussion on Myloadtest blog, it is a pity that Myloadtest.com has not been updated for a long time...... or he changed his blog spot?? :)

Analyzing thread dump practice

With this message I would like to share some real case that we successfully identified performance bottleneck through analyzing threa dump and correlate with our code.

The basic idea is, when the performance testing is running, get a snapshot of all threads several times(usually I would like to take 5-7 times at a randomly interval), see what they are busy at.

* You might see most threads have similar stacktrace and are pending on same method
o Maybe most of them are waiting for getting lock – there is thread lock happened.(waiting for monitor entry)
o Maybe that method is really time consuming so there is high chance most threads are working on that.(Runnable and high CPU utilization)

App profiling log may also help to breakdown time and find which method cause most time but with thread dump you will be able to look into the 3rd party libraries at low level API.

To generate the thread dump(except using Ctrl+break under console mode for Windows):

* http://www.jboss.org/community/docs/DOC-9804
* http://www.jboss.org/community/docs/DOC-12300
* https://visualvm.dev.java.net/threads.html
* Jconsole,Mbean

You can easily find some articles on internet how to analyze them and here is a few

* http://java.sun.com/developer/technicalArticles/Programming/Stacktrace/
* http://www.0xcafefeed.com/2004/06/of-thread-dumps-and-stack-traces/
* http://www.myloadtest.com/java-thread-dump/

Analyzing thread dump actually needs some knowledge but the real cases I would show below, is kind of “idiot”, you will see…
Case 1 – solved issue caused by BeanUtil.copyProperties() and log.trace()

During performance test we found view vendor summary is very slow. This page call search web service to get data, search server CPU is low but "X application" server CPU is high.

Take a look at the thread dump, we find most threads are doing VendorAdapter.unmarshal() and locked each other at javax.management.modelmbean.DescriptorSupport.clone().

Read little more code, we find

* VendorAdapter.unmarshal() used BeanUtil.copyProperties() in a loop
* BeanUtils.copyProperties() triggers log.trace() without checking isTraceEnabled().
* Then it triggered our customized logger and finally triggered config.getVersion() which caused thread lock.

Solution

* In XXXLog4JLogger.trace() add a check isTraceEnabled() before call super.trace(this.constructLogData(message));
Result
* Response time decrease from 6.2 seconds to 3.9 seconds

Later, with same approach we find BeanUtil.copyProperties() caused another thread lock at MappedPropertyDescriptor.getPublicDeclaredMethods so we complete give up BeanUtil.copyProperties().
With this change, response time decrease further.

Total improvement is from 6.2 seconds to 2.8 seconds under expected load.
Case 2 – identify issue caused by call to ActiveMQ

This happens on search page.

While testing with 50 users, we found after 5 minutes CPU usage of X application app and search app will dramatically decrease (from 90% to less than 50%). Meanwhile server response time become very slow.

By taking thread dump on X application app console we find by that time half of the http request hander thread is at com.XXX.commonsearch.client.util.SearchJMSQueueConnectionUtil.sendObjMsg, while the rest is waiting.
Justin also found there are around 400 messages pending on ActiveMQ

Then we quickly made a temporarily change to the code to remove that call to ActiveMQ and test again. The issue is gone.
This allows X application team continue with other testing and tuning while search team investigate the solution.

Case 3 – confirmed the issue caused by creating dispatch object for web service call

Again, this happens with vendor detail page while call search service.
From the profiling log we find most time of the page consumed in calling web service but when we directly test web service using jmeter, it is fast. so that indicate client side which call web service might have problem.

By looking at the thread dump we realized most time / CPU consumed on read WSDL file and create dispatch object for web service call, every time.

Well, thread dump analyze does not really contribute to the problem solving this time. our dev guys figured out a solution by reading code which is used to call the web service– as the dispatch object is actually thread-safe after investigation, we can cache it.

人才推荐

推荐,应届硕士,通信电子专业,女,数学(超)好---天生的,不喜欢表现自己,善于发现细节问题,曾通过百×度×三面,没有任何工作经验,心态积极向上,待人真诚。
如果你相信我,请也相信她;
如果你不相信我,也请相信她!:)
目标:软件测试以及软件开发。。。。

Do not need Ctrl+break to generate the Thread Dump

I try to use the sendSignal.exe to generate Theaddump in our application, it is easy to use and I put the thread Dump into instead of a console....
start the application by CMD: XXX.bat > logfile1.log
then just call SendSignal , it will generate TD into logfile1.log
I set the log level into Error, so there is only TD info in logfile1.log if no any error exsits.
it saves my time :)

here is the Link for sendSignal:
http://www.latenighthacking.com/projects/2003/sendSignal/

当你发现LR录制的脚本为空的时候。。

今天用Loadrunner Web(http&html)协议去录制脚本时,突然发现录制的脚本为空。。。。
直接去查 recording options 选项,结果发现---->Network-->Port mapping--->capture level 变成了socket level....who changes it, who knows???
anyway, Change it back into WinlNet level, 现在脚本又回来了,像变魔术。。。

Thread Dump analysis--Awk scripts

Due to some external TD analyzer tools throw exceptions, i consider to write some tiny scripts to capture some data from thread dump log.
I use Awk to achieve this:

State1 : in Object.wait()
$ awk '($1~/"http-0.0.0.0/)&&$6=="in" {print $6,$7, i++}' Threaddump_1_Original.txt

State2 : waiting for monitor entry/ waiting on condition
$ awk '($1~/"http-0.0.0.0/)&&($6~/waiting/) {print $6,$7, i++}' Threaddump_1_Original.txt

State3: runnable
$ awk '($1~/"http-0.0.0.0/)&&($6~/runnable/) {print $6,$7, i++}' Threaddump_1_Original.txt

i got the total number of each thread state first(BTW it is just related to Http thread only),
before tuning my application code:

in Object.wait() --------15

waiting for monitor entry ----------23(waiting to lock)

runnable------------4

After Tuning Phase 1:

in Object.wait() --------16

waiting for monitor entry ----------20(waiting to lock)

runnable------------6

After tuning Phase 2:

in Object.wait() --------30

waiting for monitor entry ----------3(waiting to lock)

runnable------------9

when we did tuning work, i will use scripts to seperate the log file according to different thread state,for example:

$ awk 'RS="\n\n",FS="\n" {if(($1~/http-0.0.0.0/)&&($1~/waiting\ on/)) {print $0,i++}}' ThreadDumpA.txt >TDlog1.txt

This may help dev to focus on the real problem in thread dump and improve their efficiency, if you have any other great idea to do this kind tuning, please leave your comments here :)

Performance testers, you do need friends

Sometimes I saw some performance testers is to be blamed or they blame themselves due to some performance issues are found in production by customer....the testers may feel helpless from their face:"I am the only one who did the performance planning, testing and tuning,coding, verification, deployment...... I know I should be blamed for today's result....."

However, I do not feel lonely when I work as a performance test engineer, i do have a lot of dev friends even managers to help on solving the problem and make my Guess happen(although sometimes we may take an indirect route), Fortunately, my friends give their trust to me and we work closely!!

1. We work as a team, if there is a problem, then every team member is to blame
2. We did not spend unnecessary time to argue on how to simulate the real production data and behavior, you know it is Utopia :)
3. Focusing on solving problem instead of avoiding the problem in a self-deceiving way, we can not allow things getting worse and worse as time goes by
4.Everything is following an iterative way, do not expect to reach your aim in one move. we improves our planning, testing and tuning work step by step, accumulating a lot of our own "best practice" in daily work

So do not be alone to do performance work and also do not consider the performance problem is belonging to one, let friends get involve, it is a collaboration work naturally :)

JMeter While/if controller---set the conditions

in JMeter while controller, I just want to use multiple conditions to controll the while loop, it supports __javaScript function, so I use as following first:

${__javaScript(${usefulnessflag_g1}<0,)}&&${__javaScript(${pagenum}<${pagenummax1}),)} but Jmeter stop at that While controller, I releaized the condition not work this way

Change it into right way, just use single __javaScript function including multiple conditions, then it works well:

${__javaScript((${usefulnessflag_g1}<0)&&(${pagenum}<${pagenummax1}),)}

and Meanwhile , I found one issue of Jmeter,

I use "if controller", and find the error message in JMeter.log:

ERROR - jmeter.control.IfController: missing ; before statement (#1) org.mozilla.javascript.EvaluatorException: missing ; before statement (#1)

then i isolate problem, it happens on this line: ${__javaScript("${reviewflag2}"=="",)},

I set this string compare into if controller condition field, but it failed---not always.... i realized that try to remove the __javaScript function, then change it into:

"${reviewflag2}"=="", it worked!! so it might be a bug in Jmeter "if controller"....

for ${usefulnessflag_g1}>0, it also works well as ${__javaScript(${usefulnessflag_g1}>0,)} :)

hope it help you !

Table column type: NVARCHAR Vs VARCHAR

“We should standardize the data type to be NVARCHAR” from performance perspective :)

http://sqlserverexperthelp.com/javamiddleware.aspx

first, Please tuning your DB

Before doing the entire tuning work, our team always like to tune with DB side to guarantee.

Except for planning a good design of the DB schema is important to our performance, we can do several "simple "thing to tune our DB afterwards:

1. open the SQL profiler (in MS SQL server) and walk through all the steps/scenarios with single user thread

2. we expect each SQL statement execution time limited to 20 ms

3. find the slow or questionable SQL statements and then see:

1> proper index missing or should be removed

2> remove unnecessary SQL statements

3> use Nvarchar instead of varchar for the columns...

http://joychester.xhblog.com/archives/2008/372384.shtml

4. then running combined performance tests, no deadlock issue and other gradual increased SQL statements.

Unfamiliar processes dictionary

I would like to look up for unfamiliar process here:
http://www.processlibrary.com/

I usually check and clean up some unnecessary processes on the servers which it takes CPU time(user CPU/system CPU) before doing performance testing.

take total errors number as a Key indicator in your result

In LR, there are two indicators to describe how many errors or failures your scripts or application generate. Errors numbers and transaction failed numbers. Personally I would like to take total errors as a Key indicators to show how successfully this, instead of failed transactions number.

Take errors rate into consideration not transactions failed only:

1. Every error has its own reason
2. If many errors occurred, then the tests result are question, response time may misleading…
3. Errors and failed transactions should be correlated, so errors can be the complete set of failed transactions.
4. Try to solve the errors or make sure what happened behind each errors, may scripts issue itself, may application issue(raise a bug), or testing data has problems….

Cool scrum short intro video

Learn how to make scrum happen in just 10 minutes:
http://www.youtube.com/watch?v=Q5k7a9YEoUI&fmt=22

中文版web应用性能测试指南出版了

没有原版,只能先看看中文版的翻译质量如何,JD.meiers和Scott Barber 的作品,先顶一下,虽然已经有了电子版,但是还是收藏一下:

If you met High CPU utilization on your Application server...

I found the "Performance analysis for java Webistes" mentioned high CPU utilization problem analysis, it is useful and very practical for our performance work.



During our local performance testing, if we found:
1. High User CPU%
High user CPU% means the server is very busy handling requests from the web application:
----Heavy Load
----Bad code design and implementation
----Poor hardware
so from performance testing perspective, we should consider:
----Does the performance testing load pattern realistic or make sense to the real world model? if not, adjust it first, including user load, user distribution and test scenarios
----take Thread Dump snapshot or use some profiling tool to detect what kind of tasks /methods every thread is doing(especially for methods which the slowest action related ), then do the code review after identify the bottlenecks
----if nothing can be done by anyone, upgrade your hardware to meet your target :)

2. High System CPU% (Task manager: kernel mode; Perfmon: privileged mode)
----Http server listener will impact your App server CPU%, need to separate your web server and App server
----"Kill" or Reduce other unnecessary processes (in task manager, user name of these processes called System), like Anti-virus process or csrss.exe.

3. HIgh wait CPU% (total CPU%-User CPU%-System CPU%=wait CPU%??)
----take thread Dump to see if most of threads are "waiting on a monitor entry " or "waiting on a condition" , if so , then you might focus on your tuning work at once
----you might need to check your logging settings or code related to logging part
----you might check disk availability on the server
----you might check database efficiency, (Sql profiler may give some help on this)
---- you might check asynchronous process design, if its implementation is properly
----Check Network or remote issues?

enjoy!

Performance testing/tuning metrics update version

long time ago.... i wrote a initial draft for the key performance metrics i always considered during performance testing and tuning:
http://joychester.xhblog.com/archives/2008/318932.shtml

I want to emphasize and thinning them this time:
Performance testing metrics:
1. Test duration and Time period
2. User load, User distribution on all test scenarios(diagram better)
3. Server response time (top5 + Detail 90% line)
4. Response time highlight for Top 5 transactions and trend(diagram)
5. Error Rate=errors/total number of requests(or particular action's failed rate)
6. Throughput(KB/sec, hits/sec)
7. All servers OS resource monitoring
- CPU% in average and its trend: total CPU%, User CPU%, system CPU%
- Memory% committed in use
- Disk I/O (DB server, App server and file server)
8. Raw data for all response data(attachment)
9. Server error logs---Apache/Jboss error log, if any, Dev should investigate and solve them first(either application, configuration or test scripts issue)!


Performance tuning measurement supplementary
1. SQL profiler log and trend(average,Max, where duration> 500ms order by duration desc)
2. Thread Dump
3. GC trend, or even heap dump
4. Profiling logs--methods level time consuming
5. Apache Access logs
6. JMS table(including DLQs), ActiveMQ monitoring when there are some asynchronous processes on the back-end
7. User accounts or some data sensitive parameters should be provided into consideration and correlated with the response time
- different accounts has different volume of data
- different key words search may grab different volume of data
8. other kind of perf testing result attached, for example change configuration testing or single step isolation testing result

jTDS:useCursors+prepareSQL performance benchmark

I did a performance benchmark based on different
combination of useCursors+prepareSQL, so here is the comparison
result:

Round 1: useCursors\=true;prepareSQL\=3
Measurements Value
Search.do 1507ms
Throughput 28.88hits/sec

Round 2: useCursors\=true;prepareSQL\=2
Measurements Value
Search.do 1597ms
Throughput 28.7hits/sec

Round 3: useCursors\=false;prepareSQL\=3
Measurements Value
Search.do 1488ms
Throughput 29.2hits/sec

Round 4: useCursors\=false;prepareSQL\=2
Measurements Value
Search.do 1446ms
Throughput 29.8hits/sec

so the Winner is useCursors\=false;prepareSQL\=2, and it is suitable for both perf testing and tuning(catch SQL profiler logs).

here is the link of description of useCursors and prepareSQL
http://jtds.sourceforge.net/faq.html#urlFormat

Front-end tuning on the paper...

Put some thoughts and aggregate some information on front-end tuning for your reference, it is on my paper, but really want to start with:

Front-end tuning Checklist:

1. Check Memory leak issue on browser side
2. Cache static objects: make use of Etag or Expire headers to reduce the number of 304(Not Modified)requests
3. Combine Javascripts/Css: to reduce the number of requests
4. Gzip: no much to say, we benefit from it a lot(even some issues there)...
5. Compress images: use jpg, Gif, png instead of bmp and something else to reduce the size of images as much as possible
6. Minify Javascript/Css: removing unnecessary comments and white space, to save more Kbytes received. we can only make this experiement on the local performance envrionment not on the dev or QA envrionment, so it does not disturb our normal life
7. Remove any resources no longer exsit which can cause “404 errors” even do not download anything
8. Tune any javascript performance if we had time or gain some experience on that
9. Apache Http server Tuning initiative

What get measured get managed:

Monitoring requests and response data from IE/firefox
1. Using Firebug together with Yslow
2. Using AOL Pagetest
3. Using IBM pagedetailer
4. Using Httpfox
5. Writing Watir automation scripts to measure the end-end response time
6. Task manager(for monitoring the Browser memory usage)

A few awesome books or articles which can be considered as a “CookBook” :

1. OReilly—High Performance Websites by Steve Souders
2. OReilly—Website Optimization by Andy King
3. AST—”Right Click → View Source and other Tips for Performance Testing the Front End” by Scott Barber
http://www.associationforsoftwaretesting.org/drupal/December.Final.pdf
4. javascript tuning tips by Steve Souders :http://stevesouders.com/docs/widget-summit-2008.ppt
5. Apache performance tuning tips: http://httpd.apache.org/docs/2.0/misc/perf-tuning.html

JVM GC tuning experiement

got a lot of useful tips from http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html, but really need to try it before believe in it :)

I just did the load testing by 60 user load based on different JVM Heap size settings, here is the result with different JVM Metrics:

Heap size(Eden size/ total Heap size) Full GC Frequency JVM throughput Throughput measured by JMeter Max pause time(Full GC time cost)
100m/1024m Full GC every 7 mins 96.71% 27.42 2.01s
256m/1024m Full GC every 20 mins 97.62% 28.69 1.91s
512m/1280m Full GC every 1 hour(defualt GC interval) 98.25% 29.46 1.92s

The table bellowing shows the difference from response time perspective under different JVM Heap size settings:

Heap size(Eden size/ total Heap size) 100m/1024m 256m/1024m 512m/1280m
action1 520ms 449ms 424ms
action2 474ms 416ms 378ms
action3 746ms 639ms 567ms
action4 1345ms 1227ms 1193ms
action5 1219ms 1100ms 1079ms

In general, we can see the different JVM heap size settings will impact our application performance, we mainly want to minimize both max pause time and its (Full)GC frequency on JVM.
this time we simply enlarge our Eden space(AKA Young generation size) and total Heap size respectively;
We will have a further looking at this spot when we have time…

Capturing Live Network Data using WireShark

The network data transformation is a black box to most of us, we do not know when there is a slowness, what the servers are actually doing. (I usually isolate the network first when I conduct the performance test, but it is just a assumption or a ideal environment...)

Here is a great tool to capturing live network data, i can see what is going on with the network issues more detail!!

Using Httpfox to watch popup window source

Sometimes, if you want to see the popup window source, you can not get it from"right click-> view source".
I just to get the popup window source by Httpfox:
1. just start to record the Http traffic through Httpfox
2. open the URL of popup window in the browser
3. get the Html source from Content tab
That's all :)

Get response data size in Loadrunner

For some reason, I need to get response data size for particular web pages. From Loadrunner, there is one simple method to call: web_get_int_property()

it supports:
- HTTP_INFO_RETURN_CODE
- HTTP_INFO_DOWNLOAD_SIZE
- HTTP_INFO_DOWNLOAD_TIME
- HTTP_INFO_TOTAL_REQUEST_STAT
- HTTP_INFO_TOTAL_RESPONSE_STAT


code sample:
...................
web_browser("www.google.com",
DESCRIPTION,
ACTION,
"Navigate=http://www.google.com/",
LAST);

downloadsize=web_get_int_property(HTTP_INFO_DOWNLOAD_SIZE);

lr_log_message("%s,size=%d",lr_eval_string("{username}"),downloadsize);
....................

another input from one's blog:
http://www.cptloadtest.com/2005/05/10/PageSizeMonitorInLoadRunner.aspx
it might useful for you to monitor your page size

解惑

茌姓

  茌[茌,读音作chí(ㄔˊ)]
  一.姓氏渊源:
  第一个渊源:源于姬姓,出自春秋时期曹国古茌丘,属于以居邑名称为氏。
  春秋时期,周文王之子曹叔振铎(姬振铎)被其兄周武王封于曹(今山东定陶),建曹国,都陶丘,辖地大致为今山东定陶一带,为伯爵诸侯国。曹国有地名重邑,后改茌丘。公元前487年,宋国灭曹国,收其茌丘为茌邑,后改山茌城(今山东济南历城党家庄镇),其居人以邑为姓氏,称茌丘氏,后省文简化为茌氏。
  第二个渊源:源于姜姓,出自炎帝大臣吴权之后裔吴中,属于以帝王赐爵名称为氏。
  吴中(公元1373~1442年),武城人。著名明朝重臣。明朝洪武末年为营州后屯卫经历。明成祖朱棣取大宁,迎降。因运军饷、守御有功,晋升为右都御史。明永乐五年(公元1407年)改任工部尚书,后从北征,艰归,改任刑部尚书。明永乐十九年 (公元1421年),吴中与户部左侍郎夏原吉、兵部尚书方宾等同上奏北征军饷困难,因而违反了帝意,被关进监狱。明仁宗朱高炽即位后复官,加封太子少保。明宣德元年(公元1426年)从征乐安,明宣德三年(公元1428年)坐以官木石遗中官杨庆作宅,下狱,后被保释,夺禄一年。明正统六年(公元1441 年)复官,升任太子少保。
  吴中勤敏多计算,先后在工部二十余年。今北京的宫殿、长、献、景三帝寝陵,皆分所营建。吴中职务填委,规画井然。然不恤工匠。湛于声色,时论鄙悖。
  吴中于明正统七年(公元1442年)逝世,终年七十岁,追封茌平伯,赐谥为“荣襄”。在其后裔中,有以其封号为姓氏者,称茌氏。
  二.迁徙分布:
  今茌氏族人主要分布在江苏省徐州、丰县一带地区

AutoITX3, it helps to make automation happen

Once you got such an error, when you run in ruby:

Unknown OLE server: `AutoItX3.Control' (WI2OLERuntimeError)

that might mean you have not registed the AutoItX3.dll in your computer,
in current Watir version, AutoItX3.dll has been included in the watir folder,
so Run the command in "Run":
regsvr32 "C:\ruby\lib\ruby\gems\1.8\gems\watir-1.6.2\lib\watir\AutoItX3.dll"

note: you need to pick up your own path of where AutoItX3.dll locates.

then you will not get the errors :)

learn AutoItX3, please go to http://www.autoitscript.com/autoit3/index.shtml
I just want to use it to handle the popup windows when writing Watir scripts

How to handle the popup window using watir scripts

sometimes, we will meet the popup window issue with our automation scripts, it can not catch the window easily, so use the AutoItX3, we can solve this kind of problem well:
require "watir"
require "win32ole"

def check_for_popup_window

flag =0
autoit = WIN32OLE.new('AutoItX3.Control')

while (flag ==0)

# check the popup window status, set the timeout =1

flag = autoit.WinWait("Microsoft Internet Explorer", "", 1)

#another way to do this, you can get the popup info with autoit Spy tool
#autoit.ControlClick("Microsoft Internet Explorer", "", "[CLASS:Button; INSTANCE:1]")

puts(flag)

# If window found, send appropriate keystroke (e.g. {enter}, {Y}, {N}).

if (flag==1) then autoit.Send("{Enter}") end

sleep(0.5)
end
end

#main body of watir program:

ie = Watir::IE.new

ie.speed = :fast

puts "Step 5: Click one vendor detail link"

#create a new thread to call check for popup window method
popup = Thread.new {check_for_popup_window}

ie.goto test_vendorURI

sleep 1

#tear down the thread has been created
at_exit { Thread.kill(popup) }

A simple example for Xpath on web service test

when you do automation test or performance test on web service, you may need to do some correlation work on dynamic value returned by server.
Take Jmeter as an example, for correlation work, there is two ways to catch dynamic value
1. using Regular Expression Extractor-- you can write simple Regular Expression to extract the value
2. using XPath Extractor--write Xpath to seek the value

Here is a simple example to show how you deal with web service correlation work using Xpath :

I want to get the token value returned by the server, here is the response XML:





true
3FBCB0BDA521B5BC1A399AC094063993D26BC70BA1E3175927D6F






Xpath: //*[local-name()='token']

how about there is the namespace in the XML?? just like this:



getResetPasswordTokenResponse xmlns:ns1="http://user.123.com/user_security_service">

true
3FBCB0BDA521B5BC1A399AC094063993D26BC70BA1E3175927D6F






XPath://*[local-name()='token' and namespace-uri()="http://api.user.123.com"]

it is simple, and user-friendly :)

at Last, you can test your Xpath expression here : http://www.mizar.dk/XPath/Default.aspx

hope this helps!

using Xpath way in Watir with Caution---slowness

i am used to write watir scripts to do the single user performance test or launch it under load test, it helps a lot when I want to get the response time from real user experience perspective.

but there is some situation i need to use Xpath to get the object which is to be clicked....
ie.button(:xpath, "//div[@id='ContentWrapper']/div/div[3]/a/img").click

i find it is realllllly slow when Watir is seeking the object within the Dom, i realize this way is not suitable for me to get response time,just for automation test maybe....

for almost situation i need to click the objects, i always use:
ie.link(:index,"118").click

or I change Post into Get method to get the same thing, for example:
Change : ie.button(:xpath, "//div[@id='ContentWrapper']/div/div[3]/a/img").click
into :
ie.goto "http://1.2.3.4/search.do?para1=performance&pageNumber=1"

so anyway, I like Watir, but using Xpath way in Watir with Caution, it is a little bit slow....

2009是憧憬的一年

经过2008这个多事之年,其实不想过多的回忆什么,倒是有种赶紧熬过2008年的感觉。。。。
冬天即将过去,春天还会远吗?
对于2009年,充满的更多是希望,想借个理由对自己说:新的开始,抛掉所有的不快,让我们重新上路。。。。。。
生活还将继续,而不同的是我们自己对待生活的态度:)
Happy 2009!! A brand New Life is Coming!

I miss Sierra Nevada Beer

突然想起前年在san jose 酒吧里喝到的啤酒,Sierra Nevada,啤酒的色泽跟我们普通喝到的啤酒不同,有些偏深,喝到嘴里苦中带甜,有股花瓣的清香。。。。(不知道是不是给女士喝的,哈哈,不过确实听好喝,价格也不是很贵)


不过还有一件印象深刻的是喝酒之前被服务员要求看护照,因为没带,所以问了句:“只带了国内身份证,能行不?”(注意,我的身份证还不是第二代的,还是最老款的那种,字迹还不是很清晰)
服务员看了看,说问问老板,说着拿着身份证去给经理过过眼
大概过了5分钟,他走过来,拿着啤酒单,对我微笑着说,没问题,您需要什么酒。。。同事都对我笑了,说明我看起来还年轻。。。。

新年了,祝大家有一个新的开始,如果国内有买这个酒的话,我想拿着它跟大家干杯!Cheers Up!

Do not shout at your JBoss!

An interesting post by Brendan Gregg who did a very unthinkable experiment to his JBoss server.

So he found that when he shout at the JBoss server, the server will take a long Disk I/O latency....so he give us a "take away", do not shout at your Jboss if you want a high performance! :)

here is the link of Brendan Gregg's experiment:
http://blogs.sun.com/brendan/entry/unusual_disk_latency

Class SimpleDateFormat is not thread safe

One tiny find this week from our team,during I run the load testing and there is one step we need to fill the target date into the text box.

in our code, we need to parse date/time format using SimpleDateFormat class, so if IthreadA is parsing a value while threadB is changing the pattern, then error message bellowing could happen,"occasionally":
java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:424)
at java.lang.Long.parseLong(Long.java:461)
at java.text.DigitList.getLong(DigitList.java:167)
at java.text.DecimalFormat.parse(DecimalFormat.java:1271)
at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1375)
at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1124)
at java.text.DateFormat.parse(DateFormat.java:333)
at com.XXXX.struts.DateConverter.convert(DateConverter.java:41)
at org.apache.commons.beanutils.ConvertUtilsBean.convert(ConvertUtilsBean.java:428)
So, we just simply change the definition of SimpleDateFormat object from class variable to local variable, this problem have been solved!

"This is a pitfall from JDK" :) I am not sure how many of you already release it....
You can look at the interesting article from Brian Goetz
http://www.ibm.com/developerworks/java/library/j-jtp09263.html

One thread lock issue sloved, another rises...

Recently during performing a load testing to our application, after taking stack trace from back-end, it appears 90% of http threads are locked as this kind of pattern:

"http-0.0.0.0-8080-12" daemon prio=6 tid=0x4bc5f620 nid=0xc54 waiting for monitor entry [0x4f4de000..0x4f4dfcec]
at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:81)
- waiting to lock <0x07e4b968> (a java.lang.ref.ReferenceQueue$Lock)
at java.io.ObjectStreamClass.processQueue(ObjectStreamClass.java:2206)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:253)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1035)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1375)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1347)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:302)
..........

Looking for help through the popular search engine, come across to this thread:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6525425
just exactly the same issue we met!!
we are using JDK1.5.0._07 version on our performance environment (BTW, we are not crazy to do version upgrade stuff :) )
this lock issue has been fixed in JDK1.5.0._14, look at the release notes:http://java.sun.com/j2se/1.5.0/ReleaseNotes.html#150_14

after validation, we find that the issue has really been fixed, there is no one thread lock as previous pattern, and throughput improves nearly 20% (under a relative high user load condition) ,HOWEVER, there is another type of lock rising almost 90% among all http threads.... Pattern 2:

"http-0.0.0.0-8080-48" daemon prio=6 tid=0x4dcc3808 nid=0xb0c waiting for monitor entry [0x5208c000..0x5208f9ec]
at org.jboss.metadata.WebMetaData.getRunAsIdentity(WebMetaData.java:511)
- waiting to lock <0x24e8c7e0> (a java.util.HashMap)
at org.jboss.web.tomcat.security.RunAsListener.instanceEvent(RunAsListener.java:67)
at org.apache.catalina.util.InstanceSupport.fireInstanceEvent(InstanceSupport.java:295)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:676)
at org.apache.catalina.core.ApplicationDispatcher.doInclude(ApplicationDispatcher.java:574)
at org.apache.catalina.core.ApplicationDispatcher.include(ApplicationDispatcher.java:499)
at org.apache.jasper.runtime.JspRuntimeLibrary.include(JspRuntimeLibrary.java:966)
at org.apache.jasper.runtime.PageContextImpl.include(PageContextImpl.java:602)
at org.apache.struts.tiles.TilesUtilImpl.doInclude(TilesUtilImpl.java:99)
at org.apache.struts.tiles.TilesUtil.doInclude(TilesUtil.java:135)
........

we are targeting to look at the source code of third party, hope to solve it luckily, one of source code related:
http://www.java2s.com/Open-Source/Java-Document/EJB-Server-JBoss-4.2.1/tomcat/org/jboss/web/tomcat/security/RunAsListener.java.htm