Friday, May 20, 2016

Get Page performance result by Combining User and Resource timing API

 //calculate page loading time by resource timing API  
 var perfEntries = performance.getEntries();  
 var end_probe = perfEntries.filter(function(item) {  
   //page load finish request indicator  
   if (item.name.indexOf('/req_url_end') > -1) {  
     console.log(item.name);  
     return true;  
   }  
 });  
 if(end_probe.length > 0) {  
   var end_time = end_probe[0].responseEnd;  
   var start_probe = perfEntries.filter(function(item) {  
     //page load start indicator  
     if (item.name.indexOf('/req_url_start') > -1) {  
       console.log(item.name);  
       return true;  
     }  
   });  
   var start_time = start_probe[0].startTime;  
   var duration = end_time - start_time;  
   console.log(duration);  
 } else {  
   console.log("page loading end indicator is not found, please double check all perf Entries");  
 }  
There is a better way to combine User Timing API and Resource Timing API to get accurate page performance, i am using Nightmare APIs as a sample to do the automated page tests, which can help our continuous page performance test process on daily basis:

 var url = "http://www.yourhost.com/abc/def/";  
 var page_complete_idy = 'key_request_name'; //page indicator by resource name  
 var tag = 'ReviewPage';  
 var env_name = 'prod'; 

 const RENDER_TIME_MS = 2000;  

var Nightmare = require('nightmare'),
  nightmare = Nightmare({show: true, switches: {
    'ignore-certificate-errors': true
  }});

nightmare
  .goto(url)
  .wait(function(idy){
    var perfEntries = window.performance.getEntries();
    if (perfEntries.length > 20) {
      return perfEntries.some( function (item) {
        if (item.name.includes(idy)) {
          return true;
        }
      });
    } else {
      return false;
    }
  }, page_complete_idy)
  .evaluate(function(idy){
    var perfEntries = window.performance.getEntries();
    var perf_obj = perfEntries.find(function (item) {
      return item.name.includes(idy)
    });
    if (perf_obj) {
      return perf_obj.responseEnd.toFixed(1);
    } else {
      return 'undefined'
    }
  }, page_complete_idy)
  .end()
  .then( function (duration) {
    console.log(tag + ":" + duration);
  })

Thursday, May 14, 2015

Replaying Your [access] log by JMeter


Replaying [apache access] log by JMeter to mimic the real user load..
This is inspired by blazemeter post: Learn How to Replay Your Production Traffic With JMeter, but I made my own optimization and enhancement, check it out if you are interested in it:

https://github.com/joychester/Doraemon

PS: the first row in the formatted log file will be ignored, due to fetch the log started timestamp.

Wednesday, February 11, 2015

Upgrade Ruby version on your Mac OS X

Finally I got a 15'' Mac Pro as my working laptop, it is kind of Chinese new year gift :)
First, the OS embedded a Ruby 2.0 version, which is kind of out of date, so upgrade to the 2.2.0 is my first stuff to do with my Mac.
The main steps i follow is This
However, after doing that, it is not working properly on mine, so i will simplify my steps as following:
  1. Install Homebrew: ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  2. Install rbenv to help you manage ruby versions: brew install rbenv ruby-build
  3. Adding extra Path to ~/.bash_profile : export PATH=/Users/cchi/.rbenv/shims:$PATH
  4. Open Terminal-> preference -> Shell -> Startup : source ~/.bash_profile
  5. Check all existing package you can install : rbenv list -l
  6. Install Ruby 2.2.0: rbenv install 2.2.0
  7. Sets the global version to Ruby 2.2.0: rbenv global 2.2.0
  8. Check the ruby Version: ruby -v
Start to Taste your Ruby and Mac!!


Update: i found this link which is great steps and explanation to install  Ruby by rbenv:
https://cbednarski.com/articles/installing-ruby/

Thursday, January 08, 2015

Scaffold Code on my github based on Structuring Sinatra Web Application



It is Sinatra module based and more structuring(or clean) than our previous one of my indoor project based on classical style code when I wrote it in 2013, so that’s why I rewrite the code and make it as a “framework”

Check it out from my github

A demo app using jquery, highcharts, bootstrap and Sinatra module based code, to show you how to organize the code and the folder structure to use this "framework"

Thanks to Inspired from:
Structuring Sinatra Applications
Structuring Sinatra Apps 

Online Editor c9.io 

 if you are using c9.io services as your IDE, you can easily take my following scripts to grab my code:
 require 'git'  
 require 'fileutils'  
 require 'sys/proctable'  
 $: << File.expand_path(File.dirname(__FILE__))  
 git_repo = 'https://github.com/joychester/Arowana.git'  
 target_dir = './arowana'  
 if ! Dir.exist?(target_dir)  
   g = Git.clone(git_repo, target_dir)  
 else  
   FileUtils.remove_entry(target_dir)  
   g = Git.clone(git_repo, target_dir)  
 end  
 # exec 'bundle install and rackup config.ru'  
 Dir.chdir('./arowana') do  
   `bundle install`  
   # check postgresql service if running  
   pg_service = Sys::ProcTable.ps.select { |process|  
     process.include?('postgres')  
   }  
   if pg_service.empty?  
     p 'please check your postgresql service if it is running, exiting...'  
     exit(1)  
   else  
     p 'ready to start your Arowana App'  
     `rackup config.ru -p $PORT -o $IP`  
   end  
 end  

Sunday, December 28, 2014

Automated WebPageTest using "snowboard"

I have pushed my project code "snowboard" to my github and check it out if you want to see if it is helpful or not for your daily Synthetic Front-End Performance Test:
https://github.com/joychester/snowboard

Thanks to Webpagetest, from now on, you can request your own API key from : http://www.webpagetest.org/getkey.php

you can freely write your own dashboard or store the whole thing to MongoDB or PostgreSQL etc,  for page trending and further analysis, or you can define your own page perception time by filmstrip which is an existing stage to redefine the page load time for so dynamic web pages.


Monday, December 15, 2014

HTTP1.0 and HTTP1.1 Performance with KeepAlive enabled


The recent misconfiguration to the ssl.conf of apache gives me the chance to test the HTTP1.1 and HTTP1.0 performance difference with KeepAlive ON, actually it stays there for years...

Pic1: Shows the HTTP1.1 with Keepalive ON performance overtime, stable and fast:













Pic2: Shows the HTTP1.0 with Keepalive ON performance overtime, up and down:













Current settings in ssl.conf, which makes all IE user agent use HTTP 1.0 as a response protocol:

SetEnvIf User-Agent ".*MSIE.*" \
         nokeepalive ssl-unclean-shutdown \
         downgrade-1.0 force-response-1.0

To fix the issue, just bypass IE1-6 which may have issues instead of applying to all IE user-agent (it is said to be fixed by latest apache version already):

SetEnvIf User-Agent ".*MSIE [1-6].*" \
         nokeepalive ssl-unclean-shutdown \
         downgrade-1.0 force-response-1.0

PS: Also tested when turn Keepalive to Off , the response time between HTTP1.0 and HTTP1.1 is similar, but 3-4 times slower than keepalive settings for sure due to handshake..

Monday, November 10, 2014

Tweaking your load generator machine if you are using the Windows platform


I have done this for a while, recently, some guy came to me and ask the same questions they noticed, here is the story:
Sometimes, we need to pay attention if you noticed the performance result is adding 200ms latency comparing with previous results on windows platform, that may due to the following reason (AKA, Nagle algorithm):

http://en.wikipedia.org/wiki/Nagle%27s_algorithm

How to fix the problem, to enable the TCP no delay on Client side!!! (TCP_NODELAY):

http://www.justanswer.com/computer/3du1a-rid-200ms-delay-tcp-ip-ack-windows.html

Meanwhile, you also want to tweak/increase your dynamic tcp/udp port range to support more concurrency requests to avoid port exhaustion
https://docs.microsoft.com/en-us/windows/client-management/troubleshoot-tcpip-port-exhaust

Check ipv4 tcp dynamic ports value: 
PS C:\WINDOWS\system32> netsh int ipv4 show dynamicport tcp 

Set  ipv4 tcp dynamic ports value (need administrator rights):
PS C:\WINDOWS\system32> netsh int ipv4 set dynamicport tcp start=10000 num=20000

P.S. meanwhile, if you have TCP Ack delayed configured on the server side, you may consider to disable it by using the TCP_QUICKACK socket option , since this could cause another 200ms delay to send acks to clients.

Sunday, March 30, 2014

Socket read timeout issue -- A Pattern with GC Activity



There may be several patterns for socket read timeout issue from client to the server, but this is one of the patterns I want to share:

Pattern A Description:
As we know, GC will make the world stopped(different GC Collector will have different behavior: https://www.cubrid.org/blog/3826410 and https://www.cubrid.org/blog/3826519)

When the “world stopped”, the JBoss(Tomcat) will stop responding any application threads execution as well as accept any coming connections except for the GC threads doing its own cleaning job…

Meanwhile, if Apache web server intents to establish a connection with JBoss(Tomcat) by AJP protocol, it will easily get the 200 seconds socket timeout issue(we defined by workers.properties), and looking at mod_jk.log, you will find the Error logs there:
PS: we set 1*apache and 1*Jboss on the same host, i am borrowing cubrid's nice picture, but we are using worker instead of prefork MPM :

 

Reproduce this scenario:
  Reproduce steps for this socket_timeout issue:

  •  Kick off load testing
  • Manually trigger the Full GC by jvusialVM and Jmeter Tree view will comes out the Socket read timeout Error


Jmeter_log:
GET https://hostname/help/services/popUp?nodeDesc=param1
Request Headers:
Connection: keep-alive
User-Agent:  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36 perfheader=4xlr3puk

Apache_Access_log:
10.80.8.59 - [28/Mar/2014:03:24:38 +0000] "GET /help/services/popUp?nodeDesc=param1 HTTP/1.1" 200 58 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36 perfheader=4xlr3puk" + requestTimeMicroS=200606264 xforwarded=10.80.8.59

Mod_jk_log: (PS: the timestamp is 200 seconds after the request sends out)
[Fri Mar 28 03:27:58OURCE 2014] [20266:1183357248] [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1274): (worker1) can't receive the response header message from tomcat, network problems or tomcat (127.0.0.1:8009) is down (errno=11)
[Fri Mar 28 03:27:58OURCE 2014] [20266:1183357248] [error] ajp_get_reply::jk_ajp_common.c (2118): (worker1) Tomcat is down or refused connection. No response has been sent to the client (yet)
[Fri Mar 28 03:27:58OURCE 2014] [20266:1183357248] [info] ajp_service::jk_ajp_common.c (2607): (worker1) sending request to tomcat failed (recoverable),  (attempt=1)
[Fri Mar 28 03:27:59OURCE 2014] worker1 hostname 200.610982

Current Solution to reduce such a timeout issue:
1.       JVM tuning, leverage the CMS GC collector to reduce the GC timing (making the stop world timing as shorter as possible, and GC less frequently)
2.       Enable Cping/Cpong in workers.properties if there is using AJP between Apache and Jboss (detect the broken pipe and avoid the handshake failure in advance)
3.       For Jboss4 role particularly, we need to get rid of the TCP CLOSE_WAIT connection problem which is introduced by Ping mode by Replacing AJP processor with JbossWeb Native Connector(http://www.jboss.org/jbossweb/downloads/jboss-native-2-0-10)  
4.      A mod_jk bug reported by apache, replacing socket_timeout by socket_connect_timeout and activate ping mode with proper timeouts (https://issues.apache.org/bugzilla/show_bug.cgi?id=49468)socket_connect_timeout is to specify the TCP connect phase timeout from Apache to JBoss with AJP protocol

      why setting Cping and Cpong important in workers.properties:
No CPing/CPong set
The CPing/CPong property in mod_jk is the most important worker property setting, allowing mod_jk to test and detect faulty connections. Not setting this parameter can lead to bad connections not being detected as quickly which can lead to web requests behaving as if 'hung'. (https://issues.apache.org/bugzilla/show_bug.cgi?id=49468)

Next Step:
  •      Find some typical socket timeout cases on PROD
  •      Compare different configurations/Settings during local PE test to see the effect
  •      Test and Learn...

    Sunday, February 09, 2014

    Lessons learned druing one of my recent projects

    1.       Prepare a good Planning and Clear target in advance, and we share the common purpose

    2.       Understand the envs/software/settings (Design, Mechanical sympathy, VMware DRS, Gateway Throttling Replication mechanism), doing some Research before doing any testing

    3.       A simple but straightforward tests, complex ones make things complex and hard to narrow down the problem

    4.       A Clean up the env

    5.       A repeatable, accurate and detailed Baseline first.. Do not rush to do any optimization before a repeatable baseline captured

    6.       Get Fully real time Monitoring, even the load generators

    7.       Make changes One by One, not all in once, do not mix things up, checklist all the stuff we have been made

    8.       Logs is not free, turn DEBUG logs off if nobody look at it, if we really need them, make them into INFO

    9.       Check the Disk space when you do multiple rounds and write tons of logs.. Response time will be slowed down suddenly

    10.   Visibility to the VMXhost who is using VMware to do PE tests and get to know how your hosts are distributed is critical to the tests(resource utilization should be balanced(CPU,IO,Mem) and not touching the ceiling of each host), Disable the DRS feature for the PE cluster, it is not good for getting repeatable data, but we can optimize the hosts based on our observations by monitoring the hosts utilization

    11.   Do not trust open source, need to dig into it when you are making use of it intensively!!