Tuesday, January 04, 2022

API Performance Testing in k6 during the development phase

 

General Goal ->  Finding the performance bottleneck and regressions by simply...

  • Running a API level testing
  • Measuring the key performance Indicators
  • Analysis the performance result and trend
  • Isolate the external dependencies if needed (focus on your own code rather than anything else out of your control)

In this wiki, we will adopt k6.io as the performance/load testing tool, which is easy to setup and run locally, meanwhile, create complete monitoring system to visualize your test results as well as essential JVM performance metrics. In terms of isolating the external dependencies, we will create a docker based mock service, so that we can control the pace and customize response body to simulate different scenario with minimum effort.

If you want to conduct a scenario based performance testing towards an integration env such as staging env, I would recommend to use JMeter to do so, it is comprehensive and more mature tool, but it is out of this wiki's scope. We are not going to talk about the stress test, soak test or capacity test, since they need a more standard(production mimic or equally scaled) env and different test strategy, need thoughtful plan and focus on what we want to achieve by various experiment. The good thing is once you understand the basics of performance testing, you will be easily to have a better understanding with the other type of tests.

 

What I talk about when I talk about performance

My Daily life about Performance Engineering Cycle:

Performance is a generic term, it is difficult to give this word a concrete definition from single perspective. Performance issues could be caused by one or many factors, you may spend lots of time to find the right piece(s), clues or even using your educated guess to isolate the factors, prove your findings and resolve the issues. That's why performance issues always hard and some nerds are so obsessed with trouble shooting performance problems..

 

Why Local Performance test? (AKA, Unit test for API Performance)

The local environment is a great treasure(Any project can not be set up a local environment easily should be retired, seriously)!! It’s where we should be coding our load test scripts and from where we should initiate our load tests. Meanwhile, when I try to define "Local", here is not only referring to your own desktop or laptop, but any environments you are fully controlling and easy to manage and make changes without any impact to others.

Pros:

  • Easy to control
  • Flexible to manage your dependencies
  • Easy to setup and Test is cheap

Cons:

  • Hardware Spec limitation
  • Hard to compare with previous baseline
  • Difficult to simulate the complex scenario

 

K6 Local Env Setup:

To install the K6.io on Mac OS, Simply run following cmd: 

brew install k6

if you are using the other OS to run the tests, please refer to this link

 

Create a Simple API Test script using javascript and k6 lib:

API Level Performance testing supposed to be simple and straight-forward, so Dev could run it easily and often once they make any changes.


 

k6.io adopts javascript as its scripting language, and Go lang as its backbones. For detailed usage of k6.io, you can start with using K6 documentation

In general, the k6 test script at least contains a few blocks :

  • import used libs
  • define global const variables
  • define customized metrics/checks
  • define test running configs
  • init code function, just run once for all VUs, eg: deal with data parameterization (optional)
  • VU test code function, the scenario/steps for each VU
  • teardown code function, just run once for all VUs before ending/shutdown the tests (optional)

To simplify what i mentioned above, we will use following API Test script as a test template which provides the essiential elements and components to run a local perf test, for example naming your test script as sample_script.js:

 

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';
 
const SLEEP_DURATION = 0.2;
const PROTOCOL = "https"
const HOST_NAME = "test-api.k6.io";
 
//Define custom metrics
let successRate = new Rate("check_success_rate");
 
//Test running configs
export let options = {
  discardResponseBodies: false,
  userAgent: 'MyK6UserAgentString/1.0',
  scenarios:{
    http_get_api_3RPS: {
      executor: 'constant-arrival-rate', // use open model instead of close model
      rate: 3, // 3 RPS
      timeUnit: '1s',
      duration: '30s',
      preAllocatedVUs: 5,
      maxVUs: 15,
      startTime: '0s', // config stage tests
    },
    http_get_api_3RPS: {
      executor: 'constant-arrival-rate', // use open model instead of close model
      rate: 5, // 5 RPS
      timeUnit: '1s',
      duration: '30s',
      preAllocatedVUs: 5,
      maxVUs: 15,
      startTime: '31s', // config stage tests
    },
  },
  thresholds: {
    http_req_duration: ['p(90) < 250'],
    'check_success_rate': [{
      threshold: 'rate > 0.95',
      abortOnFail: true,
      delayAbortEval: '15s'}],
  }
}
 
//Init code
export function setup() {
  console.log("Init Testing..." + new Date().toLocaleString());
  return Date.now();
}
 
//VU test code
export default function() {
  // Send out the API
  const response = http.get(`${PROTOCOL}://${HOST_NAME}/public/crocodiles/?format=json`, {
    cookies: { my_cookie: "123456" },
    headers: { 'X-MyHeader': "apitest" },
    timeout: "15s",
    compression: "gzip, deflate, br",
    tags: {name: 'APINAME--GET'},
  });
 
  // Assert the response
  const checkResp = check(response, { // can be a combination assertion
    "response code is 200": (resp) => resp.status === 200,
    "content is present": (resp) => resp.body.includes("Bert"),
  });
 
  successRate.add(checkResp);
 
  // Simulate the think time
  sleep(Math.random() * SLEEP_DURATION);
}
 
//TearDown code
export function teardown(data) {
  console.log(`Test duration: ${ Date.now()- data }ms`);
}


During scripting phase,  we prefer to do Data Parameterization, so that we can try to avoid the cache and simulate the real world scenario, following is the typical methods we can use to deal with this: https://k6.io/docs/examples/data-parameterization/ or you can refer to one sample scripts i write in git repo

For some use cases, if the target API needs the other API's output as its input, this is called Correlation. For example, we can extract the data from previous API response body and compose this data as the input parameter to the API we want to measure most. k6 has the option to parser the response body and grab what you need for further steps(make sure you have the running config: discardResponseBodies: false). More example with correlation: https://k6.io/docs/examples/correlation-and-dynamic-data/ 

Recommendation:  in Local performance testing, we should avoid as much dependency as possible, using Mock services or generate "fake data" to remove the dependency as much as possible. Focus on your code and design first!

To run your test script locally once you prepare the scripts, execute following CLI after cd to your test script folder, usually you start your test with smoke testing to make sure your scripts has no Errors or unexpected results:

k6 run sample_script.js

Once the script is ready to do load testing , then you can tweak your testing running configs in script or you can overwrite some critical configs through CLI to meet your load target.

After all, we want to smell our own API, get confidence before you submit your commits and go to prod to monitoring your API with something (smile)

Some typical use case examples: https://k6.io/docs/examples/

K6 API documentations: https://k6.io/docs/javascript-api/

 

Test result visualization: 

Prefer to use influxDB + grafana to store and visualize your test result over time, so you can easily to notice the changes and time to go wrong, also easy to compare from time to time.

Install influxDB on your Mac OS, currently k6 does not support influxDB 2.0, so we will still use influxDB 1.8 until they add support 2.0 support officially: 

brew install influxdb@1

Start influxDB instance on local (background mode), so it listens to 8086 port by default for exchange the data: 

brew services start influxdb@1
or
nohup /usr/local/opt/influxdb@1/bin/influxd &

To run the k6 test and store the test data in local influxdb instance, in following example, it will create "myk6db" database automatically: 

k6 run --out influxdb=http://localhost:8086/myk6db sample_script.js

Install Grafana on Mac OS:

brew install grafana

Start Grafana service:

brew services start grafana

Access to your local grafana page by : http://localhost:3000/ , enter admin for username and password.

Next,  Add influxDB myk6db datasource and Create your own Dashboard to visualize the k6 test results:

(If you would like to add grafana panel plugin to build fancy dashboard, you can try to download the plugin folder and drop into grafana plugin folder: /usr/local/var/lib/grafana/plugins/)

I have defined a basic k6 test grafana dashboard for anyone to import as a quick start, feel free to download it from my github repo
P.S. Highly recommend to run the baseline before you make the changes and not compare with your out-of-date "baseline", things can be changed since it is local env. 

Monitoring Your (Java) application:


To monitoring local Java process is easy nowadays, I recommend you to use JAVA Mission Control (JMC) and Flight Recorder(JFR) which developed by Oracle JAVA team. You can download the latest version of JMC separately from here , and how to start the JMC. The other option you may want to choose is VisualVM, one of my previous fav monitoring tool for JVM.

Configure your Java application correctly for the VM options, just make sure you copy the same JVM options currently in use from production for your own role.

If it is newly developed, you can try to configure by yourself or use following simple template to get started, if the GC overhead is bottleneck, you have to revisit and tuning it. If GC throughput is over 99.5%+(which means GC timing spend less than 0.5% of your whole testing), you normally do not need to bother JVM options. Keep the JVM options minimum, make sure you fully understand the impact before you add it.

G1 GC is my recommendation if you are on JDK8+ in general, however, if you are on JDK11+(ZGC) or JDK12+(Shenandoah), you may do the comparison between newly added GC Collectors and G1 GC. Assume you have at least 16GB RAM on your local machine, and you wish to sizing your java heap space at 4GB: 


-Xms4096m
-Xmx4096m
-Xss256k
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:+DisableExplicitGC
-XX:+UseStringDeduplication
-XX:+ParallelRefProcEnabled
-XX:MaxMetaspaceSize=512m
-Djava.rmi.server.hostname=192.168.0.xxx

 P.S. -Djava.rmi.server.hostname VM options need to be added to your Java application to let JMC or visualVM to connect to this host, otherwise, it may have following Error when trying to connect to jmx server:

...
Caused by: java.rmi.ConnectException: Connection refused to host: <Some_else_IP>; nested exception is:
    java.net.ConnectException: Operation timed out (Connection timed out)
...

Pay attention: If you connect to you VPN, then you might have a separate IP address to connect to, run following  cmd on you local:

% ifconfig | grep "inet "


It will show you the IP address you could use, if you could not decide which one to use, try both until it is connected.

In order to use JMC to monitor or use JFR to profiling and analyze your Java application,  it is out of this wiki's scope, please find out here. For JFR tool, you need add additional VM options to enable it, please make sure do not enable the JFC VM configs in production env since it needs additional commercial license and adding some overhead to your services or using OpenJDK JMC and JFR for free (you need to use OpenJDK 11+).

The Key Java performance metrics you need to pay attention to:

  • JAVA CPU%
  • Machine CPU%
  • Heap Memory Usage/Footprint
  • Non-Heap Memory Usage/Footprint
  • GC throughput, GC timings and GC Frequency
  • Java Threads count/trend
  • JDBC Connection Stats
  • System Level performance metrics(collect separately, but on local testing, it is optional)

P.S. Highly recommend to save the key JMX metrics to influxDB during the local testing, so you can get a historic point of view and compare how things change time to time. So you can use jmxtrans together with jmxtrans-output-influxdb to export important JVM metrics to influxDB, and visualize it in Grafana.

  • Install jmxtrans on Mac OS : 

    brew install jmxtrans
  • Instrument the JVM options to export jmx port: 

    -Dcom.sun.management.jmxremote.port=9426
    -Dcom.sun.management.jmxremote.authenticate=false
    -Dcom.sun.management.jmxremote.ssl=false
  • Instrument the JVM options to define the hostname for connection with jmx server : 

    -Djava.rmi.server.hostname=<Local_IP_Address>
  • Define jmxtrans configuration file, for example, save the file as "~/Tools/k6_Loadtest/jmxtrans_config/jmxconfig.json": 

    {
    "servers":[
    {
    "port":"9426",
    "host":"<Local_IP_Address>",
    "runPeriodSeconds": "10",
    "queries":[
    {
    "obj":"java.lang:type=Memory",
    "attr":[
    "HeapMemoryUsage",
    "NonHeapMemoryUsage"
    ],
    "resultAlias":"jvmMemory",
    "outputWriters":[
    {
    "@class":"com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
    "username":"admin",
    "password":"admin",
    "database":"jmxDB",
    "tags":{
    "application":"demoApp"
    }}]}]}]}
  • Start jmxtrans process: 

    /usr/local/opt/jmxtrans/bin/jmxtrans ~/Tools/k6_Loadtest/jmxtrans_config/jmxconfig.json

    By default, the jmxtrans will collect the jmx metrics defined in JSON config file once per minute, For production monitoring, it is good enough, but for local performance testing, we had better to adjust it to 10 seconds per collection for more granularity. Once it is setup, its time to create grafana dashboard with JMX metrics monitoring together with k6.io test data.  It helps a ton to better understand your tests and the application under load

Create a mock services (Optional but highly recommend):

To have a external services being mocked is quite helpful, it will make your life much easier:

  • save time to find a workable and stable environment;
  • focus on your own code;
  • test result is more predictable and repeatable.

Since you are working on a Local env you can fully manage, so it is your choice to use the external mock services( such as Mockoon) or you just comment out some of the code to make your test work, but i would suggest to try to simulate the remote connection as much as possible, since it will help to simulate the threads, memory usage and network connections against real use cases.

In this section, I will create a dummy mock services using docker/Golang and Caddy HTTP server in order to simulate different Rest API HTTP methods/payload/Response time.


 
The sample code in my github repo for the reference

Preparations:

How to build and run dummy-mock service:

  • clone the github repo into your local
  • cd /path/to/target/dir/with/dockerfile
  • define your own response-GET.json and response-POST.json file
  • docker build . -t dummymock
  • docker image
  • docker run -d --rm -p 9091:80/tcp dummymock
  • /path/to/caddydir/caddy start (note: make sure current dir has predefined Caddyfile, so caddy will auto load the config file)
  • Use postman or curl to try the mock services with your HTTP method + customized duration you expect to simulate from mock service, for example: http://localhost:8020/?duration=200

Note:

  • If you want to support https protocol, you can dig into caddy documentation and config to support https,
  • by default, it does not support too many json response payload, but if you would like to do so, it is easier to extend by adding to the source(main.go) and re-build it

Fine Tuning your OS (Optional):

Make sure your Desktop or Laptop is not the bottleneck during running your performance test, if that is a case, you may consider to fine tuning your OS first , if nothing works, you may consider to adopt dedicate load generators to help you. with the test, however you are not flexible to do a test, it is a trade-off. Do remember, focus on your code first, no one cares if you do not even care.

 

Install xk6, the k6 extension modules(Optional):

  • make sure you have Go installed
  • You can download binaries that are already compiled for your platform
  • Extract on your local directory, go to the directory
  • If you are using MacOS, right click to open with Terminal to grand the permission to run xk6 on your local
  • Select xk6 extensions you want to try, for example, you want to run your k6 test with csv parser functionality from: https://github.com/szkiba/xk6-csv
  • run the cmd to build your k6 with extensions you selected : xk6 build --with github.com/szkiba/xk6-csv
  • it will generate a new k6 in the same folder, and run the test with following cmd:

    ./k6 run test.js

 

Wednesday, February 20, 2019

How to List all the JVM options defaults

$ java -XX:+PrintFlagsFinal -version | grep ParallelGCThreads
    uintx ParallelGCThreads                         = 8                                   {product}

Friday, May 18, 2018

How to check your linux host memory consumption

ps aux  | awk '{print $6/1024 " MB\t\t" $11 " PID\t" $2}'  | sort -n

Thursday, January 04, 2018

Good posts to share on Detecting Connection Leak and How to test connection leak

In case you have the following exceptions:
"org.apache.http.conn.ConnectionPoolTimeoutException: 
Timeout waiting for connection from pool at 
org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:412) at 
org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:298) at 
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:238) at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422) at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at 
...... " 
Here are  some Good posts on Detecting HTTP Connection Leak and How to test connection leak
Detect HTTP Connection leak Post: http://phillbarber.blogspot.com/2014/02/lessons-learned-from-connection-leak-in.html
How to test Connection leak: http://phillbarber.blogspot.co.uk/2015/02/how-to-test-for-connection-leaks.html

Wednesday, November 22, 2017

How to find High CPU% Threads in Jboss through JMX Console

1. Open your JBoss JMX Console page
2. Find the Path : [jboss.system]: [ServerInfo] : [listThreadCpuUtilization]
3. Then you can Dump your threads by listThreadDump() operation to find out what the threads are doing

Wednesday, October 18, 2017

NMT and Java Native memory leak

Java application process used memory usually include JVM Heap, non-heap(PermGen/Metaspace) and Native code which including JVM internals and native OS libs. As we noticed our Physical memory is out of memory after application running a while, however, the Heap usage is fine and normal, using TOP cmd, we found it has eaten almost all the physical memoy and even much bigger than -Xmx heap size we assign to the heap , the first thought come into my mind is maybe Native Memory come into trouble...
But how to make a conclusion to figure point to Native memory?
using -XX:NativeMemoryTracking=summary to help (after JDK7_40?)
after you add above option into your JVM startup config file, first to make sure you have a root permission or switch to root user to run following cmd, for example:
sudo -u {UID} /opt/java/bin/jcmd {PID} VM.native_memory baseline

after the testing running for a while, run another cmd to show your difference comparing with baseline:
sudo -u {UID} /opt/java/bin/jcmd {PID} VM.native_memory summary.diff

PS. if you do not using sudo -u {UID}, you may get exceptions like following:

java.io.IOException: Operation not permitted

or
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
This could give you some high level idea if you have the Native memory leak or not, but which Object brings you trouble ,  Since NMT doesn't track memory allocations by non-JVM code,  you can use jemalloc / pmap to detect memory leaks in native code, few good posts for your reference : http://jenshadlich.blogspot.com/2016/08/find-native-memory-leaks-in-java.html
or http://lysu.github.io/blog/2015/02/02/how-to-deal-with-non-heap-or-native-memory-leak/

Tuesday, March 07, 2017

Questionnaire Template for API Performance Review Process


  • API Owner: Product Owner, Dev Lead and QA Lead
  • Release Target Date
  • Business Impact(GMS or Save Cost), Impacted flows and User types
  • How much traffic expected During Peak hour(TPS/TPM/TPH)
  • API Name/EndPoint & Method/Sample Request & Response?
  • API priority based on its traffic and importance
  • High level Design and Workflow diagram for the API or API dependencies
  • How many Roles get deployed, and their .war names/versions
  • Existing or New API, If an existing API, any Monitoring Dashboard and Baseline captured against PROD?
  • JDBC queries
  • Third parties dependencies and End points
  • Firewall Ruleset/Gateway Synapse changes
  • Project Wiki page Link
  • Dev API testing plan/scripts/results for reference

Monday, November 07, 2016

Make a Performance budgeting chart for measuring page performance

You should have a performance goal before you measure your page performance, i would suggest you had better have a performance budget for each component, then fight against the one who has overdrawn.
Using navigation timing API , User Timing API and Resource Timing API to do the measurement, in both synthetic and RUM way!!

Besides measure Duration of each Component as a Main KPI, The Content Downloaded Size, # of Requests for each Component need to be considered meanwhile. 


















Another point of view to explain the key page performance metrics, please remember every page design differently, you can not set up a rule to fit all, considering from end user perspective is always a great start:



Friday, May 20, 2016

Get Page performance result by Combining User and Resource timing API

 //calculate page loading time by resource timing API  
 var perfEntries = performance.getEntries();  
 var end_probe = perfEntries.filter(function(item) {  
   //page load finish request indicator  
   if (item.name.indexOf('/req_url_end') > -1) {  
     console.log(item.name);  
     return true;  
   }  
 });  
 if(end_probe.length > 0) {  
   var end_time = end_probe[0].responseEnd;  
   var start_probe = perfEntries.filter(function(item) {  
     //page load start indicator  
     if (item.name.indexOf('/req_url_start') > -1) {  
       console.log(item.name);  
       return true;  
     }  
   });  
   var start_time = start_probe[0].startTime;  
   var duration = end_time - start_time;  
   console.log(duration);  
 } else {  
   console.log("page loading end indicator is not found, please double check all perf Entries");  
 }  
There is a better way to combine User Timing API and Resource Timing API to get accurate page performance, i am using Nightmare APIs as a sample to do the automated page tests, which can help our continuous page performance test process on daily basis:

 var url = "http://www.yourhost.com/abc/def/";  
 var page_complete_idy = 'key_request_name'; //page indicator by resource name  
 var tag = 'ReviewPage';  
 var env_name = 'prod'; 

 const RENDER_TIME_MS = 2000;  

var Nightmare = require('nightmare'),
  nightmare = Nightmare({show: true, switches: {
    'ignore-certificate-errors': true
  }});

nightmare
  .goto(url)
  .wait(function(idy){
    var perfEntries = window.performance.getEntries();
    if (perfEntries.length > 20) {
      return perfEntries.some( function (item) {
        if (item.name.includes(idy)) {
          return true;
        }
      });
    } else {
      return false;
    }
  }, page_complete_idy)
  .evaluate(function(idy){
    var perfEntries = window.performance.getEntries();
    var perf_obj = perfEntries.find(function (item) {
      return item.name.includes(idy)
    });
    if (perf_obj) {
      return perf_obj.responseEnd.toFixed(1);
    } else {
      return 'undefined'
    }
  }, page_complete_idy)
  .end()
  .then( function (duration) {
    console.log(tag + ":" + duration);
  })