Monday, September 22, 2014

VRQ 0.2 release

As 3.17 will be released soon, earlier than it's expected, VRQ development is cut off and tagged for 0.2 release.

There are some bug fixes and others are improvement. Some is not related to VRQ locking, and I will see if it can be back-port to original BFS as baseline improvement in the next release. The detail changes are:

3ef882c bfs: Rework swap_sticky().
-- Yet another activity will be continued in next release.
e8754f9 bfs: rework resched_xxxx_idle(), basic version.
-- I will write another post to describe it in detail, but in brief, it rewrite the resched_xxxx_idle() using cpumask method.
2ae0fb6 bfs: refactory schedule() for rq&grq lock ctx switching.
cea6ce8 bfs: vrq: rq&grq locking ctx switch v3.
-- It's a bad idea to separate a context_switch process into two grq locking sessions, so I turn to this solution which hold rq and grq locking during context_switch.
319cd02 bfs: vrq, refactory wake_up_new_task.
79f5644 bfs: Fix need_other_cpu logic in schedule().
4d511fb bfs: RQ niffy solution.
-- Already described in previous post.
7c519a7 bfs: inlined routines update.

The test result are


50% Ratio:
5m19.531s
5m19.519s
5m19.509s
5m19.508s
5m19.430s
5m19.376s
5m19.363s
5m19.359s
5m19.333s
5m19.299s

150% Ratio:
2m54.394s
2m53.632s
2m51.960s
2m51.929s
2m51.925s
2m51.801s
2m51.790s
2m51.747s
2m51.641s
2m51.592s

100% Ratio:
2m51.001s
2m50.150s
2m49.881s
2m49.860s
2m49.812s
2m49.770s
2m49.764s
2m49.733s
2m49.699s
2m49.660s

100%+50% Ratio:
2m49.987s
2m49.980s
2m49.916s
2m49.865s
2m49.835s
2m49.828s
2m49.802s
2m49.784s
2m49.744s
2m49.733s

Comparing to vrq-02-baseline-test-result, under low or heavy workload, VRQ 0.2 shows a visible better throughput than the baseline. And under the optimize workload, VRQ 0.2 shows a slight better than baseline.

If you want have a try with VRQ 0.2, the code is located at v3.16.2-vrq.

Wednesday, September 17, 2014

VRQ 0.2: RQ niffy solution

One of the changes in VRQ 0.2 is RQ niffy solution, which is a replacement solution of grq niffies by put niffy into each RQ. For the original design of grq niffies, please read CK's post http://ck-hack.blogspot.com/2010/10/of-jiffies-gjiffies-miffies-niffies-and.html

Functions which need grq.niffies are time_slice_expired() and task_prio().

There are update_clocks() called before every time_slice_expired() with grq lock, that means there is no impact if RQ niffy solution is used instead of grq.niffies solution in update_clocks() and time_slice_expired().

In task_prio(), grq.niffies can be replaced by niffy in current RQ, it may not be the latest niffy among all the RQs, but it is acceptable.

By using RQ niffy solution, grq lock for niffy update/read is not required. It is designed to reduce grq lock hot spots.

For the code change, please check https://bitbucket.org/alfredchen/linux-gc/commits/f6ec6f5303cb88e7462f4321b7a29d6c8ab83e89?at=linux-3.16.y-vrq

Saturday, September 13, 2014

VRQ 0.2 Baseline Test Result

After sync up kernel mainline stable release, the baseline for this VRQ cycle is frozen.  In the following 3 or 4 weeks till 3.16.5 or 3.16.6 release, feature code of VRQ 0.2 will be committed.

In the weekend, I run the testing for baseline and the current VRQ. The result seems good. Below are the details.

50% ratio:

3.16_0456_50#
5m34.068s
5m34.061s
5m34.021s
5m33.930s
5m33.927s
5m33.923s
5m33.860s
5m33.855s
5m33.767s
5m33.754s

3.16_Baseline_50#
5m22.297s
5m22.272s
5m22.173s
5m22.159s
5m22.085s
5m22.062s
5m22.024s
5m21.983s
5m21.967s
5m21.884s

3.16_VRQ_50#
5m22.313s
5m22.089s
5m22.071s
5m22.037s
5m22.026s
5m21.964s
5m21.949s
5m21.918s
5m21.900s
5m21.782s

Result shows that commit https://bitbucket.org/alfredchen/linux-gc/commits/ad9dd03db1002717f155c859ee613641620d3ba0?at=linux-3.16.y-gc
really boost system performance, about 3%. VRQ is as good as Baseline for this testing.

150% ratio:

3.16_0456_150 #
2m56.433s
2m56.412s
2m56.371s
2m56.354s
2m56.348s
2m56.342s
2m56.340s
2m56.327s
2m56.279s
2m56.271s

3.16_Baseline_150 #
2m57.551s
2m57.516s
2m56.365s
2m56.354s
2m56.335s
2m56.295s
2m56.290s
2m56.271s
2m56.258s
2m56.187s

3.16_VRQ_150 #
2m53.562s
2m53.048s
2m51.942s
2m51.855s
2m51.803s
2m51.786s
2m51.777s
2m51.771s
2m51.694s
2m51.585s

Baseline is as good as original BFS, VRQ shows a performance boost, about 2%.

100% ratio:

3.16_0456_100 #
2m51.594s
2m50.640s
2m50.598s
2m50.592s
2m50.574s
2m50.535s
2m50.511s
2m50.509s
2m50.477s
2m50.434s

3.16_Baseline_100 #
2m50.702s
2m50.633s
2m50.614s
2m50.598s
2m50.579s
2m50.555s
2m50.501s
2m50.498s
2m50.445s
2m50.311s

3.16_VRQ_100 #
2m49.929s
2m49.860s
2m49.853s
2m49.836s
2m49.827s
2m49.800s
2m49.800s
2m49.788s
2m49.770s
2m49.721s

Baseline is as good as original BFS, VRQ shows a little better than Baseline.


100%+50%IdlePrio Ratio:

3.16_0456_100_50 #
2m50.928s
2m50.893s
2m50.828s
2m50.796s
2m50.794s
2m50.784s
2m50.771s
2m50.746s
2m50.675s
2m50.666s

3.16_Baseline_100_50 #
2m50.863s
2m50.861s
2m50.854s
2m50.844s
2m50.804s
2m50.736s
2m50.713s
2m50.707s
2m50.640s
2m50.536s

3.16_VRQ_100_50 #
2m50.114s
2m49.965s
2m49.926s
2m49.910s
2m49.878s
2m49.863s
2m49.817s
2m49.802s
2m49.796s
2m49.705s

The result is almost same as 100% ratio test, it is worthy to look close look into.

Friday, September 12, 2014

Branches sync-up with 3.16.2

-bfs 

This branch is for bfs related development, which consider stable and apply upon original bfs code.

Changes:

-- Find a regression commit and revert it at this time.
-- Sync up with mainline 3.16.2

linux-3.16.y-bfs

-vrq 

This branch is for bfs vrq solution development, it should be considered experimental and just use for testing.

Changes:
-- Sync up with mainline 3.16.2
-- Rebased latest -bfs branch.
-- Add some bug fix to vrq code.

linux-3.16.y-vrq

-gc

Changes:

-- sync up with mainline 3.16.2
-- merge -bfs branch instead of tracing all bfs related commits.

linux-3.16.y-gc