Wednesday, November 2, 2016

VRQ 0.89 test branch released


VRQ at its beginning is to reduce grq(global run queue) lock access as much as possible. In previous release of VRQ, it was trying to get rid of grq lock access hot spots and create grq lock free code path. With the recently introduced skip list queue data structure, grq could be wiped out completely, and it will be happened in this and the incoming release of VRQ.

The immediately question will be whether it is base on MuQSS by CK? Answer is NO. It actually divided from an early commit of 4.8 -vrq branch. The reasons why not based on MuQSS are
1. Different skip list implementation.
2. There are still many sync-up and feature commits need to be picked up from previous -vrq branch.
3. Different rules to be followed and different routine implementation.
4. Codes are more controlable when work on a familiar code base.

So here comes this new release of VRQ with the below changes

* Totally remove the global rq structure, per cpu run queue has its own skip list to hold the running tasks on this run queue and be accessed by rq->lock(which is existed in previous version of VRQ).

* Update task_access_lock/unlock(...) strategy for grq/rq data structure changes.

* Update set_task_cpu() logic and usage as it has to follow the principle rules, 1. don't use set_task_cpu() for blocked tasks, let ttwu to solve out. 2. Setting task's cpu means to change the cpu/rq which task resided on.

* Update set_cups_allowed_ptr() logic and usage when task is queued or running on wrong cpu.

* Update cpu hot-plug api implementation as tasks now reside on per cpu run queue instead of a global run queue. And makes cpu on/off-line and suspend/resume work more reliable.

* Remove unused code such as sticky task, because by putting prev task to per rq skip list is natural stick/cache.

* more to be listed.

It's not done yet, so this is called release 0.89, the major known issue is the imbalanced cpu loading, on two cpus system, most system workload will be on cpu0. On a quad cores system, it becomes better but some cores are still not running at 100% cpu usage. This is because a very simple version of __schedule() and TTWU() is using in this release, not fantastic feature is deployed yet.

With the above known issue/limitation, it's not suitable to use this release in a production environment. This release is aim to demonstration the foundation routine changes adapting to per cpu run queue and verify no major system broken occurs. Then in next release(0.9), fix for imbalance cpu loading and scheduler performance improvement could be added.

You are encouraged to test this VRQ release, don't looking at performance, it sucks due to the known limitation. But please looking for miss behaviors of system or suspected kernel log comparing to previous VRQ release.

Code is available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test

Enjoy it.

BR Alfred

3 comments:

  1. Replies
    1. Oh, too many abbreviation in my head, fixed. Thanks.

      Delete
  2. @Alfred,

    I'll hold out testing until 0.9 version comes, except if this test is important for You, then I can give it a shot some time.

    But as I understood You're going multi-queue path as well. Will this path include features which makes D3 crawl?

    Looking forward to stable 0.9.

    Br, Eduardo

    ReplyDelete