Thursday, December 29, 2016

VRQ 0.90 release

VRQ 0.90 is released with the following changes

1. Introduce sched_yield_type from MuQSS
2. Remove temp valuables in run queue for current task
3. Refine dither.

As there is no more previous vrq feature code need to be ported, I'd like to announce VRQ 0.90 release just before the new year. All known issues will be traced in 0.90+ releases.

In the 0.90+ release, I am planning to focus on fixing known issues, porting useful code changes from BFS/MuQSS and develop new features in VRQ.

Enjoy 0.90 release of VRQ in 2017, :)

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.9.y-vrq
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.9.y-vrq

All-in-one patch is available too.

BR Alfred 

Friday, December 23, 2016

VRQ 0.89h release

VRQ 0.89h is released with the following changes

1. Fix UP compile issue.
2. Refine take_other_rq_task(), which create a code path for cpu running scaling frequency.
3. Code clean up and remove un-used code.

I'd spend the rest week of this year to double check if there are still helpful feature in previous VRQ not yet been ported to 0.89, if no, there will be 0.9 release at the end of 2016.

Enjoy this X'mas release of VRQ, :)

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.9.y-vrq
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.9.y-vrq

All-in-one patch is available too.

BR Alfred 

Thursday, December 15, 2016

VRQ 0.89g release

VRQ 0.89g is released with the following changes

1. Sync-up with kernel 4.9
2. Fix a bug that scheduler may out of sync with acpi governors when switch from non-acpi drivers.
3. Minor compile warning fix.

As 4.9 is a release with huge changes, two of my machines are still have issues caused by mainline code changes. It may required one or two minor release to settle down.
Meanwhile, I am planning to continue deploy missing changes from previous VRQ to 0.89. And consider that BFS is no longer development, vrq doesn't need to rebase to any BFS code base, I will squash some commits that would help to reduce the over head when porting to new kernel release next time.

Enjoy kernel 4.9 with VRQ scheduler, :)

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.9.y-vrq
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.9.y-vrq

All-in-one patch is available too.

BR Alfred

Tuesday, December 6, 2016

VRQ 0.89f release

VRQ 0.89f release

Normally I won't do two release in a day, but there are always exception. Here is the VRQ 0.89f relase with just one single commit

Rewrite the best_mask_cpu(). Which now use sched_cpu_affinity_chk_masks, to provide better performance improvement and to avoid addtional checking in non smt abilty cpu but has SMT kernel config enabled.

In short, in my sanity test, it shows improvement in all kinds of workload, here comes the sanity result of VRQ 0.89f before the next kernel release.

vrq0.89f

>>>>>50% workload
>>>>>round 1
real    5m27.812s
user    10m13.565s
sys     0m39.533s
>>>>>round 2
real    5m27.771s
user    10m13.407s
sys     0m39.521s
>>>>>round 3
real    5m27.834s
user    10m13.448s
sys     0m39.579s
>>>>>100% workload
>>>>>round 1
real    2m54.660s
user    10m30.269s
sys     0m41.142s
>>>>>round 2
real    2m54.602s
user    10m30.652s
sys     0m41.021s
>>>>>300% workload
>>>>>round 1
real    2m57.899s
user    10m40.231s
sys     0m41.864s
>>>>>round 2
real    2m57.682s
user    10m40.238s
sys     0m41.928s
>>>>>round 3
real    2m57.480s
user    10m40.219s
sys     0m41.282s

Enjoy this final vrq release before next kernel, :)

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test


All-in-one patch is available too.

BR Alfred

Monday, December 5, 2016

VRQ 0.89e release

4.9 kernel will be released soon and vrq 0.89 will be replaced previous version as the official release of VRQ branch. To give enough hand-off period  before 4.9 officially come out, here come VRQ 0.89e release with just two commits

1. Fix hang issue once switch to schedutil governor.
2. Use mainline loadavg.c for load avg calculation.


code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test

All-in-one is available too.

Enjoy this release, :)


There is just one may be two missing features which existed in previous vrq but not yet be in vrq 0.89 release. Once they are done, 0.9 can be official released, hopefully it could happened this year.

 BR Alfred

Thursday, December 1, 2016

VRQ 0.89d release

VRQ 0.89d now release with

1. Fix the cpu c-state issue. It is a long existed bug but covered by other issue.
2. Don't punish run queue time slice for RT/ISO and NORMAL policy task. The hackbench test shows that sharing time slices between parent and child task(enabled in 089c) limited the fork boost in one time slice. So here comes this policy specified modification.
3. Rewrite task_preemptible_rq(), more efficiency than previous version and help with policy fairness.
4. Remove unneeded code and debug code.

cpufreq_trigger investigation is still on going and policy fairness is being watched to see if further improvement is needed.

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test

All-in-one patch is available too.

Enjoy this release, :)

BR Alfred

PS, if you want to see some sanity test result comparing to vrq089a

089d

>>>>>50% workload
>>>>>round 1
real    5m27.954s
user    10m12.988s
sys     0m40.254s
>>>>>round 2
real    5m27.918s
user    10m13.064s
sys     0m40.219s
>>>>>round 3
real    5m28.132s
user    10m13.435s
sys     0m40.086s
>>>>>100% workload
>>>>>round 1
real    2m54.629s
user    10m30.754s
sys     0m41.447s
>>>>>round 2
real    2m54.776s
user    10m30.643s
sys     0m41.513s
>>>>>round 3
real    2m54.765s
user    10m30.421s
sys     0m41.619s
>>>>>300% workload
>>>>>round 1
real    2m58.007s
user    10m40.934s
sys     0m42.030s
>>>>>round 2
real    2m57.813s
user    10m40.255s
sys     0m42.349s
>>>>>round 3
real    2m58.158s
user    10m40.527s
sys     0m42.589s

089a

>>>>>50% workload
>>>>>round 1
real    5m29.051s
user    10m15.233s
sys     0m40.015s
>>>>>round 2
real    5m28.288s
user    10m13.595s
sys     0m40.065s
>>>>>round 3
real    5m28.229s
user    10m13.232s
sys     0m40.328s
>>>>>100% workload
>>>>>round 1
real    2m55.358s
user    10m32.229s
sys     0m41.553s
>>>>>round 2
real    2m55.629s
user    10m32.527s
sys     0m41.358s
>>>>>round 3
real    2m55.252s
user    10m31.858s
sys     0m41.873s
>>>>>300% workload
>>>>>round 1
real    2m59.998s
user    10m47.413s
sys     0m42.727s
>>>>>round 2
real    3m0.404s
user    10m47.422s
sys     0m43.425s
>>>>>round 3
real    2m59.934s
user    10m47.287s
sys     0m43.103s

Saturday, November 26, 2016

VRQ 0.89c released


VRQ 0.89c was released with

1. Introduce sched_rq_queued_masks, which made the run queue looking for higher policy queued task from other run queue firstly.
2. Fix unexpected design intention when creating new tasks and fix a hang issue enabling this new code.

I have decided to make a last-minute callback of planned task policy fairness feature commit, as it introduced a dead-lock scenario, by fixing this dead-lock, it leads to other side effects. It may need different solution for task policy fairness.

The cause of the cpu c-state issue has been found, it's the cpufreq_trigger callback.  It seems it also related to schedutil and intel cpufreq governor issues. It's time to solve it once for all.

Above two are in the to-do list in 0.89d. Although there are less commits in these release, but the sanity test also shows visible improvement in all kinds of workload.

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test

all-in-one patch also available.

Enjoy it! :)

Wednesday, November 16, 2016

VRQ 0.89b released

VRQ 0.89b is released with

1. Fix low workload regression comparing to previous vrq release by not trying to pick tasks from other run queue when cpu is scaling.
2. Follow cpu affinity order when pick tasks from other run queue.
3. Fix long existed wrong run queue scaling value when run on performance governor or exit from dynamic governor.

With all above changes, this release show better sanity performance than any other previous vrq releases. Next release will be focus on task policy fairness.

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test

all-in-one patch also available.

Enjoy it, :)

Thursday, November 10, 2016

VRQ 0.89a released

VRQ 0.89a is released which mainly fix the imbalance cpu usage issue. Now all cpu can run at 100%, and no issue found in daily usage.

Also, there are other code changes/refines in scheduler core, but I afraid it is not so visible.

For this release, sanity kernel compilation tests also show there is noticeable improvement comparing the previous vrq release at 100%~300% workload. At 50% workload, a small regression is found in the 4 cores system, a debug patch already gets it back to the same level of the previous vrq, but I'd like it to be well test on other systems before officially commit it.

I am happy with the sanity test results, it show this new release of vrq already better than previous release, and there are several performance improvement idea not yet implemented.

Have fun with this release and expect the next. :)

BR Alfred

PS: code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test

Wednesday, November 2, 2016

VRQ 0.89 test branch released


VRQ at its beginning is to reduce grq(global run queue) lock access as much as possible. In previous release of VRQ, it was trying to get rid of grq lock access hot spots and create grq lock free code path. With the recently introduced skip list queue data structure, grq could be wiped out completely, and it will be happened in this and the incoming release of VRQ.

The immediately question will be whether it is base on MuQSS by CK? Answer is NO. It actually divided from an early commit of 4.8 -vrq branch. The reasons why not based on MuQSS are
1. Different skip list implementation.
2. There are still many sync-up and feature commits need to be picked up from previous -vrq branch.
3. Different rules to be followed and different routine implementation.
4. Codes are more controlable when work on a familiar code base.

So here comes this new release of VRQ with the below changes

* Totally remove the global rq structure, per cpu run queue has its own skip list to hold the running tasks on this run queue and be accessed by rq->lock(which is existed in previous version of VRQ).

* Update task_access_lock/unlock(...) strategy for grq/rq data structure changes.

* Update set_task_cpu() logic and usage as it has to follow the principle rules, 1. don't use set_task_cpu() for blocked tasks, let ttwu to solve out. 2. Setting task's cpu means to change the cpu/rq which task resided on.

* Update set_cups_allowed_ptr() logic and usage when task is queued or running on wrong cpu.

* Update cpu hot-plug api implementation as tasks now reside on per cpu run queue instead of a global run queue. And makes cpu on/off-line and suspend/resume work more reliable.

* Remove unused code such as sticky task, because by putting prev task to per rq skip list is natural stick/cache.

* more to be listed.

It's not done yet, so this is called release 0.89, the major known issue is the imbalanced cpu loading, on two cpus system, most system workload will be on cpu0. On a quad cores system, it becomes better but some cores are still not running at 100% cpu usage. This is because a very simple version of __schedule() and TTWU() is using in this release, not fantastic feature is deployed yet.

With the above known issue/limitation, it's not suitable to use this release in a production environment. This release is aim to demonstration the foundation routine changes adapting to per cpu run queue and verify no major system broken occurs. Then in next release(0.9), fix for imbalance cpu loading and scheduler performance improvement could be added.

You are encouraged to test this VRQ release, don't looking at performance, it sucks due to the known limitation. But please looking for miss behaviors of system or suspected kernel log comparing to previous VRQ release.

Code is available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test

Enjoy it.

BR Alfred

Sunday, October 9, 2016

VRQ patch v4.8.1-vrq0 released

VRQ patch v4.8.1-vrq0 is released, all-in-one patch is available. linux-4.8.y-vrq branch has been pushed to bitbucket and github.

What's new
1. Sync up with 4.8 mainline scheduler code changes.
2. Introduced skip list as queue data structure. For detail information about the implementation of skip list, please reference to my previous posts(#1, #2 #3) and also the original skip list design idea from CK.
3. Based on the feedback of users, sticky and caching features are turned off for NORMAL policy tasks, which give max interactivity experience for desktop&gaming usage(commit is here). Further tuning may still be needed, but it looks that it's the best choice out of the 4 debug patches, based on the feedback information so far.
4. Workaround fix for a boot-up issue caused by preempt task solution.
5. A v2 fix for smp_processor_id() preempt code usage in smpboot_thread_fn().


PS:

The policy based sticky/caching feature is almost done, unlike the interactivity switcher in original BFS, this feature provide throughput and interactivity at the same time. "At the same time" not means a task can both throughput and interactivity all together, it means that throughput tasks and interactivity tasks can run all together. Just assign the right policy to the tasks.


Enjoy VRQ for 4.8 kernel, :)

BR Alfred

Monday, September 26, 2016

About Skip List

I read one post about skip list today which link to this web(http://ticki.github.io/blog/skip-lists-done-right/), in which some very useful information can be found.

The "O(1) level generation" is just what I used in my new skiplist implement in BFS, though I never read similar before.

BR Alfred

Saturday, September 17, 2016

Skip list + VRQ for 4.7

Based on the new implementation of skip list, I am adding -VRQ commits upon it. All commits will be pushed in linux-4.7.y-sl branch. Unlike previous -vrq release, I am releasing additional checkpoint tags, so users can use these tags to check their issue and narrow code changes which introducing the issue.

Checkpoint tags:
1. 4.7_0472_sl_baseline and all in one patch.
    This is the start point of all the work, all commits are in About skip list in BFS.

2. 4.7_0472_sl_new and all in one patch.
    New implementation of skip list, please check the post at New implementation of skip list for BFS.

3. 4.7_0472_sl_new_sync and all in one patch.
    Include all Sync-up commits from release to release which not yet be picked up by original BFS.

4. 4.7_0472_sl_new_gc and all in one patch.
    This patch include all former -gc commits, most important one is the "
 Full cpumask based and LLC sensitive cpu selection" commit, which help with performance under low workload.

5. 4.7_0472_sl_new_vrq and all in one patch.
    This is not the same as VRQ3 patch, but only include two major feature "VRQ solution" and "preempt task solution". I want tag a checkpoint here before the latest stick/cache code changes as it is not yet finalized.

6. 4.7_0472_sl_new_vrq_full and all in one patch.
    Include all -vrq features upto VRQ3 in 4.7.

...to be continued

BR Alfred

Thursday, September 15, 2016

New implementaion of skip list for BFS

Two commits has been pushed to linux-4.7.y-sl branch.

First one is an embedded rewritten version of CK's original skip list in 0480 bfs. It is similar to the code changes what CK has made in 0497, but my implementation are different in
1. Just keep one data structure, that's skiplist_node.
2. Remove kmalloc/kfree completely.
3. Remove value in skiplist_node and use container_of() to obtain the pointer of structure which skiplist_node is embedded in.

The second commit is a new implementation of skip list. In the new implementation:

A customized search function should be defined using DEFINE_SKIPLIST_INSERT macro and be used for skip list insert operation.

Random Level should be customized implemented and set to node->level then pass to the customized skiplist_insert function.

Levels start at zero and go up to (NUM_SKIPLIST_LEVEL -1).
NUM_SKIPLIST_LEVEL in this implementation is 8 instead of origin 16, considering that there will be 256 entries to enable the top level when using random level p=0.5, and that number is more than enough for a run queue in a scheduler usage. And it also help to reduce the memory usage of the embedded skip list node in task_struct to about 50%.

Based on testing, the first 8 bits in microseconds of niffies are suitable for random level population. find_first_bit() is used to satisfy p = 0.5 between each levels, and there should be platform hardware supported instruction(known as ctz/clz) to speed up this function.
The skiplist level for a task is populated when task is created and doesn't change in task's life time. When task is being inserted into run queue, this skiplist level is set to task's sl_node->level, the skiplist insert function may change it based on current level of the skip lsit.

And there is a lot enhancement in insert/delete function.

The hackbench and kernel compilation sanity both show improvement comparing the baseline version.

You can download the all in one patch of baseline version and this new implementation and have a try.


-- Alfred

Edit: Change typo silently.

Wednesday, September 14, 2016

About skip list in BFS

It's great that CK release huge changes for BFS in 0480 release, the changes not only include cpufreq trigger and cpufreq governor change in recent kernel release but also the most fun part, use skip list to replace the bitmap mask linked queue in global run queue data structure.

May be there are many changes at a time, so BFS 0480 is not settle down yet, most issue are related to cpufreq trigger and cpufreq governor code. I was planed to rework the about 60 -vrq commits upon 0480, but now I'd had to wait till it is calmed down. Meanwhile, I have done some tests for the skiplist design in BFS. That means only three patches in 0480 is used, they are


bfs472-fix_set_task_cpu.patch
skiplists.patch
bfs472-skiplist.patch

For my kernel compilation sanity tests, sl is almost as same as bfs0472, only different is sl is a little bit better than 0472 at 300% workload.

For hackbench(from https://lwn.net/Articles/351058/), 20 groups, 0472 finished at about 4sec, sl finished at 3.6sec. Consider -vrq(0472 based) finished at 3.8sec and cfs at 3.2sec, I can't wait to see how it goes when sl combine with -vrq.

The skiplist design for BFS, IMO, as an initial release, it looks pretty good and very potential. Here are my comments for the skiplist in 0480

#1 In earliest_deadline_task(), RT tasks is now impacted by interactive setting. IMO, it should keep the same behaviors as it is at 0472.

#2 When interactivity is enabled and consider both normal policy tasks and idle policy tasks are in the queue, for example tasks in queue are in this order (N1, N2, D1, D2), in current implement, it's possible that earliest_deadline_task() go through all 4 tasks to pick up one among them. But normal tasks should always has higher priority than idle ones, so just go through the tasks with normal policy(that's the N1, N2 in this example) will be enough.

#3 kmalloc is used for current skip list implementation. It is not a good design for critical code like a scheduler. I have rewriten the skip list implementation to embedded the data structure into the task_struct etc. (PS, CK address this in 0497, meanwhile I have rewritten an embedded implementation too)

#4 Currently, there are same probability that randomLevel() returns 0~15. But "where an element in layer i appears in layer i+1 with some fixed probability p (two commonly used values for p are 1/2 or 1/4)" -- from https://en.wikipedia.org/wiki/Skip_list , so there should be some improvement to adjust the random level.

I have finished #1 and #2 propose enhancement patch and take it as a baseline for my rest SL related embedded/improvement skip list implementation patches. Once the code has been cleaned up, I will push the rest patches to the linux-4.7.y-sl branch in my git repositories.

BR Alfred

Sunday, August 28, 2016

v4.7_0472_vrq3 patch released

v4.7_0472_vrq3 patch is released

The new code changes have been in -test branch since 4.6 release and tested by users since then, bugs has been fixed and improvement has been added. Although currently -test branch is pending on user feedback for further improvement, but IMO, these code changes now on the -test branch are stable enough to be merged into the -vrq branch. So, here it comes the v4.7_0472_vrq3 patch.

This patch is identical with  v4.7_0472_test2 except the version print out in dmesg. Comparing to -vrq2, the major feature in vrq3(-test2) is the introduction of preempt stick task to replace the original stick timeout in previous release, which helps with high workload performance and the result can be proved in the sanity test report.

Code has been pushed to bitbucket and github, all-in-one patch is also available.

Enjoy -vrq3, :)

BR Alfred

Monday, August 22, 2016

v4.7_0472_test2 patch released

v4.7_0472_test2 patch has been released, with the following changes
1. Revert "Immediately select preempt task in deactivate_choose_task().", which is reported introduce lag of mouse pointer moving when idle.
2. Adding quick path in pick_other_cpu_stick_task() when no preemptible rq/cpu or just one preemptible rq/cpu.

Enjoy this new release and feedback will be welcome. :)

BR Alfred

EDIT:
It's nice to hear that -test2 fixed issues on -test1, now, base upon -test2, there are debug patches for testing, you can check them out at previous post "debug patches call for testing"

Friday, August 19, 2016

4.7 debug patches, call for testing

I was working on the debug patches in the previous release and try to work out which direction to be taken, but ending up the one last puzzle is still missing, so uploaded three debug patches upon the -test1 patch, so users can help to complete the whole picture.

The stick and cache mechanism for NORMAL policy tasks are both turned on in -test1 patch, here are three debug patches to find out how it goes when these two mechanism on and off independently.

#1 4.7_vrq_test1_debug_s0c0.patch which turn both stick and cache off, it was the debug1 patch in 4.6 release
#2 4.7_vrq_test1_debug_s0c1.patch which turn stick off and cache on for NORMAL policy tasks, it was the debug2 patch in 4.6 release
#3 4.7_vrq_test1_debug_s1c0.patch which turn stick on and cache off, which is the missing puzzle, :)

It will be appreciate for users to compare the test0, test1 and these three debug patches and focus on NORMAL policy task interactivity then provide the feedback.

Enjoying this puzzle game, :)

BR Alfred

EDIT:
-test2 patch has been released and fix issues reported by user on -test1, so these debug patches should be applied upon -test2 patch for testing and comparing to -test2 patch.

Wednesday, August 17, 2016

v4.7_0472_test1 patch released

v4.7_0472_test1 patch released

v4.7_0472_test1 patch is released. In this release
1. Improve throughput performance by introducing cpu affinity selection in pick_other_cpu_stick_task() . This helps to reduce regression on SMT cpus.
2. Immediately select preempt task in deactivate_choose_task(). This helps with performance, for interactivity, I'd need your comments to find out.

Enjoy it and I am looking at the debug patches in 4.6 release, -test2 patch will be out next week, :)

BR Alfred

Friday, August 12, 2016

v4.7_0472_test0 patch released

v4.7_0472_test0 patch released

There is no new code for this first test patch, not even include the debug patches(for 4.6) in the previous post.

Base on the sanity test result, the new code changes in the test patch improve throughput on non-SMT cpu, in all kinds of workload scenarios, just like the sanity result in 4.6 release(v4.6_0470_test4 patch for testing & 4.6 Sanity test raw data). But for SMT cpus, throughput regression are recorded.

So the next important item for -test branch would be fix the regression on SMT cpus.

Enjoy this first -test patch for 4.7 and wait for the next -test release, :)

BR Alfred

Tuesday, August 9, 2016

v4.7_0472_vrq2 patch released

v4.7_0472_vrq2 patch was released at bitbucket and github.

What's new

1. Rearrange commits for -test branch work
2. Several minor changes upon 0472, which includes
57374d9 bfs/vrq: Do not need to set_task_cpu() in task_preemptable_rq()
7a1262a bfs/vrq: task_cpu_hotplug() update
cbfc46d bfs/vrq: Deploy cpufreq_trigger() in task_preempt_rq()
7829fd9 bfs/vrq: Add WARN_ON_ONCE() when to_wakeup equal prev

 
Currently, -test branch rework is on going, hopefully be available at the weekend for testing.
 
Enjoy it, :)
 
BR Alfred

Friday, August 5, 2016

v4.7_0472_vrq1 patch released

v4.7_0472_vrq1 patch released

with the fix for wrong task cpu affinity after suspend/resume cycle.

Codes are committed at bitbucket and github, all in one patch also available.

Enjoy it, :)

BR Alfred

Wednesday, August 3, 2016

v4.7_0472_vrq0 patch released

Finally, v4.7_0472_vrq0 patch released

Please check it out at bitbucket and github, all-in-one patch is also available.

It's based on BFS 0472 and *no new code changes* from 4.6 vrq branch. As cpu hotplug api changes in 4.7 which impact the availability of several cpumask bitmap during system init/suspend/resume phase. There are some code modification in individual commits to adapter these changes.

The patch is running fine on 3 of my machines and no suspend/resume regression is observed so far. Sanity tests will be on the way.

The next release will be the vrq1 patch which to  merge/rearrange some commits and prepare for the -test branch.

Enjoy this vrq patch for 4.7 and your feedback will be always welcome.

BR Alfred

Sunday, July 31, 2016

BFS 0472 Sync-up patch for 4.7 kernel is available

BFS 0472 Sync-up patch for 4.7 kernel is available after spending the last weekend to put all sync-up patches upon bfs 0472 and fixed the suspend/resume issue on one of my notebook.

Based on quick sanity tests, the cpufreq api deployment issue in 0470 has been fixed. And the performance for low workload is better than previous release.

Enjoy this and wait for the incoming vrq patch. :)


BR Alfred

Thursday, July 28, 2016

BFS 0470 Sync-up patch for 4.7 kernel is available

My porting of BFS 0470 Sync-up patch for 4.7 kernel is available.
It contains all-sync-up works from release to release and mainly the sync-up work for 4.7 kernel. It's working and suspend/resume also seems good considing the cpu hot-plug api updates from the upstream.

Have fun with this "pure" BFS patch with 4.7 and I am working on -vrq branch.

BR Alfred

Tuesday, July 19, 2016

Heads up! Performance regression over several release

Performance regression over several releases from 4.3 to 4.6 is observed,under low workload(50% workload) on smt machine(without SMT_NICE config)

Just re-test the 4.3-vrq kernel, 4.5-vrq and bfs/vrq/test on 4.6, the sanity tests show that major performance regression occurs, saying 20~30 more seconds in a 7mins test. The pure bfs of 4.6 is impacted badly, about 50% more time taken(11mins).

More sanity tests for different versions of kernel during 4.3 to 4.5 are still on the way to find out where to looking at firstly. There could be more than one cause for the regression based on the sanity result current have. I will keep you updated.

BR Alfred

Updates:

After investigation,  there are three factors contribute to the regression.

1. Regression from release to release, there are minor regression recorded from 4.4 to 4.5 and 4.5 to 4.6 using default CFS setting. Unfortunately, nothing could be done from BFS/VRQ scheduler's perspective.
2. Interactivity default on setting which is introduced in BFS 0466(4.5), which contribute about 50% of the regression.
3. cpufreq_trigger() API introduced in 4.6 was not properly deployed in BFS0470. I have a debug load to improve this API deployment,  but it's not  robust enough to go public.

As 4.7 was released this week, I'd like to address this regression issue in 4.7. There are major changes about scheduler code in this release, and it will take longer to port/sync the codes. Good news is it's about 25% finished, the first bfs 4.7 kernel is up and running.

Thursday, July 14, 2016

v4.6_0470_test5 patch released

v4.6_0470_test5 patch is available which fix a bug in the last two test patches(test3 and test4), so it's highly recommended to upgrade to this test5 patch if you are on test branch.

PS, it's not include the debug patch for Eduardo, I'm still waiting for the feedback then decide how to tune the codes.

BR Alfred

Saturday, July 2, 2016

v4.6_0470_test4 patch for testing & 4.6 Sanity test raw data

v4.6_0470_test4 patch is available which has only one big update
- low workload performance regression fix

Highly recommend to update to this version if you are on -test branch.

And the 4.6 sanity tests are done which run on cfs/bfs/vrq/vrq-test, the raw data can be downloaded here, if you are interesting in

BR Alfred

Friday, July 1, 2016

BFS/VRQ on Raspberry Pi 2

When it is at 4.1 release, the Raspberry PI 2 used to have stable issue with VRQ patch, which cause it hang after about 1 day uptime. I have to run it with -gc patch for 7*24 usage.

In this release, it happens that I have to debug the -vrq stable issue and I decided to try -vrq again on rpi2. Till now, my Raspberry PI 2 7*24 box has been up with 4.6 BFS/VRQ patch for almost 12 days. No scheduler related kernel debug warming/error in can be found so far. So it is considered stable.

The -vrq patch of 4.6 can be cleanly apply on rpi kernel tree and no additional patch is needed. Have fun with these wonderful SoCs with BFS/VRQ scheduler.

BR Alfred

Wednesday, June 29, 2016

4.6 VRQ patch v4.6_0470_vrq2 and 4.6 VRQ test patch v4.6_0470_test3 released

4.6 VRQ patch v4.6_0470_vrq2 is released with
- v4.6.3 based
- merge commits
- minor code change to remove an never-reached branch in __schedule()


4.6 VRQ test patch v4.6_0470_test3 is released with
- v4.6.3 based
- merge commits
- code clean up

Same tags are also available on my github repository

In the rest time frame of this release, I'll perform the sanity and latency tests for cfs, bfs, vrq and vrq-test, result will be post when they are done.
BR Alfred

Edit:

VRQ-test all-in-one patch is available at here.

Wednesday, June 22, 2016

v4.6_0470_test1 patch for testing

4.6 vrq test patch updated again with adjustment to address interactivity issue. Hopefully this helps.

Here is the all in one patch. And thanks for testing and the feedback.

BR Alfred

Tuesday, June 21, 2016

v4.6_0470_test0 patch for testing

Here comes the first test patch of vrq test branch for 4.6 kernel. (It's an all-in-one patch, apply clearly upon vanilla kernel tree)

Have fun and feedback will be welcome.

BR Alfred

Tuesday, June 14, 2016

4.6 VRQ patch v4.6_0470_vrq1 released

4.6 VRQ patch v4.6_0470_vrq1 released, please find it on bitbucket and github

Changes:
Based on 4.6.2

Based on bfs 0470
vrq0 compile issue fix

Have fun with 4.6 kernel, -test branch for 4.6 is incoming.

BR Alfred

Tuesday, June 7, 2016

4.5 VRQ test patch v4.5_0469_test0 updated, again

Yes and again, 4.5 VRQ test patch has been updated, by adding one new commit to the -test branch after many tries to improve the interactivity based on current -test code changes.

In my test environment, with this new commit, there is no frame drop playing h264 video while 300% BATCH/NORMAL/IDLE workload in the background. Feel free to have a try and your feedback will be welcome. Next update would be happen in 4.6, :)

PS, I have forced updated the git branch, so please re-fetch them from git.

BR Alfred

Wednesday, May 18, 2016

Pre-release of v4.6_0469_vrq0 patch

Pre-release of v4.6_0469_vrq0 patch is available now.

It is the first BFS/VRQ patch for kernel 4.6, a few things haven't been done yet, including the cpufreq_util adaptation in BFS and some possible enhancement during 4.6 sync-up, which will require more time to be completed.

Again, there is no new changes comparing to the v4.5 vrq branch, just the sync-up code for v4.6.

Time to have fun with 4.6 kernels. :)

BR Alfred

Monday, May 16, 2016

4.5 VRQ test patch v4.5_0469_test0 updated

4.5 VRQ test patch has been updated with two new commits, which

  • only stick non-rt and non-kernel tasks
  • make stick preemt task preemtable

with these two new commits, no frame drop can be seen for h264 playback while 300% IDLE workload running at the background. But for 300% nice 19 NORMAL workload, there are still 4 frame drop in about 5mins test comparing to zero frame drop in pure VRQ branch kernel. So yes, there still some thing can be done to improve the interacting for NORMAL policy tasks. Consider that 4.6 has been come out and I'm going to work on 4.6 sync-up, the improvement for test branch will be continued in 4.6 test branch. Before testing the 4.6 VRQ, you can have a try for this new test patch for 4.5, and I'd like to hear your feedback.

PS, I have forced updated the git branch, so please re-fetch them from git.

BR Alfred

Wednesday, April 27, 2016

4.5 VRQ test patch v4.5_0469_test0 released

This is the first test patch branch for v4.5 kernel, it's totally based on the  v4.5_0469_vrq0 code, and include only one change.

The change is a continuous improvement for the "sticky task". For the background information about "sticky task", please read about this from CK's blog or search "sticky" in CK's blog for further information.

In VRQ, the sticky task already has some modifications, which happened with the policy caching timeout changes, please reference this blog for information. In this change, I'm trying *not* putting sticky task into grq, instead, the sticky task is now set as the preempt task of the rq, which will be selected to be run immediately when next reschedule comes.

Pros:
This change reduce grq locking access overhead for all workload especially the for heavy load, it is recorded 2m32.xxxs under 300% workload compare to the original 2m36.xxxs for the NORMAL policy tasks.

Cons:
Theoretically, in current implement, two task A and B which run on same cpu could be the running task and the sticky task rationally and fail to select other tasks in the running queue.


The test branch is now at bitbucket and github, have fun with this first test branch and your feedback will be welcome.

BR Alfred

Wednesday, April 13, 2016

4.5 VRQ patch v4.5_0469_vrq0 released

Here comes the all in one vrq patch v4.5_0469_vrq0 which now is based on the latest BFS 0469, of course the new sched_interactive design in the latest BFS has been removed and be replaced by the task policy based caching timeout design in VRQ.

Basically, this release doesn't introduce much code changes comparing to the previous release, just put the code based on the latest BFS for better maintenance in the future.

As usual, I'm still waiting the upstream BFQ patch for 4.5 to update the GIT repositories. If you have any question/feedback about the VRQ patch, please let me know.

Have fun with this new release!

BR Alfred

Edit:
I have pushed the latest vrq0 code to my repository on bitbucket and github. Individual commit can be viewed now.

Wednesday, April 6, 2016

4.5 VRQ patch v4.5_0466_vrq0 released

Finally, here is the all in one patch file at download link for vrq patch for kernel 4.5. Git repository will be updated in next two days.

It's still based on bfs 0466, and as previous branch strategy change, it contains no new code changes but the 4.5 sync-up code. And as CK has released 0469 but it is said that there is no code changes upon 0467 expect the sync-up code. I would check and pick up useful changes in 0469 to the incoming vrq1 release.

Have fun with 4.5 kernel.

BR Alfred

Friday, March 4, 2016

Branch strategy update

You may be noticed that there is no -gc branch in my git repositories from kernel 4.4. And yes, it is intended to be. As it's overhead to maintenance two branches and keep code changes stable on them. So here is the update for branch strategy

1. From 4.4 and so on, there will be only -vrq branch, no more -gc branch and vrq branch is considered stable.
2. There will be an -wip branch which will contain new code changes need to be tested before adding into -vrq branch.
3. The first release for a new kernel major version will contain *no new feature* code except the mainline scheduler sync-up changes. Hopefully this help to identify issues on new kernel release.
4. LTS (long term support) branch. For any kind of reasons,  I think LTS branch is needed as they runs for years at lease after the next LTS kernel comes out. I will try best to back-port  new code changes to LTS branch and see how it helps with people who stick on it. The first LTS branch is linux-4.4.y-vrq.

BR Alfred

Monday, February 15, 2016

VRQ v4.4_0466_vrq3 released

In this minor update, just two new commits

88a07b9 bfs/vrq: Do not cache rt tasks
23c7b29 bfs/vrq: Add policy_stick_timeout
7d189bb bfs/vrq: -vrq version bump to v4.4_0466_vrq3 

Just three commits, so I don't provide the all-in-one patch file, you can find the new tag(v4.4_0466_vrq3) in bitbucket and github repository, and the linux-4.4.y-vrq branch also been updated.

For the cached task, now they have a stick timeout which set to 1/16 ms rather than using the same cache timeout value in previous version. Based on my test, which help the system resume delay and also help to overall performance. And because of this new changes, I have set the NORMAL policy task caching timeout to 6ms. Please re-test this default setting again and see how it works in this new version.

BR Alfred

Thursday, January 28, 2016

v4.4.0-vrq2 released

v4.4.0-vrq2 has been released. The all in one patch file can be downloaded here. And both bitbucket and github repository have been updated with linux-4.4.y-vrq branch and v4.4.0-vrq2 tag.

What's new:
Mainly focus on startup/shutdown and suspend/resume issues in previous releases.
Holding back some feature commits like removal of SMT_NICE code.

BR Alfred

Edit:

Story about starup/shudown and suspend/resume issue

There are several issues combine together in previous release.
1. dmesg shows there is about 1 secs delay in kernel log while system booting up.
2. Failed to reboot/shutdown machine.
3. Failed to resume from suspend.

The causes are complicated, the most major one is I have removed some code path to reschedule a cpu/rq after putting a task into the global run queue.  The second one maybe a circle deadlock in mainline, I catch the dmesg twice during my 200+ suspend/resume tests, and reduce the task cached time-out seems to helping with the resume success rate.

In this release, beside adding back the code to pump the scheduler, the NORMAL policy task caching time-out has been changed to 3ms, all rt policy task caching time-out to 0ms(in fact that rt policy tasks never be impacted by caching time-out, unless they are changed to NORMAL policy after caching). Issue 1 and 2 are fixed, issue 3 tested with 10 suspend/resume in console and 10 suspend/resume in X, so the failure rate of suspend/resume should be <5%.

Wednesday, January 13, 2016

First BFS/VRQ patch for kernel v4.4

Here is the all in one vrq patch for the latest linux kernel v4.4.

What's new:
1) Sync up with upsteam schedule code changes.
2) Remove original SMT_NICE code in BFS, something new incoming.
3) Quick path for best_mask_cpu(), which improve performance when workload<100%.
4) Minor refines.

I'd like to wait for other patches(BFQ etc) and do some commit merges before pushing the code to git. Meanwhile, of course, the most important, I'd like to hear your feedback about this patch on v4.4 and see if any adjustment is needed.

Having fun with VRQ in this new kernel release and the 2016.

BR Alfred

Edit:
Thanks pf for testing and reports back. I have update the code change the link to https://bitbucket.org/alfredchen/linux-gc/downloads/v4.4_vrq_1.patch

Heads-up:
Please be notified that current vrq may failed to reschedule in some rare cases, specially when system boot up/reboot/shut-down and suspend/resume. I am looking back what code changes introduce the issue.

Updates:
Looks like there are 2~3 issues in the field I'm hurting. One is about 1sec boot up delay shows in dmesg, and fix is done. Another is suspend/resume issue, I have bisected and found the commit, the issue is not related to bfq v7r10, fixing code is ready and need more time to verify it then see if any other commits cause suspend/resume issue back to the latest commit. The third issue is unable to shutdown, hopefully the fix of second issue also help with this.

Another heads-up:
Remember the "unplugged io" issue in bfs? Since mainline code changes, it also impact the fix code for this issue. So I have removed one condition checking in the fix code because that is never be true in current version. But anyway, please re-check the "unplugged io" issue, as which I can reproduce in my machines to verify it.