Alfred Chen's Blog: First BFS/VRQ patch for kernel v4.4

Wednesday, January 13, 2016

First BFS/VRQ patch for kernel v4.4

Here is the all in one vrq patch for the latest linux kernel v4.4.

What's new:
1) Sync up with upsteam schedule code changes.
2) Remove original SMT_NICE code in BFS, something new incoming.
3) Quick path for best_mask_cpu(), which improve performance when workload<100%.
4) Minor refines.

I'd like to wait for other patches(BFQ etc) and do some commit merges before pushing the code to git. Meanwhile, of course, the most important, I'd like to hear your feedback about this patch on v4.4 and see if any adjustment is needed.

Having fun with VRQ in this new kernel release and the 2016.

BR Alfred

Edit:
Thanks pf for testing and reports back. I have update the code change the link to https://bitbucket.org/alfredchen/linux-gc/downloads/v4.4_vrq_1.patch

Heads-up:
Please be notified that current vrq may failed to reschedule in some rare cases, specially when system boot up/reboot/shut-down and suspend/resume. I am looking back what code changes introduce the issue.

Updates:
Looks like there are 2~3 issues in the field I'm hurting. One is about 1sec boot up delay shows in dmesg, and fix is done. Another is suspend/resume issue, I have bisected and found the commit, the issue is not related to bfq v7r10, fixing code is ready and need more time to verify it then see if any other commits cause suspend/resume issue back to the latest commit. The third issue is unable to shutdown, hopefully the fix of second issue also help with this.

Another heads-up:
Remember the "unplugged io" issue in bfs? Since mainline code changes, it also impact the fix code for this issue. So I have removed one condition checking in the fix code because that is never be true in current version. But anyway, please re-check the "unplugged io" issue, as which I can reproduce in my machines to verify it.

35 comments:

Oleksandr NatalenkoJanuary 13, 2016 at 11:45 PM
BFQ prerelease (r10) is already available for 4.4.

https://github.com/linusw/linux-bfq/tree/bfq-v7r10

No sync with original BFSv467?
ReplyDelete
Replies
AnonymousJanuary 14, 2016 at 1:55 PM
@Alfred:
I'm continuing testing the NORMAL_POLICY_CACHED_WAITTIME, now with this kernel, first at (4) to compare my last 4.3.3 and 4.4.0, and now 4.4.0 at (3). For now, I'm undecided, whether it gets better/worse in both directions interactivity/performance.

I've now also re-tested the issue of my last posting from the previous blog-thread. It's really difficult to reproduce. But I see that there is a problem with setting nice values and SCHED_* policies (schedtool). In a certain row of commands they show a difference in gkrellm display as niced processes, although they aren't (no matter, SCHED_IDLEPRIO or SCHED_NORMAL). Even as SCHED_NORMAL and schedtool -n -19, there is no performance/interactivity impact vs. -n 19. I consider this as a design error.

What I now want you to do, is to thoroughly check your code for these paths. In my actual experience the nice & policy paths are not 100% functional.

BR Manuel Krause
ReplyDelete
Replies
AnonymousJanuary 15, 2016 at 10:03 PM
I'm experiencing a larger input latency with this version when playing games in wine. Playing games in wine involves a lot of context switches per frame about 400 every 4ms. I have to set 'NORMAL_POLICY_CACHED_WAITTIME (0)' to have acceptable latency in this version, the previous versions worked fine with the default value for NORMAL_POLICY_CACHED_WAITTIME.
ReplyDelete
Replies
AnonymousJanuary 16, 2016 at 4:38 AM
your current linux-4.3.y-vrq branch
ReplyDelete
Replies
AnonymousJanuary 18, 2016 at 12:32 PM
@Alfred:
Regarding NORMAL_POLICY_CACHED_WAITTIME and your sentence in the previous blog thread "In previously, the cache time out is about 1/16 ms or some like that when using SCOST." -- How should we understand it? Setting NORMAL_POLICY_CACHED_WAITTIME to 1 means one ms now? How is it comparable to the 1/16 ms in the SCOST approach? Shouldn't there be finer granularity for WAITTIME below 0 vs. 1/16 ms?

I've tested the NORMAL_POLICY_CACHED_WAITTIME with 8, 6, 5, 4, 3, 1, 0. Only 2 is missing in the suggested row now. What I've seen, is, that lowering the value is better for interactivity with the desktop + video playback and doesn't harm everydays' throughput. Setting 8 really hurts interactivity! I don't know what 1000 ms may lead to. When setting it more and more in direction to (0), I've watched, that processes are more easily switching to my second of two cores, tending to equalize cpu0 vs. cpu1. So, this setting also affects the cpu affinity for processes.

Best regards,
Manuel Krause
ReplyDelete
Replies
AnonymousJanuary 20, 2016 at 9:50 AM
Regarding your "Heads-up" edit of the top message, mentioning rare reschedule failures, please, also be aware, that there are more possible issues with 4.4.0:

Especially users of the BFQ I/O scheduler v7r10 should have a look at the latest posts on https://groups.google.com/forum/?fromgroups=#!forum/bfq-iosched
and at the bottom of this here: https://groups.google.com/forum/?fromgroups=#!topic/bfq-iosched/9N1QL9E-KH4

Best regards,
Manuel Krause
ReplyDelete
Replies
kernelOfTruthJanuary 20, 2016 at 10:53 AM
Hi Alfred,

thanks for your continued work on BFS & VRQ !

There appears to be a scheduler issue with SCHED_ISO tasks:

My current "best" test for this is GRID Autosport's built-in benchmark mode

When running it with default options,

the game experience is buttery-smooth (well, as smooth as it can be with mid/high settings on a GTX 760 @1920x1080 - min 53 fps, max 76 fps, average around 63 fps),

this wasn't the case with 4.3 kernel - so you significantly improved BFS-VRQ from 4.3 to 4.4 !

With schedtool -I -e

max is about the same 73 fps, min - however 39 fps and average around 56 fps.

Now you could say that 39 fps isn't bad, granted - however there's several total "stuttering" occurrences where the whole screen content stands still and then continues,

this occurs at least 2-3 times during the benchmark, totally out of the blue and randomly - and, as you can imagine, this is pretty deadly during long-term races (especially "Endurance").

I've observed similar behavior when running mpv via schedtool -I -e with vapoursynth scripts and 60fps movies on the 4.3 kernel

caveat: (especially) chromium or other programs accessing the gpu, need to be closed otherwise the stuttering become even more frequent (3+ times on the benchmark and for longer time - sometimes even several seconds at once)

So ... is this more of a contention issue (throughput) or a (re)scheduling issue ?

Anyway - hope this gets you on the right track to track this down and abolish it,

BFS is really needed since even with a tweaked CFS scheduler, the experience is BFS is simply superior (high i/o, high load, general work with desktop, etc. etc.)

Thanks !
ReplyDelete
Replies