I'd like to wait for v4.3.1 before official release new gc and vrq code, but it looks like a test patch would be welcome before that.
Here it comes, please download the vrq_v4.3_0465_2.patch from bitbucket download page and it contains the bfs 0465 rebase and part3 of task caching commit in it. Feel free to git it a try and report back.
PS, I got feedback from a user who report vrq with wine gaming has better experience like mouse movement etc. It turns out the initial idea of vrq about reducing grq lock sessions helps.
BR Alfred
... yes, very welcome indeed! :-)))
ReplyDeleteThank you, Alfred! And I really like your quick reactions upon users' wishes.
It's up and running very well so far and now in my "reliability testing" ;-) No anomalies, except for... maybe... it's even snappier than before. Great work!
BR Manuel
Thx, Alfred, runs fine
ReplyDeleteI know... not enough uptime so far, but the first testing hours show, that something has greatly improved with this VRQ patch. :-)))
ReplyDeleteAtm I want to especially point out, that the reliability of hibernations with TuxOnIce has increased to 100%. I mean with that, in practice, that each resume from hibernation (1.) succeeded and (2.) succeeded at the first attempt. I never had such a row of 5 (up to now) working hibernations since kernel ~4.2.3+vrq (if at all). I've kept my original .config for this one, changed the memory/shm/swap load for each hibernation, and need to add the remark, that my testings of plain BFS 0465+BFQ haven't shown that high reliability.
You can't estimate how thankful I am, Alfred!
This patch minimizes the risk of headaches, hair loss ;-) and, of course, wasted lifetime.
BR Manuel
As I already assumed, the report above was written too early.
DeleteAlso this revision of VRQ can fail on TuxOnIce's resumes from hibernation -->on my system<--. And yes, the failures needn't be caused by BFS/VRQ scheduler. What I can say from my humble tally sheet statistics is, only, that this VRQ version highly reduces resume failures vs. previous VRQs and vs. plain BFS for me.
Since 2 days I've planned to test more internal settings for the i915 module and xorg.conf parameters and some of the TuxOnIce sysconfig options -- to reduce the rest risk of resume failures.
What I don't understand, @ Alfred: How can the success rate increase with your new VRQ patch vs. the old -- if it's NOT scheduler related?
BR Manuel Krause
The delta between 0463 and 0465 may contribute to the increment of the rate as you report that bfs0465 works better than the -gc branch. And the vrq code seems works better than gc, right? So it also contributed.
DeleteI don't know the real cause of this trouble, but I will try to make the code works in the correct way. And wish that would help to solve the issues in a common way.
Your summary is exact and thoughtful.
DeleteThis issue itself drives me mad. (Not your fault.) Yesterday afternoon I meant to have found some settings that rised the rate to 10/10 (ten of ten) following successful resumes without any retries(!) -- each time only the content of /dev/shm and browser tabs and videos playing has changed -- up to and including this morning's resume. So, I thought to have configured my system in the right way.
Later on, at noon, hibernating again, the system decided it to be the "odd" day.... Rows and rows of needed resume retries and very very rare successes. No settings changed.
Of course, you can only cover your code. And I know that you're (and have always been) doing your very best to make it work correctly.
Maybe there's another function in all the related resume code including TuxOnIce & i915 that wants correct return values...
Another, maybe crude, idea: is there a way to speed up CPU1 (on here my second core) to get online within the resume process? Either by parameters or in the code? What leads me to this idea: My sticking resumes always fail at a stage where the GFX should be (re)set and/or the second of two cores (CPU1) should come up, in order to make the planned read-in of caches begin (by TuxOnIce), where both can fail.
Only a suggestion for more thoughts from your side.
My very kind regards and thanks for your work.
Manuel
Tonight, I've even tried the same setup + test scenario but with a ->CFS<- compiled kernel (apart from that same .config). It also fails at the same resume stage and I stopped testing it in the first round after 25 failing resume attempts (no successes). Just to be sure.
DeleteBR Manuel
If it fails with CFS, then most likely it is not a scheduler related issue, IMO.
DeleteYes, thank you, Alfred. This is also my conclusion for the moment. To prove it was the reason that I've re-checked with CFS. Most probably I won't bother you with that issue again ;-)
DeleteUnfortunately I haven't found many bugreports that pinpoint this issue to the i915 code. I've read one or two in the bugzillas that show the same failure behaviour (without TuxOnIce) that would slightly indicate a timing issue in i915 resume code (not scheduler related, not TuxOnIce caused => gfx restore related). IMO.
Best regards, and thanks for your assistance,
Manuel
Sadly this one is crashing for me If I get the system under load and play a video seems sound related video without sound is working with sound the system crashes.
ReplyDeleteIt's a usb sound can't and I will setup the netconsole to provide a stack trace.
so here we go crash log:
ReplyDeletehttp://pastebin.com/XUW5iuBY
.config:
http://pastebin.com/k87NgRCk
the older vrq0 patch works without that problem
I have done a quick check about your config file and find that CONFIG_SMT_NICE is enable, which I have never tried myself and do not suggest it for gc and vrq. And unfortunately there is a bug in current vrq related to CONFIG_SMT_NICE, which may cause unexpected result if it is enabled. So, simply disable CONFIG_SMT_NICE and see if this fix the issue.
DeleteBR Alfred
From the crash log and assemble code of bfs.c, I am petty sure it's caused by the "return;" in task_preemptable_rq() when CONFIG_SMT_NICE is enabled. The fixed code should be look like below, you can have a try, but I'm not grantee SMT_NICE works well.
Delete1612 task_preemptable_rq(struct task_struct *p, int only_preempt_idle)
1 {
...
54 #ifdef CONFIG_SMT_NICE
55 >-------if (!smt_should_schedule(p, target_cpu))
56 >------->-------return NULL;
57 #endif
PS, sorry that I have merged this fix into a previous commit so can't provide you a simple patch file.
BR Alfred
The related code, for what you've shown a fix is also in the "old" vrq0 patch. @Anonymous doesn't have problems wit the "old" patch.
DeleteI hope that I didn't miss some #ifdef ... #endif lines.
BR Manuel Krause
Use "return" instead of return a value in a function require a return value, which indeed the caller will has a value but it's unpredictable. It may happen to be an zero, and maybe it always be an zero, but once it's not, it's a mass.
DeletePS, I'm not intend to return a non-value, I changed the return type of this function but since I never tried CONFIG_SMP_NICE, so no compile warning to catch my attention that I miss this one.
just for information this fixed the crash
DeleteThat's good. Thanks for testing, and, SMT_NICE works as expected on vrq?
DeleteMmm. @Alfred: Sometimes your English is unreadable:
DeleteDid you mean:
Using simply "return" instead of "return VALUE" in a function that gives back a VALUE confuses the caller, regarding the VALUE.
?
Anyway,: Has it been only a coincidence, that @ Anonymous didn't face this issue with the =same= earlier vrq0 code?
BR Manuel
Yes. He must have been very lucky that the caller always get an zero in earlier vrq kernel build.
DeleteCONFIG_SMT_NICE was not set in the earlier vrq patch it wasn't available.
Delete@Anonymous:
DeleteThank you, for coming back to clarify this!
BR Manuel