There are three major feature in this VRQ branch
1. VRQ lock strategy update, replace grq lock strategy with task_access lock strategy
That is
* lock on rq->lock when task is on cpu
* lock on grq.lock when task is in queue
* otherwise lock on task's pi_lock
It's the most huge changes which impact almost the whole scheduler code. Based on this, there are some grq lock sessions improvements are made for activate and idle task schedule.
2. preempt task solution
This is an enhancement for try_to_wake_up(), instread of putting the wake up task in grq and reschedule a cpu/rq to pick it up, the woken task now becomes the preempt task in the rq and be picked immeditly in next schedule run, this save the effect to put/get the task from grq and avoid other cpus/rqs to access grq.
3. cache_count solution
Introduce cache_count for task, which indicate cache hot when task waiting in queue. This replaces sticky task solution in BFS.
Current setting 14 for activate tasks and 4 for deactivate tasks are both tested values. In future version, algorithm will based on more meaningful factors.
REMARKABLE NOTICE:
1. SMT_NICE code is kept but is not tested, don't enable it for VRQ yet.
2. yield_to() locking is unchanged and not tested, so kvm may not work.
3. UP is not tested, and VRQ is not designed for UP, don't try it on VRQ.
4. Based on user reports, VRQ may not work with some kernel config, but it's unknow which config is causing the issue. Further testing still needed.
5. Try VRQ if you want to help testing, if it runs good, keep using it, if not, fall back to the -gc branch.
Enjoy and have fun.
BR Alfred
Update:
Found an issue by investigating Manuel's config, a quick workaround is set NR_CPUS to exact core number of your system. A fix will be in for 0.5 release.
I've also tested this one. First boot hung at accessing the root disk/partition. No message.
ReplyDeleteSecond boot hung within starting the system. BUG message available but not in the logs. No camera.
What of the BUG information is needed for your debugging? Means, what should I write it down on paper and type back on here?
Best regards,
Manuel
BTW, I've had sent my kernel .config to you. Is there any obstacle in it, that you point out "4. Based on user reports, VRQ may not work with some kernel config, but it's unknow which config is causing the issue. Further testing still needed."
DeletePlease, let me know, what setting is maybe "wrong".
Manuel
@Manuel Thanks for testing. I have sent you email and update this post, pls try with NR_CPUS = 2 and see how it goes. I'm working on the fix.
DeleteUnfortunately, I can't tell you whether this setting is a useful workaround, as the machine doesn't get through the bootup. No matter what runlevel I choose. And the most sad fact about it, I don't get any failure message. The system just stops working and it's not reproducible at which point it stops. :-(
Delete@Manuel
DeleteHere is the plan:
1. The last try without acpid, alsasound, intel drm services and with NR_CPUS =2, system boot-up/shutdown gracefully with a clean dmesg log, I'll run stability test and see what happened.
2. I'll try to add back the removed system services and test it again.
When 1. or 2. is finished, I'll send back the kernel config file to you, you can give it a try at your side.
BR Alfred
O.k. I really like testing the VRQ kernels, but I hope I don't need to exclude system services from the bootup on here. I had thought that booting into single user admin mode would/should be sufficient to reduce whatever things VRQ doesn't like.
DeleteOf course I'd try with your kernel .config.
BR Manuel
@Manuel
DeleteI have sent you the updated kernel config files, both works for my notebook which should similar to your hw. You can have a try.
Many thanks.
@Alfred
DeleteI'm a bit late due to needed longterm stability testing of old and new tuxonice versions with kernels (4.0.1 & 4.0.2).
I've gotten your email with the two kernel configs. Needed some time to elaborate what differences made sense on here. I have no clue, what you've changed for a particular reason. Are you able to explain, what does make sense for you, and why?
At first I won't boot into a blind kernel without gfx. It it also quite senseless, if you aim to normal users with the VRQ.
Second, the adapted config from your "ok_graphic" version was as unstable as the very first trial. Not booting through (not even into S or 1).
Third, if it's only a timing issue, introduced by the 16 VRQ patches, wouldn't it be easier for both of us, that you'd point me to one or two "hot" patches that I should omit, to get your VRQ stable on here?
Best regards,
Manuel
BTW, the pure -gc patches with 4.0.2 & BFQ & tuxonice & Con's latest two fixes work well ! BR Manuel
DeleteAnd in the meantime I've incororated some of your .config's settings into mine, to see, if they improve stability on here. Thank you very much anyways.
DeleteThe most value changes upon your kernel .config to the first kernel config I sent you are
Delete1. NR_CPU =2, to match your system cores, which have been identified as a bug of -vrq 0.4, this change should workaround it.
2. Disable all graphic config, the most difficult to adapt your .config to my hw is the graphic part, I disable all of them to boot into a console system.
Based on my test, my system runs stable with this confirm.
The second config I send to you is based on the first one, but I play around with the graphic configs, finally I build all of them in kernel and graphic works fine for my hw.
All these are test on my hw, if the first one work fine for your hw but the second doesn't, I think it can be narrow down the issue to graphic related.
Yes, I indeed came back to some VRQ testing. ;-)
DeleteI now applied the 4 additional patches you've published since my last tests and went with my NR_CPUS=4 (means exactly, I compiled with my cleared out .config but including gfx).
For now I can boot into single user mode reliably. But when attempting runlevel 5, it hangs even before X would get started (IIRC in the network loading stage).
BR, Manuel Krause
It's a good news that single user mode works. Next steps I would suggest to isolate the issue by disabling suspected service(eg, networking, graphic).
DeleteDisable the services or disable in kernel config to see what happened.
Just to be honest, I don't see what this is leading to. Seeing essential system components failing with VRQ should be enough information for you to rework the patches. Also, having Linux without gfx and without a running desktop and without network, is just a NOGO.
DeleteBR, Manuel
When debugging an issue, it is an useful way to isolate the issue to as possible as you can, especially for your case that system just hangs without any useful crash log to look into. It is certainly *NOT* to say if there is a issue with those components, then disable them for daily usage. We need to know which/when/why causing the issue in this phase before coming out any solution.
DeleteThe -vrq branch is much closer to the edge than -gc branch does, but based on my testing with your provided kernel config file, I believe that it may be graphic config related issue. It may similar to the resume problem with bfs&toi you have investigated.
At the meantime, it is quit understandable that doing such debug for user on their main machine is not very convince.
DeleteBR Alfred