Alfred Chen's Blog: VRQ 0.96 release

Wednesday, June 7, 2017

VRQ 0.96 release

Abandon this release due to lock-up issues reported by users, the lock-up is caused by "Lock strategy update" commit, which works well on my working machines and continue during my work on SMT sensitive scheduling, that make me believe it was good and stable.

New "Lock strategy update" debug patch will be posted here for testing. Once it is confirmed work well for other users, the repined 096a will be released.

VRQ 0.96 is released with the following changes

1. Sync up cpufreq util usage.
2. Lock strategy update, which hopefully fix potential lock issue when task migrating.
3. SMT sensitive scheduling v0.1

Main feature in this release is the first version of SMT sensitive scheduling, which reduce 10s kernel compile benchmark on my test machine(original 7m17s) under 50% workload.
Or, you can easy to observe cpu usage changes when any physical cores available, scheduler will not put task to smt core. For example, if two tasks are running on a 2 cores 4 threads cpu, one will be on cpu 0 or 1, another will be on cpu 2 or 3.

Further improvement for SMT sensitive will be in next release. I'd see if any improvement/simplify can be made to current design.

Enjoy VRQ 0.96 for v4.11 kernel, and unlock your SMT cpu ability with VRQ, :)

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.11.y-vrq
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.1.y-vrq

All-in-one patch is available too.

BR Alfred

76 comments:

AnonymousJune 7, 2017 at 5:13 AM
Hi Alfred,
I'm testing it on Ryzen and i7. I probably broke something on the i7, because it doesn't boot and halts at "triggering udev events". The ryzen build did fully boot up and I was able to start the test compile job and see the SMT magic in effect by watching core utilization graphs (good job btw), but it gradually crashed and these are the weird logs I was able to catch:

make[3]: Warning: File 'include/config/auto.conf' has modification time 7023 s in the future
/bin/sh: fork: retry: Zdroj je dočasne neprístupný (translation: source temporarily unavailable)
make[3]: fork: Zdroj je dočasne neprístupný
make[3]: warning: Clock skew detected. Your build may be incomplete.

After that build errors and waiting for jobs to finish from parallel builds and later only fork errors.
The ssh session in second terminal crashed too.
The machine didn't die completely, ping works but ssh doesn't respond anymore.
I was using gcc 7.0 on previous builds and that was updated to 7.1 quite recently and I'm not sure if I had a successful vanilla build on that. So that might be the culprit. Maybe the same as on the i7 machine. I will try to compile vanilla kernel and report if it woks (after I get home and reboot the machine).

Best regards,
Dzon
ReplyDelete
Replies
AnonymousJune 7, 2017 at 10:34 AM
@Alfred:
Thank you for the update! It's in use together with fresh 4.11.4 kernel, now. Except from the kernel changes, that aren't relevant for my HW, only the "vrq: Lock strategy update" would make a difference on here, as I'm without SMT (HW not capable and not configured).
Regarding Dzon's message above: I have no problems with usual compilation (dualcore and make -j2).

If you have a little time, can you please explain in short your expected effects of the lock strategy update?

BR, Manuel Krause
ReplyDelete
Replies
AnonymousJune 8, 2017 at 7:44 AM
Hi Alfred, and thank you for this release of VRQ.

I had a freeze the two times I ran my usual 'make -j4 ffmpeg' benchmark. I didn't tried a third time. The build freeze, but I'm still able to switch to another tty.
Here is the error log :
https://pastebin.com/i41j3zr6

'make -j1', -j2, -j8 and above are fine though.

If I find time, I'll try with VRQ 0.95b.

Pedro
ReplyDelete
Replies
Alfred ChenJune 8, 2017 at 7:29 PM
Abandon this release due to lock-up issues reported by users, the lock-up is caused by "Lock strategy update" commit, which works well on my working machines and continue during my work on SMT sensitive scheduling, that make me believe it was good and stable.

New "Lock strategy update" debug patch will be posted here for testing. Once it is confirmed work well for other users, the repined 096a will be released.
ReplyDelete
Replies
AnonymousJune 8, 2017 at 10:25 PM
@Alfred,

I'll try to check whether can I backport those 3 (and debug patch) to 4.10, or You know aldeady that this idea won't work as patcn for 4.10 and 4.11 differs too much and it's not really doable by simple code merging by hand?

Thanks and br,
Eduards
ReplyDelete
Replies
Alfred ChenJune 15, 2017 at 12:04 AM
Hi, all,
I still can't finger out what's wrong with the lock strategy update commit in VRQ096, so I think I have to do this in the hard way ---- change the code step by step and see which one gets wrong. Lucky, it is not a huge commit.

So here is the #1 lock strategy debug patch, apply upon VRQ095b patch. Please try it out and give your feedback then I'll prepare the #2.

https://bitbucket.org/alfredchen/linux-gc/downloads/lock_strategy_00.patch

Thanks for testing, :)

BR Alfred
ReplyDelete
Replies
Alfred ChenJune 15, 2017 at 11:44 PM
@all
Thank you all for the quick test of lock_strategy_00.patch, looks like the first step is good move.

Here comes the #2 debug patch, just change a little bit and it is applied upon VRQ095b.
https://bitbucket.org/alfredchen/linux-gc/downloads/lock_strategy_01.patch

After this, there still two more debug patches are planned.

BR Alfred
ReplyDelete
Replies
AnonymousJune 18, 2017 at 11:27 PM
@Alfred,

I have used VRQ 095b + lock strategy 01 for couple of days, no problems so far. Compilations, everyday usage and games, both native and wine, work fine.

Br, Eduardo
ReplyDelete
Replies
Alfred ChenJune 19, 2017 at 7:35 PM
@all
Sorry that I was a little busy last weekend.
Here comes the #3 debug patch, https://bitbucket.org/alfredchen/linux-gc/downloads/lock_strategy_02.patch

@Manuel
These debug patches are just to find out the lock-up issue in 096.

BR Alfred
ReplyDelete
Replies
AnonymousJune 20, 2017 at 4:15 AM
'make -j4 ffmpeg' ran successfully 6 times in a row with linux 4.11.4 and VRQ 0.95b+lock_strategy_02.patch.
No errors in the logs.

Pedro
ReplyDelete
Replies
AnonymousJune 20, 2017 at 5:49 AM
Boot without warnings. One kernel build completed without errors on kernel 4.11.5 with vrq95b and lock_strategy_02.patch (Ryzen machine).

BR, Dzon.
ReplyDelete
Replies
AnonymousJune 20, 2017 at 2:31 PM
Also no issues or anomalies with 4.11.6 & lock_strategy_02.patch with BFQ (core2duo).
BR, Manuel Krause
ReplyDelete
Replies
Alfred ChenJune 21, 2017 at 1:22 AM
@all
Thank you all for testing. Here comes the final debug patch to find out what cause the lock-up.
#4 https://bitbucket.org/alfredchen/linux-gc/downloads/lock_strategy_03.patch
(apply upon 095b)
Depends on the testing result, I may ask your help to double check the original lock strategy commit on VRQ096, let's see how this final debug patch goes first.

BR Alfred
ReplyDelete
Replies
Alfred ChenJune 21, 2017 at 4:47 PM
@all
Here is the double check action.
Please apply the 096 lock strategy update commit *UP ON* VRQ095b, if you don't want to fetch it from git, I have uploaded it to https://bitbucket.org/alfredchen/linux-gc/downloads/lock_strategy_096.patch

Test it harder if possible. :)

BR Alfred
ReplyDelete
Replies
AnonymousJune 22, 2017 at 4:37 AM
I can not test this until evening today (GMT+2), but as I discovered earlier I got the same error as everyone else affected with 4a41e41 and b13b01c applied to 4.10.17, it seemed to me that e970c48 is not the cause of the problem, as I specifically left this out because of reported failures.
I will try to apply freq + lock strategy, leaving out smt sched patch and based on result just apply smt as the last one.
I haven't tried 096 on Ryzen yet, so probably I'll start with that to confirm whether I have the problem at all :)

Br, Eduardo
ReplyDelete
Replies
Alfred ChenJune 25, 2017 at 8:00 PM
@all
Based on the latest testing results. The "Lock strategy update" commit doesn't introduce lock-up issue, just as I have double checked all possible call path of the lock APIs, I don't see any possible unexpected scenario could happen.
While @Dzon has lock-up issue with commit "4a41e41" -- vrq: smt sensitive scheduling v0.1, I'd plan a improvement update for this commit and release a debug patch ASAP this week.

BR Alfrerd
ReplyDelete
Replies
Alfred ChenJune 26, 2017 at 7:24 PM
@all
Please try this patch upon VRQ096, which use strict locking when doing smt balancing.
https://bitbucket.org/alfredchen/linux-gc/downloads/v4.11_vrq096_096a.txt

BR Alfred
ReplyDelete
Replies
Alfred ChenJune 28, 2017 at 7:52 PM
@all
Here comes the second respined debug patch upon VRQ096, which fix a issue when NR_CPUS > real cpu cores, that leads to schedule task to an un-existed cpu and write protection fault.
https://bitbucket.org/alfredchen/linux-gc/downloads/v4.11_vrq096_096b.txt

Hopefully this fix the issue for most of you.
ReplyDelete
Replies
Alfred ChenJuly 3, 2017 at 12:06 AM
@all
Thank you for testing. It turns out the latest debug patch fix the issue in VRQ096 for most users, so VRQ096b is officially released. Feel free to discuss in the new post there.
ReplyDelete
Replies

Add comment