Comments on Alfred Chen's Blog: VRQ 0.3 updates

Just replied in a new post. There will be a new so...

2015-02-09T01:36:42.968-08:00

Just replied in a new post. There will be a new solution in 3.19.

BR Alfred

Hi, Alfred! Just wanted to come back and ask now, ...

2015-02-06T14:37:53.560-08:00

Hi, Alfred!
Just wanted to come back and ask now, if you've found a new solution/fix (other than me reverting the patches).

Best regards,
Manuel

Yes, you're right: Reverting these three -vrq ...

2015-01-19T07:01:19.771-08:00

Yes, you're right: Reverting these three -vrq patches (thank you for naming them!) brings back normal behaviour. :-) Also for the case of resuming from suspend to disk everything works well.
BR Manuel

I have bisect and found commit "bfs: vrq: RQ ...

2015-01-19T00:51:34.020-08:00

I have bisect and found commit "bfs: vrq: RQ niffy solution." which introduced this issue. You can skip the last 3 to 4 commit in -vrq branch and see if it fix the issue for you. Those commits are "
bfs: vrq: dedicated xxxx_schedule().
bfs: vrq, refactory wake_up_new_task.
bfs: vrq: RQ niffy solution."
I am still waiting for the debug load info to tell the detail cause of this issue.

BR Alfred

Oh, that doesn't sound good. But, anyway, a qu...

2015-01-14T06:48:58.958-08:00

Oh, that doesn't sound good. But, anyway, a quality fix is better than a too early one.
Please, remember, that my tests with your baseline patches up to 3.18.2 were error-free.
I'll stay tuned and can also test preliminary test-fix-patches for you.

Good luck, Manuel

I have to said it's too earlier to say 100% re...

2015-01-13T19:23:36.248-08:00

I have to said it's too earlier to say 100% reproduced yesterday. Yes, when the issue is triggered, no matter what background workload(SCHED_BATCH or SCHED_NORMAL) is running, one cpu is failed to pick up normal or system task. But to trigger the issue, here, I use "schedtool -3 xxx” to set mprime thread to SCHED_BATCH, is not work well after a flash restarted system. Last night, no matter how I play with it, it just can't trigger the issue, but when I try it today(that system doesn't restart during night), one-shot bingo!
Based on current info, the test cycle is 12h+ for me, and I think I need to re-test the baseline version then bisect to find the commit. In other word, it wouldn't be a short time to expected the fix.

Any updates will let you know.

Please, do also provide an incremental patch for m...

2015-01-13T07:08:42.796-08:00

Please, do also provide an incremental patch for me, somewhere, so we all are able to see what you've changed vs. current VRQ.

Thanks, Manuel

Glad, to hear that we've found the culprit. :-...

2015-01-13T04:27:47.286-08:00

Glad, to hear that we've found the culprit. :-) I wish you good luck with elaborating the fix and hope that it wouldn't lead to bad benchmarking results of VRQ, that do look promising, for now.

BR, Manuel

Thanks for your time to help testing and provide u...

2015-01-12T22:12:39.021-08:00

Thanks for your time to help testing and provide useful info.
Once set policy to SCHED_BATCH, the issue is 100% reproduced and confirm that baseline version(-gc branch) is clear and -vrq is impacted.
Once the fix is ready, I will update the -vrq branch and let you know.

BR Alfred

Mmmh, I don't understand exactly why you need ...

2015-01-12T05:34:39.554-08:00

Mmmh, I don't understand exactly why you need to re-model my system's behaviour with taskset. On here this is happening without manual intervention.
To your questions: There may be one significant difference to your mprime tests:
1. Querying the two wcg tasks:
# schedtool `pidofproc wcgrid_faah_7.1`
PID 15980: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0x3
PID 21063: PRIO 0, POLICY B: SCHED_BATCH , NICE 19, AFFINITY 0x3
I mean, it's the SCHED_BATCH scheduling policy that's different.
2. Yes, the "make -j2" commands are executed at default priority, SCHED_NORMAL and NICE 0.
3. Yes, repeatedly stopping+restarting both wcg tasks and kernel compile, resulted in cpu1 not picking up normal tasks from cpu0 over the repeated steps.

I hope this info helps to understand a bit better what's going on here with VRQ.

BR, Manuel

PS, what does "And, while doing this row of t...

2015-01-11T19:09:28.256-08:00

PS, what does "And, while doing this row of tests (of course, without rebooting), with each test, the ability of CPU1 vanishes, to take over "normal" or "system" tasks. For kernel compilation: CPU0 at full load, CPU1 only showing some peaks." means?

When disable/enable wcg repeatedly, cpu1 fails to pick up normal/system tasks? If so, it looks like wcg is running at high priority than it seems to be.

BR Alfred

I have tried to simulate your usage as your last d...

2015-01-11T17:22:07.609-08:00

I have tried to simulate your usage as your last description.
I started 2 mprime thread and use taskset to make them all run on cpu1. After this, I can see backgroud workload occupied cpu1 100% and system normal workload on cpu0. The mprime threads are running at nice level 19.
Then I start kernel compile with "make -j2", cpu0 is occupied by 90%+ normal workload and cpu is occupied by 70%-90% normal workload, some system workload(red) and the rest are taken by mprime threads. The result is quit as expected. I have also roll back to the Baseline version kernel and get the same result.

So let's give it a last shoot before I go to install the BOINC.
1. What priority and nice level the wcg work unit is running at?
2. Your "make -j2" is running at default priority and nice level?(guess so)

O.k., now I've found time to test a bit and to...

2015-01-09T21:51:23.213-08:00

O.k., now I've found time to test a bit and to answer.
I've tested your baseline 3.18.y-gc (instead of your proposal of "pure" BFS, as I use your .y-gc on 3.17.x, too).
Please, also take notice of, that I permanently observe the two cpu cores via gkrellm, where I see in colours what amount of idle load vs. normal+system load happens ATM. Here the interval is very small. I understand, that the watched "balance" is not a real balance, but a statistical one.
With your baseline patches, on the kernels 3.17.8 & 3.18.2, the 2 cores show a (virtually) equal load over short and long time. When I can agree to your explanation of the unbalanced behaviour in general: the VRQ patched 3.18.2 doesn't equalise correctly at all.
I run wcg with BOINC at default settings. It seems to fetch and queue work units as afforded and brings the new ones to each cpu core when that's run without one active work unit. In the BOINC Manager I can temporarily suspend tasks, and see, that this is true.
With the VRQ 0.3 patched 3.18 the firefox task settles to CPU0 (first core) completely over time, while CPU1 (second core) then only serves for wcg work unit 2. Starting a kernel compile "make -j1" or "-j2" shows, that CPU1 doesn't do more of half than CPU0. Then stopped all wcg clients, the -main- desktop+firefox load switches over to CPU1. And after that "make -j1" and "-j2" show equal load on both cores. Compilation quitted, then starting wcg again, the firefox load gets back to CPU0. And, while doing this row of tests (of course, without rebooting), with each test, the ability of CPU1 vanishes, to take over "normal" or "system" tasks. For kernel compilation: CPU0 at full load, CPU1 only showing some peaks.

Conclusion: There's something severely imbalancing within your VRQ (only).

Best regards, Manuel

Yeah, in 3.18, suspend is broken in my workstation...

2015-01-05T22:27:18.534-08:00

Yeah, in 3.18, suspend is broken in my workstation and mpv movie screen is crapped in window mode.
Ok, back to the topic. I have tried to reproduce your usage by using mprime to simulate wcg(I don't install it) background workload and mpv plays mkv movie to simulate font workload.

In both 3.17 and 3.18 -vrq branch, I can both obse...

2015-01-05T22:24:00.846-08:00

In both 3.17 and 3.18 -vrq branch, I can both observer kind of imbalanced workload among cpus, but it doesn't continue forever as you said. I will re-check this on original BFS kernel tonight.

For the imbalanced behaviors, I think I can explain in this way. The cpu usage in htop is a statistics data. That means for example, if a single thread task which occupied 10% cpu, during the htop calculating windows, saying 2 seconds, if it stays in cpu0 1 second and stays in cpu1 1 second, it will result in 5% usage in each cpus via htop, this looks balanced. From the scheduler's point of view, switch task among cpus is not a good idea(though, in my opinion, in your hardware, 2 cores shares same LLC, switch among these two cpus is cost free). So in idea way, this task may stay the whole 2 seconds on one cpu, it results in 0% usage in one cpu and 10% in another, it looks imbalanced via htop.

For your new discovery, I don't have suspend-to-disk to work, I tested with suspend(to ram) and resume, both front-ground and background load are the same after resume.

Two additional questions:
1. How does wcg set its work unit, it can be fixed on a certain cpu or not care about? Like mprime, it just start number of thread and doesn't care which cpu/core they runs on.
2. What is the behaviours of your usage in 3.17 and 3.18 with pure BFS?

Nothing, what I've written, means: Only the wc...

2015-01-05T01:34:23.620-08:00

Nothing, what I've written, means: Only the wcg client kept running on CPU1.

Mmmh. I don't like 3.18.x, as the automatic fa...

2015-01-05T01:24:18.599-08:00

Mmmh. I don't like 3.18.x, as the automatic fan management fails.
But o.k.: Now the answers:
1. I run wcg as root and it's a nice 19 process. There are 2 work units running, each of them on one of my two cpu's cores.
2. smplayer/ MPlayer doesn't play a role for the kind of videos I usually play. The load is neglectible. About +5% on one core?
3. Unbalanced load would stay forever from the beginning. At least it seams so after booting, login to KDE, then starting programs and so on.
4. htop shows the same load imbalance.

Yesterday I've then made a new discovery: After Suspend-to-Disk and resuming, all load was assigned to CPU0. Nothing balancing to CPU1. That stayed "forever".

Manuel

It's interesting, I will try to reproduced it....

2015-01-04T01:07:59.560-08:00

It's interesting, I will try to reproduced it. So I would like to ask a few questions.
1. What priority the wcg client is running? Single thread or multithread?
2. smplayer using single thread or multithread? How much cpu% when it plays your avi movie?
3. When un-balanced loading triggered, how long it lasted or stay that way?
4. Would you able to use htop and observing the same?
3.

"bfs460-locked-pluggedio.patch" is a fix...

2015-01-04T00:47:13.153-08:00

"bfs460-locked-pluggedio.patch" is a fix for known issue, if you don't have such issue, you can safely ignore it, ATM. I will wait for next sync with new release of BFS to catch all these patches.

Yes. That is for debugging only.

2015-01-04T00:39:32.321-08:00

Yes. That is for debugging only.

My first observation with 3.18.1 + BFQ + 23 BFS-VR...

2015-01-03T00:40:13.681-08:00

My first observation with 3.18.1 + BFQ + 23 BFS-VRQ-branch patches (omitted patch 23 of 24 "3b9cc00 bfs: xxxx_schedule() stat debug") is, that my two CPU cores aren't used/ loaded equally. I see it when observing gkrellm CPU0/1 charts while doing things.
At that moment there are mainly running a worldcommunitygrid client in the background, a firefox-esr with 110 open tabs and a smplayer playing an .avi movie.

One of the two cores shows approx. 50% of the NON-IDLE load of the other. Funny, that if I quit and newly start firefox, this can happen on cpu0 or cpu1, to see more load on the other cpuX and vice versa.
Quitting the low prioritised wcg client, the core what had less NON-IDLE load before then shows 50% more load than the other.

The normal BFS always tried to balance these loads equally. Maybe there's something going wrong?

Best regards, Manuel

BTW, have you already reviewed the patches in http...

2015-01-02T22:05:10.268-08:00

BTW, have you already reviewed the patches in http://ck.kolivas.org/patches/bfs/3.0/3.18/pending/ from 20141231?
Would they be useful with 3.18-vrq, too? At least from the "bfs460-locked-pluggedio.patch" I see it's not applying correctly.

Best regards, Manuel

Can I omit "3b9cc00 bfs: xxxx_schedule() sta...

2015-01-02T21:58:14.553-08:00

Can I omit "3b9cc00 bfs: xxxx_schedule() stat debug" or is it needed? Sounds like debugging overhead.

Thanks for your continued work on BFS+ and, yes, I wish a Happy New Year,

Manuel