We traced the unplugged_io issue these two weeks, most discussion are in the replies of a-big-commit-added-to-41-vrq
At first I though that
"I guess the sched_submit_work() doesn't work for bfs b/c bfs use grq_lock instead task_lock() in mainline which a combine of task's pi_lock and rq->lock, the checking of tsk_is_pi_blocked(tsk) is not enough for BFS."
After investigation, it turns out that tsk_is_pi_blocked() is introduced in v3.3
3c7d518 sched/rt: Do not submit new work when PI-blocked
And it's not indicate tsk->pi_lock is held as I used to think it was.
So the question is back again, when sched_submit_works() is introduced in mainline 3.1, it moves the blk_schedule_flush_plug(tsk) call outside from schedule(), but relaxing the checking when not calling it. This code change is ok for mainline CFS but it's not for BFS somehow.
Adding back those checking is the current solution. The last patch for this issue is unchanged. I'd update -gc and -vrq branch soon to include it.
BR Alfred
Thursday, August 27, 2015
Monday, August 24, 2015
4.1 -gc -vrq sanity test result and look forward
Since there are toolchain upgrade in my distribution. Now the system is using new gcc 4.9.x etc, it runs a little bit slow than 4.8.x, I have to overclock the test-bed system to get an acceptable run time of the sanity test. The result is as expected, comparing to previous test results, no regression is introduced in this release. The result is attached at the end of this post.
And 4.2 official release is delayed one week, it gives me a chance to list the todo items in next cycle, here they are
1. Sync up mainline 4.2, when preview the code changes during 4.2, there are much changes in scheduler code, over 1200+ lines of diff.
2. Start work on new commit which auto adjust the the cpu cache size factor of the task caching, current it's hard-code to optimize for my test-bed machine.
3. Fix known bugs, add comments and try to finalize some of the commits in VRQ.
4. Test and tune SMT.
5. Introduce another benchmark test.
Seems that there are enough thing to keep me busy for weeks, :)
BR Alfred
4.1 CFS
>>>>>spining up
>>>>>50% workload
>>>>>round 1
real 4m40.652s
user 8m39.005s
sys 0m35.902s
>>>>>round 2
real 4m40.688s
user 8m39.100s
sys 0m35.892s
>>>>>round 3
real 4m40.879s
user 8m39.041s
sys 0m35.881s
>>>>>100% workload
>>>>>round 1
real 2m30.750s
user 8m56.625s
sys 0m38.958s
>>>>>round 2
real 2m32.314s
user 9m2.696s
sys 0m39.169s
>>>>>round 3
real 2m32.873s
user 9m5.219s
sys 0m39.235s
>>>>>150% workload
>>>>>round 1
real 2m35.384s
user 9m13.719s
sys 0m40.464s
>>>>>round 2
real 2m34.874s
user 9m11.656s
sys 0m40.704s
>>>>>round 3
real 2m34.973s
user 9m10.739s
sys 0m40.397s
>>>>>200% workload
>>>>>round 1
real 2m36.812s
user 9m17.614s
sys 0m40.828s
>>>>>round 2
real 2m36.634s
user 9m18.383s
sys 0m40.701s
>>>>>round 3
real 2m36.992s
user 9m19.108s
sys 0m40.819s
>>>>>250% workload
>>>>>round 1
real 2m37.632s
user 9m21.271s
sys 0m41.163s
>>>>>round 2
real 2m38.446s
user 9m24.224s
sys 0m41.022s
>>>>>round 3
real 2m38.602s
user 9m24.575s
sys 0m41.436s
>>>>>300% workload
>>>>>round 1
real 2m39.867s
user 9m29.286s
sys 0m41.574s
>>>>>round 2
real 2m40.615s
user 9m29.444s
sys 0m41.578s
>>>>>round 3
real 2m40.111s
user 9m29.686s
sys 0m41.852s
4.1 BFS
>>>>>50% workload
>>>>>round 1
real 4m45.965s
user 8m53.304s
sys 0m32.862s
>>>>>round 2
real 4m45.964s
user 8m53.812s
sys 0m32.378s
>>>>>round 3
real 4m45.919s
user 8m53.194s
sys 0m32.927s
>>>>>100% workload
>>>>>round 1
real 2m30.846s
user 9m1.581s
sys 0m33.857s
>>>>>round 2
real 2m31.267s
user 9m2.822s
sys 0m34.096s
>>>>>round 3
real 2m31.666s
user 9m4.665s
sys 0m33.841s
>>>>>150% workload
>>>>>round 1
real 2m34.415s
user 9m16.511s
sys 0m34.483s
>>>>>round 2
real 2m34.530s
user 9m16.214s
sys 0m35.030s
>>>>>round 3
real 2m34.578s
user 9m17.104s
sys 0m34.456s
>>>>>200% workload
>>>>>round 1
real 2m35.951s
user 9m22.398s
sys 0m34.514s
>>>>>round 2
real 2m37.026s
user 9m22.704s
sys 0m34.639s
>>>>>round 3
real 2m36.158s
user 9m22.571s
sys 0m35.061s
>>>>>250% workload
>>>>>round 1
real 2m37.269s
user 9m25.792s
sys 0m35.212s
>>>>>round 2
real 2m37.058s
user 9m25.937s
sys 0m34.739s
>>>>>round 3
real 2m37.132s
user 9m25.538s
sys 0m35.453s
>>>>>300% workload
>>>>>round 1
real 2m37.935s
user 9m24.762s
sys 0m35.681s
>>>>>round 2
real 2m37.039s
user 9m25.452s
sys 0m35.822s
>>>>>round 3
real 2m38.103s
user 9m26.001s
sys 0m35.129s
4.1 GC
>>>>>50% workload
>>>>>round 1
real 4m43.899s
user 8m50.524s
sys 0m32.508s
>>>>>round 2
real 4m43.831s
user 8m50.031s
sys 0m32.868s
>>>>>round 3
real 4m43.810s
user 8m49.999s
sys 0m32.926s
>>>>>100% workload
>>>>>round 1
real 2m30.824s
user 9m1.669s
sys 0m34.747s
>>>>>round 2
real 2m31.382s
user 9m4.495s
sys 0m34.260s
>>>>>round 3
real 2m31.539s
user 9m5.008s
sys 0m34.470s
>>>>>150% workload
>>>>>round 1
real 2m35.457s
user 9m18.970s
sys 0m34.946s
>>>>>round 2
real 2m34.628s
user 9m18.050s
sys 0m34.884s
>>>>>round 3
real 2m34.648s
user 9m18.807s
sys 0m34.446s
>>>>>200% workload
>>>>>round 1
real 2m36.268s
user 9m23.971s
sys 0m35.149s
>>>>>round 2
real 2m36.410s
user 9m24.660s
sys 0m35.172s
>>>>>round 3
real 2m36.670s
user 9m25.137s
sys 0m35.346s
>>>>>250% workload
>>>>>round 1
real 2m37.606s
user 9m29.152s
sys 0m36.025s
>>>>>round 2
real 2m38.546s
user 9m27.398s
sys 0m35.950s
>>>>>round 3
real 2m38.509s
user 9m28.057s
sys 0m35.655s
>>>>>300% workload
>>>>>round 1
real 2m37.824s
user 9m28.526s
sys 0m36.302s
>>>>>round 2
real 2m37.473s
user 9m28.433s
sys 0m35.741s
>>>>>round 3
real 2m37.049s
user 9m27.219s
sys 0m35.622s
4.2 VRQ
>>>>>50% workload
>>>>>round 1
real 4m43.533s
user 8m49.706s
sys 0m32.653s
>>>>>round 2
real 4m43.630s
user 8m49.385s
sys 0m32.904s
>>>>>round 3
real 4m43.468s
user 8m49.845s
sys 0m32.537s
>>>>>100% workload
>>>>>round 1
real 2m30.467s
user 9m1.640s
sys 0m34.555s
>>>>>round 2
real 2m30.812s
user 9m1.790s
sys 0m34.305s
>>>>>round 3
real 2m30.675s
user 9m2.192s
sys 0m34.027s
>>>>>150% workload
>>>>>round 1
real 2m33.289s
user 9m12.513s
sys 0m34.640s
>>>>>round 2
real 2m33.166s
user 9m12.042s
sys 0m34.795s
>>>>>round 3
real 2m33.135s
user 9m12.005s
sys 0m35.120s
>>>>>200% workload
>>>>>round 1
real 2m36.200s
user 9m19.313s
sys 0m35.160s
>>>>>round 2
real 2m35.053s
user 9m18.936s
sys 0m35.322s
>>>>>round 3
real 2m34.917s
user 9m19.771s
sys 0m34.833s
>>>>>250% workload
>>>>>round 1
real 2m37.391s
user 9m23.886s
sys 0m35.097s
>>>>>round 2
real 2m35.889s
user 9m23.426s
sys 0m35.680s
>>>>>round 3
real 2m36.198s
user 9m23.343s
sys 0m35.443s
>>>>>300% workload
>>>>>round 1
real 2m36.724s
user 9m26.019s
sys 0m35.194s
>>>>>round 2
real 2m36.576s
user 9m25.513s
sys 0m35.794s
>>>>>round 3
real 2m36.759s
user 9m25.738s
sys 0m35.238s
And 4.2 official release is delayed one week, it gives me a chance to list the todo items in next cycle, here they are
1. Sync up mainline 4.2, when preview the code changes during 4.2, there are much changes in scheduler code, over 1200+ lines of diff.
2. Start work on new commit which auto adjust the the cpu cache size factor of the task caching, current it's hard-code to optimize for my test-bed machine.
3. Fix known bugs, add comments and try to finalize some of the commits in VRQ.
4. Test and tune SMT.
5. Introduce another benchmark test.
Seems that there are enough thing to keep me busy for weeks, :)
BR Alfred
4.1 CFS
>>>>>spining up
>>>>>50% workload
>>>>>round 1
real 4m40.652s
user 8m39.005s
sys 0m35.902s
>>>>>round 2
real 4m40.688s
user 8m39.100s
sys 0m35.892s
>>>>>round 3
real 4m40.879s
user 8m39.041s
sys 0m35.881s
>>>>>100% workload
>>>>>round 1
real 2m30.750s
user 8m56.625s
sys 0m38.958s
>>>>>round 2
real 2m32.314s
user 9m2.696s
sys 0m39.169s
>>>>>round 3
real 2m32.873s
user 9m5.219s
sys 0m39.235s
>>>>>150% workload
>>>>>round 1
real 2m35.384s
user 9m13.719s
sys 0m40.464s
>>>>>round 2
real 2m34.874s
user 9m11.656s
sys 0m40.704s
>>>>>round 3
real 2m34.973s
user 9m10.739s
sys 0m40.397s
>>>>>200% workload
>>>>>round 1
real 2m36.812s
user 9m17.614s
sys 0m40.828s
>>>>>round 2
real 2m36.634s
user 9m18.383s
sys 0m40.701s
>>>>>round 3
real 2m36.992s
user 9m19.108s
sys 0m40.819s
>>>>>250% workload
>>>>>round 1
real 2m37.632s
user 9m21.271s
sys 0m41.163s
>>>>>round 2
real 2m38.446s
user 9m24.224s
sys 0m41.022s
>>>>>round 3
real 2m38.602s
user 9m24.575s
sys 0m41.436s
>>>>>300% workload
>>>>>round 1
real 2m39.867s
user 9m29.286s
sys 0m41.574s
>>>>>round 2
real 2m40.615s
user 9m29.444s
sys 0m41.578s
>>>>>round 3
real 2m40.111s
user 9m29.686s
sys 0m41.852s
4.1 BFS
>>>>>50% workload
>>>>>round 1
real 4m45.965s
user 8m53.304s
sys 0m32.862s
>>>>>round 2
real 4m45.964s
user 8m53.812s
sys 0m32.378s
>>>>>round 3
real 4m45.919s
user 8m53.194s
sys 0m32.927s
>>>>>100% workload
>>>>>round 1
real 2m30.846s
user 9m1.581s
sys 0m33.857s
>>>>>round 2
real 2m31.267s
user 9m2.822s
sys 0m34.096s
>>>>>round 3
real 2m31.666s
user 9m4.665s
sys 0m33.841s
>>>>>150% workload
>>>>>round 1
real 2m34.415s
user 9m16.511s
sys 0m34.483s
>>>>>round 2
real 2m34.530s
user 9m16.214s
sys 0m35.030s
>>>>>round 3
real 2m34.578s
user 9m17.104s
sys 0m34.456s
>>>>>200% workload
>>>>>round 1
real 2m35.951s
user 9m22.398s
sys 0m34.514s
>>>>>round 2
real 2m37.026s
user 9m22.704s
sys 0m34.639s
>>>>>round 3
real 2m36.158s
user 9m22.571s
sys 0m35.061s
>>>>>250% workload
>>>>>round 1
real 2m37.269s
user 9m25.792s
sys 0m35.212s
>>>>>round 2
real 2m37.058s
user 9m25.937s
sys 0m34.739s
>>>>>round 3
real 2m37.132s
user 9m25.538s
sys 0m35.453s
>>>>>300% workload
>>>>>round 1
real 2m37.935s
user 9m24.762s
sys 0m35.681s
>>>>>round 2
real 2m37.039s
user 9m25.452s
sys 0m35.822s
>>>>>round 3
real 2m38.103s
user 9m26.001s
sys 0m35.129s
4.1 GC
>>>>>50% workload
>>>>>round 1
real 4m43.899s
user 8m50.524s
sys 0m32.508s
>>>>>round 2
real 4m43.831s
user 8m50.031s
sys 0m32.868s
>>>>>round 3
real 4m43.810s
user 8m49.999s
sys 0m32.926s
>>>>>100% workload
>>>>>round 1
real 2m30.824s
user 9m1.669s
sys 0m34.747s
>>>>>round 2
real 2m31.382s
user 9m4.495s
sys 0m34.260s
>>>>>round 3
real 2m31.539s
user 9m5.008s
sys 0m34.470s
>>>>>150% workload
>>>>>round 1
real 2m35.457s
user 9m18.970s
sys 0m34.946s
>>>>>round 2
real 2m34.628s
user 9m18.050s
sys 0m34.884s
>>>>>round 3
real 2m34.648s
user 9m18.807s
sys 0m34.446s
>>>>>200% workload
>>>>>round 1
real 2m36.268s
user 9m23.971s
sys 0m35.149s
>>>>>round 2
real 2m36.410s
user 9m24.660s
sys 0m35.172s
>>>>>round 3
real 2m36.670s
user 9m25.137s
sys 0m35.346s
>>>>>250% workload
>>>>>round 1
real 2m37.606s
user 9m29.152s
sys 0m36.025s
>>>>>round 2
real 2m38.546s
user 9m27.398s
sys 0m35.950s
>>>>>round 3
real 2m38.509s
user 9m28.057s
sys 0m35.655s
>>>>>300% workload
>>>>>round 1
real 2m37.824s
user 9m28.526s
sys 0m36.302s
>>>>>round 2
real 2m37.473s
user 9m28.433s
sys 0m35.741s
>>>>>round 3
real 2m37.049s
user 9m27.219s
sys 0m35.622s
4.2 VRQ
>>>>>50% workload
>>>>>round 1
real 4m43.533s
user 8m49.706s
sys 0m32.653s
>>>>>round 2
real 4m43.630s
user 8m49.385s
sys 0m32.904s
>>>>>round 3
real 4m43.468s
user 8m49.845s
sys 0m32.537s
>>>>>100% workload
>>>>>round 1
real 2m30.467s
user 9m1.640s
sys 0m34.555s
>>>>>round 2
real 2m30.812s
user 9m1.790s
sys 0m34.305s
>>>>>round 3
real 2m30.675s
user 9m2.192s
sys 0m34.027s
>>>>>150% workload
>>>>>round 1
real 2m33.289s
user 9m12.513s
sys 0m34.640s
>>>>>round 2
real 2m33.166s
user 9m12.042s
sys 0m34.795s
>>>>>round 3
real 2m33.135s
user 9m12.005s
sys 0m35.120s
>>>>>200% workload
>>>>>round 1
real 2m36.200s
user 9m19.313s
sys 0m35.160s
>>>>>round 2
real 2m35.053s
user 9m18.936s
sys 0m35.322s
>>>>>round 3
real 2m34.917s
user 9m19.771s
sys 0m34.833s
>>>>>250% workload
>>>>>round 1
real 2m37.391s
user 9m23.886s
sys 0m35.097s
>>>>>round 2
real 2m35.889s
user 9m23.426s
sys 0m35.680s
>>>>>round 3
real 2m36.198s
user 9m23.343s
sys 0m35.443s
>>>>>300% workload
>>>>>round 1
real 2m36.724s
user 9m26.019s
sys 0m35.194s
>>>>>round 2
real 2m36.576s
user 9m25.513s
sys 0m35.794s
>>>>>round 3
real 2m36.759s
user 9m25.738s
sys 0m35.238s
Sunday, August 16, 2015
4.1 VRQ branch rework finished
Here are the new commits added to vrq branch(in reverse order)
2a8eea0 bfs: vrq: grq.lock free schedule for deactivate code path
8e1ae7c bfs: vrq: grq.lock free context switch for prev==idle path
34c262f bfs: vrq: refine task_preemptable_rq().
22ce18c bfs: vrq: [3/3] preempt task solution, v1.2
79265ca bfs: vrq: [2/3] introduce xxxx_choose_task() in __schedule().
fc44466 bfs: vrq: [1/3] RQ on_cpu states v1.1
be4207e bfs: vrq: refine rq->prq/w_prq as rq->try_preempt_tsk
f4aeee0 bfs: vrq: remove unused unsticky_task.
9c53147 bfs: vrq: Fix vrq solution 0.5 UP compile issue
Both bitbucket and github are updated! The most important objective of this release is stability. I got a new HW platform which found stability issues that can't be found in old platforms, and I believed the major ones have been fixed.
There still three key features on vrq branch as mentioned in vrq-04-update-for-linux-40y. But the cache count solution has advanced a little bit. Now the responsible commit is
ed20056 bfs: vrq: [2/2] scost for task caching v0.7
which is a replacement for the sticky_task design in origin bfs. I'll start another topic for it.
Now, all commits are set for vrq 4.1 branch. Benchmark will be run this week since there are many toolchain upgrade for my distribution in this release. Looking forward, next week 4.2 will be out and hopefully there will be less sync up work to spend more time on new commits.
BR Alfred
Monday, August 10, 2015
A big commit added to 4.1 VRQ
As title, this big commit is 117d783 bfs: VRQ solution v0.5
I think the most unstable issues in previous vrq release is caused by this and I believe most known issues(on my machines) have been fixed. It has been run stably for two weeks. So you are encouraged to have a try.
Know issue:
BUG: using smp_processor_id() in preemptible code, call trace from sys_sched_yield().
There still a few commits left I haven't reworked yet. I plan to finish them in two weeks before new kernel release and another sync-up cycle begins.
BR Alfred
I think the most unstable issues in previous vrq release is caused by this and I believe most known issues(on my machines) have been fixed. It has been run stably for two weeks. So you are encouraged to have a try.
Know issue:
BUG: using smp_processor_id() in preemptible code, call trace from sys_sched_yield().
There still a few commits left I haven't reworked yet. I plan to finish them in two weeks before new kernel release and another sync-up cycle begins.
BR Alfred
Friday, August 7, 2015
4.1 vrq branch update -- reworking
4.1 vrq branch is updated, but there is no new commit added, as there is new sync to pick up bfs0463 and kernel v4.1.4, new commit has to be postponed to next week.
A fix has been added to the last commit to fix the compile error on UP config.
BR Alfred
A fix has been added to the last commit to fix the compile error on UP config.
BR Alfred
Wednesday, August 5, 2015
gc-branch update with CK's BFS 0463
CK finally releases BFS 0463 against kernel 4.1 this week, so here comes the gc branch updates.
What's new:
1. Base on BFS 0463 and kernel v4.1.4
2. Fix/Sync against BFS 0463
Code has been forced push to bitbucket and github . For those just want to easier apply the patches, here is the one for all patch include all BFS related commits in my gc-branch: bfs_enhancement_v4.1_0463_1.patch
If you are using the gc-branch, I'll highly suggest you to upgrade to this gc release. An updated -vrq branch will be coming soon, no new commits is planned(have to delay to next week as much sync-up works this week), but will be some bug fixes for the existed ones.
BR Alfred Chen
Update:
Add one more commit to fix RCU stall issue.
What's new:
1. Base on BFS 0463 and kernel v4.1.4
2. Fix/Sync against BFS 0463
- 3b14908 bfs: [Sync] 4.1 schedule_user().
- 9f9dc34 bfs: [Fix] 0463 remove unused register_task_migration_notifier().
- 0145370 bfs: [Sync] TIF_POLLING_NRFLAG for wake_up_if_idle() and resched_curr().
- 775e28a bfs: [Sync] sched_init_numa().
- c6c5894 bfs: [Sync] task_sched_runtime().
- 4a48abf bfs: [Sync] sched_setscheduler() logic, v3
- dc4fa45 bfs: -gc BFS enchancement patch set version.
Code has been forced push to bitbucket and github . For those just want to easier apply the patches, here is the one for all patch include all BFS related commits in my gc-branch: bfs_enhancement_v4.1_0463_1.patch
If you are using the gc-branch, I'll highly suggest you to upgrade to this gc release. An updated -vrq branch will be coming soon, no new commits is planned(have to delay to next week as much sync-up works this week), but will be some bug fixes for the existed ones.
BR Alfred Chen
Update:
Add one more commit to fix RCU stall issue.
bfs: v4.1_0463_1 rcu stall fix.
|
Subscribe to:
Posts (Atom)