After reversed the commit which I bitsect to find out, I get a
approaching result comparing to the Baseline, but the grp lock session
is till there to work on.
I played with grq lock in the idle task path in many ways.
1. Placing grq_lock() before update_clocks(), approaching result comparing to the Baseline.
2. Placing grq_lock() after update_clocks(), the regression comes back.
3. A draft grq lock free idle fast path solution, the regression comes back.
These code changes doesn't make sense contributing to the regression to me, and I believe the grq lock may be just an illusion of a hidden issue.
Looking back at the unusual sched_count and sched_goidle value in schedstat, sched_goidle is earlier to be traced than sched_count, and there are three code path lead to sched_goidle, which are
1. idle = prev, !qnr
2. idle != prev and deactivate, !qnr
3. idle != prev and prev need other cpus, !qnr
so I wrote debug loads to find out how these three path contributes to sched_goidle, here is the result
idle deactivate needother
Baseline 9727 56106 0
VRQ 27764 61276 0
It looks like that schedule() is called while idle task is running and actually none task is queued in grq and so scheduler has to run idle again. This idle->idle code path is inefficient, it should be hit as less as possible.
One suspicious code cause schedule() to go idle->idle code path is the resched_best_idle(p) call. Currently, the resched_best_idle(p) is called when prev not equal next. I wrote a patch to remove the duplicated resched_best_idle() call in 3.15.x. This time, I decide to take a close look at under what condition resched_best_idle() should be called.
#1 prev == idle
#2 !#1 and deactivate
#3 !#1 and !#2 and need_other_cpus
#4 !#1..3 and qnr
#5 !#1..3 and !qnr
#1 #2 #3 #4 #5
resched_best_idle N N Y ? N
next != prev ? Y Y ? N
? means that the result depends what next task is fetched from grq.
Obviously, current next != prev condition can't cover #1 and #2, which will caused unnecessary schedule() calls. Take the 50% job/core ratio test for example, when a task is deactivated, and next task is not generated yet, scheduler will choose idle to run. But the resched_best_idle(prev) will cause schedule() run again on this rq, which hits the idle->idle code path.
I played with grq lock in the idle task path in many ways.
1. Placing grq_lock() before update_clocks(), approaching result comparing to the Baseline.
2. Placing grq_lock() after update_clocks(), the regression comes back.
3. A draft grq lock free idle fast path solution, the regression comes back.
These code changes doesn't make sense contributing to the regression to me, and I believe the grq lock may be just an illusion of a hidden issue.
Looking back at the unusual sched_count and sched_goidle value in schedstat, sched_goidle is earlier to be traced than sched_count, and there are three code path lead to sched_goidle, which are
1. idle = prev, !qnr
2. idle != prev and deactivate, !qnr
3. idle != prev and prev need other cpus, !qnr
so I wrote debug loads to find out how these three path contributes to sched_goidle, here is the result
idle deactivate needother
Baseline 9727 56106 0
VRQ 27764 61276 0
It looks like that schedule() is called while idle task is running and actually none task is queued in grq and so scheduler has to run idle again. This idle->idle code path is inefficient, it should be hit as less as possible.
One suspicious code cause schedule() to go idle->idle code path is the resched_best_idle(p) call. Currently, the resched_best_idle(p) is called when prev not equal next. I wrote a patch to remove the duplicated resched_best_idle() call in 3.15.x. This time, I decide to take a close look at under what condition resched_best_idle() should be called.
#1 prev == idle
#2 !#1 and deactivate
#3 !#1 and !#2 and need_other_cpus
#4 !#1..3 and qnr
#5 !#1..3 and !qnr
#1 #2 #3 #4 #5
resched_best_idle N N Y ? N
next != prev ? Y Y ? N
? means that the result depends what next task is fetched from grq.
Obviously, current next != prev condition can't cover #1 and #2, which will caused unnecessary schedule() calls. Take the 50% job/core ratio test for example, when a task is deactivated, and next task is not generated yet, scheduler will choose idle to run. But the resched_best_idle(prev) will cause schedule() run again on this rq, which hits the idle->idle code path.
There is a lot of talk but the code change is very simple, for baseline, just one line added
Below are the 50% job/core ratio throughput test results for the baseline and vrq with fix
sched_count sched_goidle ttwu_count ttwu_local real
Baseline+fix 200751 52479 85490 39002 5m22.097s
VRQ+fix 202821 51795 85278 36583 5m23.010s
idle deactivate
Baseline+fix 8024 44455
VRQ+fix 11379 40388
The fix is good for both baseline and vrq, compare to the baseline w/o fix, which cost about 5m33s, 10~11 seconds are saved, that's about 3% improvement.
No comments:
Post a Comment