Tuesday, August 26, 2014

VRQ 0.1 schedstat result and regression on 50% ratio test

The first stage VRQ 50% job/core ratio test shows there is about 3 seconds regression comparing to the Baseline. So I decide to find out the cause of it.

First, looked at the schedstat output which has been collected during the test. Below shows the result between of the Baseline code-base and the VRQ.

*********************************************************************
sched_count sched_goidle    ttwu_count    ttwu_local    rq_cpu_time        rq_sched_info.run_delay    rq_sched_info.pcount
251973            61299     83682         39699         181334347343    372048261887                        97709
domain0 19012
domian1 24971

288154            77212     85273         43396         174958346976    317271010776                        107263
domain0 18564
domian1 23882
*********************************************************************

The most remarkable diff is the larger sched_count and sched_goidle. The stat number doesn't tell what caused this. I have to bisect and find it out.

Finally, it turns out the commit which divide update_clocks() into update_rq_clock() and update_grq_clock() cause the regression. It's a mistake to separate these two too far away as bfs will use jiffies to adjust ndiff. After reversed that commit(part of it), an approaching result comes.

*********************************************************************
sched_count sched_goidle    ttwu_count    ttwu_local    rq_cpu_time        rq_sched_info.run_delay
267428            66304     82625         40292         171637450209 
*********************************************************************

No comments:

Post a Comment