Thursday, December 1, 2016

VRQ 0.89d release

VRQ 0.89d now release with

1. Fix the cpu c-state issue. It is a long existed bug but covered by other issue.
2. Don't punish run queue time slice for RT/ISO and NORMAL policy task. The hackbench test shows that sharing time slices between parent and child task(enabled in 089c) limited the fork boost in one time slice. So here comes this policy specified modification.
3. Rewrite task_preemptible_rq(), more efficiency than previous version and help with policy fairness.
4. Remove unneeded code and debug code.

cpufreq_trigger investigation is still on going and policy fairness is being watched to see if further improvement is needed.

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.8.y-test
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.8.y-test

All-in-one patch is available too.

Enjoy this release, :)

BR Alfred

PS, if you want to see some sanity test result comparing to vrq089a

089d

>>>>>50% workload
>>>>>round 1
real    5m27.954s
user    10m12.988s
sys     0m40.254s
>>>>>round 2
real    5m27.918s
user    10m13.064s
sys     0m40.219s
>>>>>round 3
real    5m28.132s
user    10m13.435s
sys     0m40.086s
>>>>>100% workload
>>>>>round 1
real    2m54.629s
user    10m30.754s
sys     0m41.447s
>>>>>round 2
real    2m54.776s
user    10m30.643s
sys     0m41.513s
>>>>>round 3
real    2m54.765s
user    10m30.421s
sys     0m41.619s
>>>>>300% workload
>>>>>round 1
real    2m58.007s
user    10m40.934s
sys     0m42.030s
>>>>>round 2
real    2m57.813s
user    10m40.255s
sys     0m42.349s
>>>>>round 3
real    2m58.158s
user    10m40.527s
sys     0m42.589s

089a

>>>>>50% workload
>>>>>round 1
real    5m29.051s
user    10m15.233s
sys     0m40.015s
>>>>>round 2
real    5m28.288s
user    10m13.595s
sys     0m40.065s
>>>>>round 3
real    5m28.229s
user    10m13.232s
sys     0m40.328s
>>>>>100% workload
>>>>>round 1
real    2m55.358s
user    10m32.229s
sys     0m41.553s
>>>>>round 2
real    2m55.629s
user    10m32.527s
sys     0m41.358s
>>>>>round 3
real    2m55.252s
user    10m31.858s
sys     0m41.873s
>>>>>300% workload
>>>>>round 1
real    2m59.998s
user    10m47.413s
sys     0m42.727s
>>>>>round 2
real    3m0.404s
user    10m47.422s
sys     0m43.425s
>>>>>round 3
real    2m59.934s
user    10m47.287s
sys     0m43.103s

9 comments:

  1. @Alfred:
    Your cpu core work balancing is still out of range. First cpu gets too much work.
    BR, Manuel Krause

    ReplyDelete
    Replies
    1. @Manuel
      It's kind of design itention in current VRQ, to make tasks stick on the run queue it resides on and avoid switch cpu overhead. For example, 2 tasks require cpu, #0 use 100% and #1 use 50%, in vrq, it's normal to see #0 occupy one cpu, usage 100%, while #1 just shows in another cpu and usage is about 50%.

      For you case, please capture a top output when you observe imlalancing, so I can check if it is normal or not.

      BR Alfred

      Delete
    2. @Alfred:
      I'm not familiar with special top options to provide you with most relevant information you may want, so please give me a short advice on this, and I'd send it by email soon.
      Some more info on the scenario: Flash player has a video loaded. When stopped both cores do ~10% of us+sy load equally, rest is nice load (wcg). When playing the video, 1st core makes ~60% us+sy, 2nd core ~30%, each rest to 100% is wcg. This is the observation of gkrellm. The sum divided by 2 results in the same value as observed on a most recent tested MuQSS kernel (where gkrellm shows equalised load on the 2 cores). So, the VRQ scheduler is working really well, only the different load distribution is still confusing me (as in earlier times^^).
      And, I didn't face any problems with this VRQ during ~23h of uptime (not tested previous ones for kernel 4.8), so great work coming from you hands. Thank you!

      BR, Manuel Krause

      Delete
    3. @Manuel
      top output should be easy as open terminal, execute "top" command, press "1" to show all cpus usage instead of sum, then use mouse to select and paste the text and save in another file, :). I just want to check what tasks are running and how many cpu are occupied.

      Delete
    4. Hi Alfred,
      I'm sorry to be so late. I emailed you.

      BR, Manuel Krause

      Delete
    5. I have checked your "top" output. It looks normal as it should be, firefox use ~50% cpu and stick "most" to cpu0, while another task(plugin of firefox) takes ~20% cpu and stick "most" to cpu1.

      BR Alfred

      Delete
  2. Hi. Here are the benchmark results for VRQ0.89d:
    http://openbenchmarking.org/result/1612032-LO-CFSVSVRQ833

    Acpi-cpufreq + ondemand is still used for VRQ, whereas intel-pstate is used on CFS.
    The standard deviations are quite high with VRQ on sqlite and john the ripper.
    That may be of interest to you.

    As for the incomplete results with VRQ0.89b and VRQ0.89c, I had to interrupt the tests. It was not a problem with your scheduler.

    Pedro

    ReplyDelete
    Replies
    1. Thanks for the benchmark, I'd take a look at the regression on sqlite and john the ripper. john the ripper should be a good start point as there are 3 samples from 089b to 089d.

      BR Alfred

      Delete
    2. @Pedro
      I have tested johntheripper on three machines with kernel vrq089d back to vrq089b, all the results of three kernel on three machines are almost the same. So I wonder it may be kernel config related, would you please provide the kernel config which benchmark is using. So I can check with it.

      Delete