Thursday, January 28, 2016

v4.4.0-vrq2 released

v4.4.0-vrq2 has been released. The all in one patch file can be downloaded here. And both bitbucket and github repository have been updated with linux-4.4.y-vrq branch and v4.4.0-vrq2 tag.

What's new:
Mainly focus on startup/shutdown and suspend/resume issues in previous releases.
Holding back some feature commits like removal of SMT_NICE code.

BR Alfred

Edit:

Story about starup/shudown and suspend/resume issue

There are several issues combine together in previous release.
1. dmesg shows there is about 1 secs delay in kernel log while system booting up.
2. Failed to reboot/shutdown machine.
3. Failed to resume from suspend.

The causes are complicated, the most major one is I have removed some code path to reschedule a cpu/rq after putting a task into the global run queue.  The second one maybe a circle deadlock in mainline, I catch the dmesg twice during my 200+ suspend/resume tests, and reduce the task cached time-out seems to helping with the resume success rate.

In this release, beside adding back the code to pump the scheduler, the NORMAL policy task caching time-out has been changed to 3ms, all rt policy task caching time-out to 0ms(in fact that rt policy tasks never be impacted by caching time-out, unless they are changed to NORMAL policy after caching). Issue 1 and 2 are fixed, issue 3 tested with 10 suspend/resume in console and 10 suspend/resume in X, so the failure rate of suspend/resume should be <5%.

14 comments:

  1. @Alfred:
    Thank you very much for the update! Up and running fine for some hours now with the default settings. Combined with BFQ I/O scheduler v7r11 and TuxOnIce.

    Most likely Off-Topic: Does someone else with TuxOnIce @ kernel 4.4.0 experience a long resume time (namely only in the phase of "Reading kernel & process data..." that previously took a second and now minutes) ?

    BR Manuel Krause

    ReplyDelete
    Replies
    1. The latter issue has most likely nothing to do with VRQ.
      Now I've re-tested with CFS. resulting in same problems.

      I assume, it's again a i915 driver issue. What a pain! And I don't see fixes for the 4.4. kernel, so far.

      BR Manuel Krause

      Delete
    2. @Manuel
      Thanks for testing. Please be noticed that NORMAL policy task caching time-out is set to 3ms now.

      BR Alfred

      Delete
    3. @Alfred:
      Yes, I've already seen it when comparing the old vs. the new patch.
      Seems, I like an even lower value for my system/ usual use pattern. And it's good to know that this knob (still) exists.

      BR Manuel Krause

      Delete
    4. @Alfred:
      An addon observation: It can be (current experience) that the cpu affinity sticking is eased by a great amount with this new revision. What I consider good IMO.
      At least I see this with my standard processes that have run for a longer time. The result is an almost equalized load for my two cpu cores. Not constantly, due to interaction, but now it appears to makes (more) sense.
      Currently I run with NORMAL_POLICY_CACHED_WAITTIME 2, but may want to use 1 most probably, like with 4.3.4.

      BR Manuel Krause

      Delete
  2. On my system I'm getting really bad performance values compared to cfs.
    Look at the values for NUMERIC SORT or LU DECOMPOSITION.

    vrq:
    BYTEmark* Native Mode Benchmark ver. 2 (10/95)
    Index-split by Andrew D. Balsa (11/97)
    Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

    TEST : Iterations/sec. : Old Index : New Index
    : : Pentium 90* : AMD K6/233*
    --------------------:------------------:-------------:------------
    NUMERIC SORT : 990.82 : 25.41 : 8.35
    STRING SORT : 285.48 : 127.56 : 19.74
    BITFIELD : 5.5814e+08 : 95.74 : 20.00
    FP EMULATION : 472.13 : 226.55 : 52.28
    FOURIER : 33948 : 38.61 : 21.69
    ASSIGNMENT : 43.007 : 163.65 : 42.45
    IDEA : 9677.4 : 148.01 : 43.95
    HUFFMAN : 3806.1 : 105.54 : 33.70
    NEURAL NET : 72.825 : 116.99 : 49.21
    LU DECOMPOSITION : 1039.8 : 53.87 : 38.90
    ==========================ORIGINAL BYTEMARK RESULTS==========================
    INTEGER INDEX : 108.736
    FLOATING-POINT INDEX: 62.426
    Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
    ==============================LINUX DATA BELOW===============================
    CPU : Dual AuthenticAMD AMD Athlon(tm) II X2 220 Processor 800MHz
    L2 Cache : 512 KB
    OS : Linux 4.4.0-vrq
    C compiler :
    libc :
    MEMORY INDEX : 25.591
    INTEGER INDEX : 28.352
    FLOATING-POINT INDEX: 34.624
    Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
    * Trademarks are property of their respective holder.



    cfs:
    : : Pentium 90* : AMD K6/233*
    --------------------:------------------:-------------:------------
    NUMERIC SORT : 1959.8 : 50.26 : 16.51
    STRING SORT : 332.96 : 148.78 : 23.03
    BITFIELD : 6.3746e+08 : 109.35 : 22.84
    FP EMULATION : 619.13 : 297.09 : 68.55
    FOURIER : 37072 : 42.16 : 23.68
    ASSIGNMENT : 43.92 : 167.12 : 43.35
    IDEA : 10172 : 155.57 : 46.19
    HUFFMAN : 3879.3 : 107.57 : 34.35
    NEURAL NET : 75.425 : 121.17 : 50.97
    LU DECOMPOSITION : 2160 : 111.90 : 80.80
    ==========================ORIGINAL BYTEMARK RESULTS==========================
    INTEGER INDEX : 131.484
    FLOATING-POINT INDEX: 82.989
    Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
    ==============================LINUX DATA BELOW===============================
    CPU : Dual AuthenticAMD AMD Athlon(tm) II X2 220 Processor 1600MHz
    L2 Cache : 512 KB
    OS : Linux 4.4.0-4-ARCH
    C compiler :
    libc :
    MEMORY INDEX : 28.356
    INTEGER INDEX : 36.605
    FLOATING-POINT INDEX: 46.029
    Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
    * Trademarks are property of their respective holder.

    ReplyDelete
    Replies
    1. @Anonymous
      Here is my testing about nbench, bfs/vrq shows about 2/3 of cfs in NUMERIC SORT etc in my system. Comparing original bfs and vrq, there are not much different.

      nbench is an interesting bench mark, I used it to tune compiler flags. But I think it not quite suitable to compare bfs and cfs with it as they tune for different aims.

      BR Alfred

      Delete
  3. @Alfred:
    As I'm quite unhappy with the 4.4 kernel, due to the i915 VT/console issues and (TuxOnIce) resume-from-disk, and I didn't want to ask you specially for it, I wanted to port your recent improvements from 4.4-vrq1 to vrq2 to the 4.3 kernel on my own. The result can be seen at http://paste.opensuse.org/ca970ee9
    It is an incremental patch for your latest 4.3-vrq version and only covers the above mentioned changes but not the 0465 to 0466 changes.

    Can you, please, have a short look on it and inform me, if I missed something important or if I should add something?
    It is running fine for 20h now, but I'm in doubt that it's complete and 100% correct.

    Thank you in advance and BR,
    Manuel Krause

    ReplyDelete
    Replies
    1. Oh, Ive just realized, that I've at least forgotten to add the commits "bfs/vrq: refine *_ns functions" and "bfs/vrq: quick path for best_mask_cpu()" from the 4.4-vrq2 to my 4.3 backport try... recompiling.

      BR Manuel

      Delete
    2. No answer at least means... no complaints... ;-) or no review... ^^
      With a little more handcraft work I've integrated those two omitted commits and also the first two early 4.4-vrq fixes done by post-factum, that are also applicable for the 4.3-vrq.
      Result, now with correct patching: http://paste.opensuse.org/8c93919e
      It's still incremental on the latest 4.3-vrq2.

      BR Manuel

      Delete
    3. @Manuel
      I may back-port vrq to 4.3 as my chromebook pixel has an issue with tpm_tis module in 4.4 which cause the system fails to resume from suspend when runs without it. But I am a little busy since the CNY is near. So it will happen when holiday is over. And for the same reason, very sorry that I don't have time to look at your back-port code changes.

      BR Alfred

      Delete
    4. @Alfred:
      O.k. - no problem at all. I wish you joyful holidays!

      My last on-top backport patch doesn't show issues on here so far within 24h with full usage. You so can give it a "blind test" on your chromebook and see if it's better or worse than previous 4.3-vrq2 and current 4.4-vrq2 in your experience, maybe even without looking at the code changes.
      My uncertainness only results from the fact that I'm no programmer. The code that I've shifted to 4.3 may only introduce eventual bugs that you've done in the 4.4-vrq0/1/2. ;-) Just kidding.

      BR and have a good time,
      Manuel

      Delete
    5. Just an addendum after all these days of using "my" last patch:
      It hasn't shown any regression since using it (with 4.3.5 atm). So, it seems that I haven't made a too bad job to combine your 4.4 relevant patches for the 4.3-vrq.
      Btw., for this kernel I increased NORMAL_POLICY_CACHED_WAITTIME to 2 again (when preferring 1 for kernel 4.4).

      BR Manuel

      Delete