Friday, August 19, 2016

4.7 debug patches, call for testing

I was working on the debug patches in the previous release and try to work out which direction to be taken, but ending up the one last puzzle is still missing, so uploaded three debug patches upon the -test1 patch, so users can help to complete the whole picture.

The stick and cache mechanism for NORMAL policy tasks are both turned on in -test1 patch, here are three debug patches to find out how it goes when these two mechanism on and off independently.

#1 4.7_vrq_test1_debug_s0c0.patch which turn both stick and cache off, it was the debug1 patch in 4.6 release
#2 4.7_vrq_test1_debug_s0c1.patch which turn stick off and cache on for NORMAL policy tasks, it was the debug2 patch in 4.6 release
#3 4.7_vrq_test1_debug_s1c0.patch which turn stick on and cache off, which is the missing puzzle, :)

It will be appreciate for users to compare the test0, test1 and these three debug patches and focus on NORMAL policy task interactivity then provide the feedback.

Enjoying this puzzle game, :)

BR Alfred

EDIT:
-test2 patch has been released and fix issues reported by user on -test1, so these debug patches should be applied upon -test2 patch for testing and comparing to -test2 patch.

20 comments:

  1. Just to appease you a little or not at all: I've started the puzzle game based on the good -test2 version on here today.
    For the first round, i've given each debug patch (single applied onto -test2) a runtime of one hour to show up differences in behaviour.
    What I can say so far: Now that the lagging pointer has gone, it would be very difficult for me to choose a winner patch, they're all behaving well. I'm going to give each of them a longer testing time and watch for uptime-related and side effects.
    So, coming back with serious results would take some days -- or maybe could not come from my side when my unspecific "everyday's usage" scenario doesn't lead to a decision.

    It would be really nice to see other testers' opinions.

    Expressively, I DON'T ask for your performance test results, to not falsify/manipulate my subjective results by a "wishful" direction (mine is interactivity, as long as video players incl. flash in firefox don't stutter).

    BR, Manuel Krause

    ReplyDelete
    Replies
    1. @Alfred:
      At the moment I'm testing debug#3 (s1c0), the 'missing' puzzle, for a longer period now, and, as it's running good, thinking over my test "strategy".
      I won't run any "dumb" (IMO) technical benchmark on here, real usage is less comparable but more meaningful, IMO and proved by my testing assistance's experience and results in the past.

      Question: As I still use NORMAL_POLICY_CACHED_WAITTIME (4) and not your default (5) for the 4.7 kernels, formerly tried to solve the pointer issue -- would resetting it to 5 (or even increasing) show up more differences for the debug patches' testing? What's your recommendation?

      BR, Manuel Krause

      Delete
    2. I'd like to suggest you to keep it the "stock" setting 5 and don't modify it for testing.

      BR Alfred

      Delete
    3. Please, give a reason for it. Needs to recompile all 4 kernels in comparison, as they're all with setting of (4) atm.
      BR, Manuel Krause

      Delete
    4. No answer means... nothing, o.k. Nontheless I want to post some updated info.
      Following your suggestion I'm now recompiling and retesting with (5) and for at least 24h uptime, following the row: VRQ2test2, then +debug[1/2/3].
      Now I'm at debug1, for some hours now.
      As I definitively favourized the debug1 version on the 4.6 kernel, because of much better interactivity, I'm highly surprised, that this current debug1 patch doesn't differ in behaviour on the 4.7.2 VRQ2 test2 in my usage pattern, so far.

      I'll continue all the test row.
      BR, Manuel Krause

      Delete
    5. Just a thought: Can it be that the -test2 patch disabled (maybe bad) code paths, so I won't benefit -- or have negative results from the debug patches in any way? Can you, please, look on it?
      BR, Manuel Krause

      Delete
    6. @Manuel
      Sorry, don't notice your last reply. Keep the "stock" setting will align the testing results all together.
      There should be no huge code change in -test2, I'd suggest you continue blind test these debug patches and see if you can tell the difference based on your usage pattern.

      Delete
    7. @Alfred:
      Thanks for your reply. Of course, it's highly reasonable to have comparable test results, at least for this software part called kernel.
      I'm continuing blind testing on here (now at -debug2) without changing the rest of the software(!!!) -- and would come back with a report in 3~4 days.

      @all other readers/ supporters:
      I want to encourage you to also test these patches, as my usage pattern is quite limited (office/ video editing+playback/ browser +flash/ high swap usage) and I'd like to see results from your other patterns to go into a future decision.

      Best regards, Manuel Krause

      Delete
    8. @Alfred:
      Sorry for being late.
      I'm not happy with my subjective test results, as they appear to me to be quite illogical (estimated from your info about the regarding changes).

      plain -test2: good
      test2+debug1: better
      test2+debug2: worse
      test2+debug3: good
      BFS490: good or better

      Each of the above test runs in my everydays' usage got 48h of time including two hibernations with TuxOnIce over night.

      I personally don't trust these my experiences (mainly because of debug3 behaving as well as plain -test2 but debug2 behaving worse) and am about to do the test row again.
      Please advise me, if I should do. Maybe you should rely on more serious testing, as made by Eduardo.

      Any info about your efforts of porting BFS490 / porting your VRQ upon BFS490?

      Thanks in advance,
      BR, Manuel Krause

      Delete
    9. @Manuel
      To me, your test result is kind of expected. s0c0 patch provides most interactivity but causes throughput regression, s0c1 has been reported not better the baseline in 4.6, so I don't expect it would be better in 4.7 neither. The s1c0, I still need Eduardo's feedback to tell the different from the -test2(baseline).

      CK introduced too many changes a time at 0480, and the code changes not settle down yet. Currently, I just pick up skiplist changes and play with it.

      So, don't expect bfs490 based -vrq is coming soon, I would firstly see how well the skiplist design can do and what enhancement can be made for it, then slowly put -vrq changes upon it. BFS490 based -vrq would be in 4.8 release, hopefully.

      BR Alfred

      Delete
    10. Guessing from your previous development steps and speed, it's more likely to see your BFS490 port with kernel 4.7, already ;-)
      But, as always, quality matters more than time.
      In the meantime I'd test the plain VRQ3 vs. debug3 (s1c0) again. Needed due to changed GFX and Mesa drivers on here.

      BR, and happy developing to you,
      Manuel Krause

      Delete
    11. @Alfred,

      I recently was not active as I had pneumonia, but now I'm back (sorta) and I'll try compiling test2 + s1c0 today (hopefully) and will try run default testing and D3 as well.
      Will get back to you.

      br, Eduardo

      Delete
    12. @Alfred,

      blogspot ate my last comment, plz check if You have it,

      br, Eduardo

      Delete
    13. Repost Eduardo's being aten comment here.

      >I did some benchmarks, including D3 as well.
      >Unigine benchmarks are there, if they are of any use to You: >https://docs.google.com/spreadsheets/d/1EayezAsGlJdXjZbS3b9m7YtvtRF-DJ3xrT3hYCvfymQ/edit?usp=sharing

      >The rest is as follows:
      >D3 - I tried 3 kernels: 4.6.3+test5+debug2+NORMAL_POLICY_CACHED_WAITTIME(2) (as per this comment http://cchalpha.blogspot.com/2016/07/v460470test5-patch-released.html?showComment=1469423526528#c8993763693987516002 ) and 4.7.3+VRQ3 and VRQ3+DEBUG3(s1c0) - all 3 are not usable, stutters every half a sec or so, game is not playable. 4.7.3+BFS490 and 4.6.3+BFS470 - no problems.
      Standard World of Tanks (WOT further on) tests - I tried all aforementioned kernels, none has issues with stutters any more, I played 5-10 minutes only with each kernel, but usually when problem arises, it's quite fast to reveal itself.
      >I also ran some comparison with all kernels mentioned in Google Sheet above, all ran fine.

      >Why so good results, well, it's being some time between previous gaming and now, mesa changed, wine changed, etc. And now instead of Voluntary Preemption I use Low Latency Desktop Preemtion, maybe that's the fix I needed for WOT stutters.
      >The thing is I don't know for sure at this point, what I can tell is that when problems will show themselves, I'll post the results.

      >If You would like me to test more, please tell which kernel.

      >br, Eduardo

      Delete
    14. @Eduardo
      First of all, take care of yourself, :)
      Thanks for the detail testing. From the Unigine benchmarks, looks like the fix for low workload performance regression also helps with ondemand governor. And BFS 0490 is also awsome, I think the skiplink contribute the most.

      For your usage, WOT now doesn't seems to be an issue anymore, now let's focus on D3. Since there is no issue on BFS490 and BFS470, would you please try the VRQ3 + s0c0 patch for D3 playing? In s0c0 patch, which turn off stick and cache for NORMAL policy tasks, that is just like the interactivity default 1 behavious in origianl BFS.

      BR Alfred

      Delete
    15. @Alfred:
      At the moment I'm at a nice alternative to s0c0 with stock settings: It's DEBUG3(s1c0) with NORMAL_POLICY_CACHED_WAITTIME(4), one lower than default. In my humble scenario and subjective experience it's behaving better than plain VRQ3 with NORMAL_POLICY_CACHED_WAITTIME(4) and stock settings at DEBUG3(s1c0). Need another 24h until I'd cross-check DEBUG1(s0c0) also with (4). But it's promising on here - maybe as good as or better than s0c0 with default (5) - to include it in your performance tests(?).

      BR to all of you,
      Manuel Krause

      Delete
    16. For s0c0 and s1c0 patches, the NORMAL_POLICY_CACHED_WAITTIME is no longer effected as the cacheing already be disabled for NORMAL policy tasks.

      BR Alfred

      Delete
    17. @Alfred:
      From that you can see how blindly I did the testing ;-) -- without even knowing what your debug patches intended to achieve.
      You can also see how humble my testing scenario is or, in a more friendly interpretation, how closely the variants in question score in everydays' use.

      BR Manuel Krause

      Delete
    18. @Manuel
      These debug patches are build to test how scheduler works in extreem scenario like wine gaming, in daily usage, it doesn't impacted that much. Based on the feedback of these debug patches, I'd decide which way to go in -test branch.
      For normal users, I'd suggest them to stay on -vrq or -test branches. Expecially I am going to rebase -vrq branch upon 472+skiplist recently.

      BR Alfred

      Delete
    19. @Alfred:
      I read you have the same concerns as I do. I've recently switched to a more reliable Mesa version and several previous interactivity issues went away by only that step.
      But, really uncomfortably, this former wrong Mesa install had showed up "effective" differences between the former three debug versions and VRQ-test2 base. And, don't ask... I won't install the wrong Mesa any time in any future.
      I still want to encourage you to work as transparently as you can, so to say to publish beneficial debug patches also officially.

      BR Manuel Krause

      Delete