Saturday, July 2, 2016

v4.6_0470_test4 patch for testing & 4.6 Sanity test raw data

v4.6_0470_test4 patch is available which has only one big update
- low workload performance regression fix

Highly recommend to update to this version if you are on -test branch.

And the 4.6 sanity tests are done which run on cfs/bfs/vrq/vrq-test, the raw data can be downloaded here, if you are interesting in

BR Alfred

20 comments:

  1. Hi,

    I tested new test4 patch on laptop (all Intel, i7) - 15hrs uptime, no issues. On desktop, which is all AMD (Phenom), I encountered microfreezes while gaming. About every 5 - 15 sec, depending on game native or wine, sound and picture stops for very very small amount of time, sound is easily noticeable but picture is harder to notice. I have verified that it does not happen with Ubuntu standard kernel and plain BFS.
    So currently I'm keeping test4 on laptop only, let's see whether others have the same issue or not.

    br,
    Eduardo

    ReplyDelete
    Replies
    1. @Eduardo
      Would you please verify this on vrq branch and the test3 patch?

      BR Alfred

      Delete
    2. @Alfred,

      I tested 4.6.3 VRQ kernel, not yet a test3 version and even with standard VRQ I have those hiccups, but a _lot_ less, like once in a while and only sound glitch was noticeable... I thought maybe it's a game, but testing with 4.5 + BFS did not have any hiccups at all for hours.
      This time I tested wine + gallium 9. The game I play is world of tanks.
      I'll compile 4.6.3 + plain BFS and test3, will test both and get back with results (when I can).

      br,
      Eduardo

      Delete
    3. @Alfred,

      I compiled 4.6.3 + BFS and test3 as well. Situation is similar, with test3 it's about the same as with test4, glitches happen at about 10 - 20 secs, plain BFS does not have this problem. Kernel config is the same for all kernels tested so far.
      Can this be the case when the problem is just with AMD hardware, which has 4 cores, no HT... Will try to run the game on laptop, when I can, to see whether it's a problem there as well.
      Will sending You the kernel config help?
      It appears that only me are having this issue, the rest of the audience don't game :)

      br
      Eduardo

      Delete
    4. @Eduardo
      Sorry for the late reply, was out of town during last weekend.
      Please test it with intel cpu to find out. Personally I don't have any amd cpu to test, I just assume they have little cache, that should be almost no difference. :)

      And, would you please send me a email? So I think I can provide a debug patch for you to check out if it helps to restore your issue back to the -vrq level.

      BR Alfred

      Delete
    5. Just an update for Eduardo's case.
      An additional debug patch has restore Eduardo's issue back to -vrq level. Two or more debug patch is planing to see if the glitches can be get rid of.
      Thanks Eduardo for testing the debug patches, which help me understand how caching/stick will impact in gaming environment.

      BR Alfred

      Delete
    6. If these patches appeared to be useful and curing the issue, it would be nice to see them published in the download repo.

      Thanks in advance, thank you for all the testing time and BR
      Manuel Krause

      Delete
  2. Sorry for messing test plan up, but with latest VRQ branches (at least, with test1–test3) I get exactly the same freeze: https://gist.github.com/bf4a107b268f76f4f3f88c5cd5c0a074

    And again, I get that stacktrace via netconsole if only NMI watchdog is enabled. If it is disabled, everything just freezes silently.

    Also, this time freezes occur more frequent. Either in 2 hours or in 2 days. No need to wait for a week :). Probably, just bad luck.

    So, Alfred, should we refine our test plan, or I should stick to your latest suggestion and revert back to -vrq for 4.4 branch?

    ReplyDelete
  3. Additional info from #kernelnewbies:

    ===
    [00:07] post-factum: Looks like a hard lockup inside schedule() - most likely somebody/something corrupted the scheduler queues.
    [00:07] valdis: what makes you think so? how did you find that out?
    [00:08] Well, the MNI entries are the lockup detector. And the two entries above it in the stack are _schedule() and schedule().
    ===

    ReplyDelete
  4. Also:

    ===
    [00:23] post-factum: More lilely, your problem is some code that does while (foo) {yadda yadda uadda} but nothing sets foo to false...
    [00:24] infinite loop in atomic context?
    [00:25] post-factum: That could do it, except that *should* leave an atomci context in the stack...
    [00:25] ok, could preemption be disabled outside of atomic context, and that happens with preemption disabled?
    [00:26] post-factum: As I said - look at the stack, and see which of those have code that can get caught in a loop.
    ===

    ReplyDelete
  5. Parser messed up some IRC stuff, so, please, check an email.

    ReplyDelete
  6. @Alfred: Regarding -test4:
    I'm not sure on how to comment on this test release after having read the previous comments. I don't face post-factum's issue and don't game, so I don't see Eduardo's problem on here.
    There still occur frame drops in video-playback -- when /dev/shm is written to for the first time and when heavy swapping occurs as result -- but I won't be able to identify the point of time "when it was better". You know, that the standard -vrq worked for me, but it had this problem too.
    Since -test3 I don't have problems regarding interactivity any more and won't be able to distinguish from standard -VRQ.
    Maybe it's worth to mention one observation: The throughput of my WCG clients seem to have dropped by ~12% with using the 4.6 kernel series (always with your patches). Noone to blame, by me just reading the charts.

    BR and "happy developing" :-)
    Manuel Krause

    ReplyDelete
    Replies
    1. pf and I have working on refined test plan for his issue.
      @Manuel
      For your throughput drop, pls compare it with latest 4.5-vrq and the first release of 4.6-vrq, there should be just sync-up changes and has no feature code changes.
      Based on the sanity data I have got, there is no major throughput drops from 4.5 to 4.6 series.

      BR Alfred

      Delete
    2. I already realized, that the WCG throughput is NO kind of benchmark in my everyday's use. Currently I have (and had before) two different projects with different clients running, that may not count equally in the average score (I simply don't know). And at the end, the result also depends on the usage of other cpu demanding programs (e.g. use of flash within firefox). In the next test I'd re-investigate the 4.5 with last VRQ for it, if you don't propose other tests.

      At the moment I'm re-testing the 4.6 VRQ0, that is based on BFS 0469, to test if other problems may have been introduced in the phase of 4.6's VRQ development:
      On my KDE desktop middle mouse clicks with the 4th button on my Logitech trackball always result in either pasting or enabling wheel-like scrolling in Firefox. And for some weeks now, it doesn't like to scroll as sudden as "earlier". Of course, this is most likely caused by the XOrg/Mesa/input&intel drivers, that have changed a lot last month, but my current testing shows, that this is also scheduler related. The VRQ0 based system is faster to recognize the "scrolling" situation vs. VRQ2test4.

      Dunno, if this disturbs your test plan again or may be helpful,
      BR Manuel Krause

      Delete
  7. OK, this funny blog system eat up posts, *again*!

    @kernelOfTruth
    Assume that v4.6_0469_vrq0 is the good reversion for you, would you please try with v4.6_0470_vrq1 and v4.6_0470_vrq2, thanks.

    >>kernelOfTruth has left a new comment on your post "v4.6_0470_test4 patch for testing & 4.6 Sanity tes...":

    >>Hi guys,

    >>confirmed,

    >>I got a hardlock twice within a few days apart (I'm often dual-booting into windows, so it could actually occur way more often when only using Linux).

    >>Both times it hardlocked during Browsing & Scrolling through websites with Chromium, proprietary nvidia driver 367.27 is used,

    >>the first time the system was mostly idle,

    >>the second time (just now) a database update of recoll/xapian (desktop search) was running in addition to it.

    >>Magic SYSRQ Key, etc. nothing worked.

    >>Interestingly the first time the system rebooted on its own, this time it hardlocked.


    >>Responsiveness isn't too bad but I got the impression that there are tons of micro-stutters (barely noticable) during benchmarking of GRID Autosport

    >>Kernel base is 4.6.3, test4,

    >>I'm not 100% certain but afaik with v4.6_0469_vrq0.patch the freezes didn't occur,
    which was the patch + kernel I used before updating to this one


    >>Thanks

    ReplyDelete
    Replies
    1. JFYI, 4.4-vrq3 uptime for me is 5 days now. It's too early to make some conclusions as I'm going to keep it at least for 10 days, but the trend is that 4.4 should not lock up.

      Delete
    2. And yes, lock up happens on Intel machine only. Uptime for my AMD machine is 31 days now with no issues.

      Delete
    3. @pf
      Thanks for the update. I'd like to suggest 14 days uptime for 4.4-vrq3 testing, as we need a strong confirm and not likely come back to this release again.

      On my site, the kworkers occupy cpu issue I mentioned in the email still can't be triggered, I am planning to write preventing next week anyway if it still not happening.

      BR Alfred

      Delete
    4. Unfortunately, I had to reboot the machine because of Wayland crash. But still no hang.

      Delete
    5. @pf
      NP, another 5~7 days uptime should be good enough to confirm no lock up.

      Delete