Wednesday, May 2, 2018

PDS 0.98o release

PDS 0.98o is released with the following changes

1. Minor code cleanup and optimization here and there.
2. Fix a bug in get_nohz_timer_target().

Although there is no improvement can be observed in sanity tests, but there should be no regression too. Code cleanup and optimization will be continued in next release(in two or three weeks) then some features will add in next kernel release.

Enjoy PDS 0.98o for v4.16 kernel, :)

Code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.16.y-pds
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.16.y-pds

All-in-one patch is available too.

56 comments:

  1. Hi,

    What are the real-life consequences of the bug fixed in this release, is the fix just cosmetic or there were some kind of side-effects?

    Br, Eduardo

    ReplyDelete
    Replies
    1. Bug fix commit is https://github.com/cchalpha/linux-gc/commit/cd16b6b9b934602579adc2449fec1127fe4d6747

      I observe nothing on this, maybe it's not triggered by my kernel config.

      Delete
  2. @Alfred:
    My first results: Compiles fine on 4.16.7 + BFQ and all appears to work as well as before, for some hours now.

    Thank you very much & BR,
    Manuel Krause

    ReplyDelete
  3. @Alfred:
    What reference kernel do you use for your sanity tests atm.? I thought about making some benchmarks on here, but then want to have it cooperative.

    BR, Manuel Krause

    ReplyDelete
    Replies
    1. The sanity tests I used usually just to compare it with the previous version of pds kernel, to find out improvement or regression.

      Delete
    2. O.k, then you don't use one specific kernel source to compile, to relate your results to each other over longer range?
      Another question: Is the "sanity" script based on quad-core? Meaning: What do I need to change appropriately for my DualCore? Am I right, to only change the compileTest's first values with a division by 2?

      TIA in advance, BR,
      Manuel Krause

      Delete
    3. It's just a simple script runs on my testbed machine, so I don't try to make it general. You can modify it to whatever you need.

      Delete
    4. @Alfred:
      Really sorry for bothering you again: I took those scripts available from your https://bitbucket.org/alfredchen/linux-gc/downloads/
      And I don't understand "~/src/sys > /dev/null" in the 'compile_throughput' script. Maybe, you can update these scripts or leave me a hint on here.

      Unfortunately the running time on my machine exceeds a night to sleep, not matching with the fact that I only have this machine at hand. Mmmh, I need to declutter my .config anyway, since months.

      BR, Manuel Krause

      Delete
    5. en, the sys program, is a c program which call the get_rr_interval system interface, where I used to print debug information. It's a hook for hacking, to print information needs to be known for each round of test.
      You can safely remove it from the script, as you are not running debug load and no debug information to print out in that interface.

      Delete
    6. Ah, o.k. now this all makes sense, thanks for clarification.

      Unfortunately, so far, I was only able to cut compile time by 25%, from ~60min to ~45min, in my first round of "decluttering". At least, I haven't made visible mistakes, meaning the system still runs well.

      Delete
    7. Hi,
      maybe you can use the modprobed-db script from graysky, to use with localmodconfig (if you don't use it already).

      Pedro

      Delete
    8. @Pedro:
      Great hint -- many thanks! This seems to be the right tool for my situation and goal, especially with reducing overhead for my brain ;-)

      BR, Manuel Krause

      Delete
    9. @Pedro:
      I assume you use or used this 'modprobed-db' yourself? As a little precaution, may I ask you to answer a little question: Are there .config areas, where I should expect rarely used modules not being captured by the script, so that I shouldn't change their setting from my current default? (E.g. USB, security or encyption related things come to my mind atm.) Mmmh, but maybe this question too much depends on the rest of the system's configuration and use, to be answered easily.

      Thank you very much anyway,
      BR, Manuel Krause

      Delete
    10. @Manuel, I used it a couple years on both my Linux machines; cut down compile time (and cross-compile of the netbook's kernel) on the Athlon X2 machine from ~2-3 hours to ~25-35 minutes. Just be sure to plug in any peripherals/devices you might ever use so it can pick up those configs, or it will be a pain to fix it later. :) E.g. I went to use a exfat USB drive once, and had to figure out the configs needed for it; also with a recent wireless adapter I was testing, which required booting the vanilla kernel to `lsmod` and check it's config (and/or googling it).

      You can remove entire subsystems, like DVB and AGP and unneeded CPU/chipset/video/etc manufacturer devices too, but the localconfig stuff should do most of it already. I recall your system is (was?) i915-based, like my netbook?

      Delete
    11. @jwh7:
      Also many thanks for your input! Quite promising... ~30min should be my target, too. My first step, two days ago, was to figure out the most probably unneeded stuff. This already eliminated ~500 of ~1400 modules before. But I really find it hard to make profound decisions on my own, so this script would be a great helper. I' ve let it run for one day now and plugged in all the usually used peripheral devices, yesterday, just like you suggest, and damn I've forgotten the USB-stick. Thanks for the reminder!
      Maybe tonight I'd come to carefully evaluating the changed config and give it a compilation and test.
      Fortunately I've fallback kernels and the disk backup fresh.

      Yes, on here it's the integrated Intel GM45 GFX using the i915 module.

      BR, Manuel Krause

      Delete
    12. There was a "Kernel Seed" website, which explain each kernel option in detail(not just the help string of the option). That helps a lot for whom wants to slim their kernel config. But it is no longer existed.

      Delete
    13. @Alfred:
      That's sad, that it doesn't exist any more. Also means, that "The internet forgets nothing." isn't the complete truth.
      With my very first shot on this, that I've left unmentioned out of embarrassment, I was misleaded by some kernel xconfig help messages in the AHCI and SCSI stacks, telling "Say N if unsure", leaving my root partition in darkness, not-booting. Fortunately I had luck to quickly find the options to fix it, without reverting all the previous manual elimination work.

      BR, Manuel Krause

      Delete
    14. Well, "The internet forgets nothing" is kinda true ;)
      I hope the site is the correct one: https://web.archive.org/web/20160110061812/http://kernel-seeds.org/

      Br, Eduardo

      Delete
    15. I just gave it a blind compile after reviewing the new .config after "make localmodconfig". Astonishing ~16min. Maybe something is missing now, but it's the advantage of Linux, to easily be able to compile-in new stuff, rather than having a huge block of code. So far all works well!!! :-)))
      I also saw that I have many things compiled-in, not as module, what may be worth my third look.

      Many thanks to you people answering, although it's quite off-topic.

      BR, Manuel Krause

      Delete
    16. @Alfred:
      How can I pass-through/ pipe the whole output, as seen on tty?, of the sanity tests to a dedicated log file, e.g. when running it without X in runlevel 3 ?

      TIA, Manuel Krause

      Delete
    17. This is what I use
      { time -p COMMAND ; } 2> LOGFILE

      Pedro

      Delete
    18. @Alfred & @Pedro:
      I'm really too unexperienced with this:
      When using e.g. "./sanity nospin 2> ./logfile2.txt" in the right directory, nothing of it gets piped to the file, although the logfile is created and everything appears in the console. What am I doing wrong?

      TIA, Manuel

      Delete
    19. I misundertood your question.
      When you use 2> you just redirect the error stream stderr to the file.

      If you want to redirect all the output (stdout and stderr) do
      ./sanity nospin > ./logfile.txt 2>&1

      If you want to see the log on your screen at the same time do
      ./sanity nospin 2>&1 | tee ./logfile.txt

      Search for standard stream redirection if you want more info.

      Pedro

      Delete
    20. Oh, my dear Pedro,
      then I possibly made mistakes with my testing last night, as I (trial&error) issued "sanity 2>&1 > ./logfile.txt"? Logfile o.k., but results disappointing.

      Anyway, I've taken your advices to my "Useful Commands List", and I hope that I don't need to bother you for such private coaching lessons in future.

      Many thanks, Manuel Krause

      Delete
  4. Here is the survey of the incoming feature commits. Do you known SMT_NICE feature on BFS/MuQSS? How do you feel about this feature and do you think it is useful?

    ReplyDelete
    Replies
    1. That's an interesting question, to determine whether it's useful, kernel has to be tested with and with out the option. I have it enabled and so far I have not had any issues with it.
      Reading theory about the option gives an assurance that feature is useful, scheduler knows about what sibling threads are doing thus it must be more interactive.
      But in the end, numbers should speak whether it's useful and under which circumstances.

      Br, Eduardo

      Delete
    2. @Alfred:
      Do you have something else to offer to vote for? My machine is not capable of that HT and followup stuff, although dual-core.

      How should I vote in a survey, when there's only one choice???

      BR, Manuel Krause

      Delete
    3. @Manuel
      Sorry, that's the only idea for incoming feature in the incoming 4.17 kernel release, for now.
      PS, I think PDS is doing well with old cpu architecture, like MC and SMT, and I am looking at intel trubo boost 3.0, that's "SCHED_MC_PRIO" config in kernel config, but nothing will happen before I got the cpu which support this. Good news is I have ordered a customized board with 8th generation cpu, will see what can be done on it in the second half of this year.

      Delete
    4. @Alfred,

      Intel turbo boost is just for intel right, so Ryzen appears out of luck from this perspective :(
      I would suggest improvements that does smth for both intel and amd.

      Br, Eduardo

      Delete
    5. I imagine @Oleksandr would have some good input here. Personally, I've never messed w/ any 'nice' stuff, but I think many do.

      Delete
  5. Thanks Alfred.
    I've done the usual throughput benchmarks.
    https://docs.google.com/spreadsheets/d/163U3H-gnVeGopMrHiJLeEY1b7XlvND2yoceKbOvQRm4/edit?usp=sharing

    PDS performance is still good, altough a little slower than CFS at load >= core count.
    These tests doesn't show the overhead of SMT_NICE. Maybe it would show with more pass.

    Pedro

    ReplyDelete
    Replies
    1. Thanks for your testing. Have checked the result, there is regression when >= cores. I have start the sanity tests from 098l again, Would you please also run for 098l and 098m from your site? Thanks in advance.

      Delete
    2. PS, SMT_NICE is not used in PDS currently. I'm working on re-enable it.

      Delete
    3. I thought this was a possibility.
      The option CONFIG_SMT_NICE is still here in PDS by the way, even if it is not used.

      I'll run 0.98l and 0.98m.

      Pedro

      Delete
    4. @Pedro
      I saw your update benchmarks. Pls don't waste time to test SMT_NICE on PDS for now, the kernel config is kept, but PDS doesn't use it currently. I will provide a debug patch to enable it in next kernel release.

      Delete
    5. Based on Pedro's benchmark ad my previous sanity tests, at least from 098n to 098o, there is no regression. That's good and as expected.
      I am still waiting for the re-run 098l ~ 098m sanity result for further investigation.

      Delete
    6. I've run 098m and 098l.
      It seems performance regressed between 098l and 098m (load > core count) and also 098m and 098n (load < core count).
      But then it could be within the uncertainty of the tests because I only run 3 pass per tests.

      Pedro

      Delete
    7. Short side question as I don't have enough knowledge about statistics: What number of passes would lead to a sufficient statistical certainty? Does it also depend on the average run time of the passes?

      BR, Manuel Krause

      Delete
    8. At least, all testing should be always made with a persisting same source to compile.

      From my running 4.16.8 PDS-o I just compiled the l, m, n, o versions of current 4.16 PDS and they made quite a difference.
      PDS-l: 18 min
      PDS-m: 19 min
      PDS-n: 17 min
      PDS-o: 20 min

      These were only one-compile shots, but they definitely say, that code difference may matter in compilation time.

      BR, Manuel Krause

      Delete
    9. I'm not really good in statistics. If someone can confirm this.

      The standard error of the mean run time is SEM = s/sqrt(n) where s is the standard deviation of the run times and n the number of run times.
      The SEM is also an estimate of the standard deviation of the error between the computed mean and the true mean.
      In short, this means if you run 4x more passes, you decrease the uncertainty by 2.

      Then there is the problem that the SEM of small numbers of run is often underestimated.

      All in all, I think n = 8 or 10 would be acceptable, and around 20 quite good.
      But then the time it takes to benchmark increases a lot, and it's what prevent me from doing this :(

      You can push things further if you have a big number of runs and if you assume the run times are normally distributed. You can make confidence interval, like it's 95% sure that the true mean is within the computed mean +- SEM*1.96


      Regarding your second question, there is at least two things that come into play.
      First the resolution of the measuring tool (here the time command) and second the measured thing itself.
      Generally speaking the answer is complex. But looking at my tests, I would say their precision (the standard deviation mathematically speaking) does not depend on the runtime.
      I don't know why.

      Pedro

      Delete
    10. @Pedro & @Manuel
      Thanks for the benchmark and discussion here. Based on all the information we have, 098m is overall the best among the four relese, there is regression on 098n, I am working on a debug patch. From 098n to 098o, it seems there is no regression. I still need time to look into commits from 098l to 098m.
      Beside this, I am also worry about the cost clam-up rate when workload grows. It's higher than CFS and MuQSS at this kernel release.

      Delete
    11. @Pedro:
      Again many thanks to you for taking so much time to try to explain it to me. And for the additional hint above. I admit that I'd need some more wikipedia input (and also some own testing data) to understand it a bit better.
      For me, too, the time constraints do limit the number of possible runs, to probably 5 runs for each of the 6 passes in sanity test per kernel release.

      BR, Manuel Krause

      Delete
    12. Here is the patch to address regression from 098m to 098n, please give it a try and provide feedback.
      https://bitbucket.org/alfredchen/linux-gc/downloads/pds098o_balance_optimization.patch

      And here comes the explanation why the regression is not found when 098m is release. My main testbed machine is a chromebook pixel 2013 after my old 4core xeno server dead last year. The test result on the chromebook is not very reliable because of the thermal, especially in minor difference.
      Hopefully this HW limitation will be change in the second half year.

      Delete
    13. @Alfred:
      Many thanks for your quick rework!
      Benchmarking related comparisons from my side can be expected in ~2 days (for PDS-o and +patch), for each previous kernel it'll take one more night.

      BR, Manuel

      Delete
    14. Well done Alfred, it's seems the regression from 098m to 098n when load < core count is fixed. See the results.

      However I think I should run more pass in my tests to have more significant results.
      I'll see what I can do.

      Pedro

      Delete
    15. Oh, statistics... Evaluating the first dataset of 098o + balance_optimization sanity test with 5 rounds I assume a problem with the data, or more precisely with it's collection. I get really high average/ standard deviations for some of the 'make -j?' passes, up to 8.8/ 12.3 % !
      So, what can I improve now? I guess it has something to do with the runlevel and some process(es) disturbing the tests, I did it in runlevel 3. Any other ideas are welcome.
      In which runlevel do you perform the sanity/ 'make -j?' testing?

      TIA, Manuel Krause

      Delete
    16. Besides the above mentioned problem I've made another weird observation, after also taking in the "user" and "sys" times to a comparison. When summing up all the real, user and sys "measured" times by 'time' command and the real-world times by 'date' command, there appears a big lag: 520.8min according to 'date' and 578.10min for the 'time' summed up and devided by my 2 cores. Where did all the time go?

      BR, Manuel Krause

      Delete
    17. Real time (from time command) is the time taken to complete the command. This is what you should compare to the time difference given by the date command.
      Also, you should not sum up real + sys + user time. Those are different "times".

      Regarding the high standard deviation on make, this can be because of :
      - background processes lauching randomly, as you said
      - disk access, if you build from your HDD
      - swap access, if you run low on ram while building
      - thermal throttling, this is what happend to Alfred
      and other things I don't see.

      Maybe you can try building in ram (/dev/shm) and monitor your cpu frequency and temperature.
      I run my test in runlevel 3 and in ram, and my build time are quite reproducible.

      Pedro

      Delete
    18. O.k., thanks to you, I now know how to understand "time". And, indeed, the summed 'real' values match the calculated ones from "date" (with only little difference).

      Unfortunately, the second dataset is almost the same disappointing as the first. What would you consider an acceptable standard deviation for useful comparisons? In the coloured areas of your benchmarking spreadsheets you have mentioned this: "within +- 1.15 stdev" . How do you get to this value?

      For the coming night's testing I'd implement some of your aforementioned thoughts' conclusions: Using /dev/shm instaed of HDD, setting up a small script to monitor cpu-frequency, -temperature and swapping. Maybe, this will either improve data or show something about the issue.

      BR, Manuel Krause

      Delete
    19. So much learning...
      My background logging script looks ready and works. I hope that it's sufficient to:
      - capture swapping with getting /proc/vmstat/pswpin and pswpout
      - gather current cpu frequency with /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq & appropriately for 2nd core cpu1

      Please, tell me when my assumptions above are wrong!

      I'd also log the
      - min/max cpufreq per core
      - temperature per core, additionally to the compartment, the latter triggering the fan
      - the fan stepping [%]

      I've also learned today, while testing this little script, that although on performance governor, the current cpu frequency goes lower, depending on the current load, and this differently for each core.

      BR, Manuel Krause

      Delete
    20. Has I recall, /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq doesn't give the real frequency but the product "real frequency" * "time percentage the core as been busy".
      That's why you see frequencies below the max frequency even when using performance governor.

      The reason is that on intel cpu, at least until skylake, all cores share the same frequency, but cores can be put into sleep separatly.
      Turbostat shows more detailled and accurate info.

      Pedro

      Delete
    21. Thanks @Pedro here to answer questions while I am busy with the code changes and testing. PDS 098p is released, you can have it a try and benchmark will be welcome.

      Delete
    22. First of all: Still no good news from my side regarding the data quality.
      Regarding the logging-script, IMHO I don't need high accuracy at this point as it's main purpose is/was to detect severe problems like throttling, emergency cooling and swapping. None of them had happened last night: frequencies, temperatures and fan speed stayed at the same level (only) during compilation and no swapping occurred.
      The compilation on /dev/shm may have improved quality of data a bit, but I'm still not satisfied (and don't reach your, @Pedro's and @Alfred's accuracy, at all, for an honest comparison).
      Values of the standard deviations from last night calculated with the same formulas as Pedro uses:
      make -j1, load 50%: 0,01%
      make -j2, load 100%: 4,04%
      make -j3, load 150%: 1,37%
      make -j4, load 200%: 0,66%
      make -j5, load 250%: 0,23%
      make -j6, load 300%: 0,08%
      Average runtime per round @100% load ~20min, for all higher loads ~15,3min, 5 rounds each make pass.

      I've really run out of ideas and feel frustrated. Any further suggestion is highly welcome! Thank you for all the help up to this point.

      BR, Manuel Krause

      Delete
    23. As "No hope means no future", I went on investigating. I even was able to derive quite accurate times for each possibly error-affected testing-value from my collected data. Unfortunately, so far I haven't gathered the uptime at test start.
      Now I'm looking at a way to most easily change these timers, systemd and cron based from within KDE.

      BR, Manuel Krause

      Delete
    24. Quite astonishing, that I had btrfs related timers and services on here, although not having any such partition at work. Many thanks do NOT go to openSUSE NOR the systemd developer fur such unclear dependencies.
      I've thrown away all such timers and all related services.
      BTW, I only found a way to manually remove the related things. Let's hope it'll survive reboots (and updates).

      BR, Manuel

      Delete