-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative execution times #57
Comments
Hi Peter! I wish it was a time machine... Unfortunately, it's not ) The thing is that profiler's impact although very small is still measurable. The whole enter/exit hook adds about 120 CPU cycle (40ns) on my laptop. Now consider the functions you mentioned: their average call times are about 1ns or zero. How can we accurately measure the timing of such small functions? The right answer - is we cannot, we can only estimate it, by subtracting the profiler's latency (that enter/exit hook time) from the observed time of the function. Now, how do you get the latency value? That's the tricky part mailnly because this value tends to drift depending on many factors, the most important of which is the cache, the shared (by different threads/cores) memory access, etc. I used to calibrate this value by taking the smallest observed latency during the calibration step (when micro-profiler_*.dll gets loaded). But this led to the error accumulation, since the average latency was ~5-15 cycles was longer than that. But again as a I said this 'average' value is taken during the dll load, which means the cpu is busy and the average value is slightly greater than the one expected in the 'free' run. So, this becomes visible with the tiny functions. Take SlowMotionDisplay::Notify for instance - it's average printed exclusive/inclusive times equals -3.95ns. We can easily get such value if we take the real value of this function as extremely short (say 0.33ns - one cycle on 3GHz machine) and we overcompensated it by 4.2ns. So, all in all, there is no overflow here, all the arithmetic is signed. But the outcome is confusing. I knew you'd spot it, just don't know, how to display this without lying to the user. What I'm thinking about is periodical latency reevaluation - like with kind of negative feedback - for instance detecting what was the most average negative value observed and decrease the calibrated latency by that value. |
Hi tyoma! thank you for this detailed and open explanation. I understand the problem with measuring execution times in the nanosecond range. The best method I know of is QueryPerformanceCounter. I found it also in the micro-profiler sources. As far as I see, it has an accuracy of 100 ns. So you have a really hard job in measuring the runtime of very small functions. Processor caches make the problem much worse. I once heard - I think it was in a talk of Scott Meyers - that good or bad usage of CPU caches can make a factor of 20 in runtime! I still remember the times, when it was possible to determine the runtime of code exactly by adding cycle times of machine instructions. I used to look them up in a light blue book called "The Intel 8086 Instruction Set" or something like that. My boss always told me not to waste time on such useless stuff. Later, when I was a manager myself, I said similar things to my developers. But that's another story. I'm not sure if it makes sense to mess around with latency reevaluation or things like that. At the end of the day it will be only a more sophisticated way to lie to the user. For me the benefit of micro-profiler is not diminished now that I understand better the limits of its accuracy. To stick to the example above: The interesting thing is, that SlowMotionDisplay::Notify is called more than 150 million times. That is suspicious, no matter how fast it is. You wrote
What about asking your marketing manager? The answer will be something like "Are you scrupulous about lying to your customers? How cute. Forget it! Customers want to be lied to. Print out a small (positive!) random number with 10 numbers of accuracy. They will love it." Best regards |
sorry for my long-tongued comment. In short: I understand your explanation and will close the issue now. |
Hi Peter! |
Hi Artem! By the way, during the reinstallation of both versions (v1.6.622.0 and v1.7.621.0) I experienced again the old problem from issue #51. In the meanwhile a have a certain routine to fix it. Copy path name from linker error message and rename microprofiler installation directory. It works, but I suppose it`s not meant that way. Regards |
Hi Peter! Long reply: the version you've tried (1.7.621) has a 2x faster collector code, but it's analyzer code was slow, so that collector had to wait until analyzer crunches everything collected before the data is submitted to the frontend. v1.7.624 has faster analyzer (~40% better in x64 code), which makes things faster approximately twice comparing to v1.6.. And these numbers are all about high load - when the code profiled has lots of functions measured as N-tens and less nanoseconds (as a reference on my machine, i5 3570 3.4Ghz 4 cores, which is ancient comparing to your's - profiler v1.7. adds approximately 16ns of overhead). If the profiling load is lower (i.e. tiny functions get inlined), the gain is higher. |
Hi Artem! v1.7.624 seems to be a bit faster, but not that much in my perception. But it doesn't matter. What I've written about the overhead caused by microprofiler, true or not, in no way diminishes the practical value of the tool. It helps me to identify the hotspots of my code and that's what it's for. The functions I am currently investigating are very small, about 10 - 20 ns. Considering that microprofiler causes about 16 ns overhead, it is plausible that the overall performance is affected in the order of magnitude I observe. My new CPU (AMD Ryzen 3600 4,2 Ghz, 6(12) cores) is indeed faster than your reference, but the single thread performance is not that much different, maybe 50% faster. I have tons of cores, but I still can't use them effectively. Still struggling with parallelizing the computation intensive code. Yes, I'm interested to learn on how to get more functions inlined. Do you need my source code or is it more general advice? According #51: You're right. Restart VS and it's ok. I was too impatient :-) |
"Yes, I'm interested to learn on how to get more functions inlined. Do you need my source code or is it more general advice?" totally - I can take a look if you point out the hotspots you currently have. Are you talking about your project here on github? I had and still do some interest in evolutionary models (you can see, I have polyworld forked in my profile - it shattered me when I first watched it's presentation), so it'll be a fun ride. "Considering that microprofiler causes about 16 ns overhead" "My new CPU (AMD Ryzen 3600 4,2 Ghz, 6(12) cores)" - in seem to be 10ns or less, bc Ryzen has more sophisticated pipeline and bigger cache. |
Hi Artem!
I have a console, but don't see any output from microprofiler. May be my way to start the console conflicts with microprofiler?
(I turn off the system menu because my "customer" likes to close the console window and then complains, that the application exits unexpectedly) Best regards |
Hi Peter! |
I just tortured my application with a long time stress test running unattended for about half an hour. When I came back, I saw to my astonishment, this:
![Anmerkung 2020-02-28 012510](https://user-images.githubusercontent.com/25415300/75551537-b930be00-5a34-11ea-84f6-f66539acbab0.jpg)
MicroProfiler helped me to improve the performance of my app a lot, but negative execution times - whow! Did you implement some kind of time machine?
But seriously, there seems to be some overflow condition. I don't think it is necessary to put a lot of effort into handling such extreme situations. A simple marker to indicate overflow would be sufficient imho.
The text was updated successfully, but these errors were encountered: