| View previous topic :: View next topic |
| Author |
Message |
Chris Guest
|
Posted: Fri Nov 14, 2008 6:38 am Post subject: measuring clock cycles per second |
|
|
Hello,
How do I measure the execution time of an algorithm/function in units of
clock cycles per second?
I'm in the processes of measuring an algorithm in terms of units of time. To
measure the wall-clock time I am using gettimeofday(). To measure the
processor time, I am using getrusage(RUSAGE_SELF). So far, all is good.
Now I would like to measure the the execution time of the algorithm in terms
of clock cycles per second. My goal is to determine/guestimate the minimum
power requirements (e.g. psu and cpu) my algorithm would/could/may require.
After googling and reading the first few pages of chapter 6 in Linux Device
Drivers 2nd edition (I don't have the 3rd edition yet). It appears I can
use rdtscl() to access the TSC?
That said, rdtscl() has been removed as of the latest kernels (I am using
kernel 2.6.22.19-0.1-default - opensuse 10.3) as per this post
<http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=47;att=0;bug=436332>.
I did manage to get the test program, using rdtscl, to compile and run. The
is at the end of this post.
Ultimately, I am going to create a linux boot cd (probably based off of the
opensuse 10.3 kiwi project) that will run the code. So, compatibility of
different kernels will not be an issue. My plan is to boot the CD on
different computers (Intel and AMD based) an record the results.
So, is there a way to measure the clock cycles or perhaps convert the time
elapsed from getrusage() to clock cycles per second?
Any information or guidance would be greatly appreciated, thank you.
---8<----
// test program calling rdtscl()
// compile using: gcc -o testcycles testcycles.c
// the following two lines are needed since asm-i386/msr.h does not
// contain the definitions for rdtsc() and rdtscl().
// On my machine, opensuse 10.3 kernel 2.6.22 amd x64 4200+, rdtscl() is
// only defined in <ams-x86_64/msr.h>
#define __x86_64__
typedef unsigned int u32;
#include <asm/msr.h>
#include <stdio.h>
typedef unsigned int cycles_t;
int main(int argc, char *argv[])
{
unsigned int start = 0;
unsigned int end = 0;
rdtscl(start);
rdtscl(end);
printf("rdtscl: No. of cycles: %li\n", end - start);
rdtscl(start);
sleep(1);
rdtscl(end);
printf("sleep 1s: No. of cycles: %li\n", end - start);
return 0;
}
--->8----
Here are the results from my machine:
OS: openSUSE 10.3
chris@Desktop:~/tests/rdtscl_test> uname -a
Linux Desktop 2.6.22.19-0.1-default #1 SMP 2008-10-14 22:17:43 +0200 i686
athlon i386
chris@Desktop:~/tests/rdtscl_test> ./testcycles
rdtscl: No. of cycles: 9
sleep 1s: No. of cycles: 1004509086
--
Chris |
|
| |
|
Back to top |
David Schwartz Guest
|
Posted: Fri Nov 14, 2008 6:38 am Post subject: Re: measuring clock cycles per second |
|
|
On Nov 13, 4:38 pm, Chris <ch...@thisisnotanemailaddress.ca> wrote:
| Quote: | That said, rdtscl() has been removed as of the latest kernels (I am using
kernel 2.6.22.19-0.1-default - opensuse 10.3) as per this post
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=47;att=0;bug=436332>.
|
Since you're not writing kernel code (are you?!) why do you care
whether rdtscl is in the kernel or not?
DS |
|
| |
|
Back to top |
Guest
|
Posted: Fri Nov 14, 2008 6:38 am Post subject: Re: measuring clock cycles per second |
|
|
There is not way to do this. Even Linus agrees that profiling is worse
on Linux
and one of things that still needs to be done.
TSC and other CPU counter events are not working because they are
not saved across task switches. So it's almost useless to use them.
I have a Blade 1000 SPARC machine here with Solaris which is able to
do something
like this. You can also use Itanium2 CPU's with HP-UX. |
|
| |
|
Back to top |
Jasen Betts Guest
|
Posted: Fri Nov 14, 2008 3:10 pm Post subject: Re: measuring clock cycles per second |
|
|
On 2008-11-14, Chris <chris@thisisnotanemailaddress.ca> wrote:
| Quote: | Hello,
How do I measure the execution time of an algorithm/function in units of
clock cycles per second?
|
you have a unit mismatch.
| Quote: | I'm in the processes of measuring an algorithm in terms of units of time. To
measure the wall-clock time I am using gettimeofday(). To measure the
processor time, I am using getrusage(RUSAGE_SELF). So far, all is good.
Now I would like to measure the the execution time of the algorithm in terms
of clock cycles per second. My goal is to determine/guestimate the minimum
power requirements (e.g. psu and cpu) my algorithm would/could/may require.
|
divide by the speed of the processor you are using ??
clocks cycles are a rather meaningless unit, they come in all
different sizes. |
|
| |
|
Back to top |
David Schwartz Guest
|
Posted: Fri Nov 14, 2008 6:01 pm Post subject: Re: measuring clock cycles per second |
|
|
On Nov 14, 6:17 am, Chris <ch...@thisisnotanemailaddress.ca> wrote:
| Quote: | David Schwartz wrote:
Since you're not writing kernel code (are you?!) why do you care
whether rdtscl is in the kernel or not?
Ugh, I was afraid that this might be something that I cannot do from
userspace.
|
You can. You can call 'rdtscl', whether the kernel has it or not.
You're not writing kernel code, so it doesn't matter whether the
kernel has it or not. Since you're calling from user space, all that
matters is whether user space has it or not, and you have complete
control over user space.
| Quote: | I have several computers that I would like to test the algorithm on: Pentium
150 -> AMD 64x2 6000+. To minimise the variables, I am thinking of creating
a boot CD based on opensuse 10.3 or 11.0 (I use 10.3 to develop). I haven't
got this far yet, so the details of the CD are sketchy at best.
|
Depending on what the algorithm does and what you hope to measure,
there may or may not be a sensible way to do what you're trying to do.
Why not just come up with a realistic test scenario that stresses your
algorithm, measure its wall time, and if you really want to, multiply
by the system's clock speed.
DS |
|
| |
|
Back to top |
Rainer Weikusat Guest
|
Posted: Fri Nov 14, 2008 6:47 pm Post subject: Re: measuring clock cycles per second |
|
|
Chris <chris@thisisnotanemailaddress.ca> writes:
| Quote: | How do I measure the execution time of an algorithm/function in units of
clock cycles per second?
|
Such a measure isn't very useful. Except in special cases, execution
time will be dominated by memory access latencies and these will vary
wildly, depending on such things as 'past history of the system' (ie
what's [still] contained in the various cache(s)) or seemingly
innocuos minor changes in machine code layout due to C-level changes
completely unrelated to the algorithm you plan to measure.
| Quote: | I'm in the processes of measuring an algorithm in terms of units of time. To
measure the wall-clock time I am using gettimeofday(). To measure the
processor time, I am using getrusage(RUSAGE_SELF). So far, all is
good.
|
Consider using a profiler. gprof is basically useless except on
ancient hardware. I found OProfile to be fairly usuable, though.
http://oprofile.sourceforge.net/about |
|
| |
|
Back to top |
Rainer Weikusat Guest
|
Posted: Fri Nov 14, 2008 6:50 pm Post subject: Re: measuring clock cycles per second |
|
|
scholz.lothar@gmail.com writes:
| Quote: | There is not way to do this. Even Linus agrees that profiling is worse
on Linux and one of things that still needs to be done.
TSC and other CPU counter events are not working because they are
not saved across task switches.
|
There is no instruction to set the time stamp counter and there cannot
even be one, because this would directly contradict its purpose. |
|
| |
|
Back to top |
David Schwartz Guest
|
Posted: Fri Nov 14, 2008 7:27 pm Post subject: Re: measuring clock cycles per second |
|
|
On Nov 14, 11:09 am, Chris <ch...@thisisnotanemailaddress.ca> wrote:
| Quote: | What timings would you suggest I be concerned with: the system (wallclock)
time or the process time (time in CPU) or both? The timing code I wrote
gives me both. The algorithm isn't simply a FP calculation but it also
includes reading large chunks of memory. I guess, if I want to truly gauge
the performance, the system time would be better since it would include bus
delays, interrupts, etc. which are important if I want to be able to
determine the minimum hardware requirements for the algorithm.
|
Bus delays and memory delay will be counted as user time. Interrupts
will not be counted as user time or system time. System time is time
that your process spends using the CPU in kernel space.
DS |
|
| |
|
Back to top |
Chris Guest
|
Posted: Fri Nov 14, 2008 8:08 pm Post subject: Re: measuring clock cycles per second |
|
|
Rainer Weikusat wrote:
| Quote: | Chris <chris@thisisnotanemailaddress.ca> writes:
How do I measure the execution time of an algorithm/function in units of
clock cycles per second?
Such a measure isn't very useful. Except in special cases, execution
time will be dominated by memory access latencies and these will vary
wildly, depending on such things as 'past history of the system' (ie
what's [still] contained in the various cache(s)) or seemingly
innocuos minor changes in machine code layout due to C-level changes
completely unrelated to the algorithm you plan to measure.
I'm in the processes of measuring an algorithm in terms of units of time.
To
measure the wall-clock time I am using gettimeofday(). To measure the
processor time, I am using getrusage(RUSAGE_SELF). So far, all is
good.
Consider using a profiler. gprof is basically useless except on
ancient hardware. I found OProfile to be fairly usuable, though.
http://oprofile.sourceforge.net/about
|
Thanks for the link. I will look into OProfile in more detail.
--
Chris |
|
| |
|
Back to top |
Chris Guest
|
Posted: Fri Nov 14, 2008 8:11 pm Post subject: Re: measuring clock cycles per second |
|
|
Jasen Betts wrote:
| Quote: | On 2008-11-14, Chris <chris@thisisnotanemailaddress.ca> wrote:
Now I would like to measure the the execution time of the algorithm in
terms of clock cycles per second. My goal is to determine/guestimate the
minimum power requirements (e.g. psu and cpu) my algorithm
would/could/may require.
divide by the speed of the processor you are using ??
clocks cycles are a rather meaningless unit, they come in all
different sizes.
|
I figured I could do that, but I have several machines that I wish to test,
ranging from a Pentium 150Mhz up to a AMD 64X2 6000+. My goal is to create
a boot CD that boots the 2.6 kernel (Not sure how that will work with the
P150 since that last OS I used on it as Red Hat 6.0 possibly the version
before that. It's been awhile since I turned it on ).
I was kind of hoping there would some dynamic way of
measuring/calculating/deriving the clock cycles.
--
Chris |
|
| |
|
Back to top |
Chris Guest
|
Posted: Fri Nov 14, 2008 8:17 pm Post subject: Re: measuring clock cycles per second |
|
|
David Schwartz wrote:
| Quote: | On Nov 13, 4:38Â pm, Chris <ch...@thisisnotanemailaddress.ca> wrote:
That said, rdtscl() has been removed as of the latest kernels (I am using
kernel 2.6.22.19-0.1-default - opensuse 10.3) as per this post
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=47;att=0;bug=436332>.
Since you're not writing kernel code (are you?!) why do you care
whether rdtscl is in the kernel or not?
DS
|
Ugh, I was afraid that this might be something that I cannot do from
userspace.
I have several computers that I would like to test the algorithm on: Pentium
150 -> AMD 64x2 6000+. To minimise the variables, I am thinking of creating
a boot CD based on opensuse 10.3 or 11.0 (I use 10.3 to develop). I haven't
got this far yet, so the details of the CD are sketchy at best.
--
Chris |
|
| |
|
Back to top |
Chris Guest
|
Posted: Sat Nov 15, 2008 1:09 am Post subject: Re: measuring clock cycles per second |
|
|
David Schwartz wrote:
| Quote: | I have several computers that I would like to test the algorithm on:
Pentium 150 -> AMD 64x2 6000+. To minimise the variables, I am thinking
of creating a boot CD based on opensuse 10.3 or 11.0 (I use 10.3 to
develop). I haven't got this far yet, so the details of the CD are
sketchy at best.
Depending on what the algorithm does and what you hope to measure,
there may or may not be a sensible way to do what you're trying to do.
Why not just come up with a realistic test scenario that stresses your
algorithm, measure its wall time, and if you really want to, multiply
by the system's clock speed.
DS
|
I think I will do this since this was my original plan.
What timings would you suggest I be concerned with: the system (wallclock)
time or the process time (time in CPU) or both? The timing code I wrote
gives me both. The algorithm isn't simply a FP calculation but it also
includes reading large chunks of memory. I guess, if I want to truly gauge
the performance, the system time would be better since it would include bus
delays, interrupts, etc. which are important if I want to be able to
determine the minimum hardware requirements for the algorithm.
--
Chris |
|
| |
|
Back to top |
Nate Eldredge Guest
|
Posted: Sat Nov 15, 2008 1:27 am Post subject: Re: measuring clock cycles per second |
|
|
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
| Quote: | scholz.lothar@gmail.com writes:
There is not way to do this. Even Linus agrees that profiling is worse
on Linux and one of things that still needs to be done.
TSC and other CPU counter events are not working because they are
not saved across task switches.
There is no instruction to set the time stamp counter and there cannot
even be one, because this would directly contradict its purpose.
|
Interestingly, that's not quite true. On my Opteron CPU, the TSC is a
model-specific register and can be written with the appropriate
(privileged) WRMSR instruction. (I just tried it and it works, though
the documentation says this feature is not to be relied upon.) So in
principle, you could have the TSC saved on task switches, so it would
count cycles for each process. Not that it seems like a particularly
good idea.
FreeBSD has support for CPU performance-monitoring counters (PMC) which
can count not only clock cycles but many other CPU events (jumps taken,
cache misses, pipeline stalls, etc). These can be set to run on a
systemwide or per-process basis. It doesn't appear that Linux has this
support yet, unless I am missing something. |
|
| |
|
Back to top |
Jasen Betts Guest
|
Posted: Sat Nov 15, 2008 7:28 pm Post subject: Re: measuring clock cycles per second |
|
|
On 2008-11-14, Chris <chris@thisisnotanemailaddress.ca> wrote:
| Quote: | Jasen Betts wrote:
On 2008-11-14, Chris <chris@thisisnotanemailaddress.ca> wrote:
Now I would like to measure the the execution time of the algorithm in
terms of clock cycles per second. My goal is to determine/guestimate the
minimum power requirements (e.g. psu and cpu) my algorithm
would/could/may require.
divide by the speed of the processor you are using ??
clocks cycles are a rather meaningless unit, they come in all
different sizes.
I figured I could do that, but I have several machines that I wish to test,
ranging from a Pentium 150Mhz up to a AMD 64X2 6000+. My goal is to create
a boot CD that boots the 2.6 kernel (Not sure how that will work with the
P150 since that last OS I used on it as Red Hat 6.0 possibly the version
before that. It's been awhile since I turned it on ).
I was kind of hoping there would some dynamic way of
measuring/calculating/deriving the clock cycles.
|
Whatever it is you are tyring to do you are going about it the wrong way. |
|
| |
|
Back to top |
Chris Guest
|
Posted: Sat Nov 15, 2008 7:58 pm Post subject: Re: measuring clock cycles per second |
|
|
Jasen Betts wrote:
| Quote: | On 2008-11-14, Chris <chris@thisisnotanemailaddress.ca> wrote:
Jasen Betts wrote:
On 2008-11-14, Chris <chris@thisisnotanemailaddress.ca> wrote:
Now I would like to measure the the execution time of the algorithm in
terms of clock cycles per second. My goal is to determine/guestimate
the minimum power requirements (e.g. psu and cpu) my algorithm
would/could/may require.
divide by the speed of the processor you are using ??
clocks cycles are a rather meaningless unit, they come in all
different sizes.
I figured I could do that, but I have several machines that I wish to
test, ranging from a Pentium 150Mhz up to a AMD 64X2 6000+. My goal is to
create a boot CD that boots the 2.6 kernel (Not sure how that will work
with the P150 since that last OS I used on it as Red Hat 6.0 possibly the
version before that. It's been awhile since I turned it on ).
I was kind of hoping there would some dynamic way of
measuring/calculating/deriving the clock cycles.
Whatever it is you are tyring to do you are going about it the wrong way.
|
It's better for me to find out now before I invest too much time in this.
Basically, I am benchmarking my algorithm/process for its
appropriateness/likelihood to be executed on specialised, embedded
hardware. The benchmarking app I have written takes a set of input and
executes the algorithm and writes a report on the various timings
(milliseconds) and memory consumption.
It was suggested to be, by a coworker, that I look into counting the clock
cycle execution of the algorithm (to report the number lock cycles to
execute the algorithm and various parts of the algorithm). The idea is to
provide a metric that can be used to determine/guestimate the minimum cpu
requirements and to (possible) determine/guestimate the power requirements
of the algorithm (i.e. if it takes N clock cycles on a Pentium 150Mhz CPU,
then it requires X watts/volts/etc.).
I know there are a lot of factors that I haven't even begun to consider. I
was just curious if there was a way to report that the algorithm took N
clock cycles to complete. Where N would be a number on the order of 1e9 I'm
sure!
The more reading I have done on this suggests that if it takes N clock
cycles on a Pentium 150Mhz CPU, it may not necessarily take M cycles on a
Pentium 3 1Ghz where M < N. It all depends on how many operations the CPU
can execute in one clock cycle.
For some reason I am failing to construct the right query for Google in
order to get more information on this.
--
Chris |
|
| |
|
Back to top |
|