High Resolution Timing On SGI 230,330 and 550 Visual Workstations on Linux and Windows NT. By Jaya Kanajan, Graphics Software, SGI H1. Introduction: High resolution timing is useful when implementing heavily repeated functions. For example, being able to call glVertex3f 20 Million times per second is expected functionality. Choosing instructions for a function in a manner that results in a minimum average number of processor cycles is made easier by having a high resolution timer. There are several other applications of high resolution timing for synchronization purposes. (eg: in digital media in order to synchronize mixed media IO, in Visual Simulation for synchronizing output). H1. Motivation: This article is targeted at folks who need functionality similar to that avaiable with IRIX/MIPS products. The following are the functions commonly used on IRIX/MIPS products for high resolution timing. syssgi(SGI_QUERY_CYCLECNTR, cyclecount) or clock_gettime(CLOCK_SGI_CYCLE,...) On Windows NT and Linux, there is not yet a standard way to take advantage of the Pentium III's high resolution timer. With the purpose of showing an example for linux, I wrapped this functionality into a header file containing a set of macro's that I find useful. H1. Implementation Details The 230,330,550 are currently based on the Pentium III class of processors. These processors have a 64 bit cycle counter that ticks at the same rate as the processor clock. Access to this counter is through the rdtsc instruction which writes the cycle counter value to two 32 bit general purpose registers EDX:EAX with the higher order bits written to EDX. By using this value, one has access to a level of timing accuracy of 1/PROCESSOR_SPEED (eg: in the case of a 1 GHz CPU, you can achieve 1 nanosecond accuracy). H2. Things to Keep In Mind: This cycle counter is nothing more than a wall clock. If the OS context switches your task out or pages in/out your address space during your benchmark, expect to see inconsistent results. You should try to account for cache effects (can be avoided to some extent by cache warming ). Both Windows (2000, NT, 98, 95) and Linux are general purpose operating systems and only offer soft real time support. As such, you'll need to go to some effort to ensure consistency in your results. Also note that the Pentium III class of processors are all out-of-order execution processors so the rdtsc instruction would need to be serialized (eg: using cpuid) in order to measure instruction level cycle counts. H2. Example On Linux: The macro definitions are provided at the end of this article. The actual use looks like the following: #include #include #include #include "systa.h" uint64_t before,after; float cpuspeed,deltat; sysgetTSC(before); dosomething; sysgetTSC(after); sysgetCPUspeed(cpuspeed); sysgetDeltaT(before,after,cpuspeed,deltat); fprintf(stderr"something took %f seconds\n",deltat); At the end of this article, there is a specific example that you can compile and run (XXXjaya package). H2. Example On Windows NT Unfortunately, I didn't attempt to make nice macros for windows since I've had trouble with the __int64 type with MSVCPP 6.0 before. So instead I left it as a little bit of assembly within ifdefs of the actual test application. The general usage is the same as with linux, the only provided macro is for getting the cpuspeed using the registry. H2. Conclusions Because it is a 64 bit counter, the TSC can be used safely without worries of having it wrap in less than 10 years (for current CPU speeds). The TSC is useful when you want a low overhead method for determing time differences. Other schemes such as gettimeofday and others invariably lead to system calls and result in scheduler jitter whereas rdtsc happens purely in your application space. So TSC is also handy when your application requires a low overhead method of timing. Overall, the TSC has it's many uses but one should choose carefully when to tie one's application's timing mechanism down to any particular hardware specific feature. There are quite a few OS and hardware independent methods of timing that give decent resolution (such as gettimeofday). H1. Example Files systa.h =============================================================================== /* * jaya@sgi.com * quick and dirty macros for high res timing example */ #ifdef __linux__ #define sysgetTSC(tsc) \ __asm__ __volatile__ ("rdtsc" : "=A" (tsc)); #define sysgetCPUspeed(cpuspeed) \ {\ FILE *fp = fopen("/proc/cpuinfo","r");\ int fresult=0;\ char tmp[400];\ while((fgets(tmp,400,fp) != NULL) && (fresult != 1)) {\ sscanf(tmp,"cpu MHz : %f",&cpuspeed);\ } \ cpuspeed = cpuspeed * 1e6; \ } #define sysgetDeltaT(before,after,cpuspeed,deltat) \ deltat = ( (double) after - before ) / cpuspeed; #endif #if defined(_WIN32) #define sysgetCPUspeed(cpuspeed) \ { \ HKEY hkey; LONG rc; DWORD buf,dwtype; LPDWORD freq; \ rc = RegOpenKeyEx(HKEY_LOCAL_MACHINE,"HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0",0,KEY_READ,&hkey); \ if (rc == ERROR_SUCCESS) { \ rc = RegQueryValueEx(hkey,"~MHz",NULL,&dwtype,(LPBYTE)&freq,&buf); \ if (rc == ERROR_SUCCESS) { \ char *tmp="asdfasdfadsf"; \ sprintf(tmp,"%u",freq); \ cpuspeed = atof(tmp) * 1000 * 1000;\ } \ } \ } #endif =============================================================================== drawpixels.c =============================================================================== /* * jaya@sgi.com * timing of drawpixels using rdtsc */ #include #include #include #include #ifdef __linux__ /* assume i586 and above arch */ #include #include float cpuspeed,deltat; uint64_t before,after; #elif _WIN32 #include #include "systa.h" __int64 time,cpuspeed; unsigned tlow, thigh; float deltat; #endif GLuint h=300,w=300,x=0,y=0,bsize=360000,numiter=1000; char *buf,*buf2; GLenum pixformat=0x1908,buftype=0x1401; static void init(void) { int i; char *obuf,*obuf2; buf = (char *) malloc(bsize*sizeof(char)); buf2 = (char *) malloc(bsize*sizeof(char)); obuf=buf;obuf2 =buf2; for (i=0; i < bsize; i++) { *(buf++) = 255 * ((double) rand() / RAND_MAX); *(buf2++) = (~1) << i; } buf=obuf; buf2=obuf2; glClearColor(0,0,0,0); glShadeModel(GL_FLAT); } void display(void) { int i; glDrawPixels(w,h,pixformat,buftype,buf); glFinish(); #ifdef __linux__ sysgetTSC(after); /* cache warm, it won't help itc */ sysgetTSC(before); sysgetTSC(before); #elif _WIN32 __asm { rdtsc mov tlow, eax mov thigh, edx } #endif for (i=0;i< numiter; i+=2) { glDrawPixels(w,h,pixformat,buftype,buf); glDrawPixels(w,h,pixformat,buftype,buf2); glFinish(); } sysgetCPUspeed(cpuspeed); #ifdef __linux__ sysgetTSC(after); #elif _WIN32 __asm { rdtsc sub eax, tlow sub edx, thigh mov DWORD PTR[time + 0], EAX mov DWORD PTR[time + 4], EDX fild cpuspeed fild time fdivrp st(1), st fst deltat } #endif #ifdef __linux__ sysgetDeltaT(before,after,cpuspeed,deltat); #endif fprintf(stderr,"%s took %f seconds\n","drawpixels",deltat); exit(0); } void reshape(int w, int h) { glViewport(0,0,w,h); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glOrtho(0,1,0,1,-1,+1); } void keyboard(unsigned char key, int x, int y) { switch (key) { case 27: exit(0); break; } } int main(int argc, char **argv) { glutInitWindowSize(w, h); glutInit(&argc, argv); glutInitDisplayMode(GLUT_SINGLE | GLUT_RGB); glutCreateWindow(argv[0]); init(); glutReshapeFunc(reshape); glutKeyboardFunc(keyboard); glutDisplayFunc(display); glutMainLoop(); return 0; } ===============================================================================