Something seems to be making Dreamwidth respond really slowly, so I'll keep this short.
Fine tuned the benchmark code and added scripting in Rexx to execute and time it. Interesting results.
Estimating the value of PI to 10 decimal places, using Simpson's rule to find the area of one quarter of a unit circle. The same source code, compiled by SAS/C 6.3 on the Amiga, and gcc 4.4.3 on Linux, yields these times. The results were the same in all cases, and correct to the tenth decimal place.
Amiga 3000T (actual hardware Amiga system with AmigaOS 2.04) 12 minutes, 20 seconds.
Amiga 3000 (E-UAE emulation, with AmigaOS 3.1) 5.2 seconds (amazing but true)
Intel P4 2.4 GHz (Ubuntu 10.04 LTS, Linux/gcc) 0.22 seconds (even I can't believe this)
I should note that the Z80A TRS-80 4P emulation takes 70 minutes to achieve 6 decimal places. I haven't tried to push it any farther than that.
The Amiga 3000T has a Motorola 68020 CPU and 68881 math co-processor. Running the same code using software double precision or IEEE libraries is slower and yields less precision than the hardware floating point.
I expected an obvious difference in speed, but not to this degree. I'm both impressed and puzzled.
Addendum, July 27: This morning I booted the DEC Alpha (old one, only 433MHz, with VMS 8.3 and HP/Compaq/VMS C compiler) and tried the PI code on it. It ran the 10 decimal places in about 3 seconds, so I pushed it up to 12 decimal places which took it 46 seconds to complete. Tried the Linux system and gcc, and the 12 decimal places took about 3 seconds. This is very roughly an order of magnitude in time for both machines to go from 10 to 12 decimal places.
It's the nature of the algorithm that each power of ten takes about twice as many calculations as the one before it, and of course on multiprocessing systems there can be other things that affect the timing so this all seems to be in order.
Fine tuned the benchmark code and added scripting in Rexx to execute and time it. Interesting results.
Estimating the value of PI to 10 decimal places, using Simpson's rule to find the area of one quarter of a unit circle. The same source code, compiled by SAS/C 6.3 on the Amiga, and gcc 4.4.3 on Linux, yields these times. The results were the same in all cases, and correct to the tenth decimal place.
Amiga 3000T (actual hardware Amiga system with AmigaOS 2.04) 12 minutes, 20 seconds.
Amiga 3000 (E-UAE emulation, with AmigaOS 3.1) 5.2 seconds (amazing but true)
Intel P4 2.4 GHz (Ubuntu 10.04 LTS, Linux/gcc) 0.22 seconds (even I can't believe this)
I should note that the Z80A TRS-80 4P emulation takes 70 minutes to achieve 6 decimal places. I haven't tried to push it any farther than that.
The Amiga 3000T has a Motorola 68020 CPU and 68881 math co-processor. Running the same code using software double precision or IEEE libraries is slower and yields less precision than the hardware floating point.
I expected an obvious difference in speed, but not to this degree. I'm both impressed and puzzled.
Addendum, July 27: This morning I booted the DEC Alpha (old one, only 433MHz, with VMS 8.3 and HP/Compaq/VMS C compiler) and tried the PI code on it. It ran the 10 decimal places in about 3 seconds, so I pushed it up to 12 decimal places which took it 46 seconds to complete. Tried the Linux system and gcc, and the 12 decimal places took about 3 seconds. This is very roughly an order of magnitude in time for both machines to go from 10 to 12 decimal places.
It's the nature of the algorithm that each power of ten takes about twice as many calculations as the one before it, and of course on multiprocessing systems there can be other things that affect the timing so this all seems to be in order.
no subject
Date: 2011-07-27 08:23 am (UTC)no subject
Date: 2011-07-27 11:13 am (UTC)At least, better shape than some of the competition, I'd say. Blue Ribbon's music product was unusable in my opinion and I'm not sure it ever went anywhere. Dr T's products were more professional but too expensive for most users.
no subject
Date: 2011-07-27 01:13 pm (UTC)no subject
Date: 2011-07-27 01:23 pm (UTC)The inefficiencies could well be in my C code, which is a port of a Fortran 77 program I wrote some years ago. I'm very amateur at C.
I did at one point try bypassing the sqrt() function in the C by instead raising the value to the 0.5 power, but of course the pow() function could be just as inefficient. It made no significant difference in either timing or accuracy.
The SAS/C compiler on the Amiga offers four different floating point libraries, but after trying all of them I've stuck with the one optimized for the 68881 floating point coprocessor. It seems to be both the fastest and the most precise.
no subject
Date: 2011-07-27 06:58 pm (UTC)Native mode (gcc/linux) 1.2 seconds to 12 decimal places
E-UAE (SAS/C on E-UAE Amiga emulation) 47 seconds to 12 decimal places
It occurs to me that you haven't seen my program operates. When we get down to completion times of 1 or 2 seconds, that is probably almost entirely due to I/O. At the completion of each increasing iteration of the Simpson method, I'm printing out the number of slices and the derived area so I can watch the progress as the integration approaches the tolerance I've set. That's console I/O performed through the operating system and certainly takes a non-negligible amount of CPU effort and elapsed time on the program. It's easy on the slower machines to see the increasing amount of CPU effort required for calculations, as time between each line of display increases exponentially. When the program runs very fast, as it does on native Linux, the entire table of values appears almost instantaneously. The elapsed time is determined by a Rexx macro that just sets a stop watch, starts the program, and stops the watch when the program exits. Consequently the timings include spurious events, interrupts, I/O operations, etc. I run several times and take an average, but when the total time is just 2 seconds, it is almost certainly more unrelated activity than actual calculation.
Simpson's rule integration 2 0.74401693585629236071810055364 4 0.77089878873674044790220705181 8 0.78029729244385437336717359358 16 0.78359941724614923241887254335 32 0.78476305447339866905309690992 64 0.78517376902013380490785721122 128 0.78531885473389795304655081054 256 0.78537012828602537073408029755 512 0.78538825232678266541341827178 1024 0.78539465945303477134586955799 2048 0.78539692459223109377575156032 4096 0.78539772541829744323393924788 8192 0.78539800854925345685586535183 16384 0.78539810865048986787684270894 32768 0.78539814404149932425269753367 65536 0.78539815655409195294112123520 131072 0.78539816097795633886846644600 262144 0.78539816254202976519849244141 524288 0.78539816309501331303977167408 1048576 0.78539816329052103416330510299 2097152 0.78539816335964651727863383712 4194304 0.78539816338408574569740494553 8388608 0.78539816339270696055052667361 16777216 0.78539816339583357063247603946 33554432 0.78539816339686507884465527241 67108864 0.78539816339727974714435276837 Area = 0.78539816339727974714435276837 Estimated value of PI = 3.14159265358911898857741107349 1.235047 seconds elapsed timeThe "Area" shown is one fourth of a complete unit circle, just the first quadrant from X=0 to 1 and Y=0 to 1. The estimated value for PI is four times the quadrant area.
no subject
Date: 2011-07-30 10:36 pm (UTC)no subject
Date: 2011-07-31 11:58 am (UTC)