altivo

Something seems to be making Dreamwidth respond really slowly, so I'll keep this short.

Fine tuned the benchmark code and added scripting in Rexx to execute and time it. Interesting results.

Estimating the value of PI to 10 decimal places, using Simpson's rule to find the area of one quarter of a unit circle. The same source code, compiled by SAS/C 6.3 on the Amiga, and gcc 4.4.3 on Linux, yields these times. The results were the same in all cases, and correct to the tenth decimal place.

Amiga 3000T (actual hardware Amiga system with AmigaOS 2.04) 12 minutes, 20 seconds.
Amiga 3000 (E-UAE emulation, with AmigaOS 3.1) 5.2 seconds (amazing but true)
Intel P4 2.4 GHz (Ubuntu 10.04 LTS, Linux/gcc) 0.22 seconds (even I can't believe this)

I should note that the Z80A TRS-80 4P emulation takes 70 minutes to achieve 6 decimal places. I haven't tried to push it any farther than that.

The Amiga 3000T has a Motorola 68020 CPU and 68881 math co-processor. Running the same code using software double precision or IEEE libraries is slower and yields less precision than the hardware floating point.

I expected an obvious difference in speed, but not to this degree. I'm both impressed and puzzled.

Addendum, July 27: This morning I booted the DEC Alpha (old one, only 433MHz, with VMS 8.3 and HP/Compaq/VMS C compiler) and tried the PI code on it. It ran the 10 decimal places in about 3 seconds, so I pushed it up to 12 decimal places which took it 46 seconds to complete. Tried the Linux system and gcc, and the 12 decimal places took about 3 seconds. This is very roughly an order of magnitude in time for both machines to go from 10 to 12 decimal places.

It's the nature of the algorithm that each power of ten takes about twice as many calculations as the one before it, and of course on multiprocessing systems there can be other things that affect the timing so this all seems to be in order.

Simpson's rule integration 2 0.74401693585629236071810055364 4 0.77089878873674044790220705181 8 0.78029729244385437336717359358 16 0.78359941724614923241887254335 32 0.78476305447339866905309690992 64 0.78517376902013380490785721122 128 0.78531885473389795304655081054 256 0.78537012828602537073408029755 512 0.78538825232678266541341827178 1024 0.78539465945303477134586955799 2048 0.78539692459223109377575156032 4096 0.78539772541829744323393924788 8192 0.78539800854925345685586535183 16384 0.78539810865048986787684270894 32768 0.78539814404149932425269753367 65536 0.78539815655409195294112123520 131072 0.78539816097795633886846644600 262144 0.78539816254202976519849244141 524288 0.78539816309501331303977167408 1048576 0.78539816329052103416330510299 2097152 0.78539816335964651727863383712 4194304 0.78539816338408574569740494553 8388608 0.78539816339270696055052667361 16777216 0.78539816339583357063247603946 33554432 0.78539816339686507884465527241 67108864 0.78539816339727974714435276837 Area = 0.78539816339727974714435276837 Estimated value of PI = 3.14159265358911898857741107349 1.235047 seconds elapsed time

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Most Popular Tags

apples - 16 uses
art - 47 uses
astronomy - 22 uses
baking - 36 uses
birds - 108 uses
books - 96 uses
cooking - 120 uses
dogs - 67 uses
dyeing - 18 uses
economics - 19 uses
fairs - 17 uses
farm - 330 uses
fiber - 11 uses
films - 32 uses
food - 62 uses
fursuiting - 59 uses
fursuits - 14 uses
garden - 13 uses
gardening - 86 uses
geekery - 314 uses
health - 26 uses
holidays - 101 uses
horses - 181 uses
knitting - 42 uses
meme - 228 uses
mff - 18 uses
modeling - 21 uses
movies - 18 uses
music - 65 uses
nablopomo - 29 uses
nanowrimo - 179 uses
pets - 97 uses
plush - 18 uses
podcasts - 12 uses
politics - 84 uses
quiz - 143 uses
railroads - 23 uses
rants - 65 uses
reading - 77 uses
sheep - 28 uses
snow - 12 uses
spinning - 190 uses
trains - 12 uses
vacation - 14 uses
weather - 794 uses
weaving - 186 uses
weekends - 12 uses
wildlife - 83 uses
work - 520 uses
writing - 180 uses

Flat | Top-Level Comments Only

From:

baphnedia

Now I'm just waiting to see how technical support deals with you when you ask Electronic Arts for help on Deluxe Paint.

altivo

I was a beta tester for EA back in those days. I learned never to ask them anything because they didn't listen anyway. I should note, though, that I did not work on Deluxe Paint. It was Deluxe Music Construction Set v.2 that I pulled my fur out over. We did eventually get it beaten into working shape.

At least, better shape than some of the competition, I'd say. Blue Ribbon's music product was unusable in my opinion and I'm not sure it ever went anywhere. Dr T's products were more professional but too expensive for most users.

Edited Date: 2011-07-27 11:16 am (UTC)

lhexa

That's actually surprisingly slow for the P4, considering how efficient an algorithm Simpson's rule is. It's possible that you have a poorly optimized square root function in there, eating up most of the time. Newton's Rule would be the fast way to calculate a square root, but maybe your library uses brute force instead.

When I get to work this afternoon I'll try it on my desktop there, which is a Core Duo.

The inefficiencies could well be in my C code, which is a port of a Fortran 77 program I wrote some years ago. I'm very amateur at C.

I did at one point try bypassing the sqrt() function in the C by instead raising the value to the 0.5 power, but of course the pow() function could be just as inefficient. It made no significant difference in either timing or accuracy.

The SAS/C compiler on the Amiga offers four different floating point libraries, but after trying all of them I've stuck with the one optimized for the 68881 floating point coprocessor. It seems to be both the fastest and the most precise.

Same code, OS, and compiler version on Intel Core2 Duo @2.93GHz

Native mode (gcc/linux) 1.2 seconds to 12 decimal places
E-UAE (SAS/C on E-UAE Amiga emulation) 47 seconds to 12 decimal places

It occurs to me that you haven't seen my program operates. When we get down to completion times of 1 or 2 seconds, that is probably almost entirely due to I/O. At the completion of each increasing iteration of the Simpson method, I'm printing out the number of slices and the derived area so I can watch the progress as the integration approaches the tolerance I've set. That's console I/O performed through the operating system and certainly takes a non-negligible amount of CPU effort and elapsed time on the program. It's easy on the slower machines to see the increasing amount of CPU effort required for calculations, as time between each line of display increases exponentially. When the program runs very fast, as it does on native Linux, the entire table of values appears almost instantaneously. The elapsed time is determined by a Rexx macro that just sets a stop watch, starts the program, and stops the watch when the program exits. Consequently the timings include spurious events, interrupts, I/O operations, etc. I run several times and take an average, but when the total time is just 2 seconds, it is almost certainly more unrelated activity than actual calculation.

The "Area" shown is one fourth of a complete unit circle, just the first quadrant from X=0 to 1 and Y=0 to 1. The estimated value for PI is four times the quadrant area.

Ah, I underestimated the number of steps needed. I'm seeing the expected dependence of the error on the square of the step size, but even with that, one more digit requires more than tripling the steps taken.

Yes, the number of calculations required for each additional digit of precision rises exponentially.

Wandering about distractedly

Slowness

Slowness

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

November 2024

Links

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags