More performance comments..

From: Dave Hudson <dave_at_nospam.org>
Date: Mon Jan 30 1995 - 09:39:14 PST

Hi All,

I thought I'd report the latest performance results I've acheived, and
mention a couple of things I found over the weekend.

I now have perf1 showing 47 microseconds, and my perf2 (sched_yield based)
code showing around 13 microseconds per context switch on my cached DX/4-100
system. This is with a kernel built under gcc 2.5.7 and Linux. IOzone is
showing a speedup of around 1% on writes and 6% on reads for the 1MByte file
test.

I've optimised a few more code sequences (which should help all CPUs), but I
realised two things about the 486 stuff I've been working on:

1) Better results come from statically linked test programs (should have
been obvious really), although the results I've quoted are all for
dynamically linked code.

2) GNU C and GNU ld don't seem to get 486 code alignment right :-( This one
is something that might prove interesting for some of the other free 486
O/S's (I've already verified the same problem under Linux, gcc-2.6.2 and
binutils-2.5.2). Basically, the 486 needs function entry points to be
cache-line aligned (16 bytes) to save potential prefetch stalls. gcc 2.x
inserts ".align 4" directives to achieve this, and the resulting object
files are thus aligned correctly. Unfortunately it doesn't ensure that the
text sizes of the object files are also 16 byte aligned. When ld links the
code it concatenates the text spaces of the object files and thus gives
unaligned executables (on a recent Linux kernel I found that "cat
/proc/ksyms" say, shows function alignments on 4 byte boundaries, not 16).
In my testing I've found poor alignment can cost 5 or 6 microseconds on a
DX/2-66 running perf1.

For now I've got round the problem by adding a 16 byte static function at
the end of all of my kernel files to guarantee the alignment of the object
files.

FWIW at the moment this is only relevant to cross-compiled code as gcc 1.42
doesn't do 486 code generation, but all of my code will still build native
under VSTa (it's just not quite as fast).

                        Regards,
                        Dave
Received on Mon Jan 30 10:29:25 1995

This archive was generated by hypermail 2.1.8 : Thu Sep 22 2005 - 15:12:17 PDT