D. J. Bernstein
Sun's manuals page
microSPARC-IIep user's manual (PS),
SuperSPARC-II user's manual (PS),
UltraSPARC-I user's manual (PS),
UltraSPARC-IIi user's manual (PS, aka 79578.ps).
Warning: The microSPARC-IIep has data-dependent FPU timings.
Each of the UltraSPARC manuals has a chapter on instruction grouping.
Not particularly clear, and far too few examples, but better than nothing.
Some important tidbits:
The store buffer has eight slots.
Stores do not take precedence over loads
until five store-buffer slots are full.
- At most four instructions can be executed in one cycle.
- At most two integer instructions can be executed in one cycle:
first, a generic instruction (add, sub, and, or, xor, sethi) or a shift;
second, a generic instruction or an instruction setting the flags.
(Alternative: first, an instruction setting the flags;
second, a generic instruction or a shift.)
An integer instruction cannot be the fourth instruction in a cycle.
- At most one load/store instruction can be executed in one cycle.
A load/store instruction cannot be the fourth instruction in a cycle.
- At most one floating-point addition can be executed in one cycle.
- At most one floating-point multiplication can be executed in one cycle.
- An instruction cannot read or write a register
written by a previous instruction in the same cycle.
(Exception: A store instruction can read a register written by a
previous instruction in the same cycle.)
- An instruction cannot read a register loaded recently.
The register is readable two cycles after the load.
there is an extra delay of one cycle for a signed integer load below 64 bits;
there is an extra delay of seven or nine cycles for an L1 cache miss;
and the result of a load is never readable
until at least one cycle after the result of the previous load.)
- An instruction cannot read a register written recently
by a floating-point addition or multiplication.
(Exception: The result can be stored at any time.)
The register is readable three cycles after the floating-point operation.
The UltraSPARC has a 64-bit cycle counter, the %tick register.
Unfortunately, this register is readable only by the kernel by default.
Fortunately, the Solaris 8 kernel makes the register readable by user code.
Solaris also provides a gethrtime() library routine
that returns the current time with nanosecond precision.
This routine is based on a system call that reads the cycle counter.
Code measurement tools
Sun's INCAS is an UltraSPARC-I simulator.
Sun claims that INCAS is available for free;
unfortunately, its download page makes several unreasonable demands.
The UltraSPARC has a rdpr %ver instruction
that copies the processor version into a register.
Unfortunately, this instruction is usable only by the kernel by default.